summaryrefslogtreecommitdiffstats
path: root/kernel/rcu
Commit message (Collapse)AuthorAgeFilesLines
...
| | | * rcu: Upgrade sync_exp_work_done() to smp_mb()Paul E. McKenney2019-06-131-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sync_exp_work_done() function uses smp_mb__before_atomic(), but there is no obvious atomic in the ensuing code. The ordering is absolutely required for grace periods to work correctly, so this commit upgrades the smp_mb__before_atomic() to smp_mb(). Fixes: 6fba2b3767ea ("rcu: Remove deprecated RCU debugfs tracing code") Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Set a maximum limit for back-to-back callback invocationPaul E. McKenney2019-05-281-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if a CPU has more than 10,000 callbacks pending, it will increase rdp->blimit to LONG_MAX. If you are lucky, LONG_MAX is only about two billion, but this is still a bit too many callbacks to invoke back-to-back while otherwise ignoring the world. This commit therefore sets a maximum limit of DEFAULT_MAX_RCU_BLIMIT, which is set to 10,000, for rdp->blimit. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Correctly unlock root node in rcu_check_gp_start_stall()Neeraj Upadhyay2019-05-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On systems whose rcu_node tree has only one node, the rcu_check_gp_start_stall() function's values of rnp and rnp_root will be identical. In this case, it clearly does not make sense to release both rnp->lock and rnp_root->lock, but that is exactly what this function does in the last early exit. This commit therefore unlocks only rnp->lock when rnp and rnp_root are equal. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Dump specified number of blocked tasksNeeraj Upadhyay2019-05-281-1/+1
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dump_blkd_tasks() function dumps at most 10 blocked tasks, ignoring the value of the ncheck parameter. This commit therefore substitutes the value of ncheck for the hard-coded value of 10. Because all callers currently pass 10 as the number, this patch does not change behavior, but it is clearly an accident waiting to happen. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * | rcu: Remove unused rdp local from synchronize_rcu_expedited()Jiang Biao2019-05-281-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | Because rdp is initialized but never used in synchronize_rcu_expedited(), this commit removes it. Signed-off-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * | rcu: Rename rcu_data's ->deferred_qs to ->exp_deferred_qsPaul E. McKenney2019-05-283-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_data structure's ->deferred_qs field is used to indicate that the current CPU is blocking an expedited grace period (perhaps a future one). Given that it is used only for expedited grace periods, its current name is misleading, so this commit renames it to ->exp_deferred_qs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * | rcu: Add checks for dynticks counters in rcu_is_cpu_rrupt_from_idle()Joel Fernandes (Google)2019-05-281-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It would be good to combine the dynticks and dynticks_nesting counters in order to simplify the code. Unfortunately, there are concerns about usermode upcalls appearing to RCU as half of an interrupt, as Byungchul learned [1]. The "half" in "half interrupt" is due to an unpaired rcu_irq_enter(): Normally, each rcu_irq_enter() has a later call to rcu_irq_exit(). Out of an abundance of caution, Paul added warnings [2] in the RCU code which if not fired by 2021 will be interpreted as meaning that this half-interrupt scenario cannot happen any more, thus permitting simplification of this code. In the meantime, this commit makes the following changes: (1) Combining these two counters requires that rcu_rrupt_from_idle() is invoked only from hard-interrupt contexts as discussed here [3]. This commit therefore adds the required lockdep_assert_in_irq() to check this constraint. (2) Furthermore, rcu_rrupt_from_idle() is not explicit about how it is using the counters which can lead to weird future bugs. This commit therefore adds comments indicating the meaning and use of each counter. (3) Lastly, this commit checks for counter underflows as another check that half interrupts don't occur. (Previously, the function would simply return true upon underflow.) All these checks checks are NOOPs if PROVE_LOCKING (and thus PROVE_RCU) are disabled. [1] https://lore.kernel.org/patchwork/patch/952349/ [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts") [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/ Cc: byungchul.park@lge.com Cc: kernel-team@android.com Cc: rcu@vger.kernel.org Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * | rcu: Avoid self-IPI in sync_sched_exp_online_cleanup()Paul E. McKenney2019-05-251-6/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sync_sched_exp_online_cleanup() is invoked at online time to handle the case where the start of an expedited grace period ran concurrently with a CPU being taken offline and then immediately being placed online. It checks to see if RCU needs an expedited quiescent state from the incoming CPU, sending it an IPI if so. However, it is quite possible that sync_sched_exp_online_cleanup() is running on that CPU, in which case it is considerably less overhead to simply request the quiescent state locally instead of simulating a self-IPI. This commit therefore places the last few lines of rcu_exp_handler() into a new rcu_exp_need_qs() function, which is invoked both by rcu_exp_handler() and by sync_sched_exp_online_cleanup() in the self-IPI case. This also reduces the rcu_exp_handler() function's state space by removing the direct call that this smp_call_function_single() uses to emulate the requested self-IPI. This in turn will allow tighter error checking in rcu_is_cpu_rrupt_from_idle(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
| * | rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus()Paul E. McKenney2019-05-251-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although sync_rcu_exp_select_node_cpus() treats the current CPU as being in a quiescent state, it might well migrate to some other CPU before reaching the smp_call_function_single(), which could then result in an unnecessary simulated self-IPI. This commit therefore instead simply refuses to invoke smp_call_function_single() on the current CPU, which causes the later rcu_report_exp_cpu_mult() to report this CPU's quiescent state with less overhead. This also reduces the rcu_exp_handler() function's state space by removing the direct call that this smp_call_function_single() uses to emulate the requested self-IPI. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Use get_cpu() instead of preempt_disable() per Joel Fernandes. ]
| * | rcu: Inline invoke_rcu_callbacks() into its sole remaining callerPaul E. McKenney2019-05-251-17/+3
| | | | | | | | | | | | | | | | | | | | | This commit saves a few lines of code by inlining invoke_rcu_callbacks() into its sole remaining caller. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * | rcu: Use irq_work to get scheduler's attention in clean contextPaul E. McKenney2019-05-252-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When rcu_read_unlock_special() is invoked with interrupts disabled, is either not in an interrupt handler or is not using RCU_SOFTIRQ, is not the first RCU read-side critical section in the chain, and either there is an expedited grace period in flight or this is a NO_HZ_FULL kernel, the end of the grace period can be unduly delayed. The reason for this is that it is not safe to do wakeups in this situation. This commit fixes this problem by using the irq_work subsystem to force a later interrupt handler in a clean environment. Because set_tsk_need_resched(current) and set_preempt_need_resched() are invoked prior to this, the scheduler will force a context switch upon return from this interrupt (though perhaps at the end of any interrupted preempt-disable or BH-disable region of code), which will invoke rcu_note_context_switch() (again in a clean environment), which will in turn give RCU the chance to report the deferred quiescent state. Of course, by then this task might be within another RCU read-side critical section. But that will be detected at that time and reporting will be further deferred to the outermost rcu_read_unlock(). See rcu_preempt_need_deferred_qs() and rcu_preempt_deferred_qs() for more details on the checking. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * | rcu: Allow rcu_read_unlock_special() to raise_softirq() if in_irq()Paul E. McKenney2019-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running in an interrupt handler, raise_softirq() and raise_softirq_irqoff() have extremely low overhead: They simply set a bit in a per-CPU mask, which is checked upon exit from that interrupt handler. Therefore, if rcu_read_unlock_special() is invoked within an interrupt handler and RCU_SOFTIRQ is in use, this commit make use of raise_softirq_irqoff() even if there is no expedited grace period in flight and even if this is not a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * | rcu: Only do rcu_read_unlock_special() wakeups if expeditedPaul E. McKenney2019-05-251-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, rcu_read_unlock_special() will do wakeups whenever it is safe to do so. However, wakeups are expensive, and they are only really needed when the just-ended RCU read-side critical section is blocking an expedited grace period (in which case speed is of the essence) or on a nohz_full CPU (where it might be a good long time before an interrupt arrives). This commit therefore checks for these conditions, and does the expensive wakeups only if doing so would be useful. Note it can be rather expensive to determine whether or not the current task (as opposed to the current CPU) is blocking the current expedited grace period. Doing so requires traversing the ->blkd_tasks list, which can be quite long. This commit therefore cheats: If the current task is on a given ->blkd_tasks list, and some task on that list is blocking the current expedited grace period, the code assumes that the current task is blocking that expedited grace period. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * | rcu: Check for wakeup-safe conditions in rcu_read_unlock_special()Paul E. McKenney2019-05-251-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When RCU core processing is offloaded from RCU_SOFTIRQ to the rcuc kthreads, a full and unconditional wakeup is required to initiate RCU core processing. In contrast, when RCU core processing is carried out by RCU_SOFTIRQ, a raise_softirq() suffices. Of course, there are situations where raise_softirq() does a full wakeup, but these do not occur with normal usage of rcu_read_unlock(). The reason that full wakeups can be problematic is that the scheduler sometimes invokes rcu_read_unlock() with its pi or rq locks held, which can of course result in deadlock in CONFIG_PREEMPT=y kernels when rcu_read_unlock() invokes the scheduler. Scheduler invocations can happen in the following situations: (1) The just-ended reader has been subjected to RCU priority boosting, in which case rcu_read_unlock() must deboost, (2) Interrupts were disabled across the call to rcu_read_unlock(), so the quiescent state must be deferred, requiring a wakeup of the rcuc kthread corresponding to the current CPU. Now, the scheduler may hold one of its locks across rcu_read_unlock() only if preemption has been disabled across the entire RCU read-side critical section, which in the days prior to RCU flavor consolidation meant that rcu_read_unlock() never needed to do wakeups. However, this is no longer the case for any but the first rcu_read_unlock() following a condition (e.g., preempted RCU reader) requiring special rcu_read_unlock() attention. For example, an RCU read-side critical section might be preempted, but preemption might be disabled across the rcu_read_unlock(). The rcu_read_unlock() must defer the quiescent state, and therefore leaves the task queued on its leaf rcu_node structure. If a scheduler interrupt occurs, the scheduler might well invoke rcu_read_unlock() with one of its locks held. However, the preempted task is still queued, so rcu_read_unlock() will attempt to defer the quiescent state once more. When RCU core processing is carried out by RCU_SOFTIRQ, this works just fine: The raise_softirq() function simply sets a bit in a per-CPU mask and the RCU core processing will be undertaken upon return from interrupt. Not so when RCU core processing is carried out by the rcuc kthread: In this case, the required wakeup can result in deadlock. The initial solution to this problem was to use set_tsk_need_resched() and set_preempt_need_resched() to force a future context switch, which allows rcu_preempt_note_context_switch() to report the deferred quiescent state to RCU's core processing. Unfortunately for expedited grace periods, there can be a significant delay between the call for a context switch and the actual context switch. This commit therefore introduces a ->deferred_qs flag to the task_struct structure's rcu_special structure. This flag is initially false, and is set to true by the first call to rcu_read_unlock() requiring special attention, then finally reset back to false when the quiescent state is finally reported. Then rcu_read_unlock() attempts full wakeups only when ->deferred_qs is false, that is, on the first rcu_read_unlock() requiring special attention. Note that a chain of RCU readers linked by some other sort of reader may find that a later rcu_read_unlock() is once again able to do a full wakeup, courtesy of an intervening preemption: rcu_read_lock(); /* preempted */ local_irq_disable(); rcu_read_unlock(); /* Can do full wakeup, sets ->deferred_qs. */ rcu_read_lock(); local_irq_enable(); preempt_disable() rcu_read_unlock(); /* Cannot do full wakeup, ->deferred_qs set. */ rcu_read_lock(); preempt_enable(); /* preempted, >deferred_qs reset. */ local_irq_disable(); rcu_read_unlock(); /* Can again do full wakeup, sets ->deferred_qs. */ Such linked RCU readers do not yet seem to appear in the Linux kernel, and it is probably best if they don't. However, RCU needs to handle them, and some variations on this theme could make even raise_softirq() unsafe due to the possibility of its doing a full wakeup. This commit therefore also avoids invoking raise_softirq() when the ->deferred_qs set flag is set. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
| * | rcu: Enable elimination of Tree-RCU softirq processingSebastian Andrzej Siewior2019-05-253-134/+140
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some workloads need to change kthread priority for RCU core processing without affecting other softirq work. This commit therefore introduces the rcutree.use_softirq kernel boot parameter, which moves the RCU core work from softirq to a per-CPU SCHED_OTHER kthread named rcuc. Use of SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Note that rcutree.use_softirq=0 must be specified to move RCU core processing to the rcuc kthreads: rcutree.use_softirq=1 is the default. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked from RCU core processing, in contrast to softirq->rcuc transition in old mainline RCU priority boosting. ] [ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock() while holding rq or pi locks, also possibly fixing a pre-existing latent bug involving raise_softirq()-induced wakeups. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
* / treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner2019-05-212-0/+2
|/ | | | | | | | | | | | | | Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'trace-v5.2' of ↵Linus Torvalds2019-05-152-11/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The major changes in this tracing update includes: - Removal of non-DYNAMIC_FTRACE from 32bit x86 - Removal of mcount support from x86 - Emulating a call from int3 on x86_64, fixes live kernel patching - Consolidated Tracing Error logs file Minor updates: - Removal of klp_check_compiler_support() - kdb ftrace dumping output changes - Accessing and creating ftrace instances from inside the kernel - Clean up of #define if macro - Introduction of TRACE_EVENT_NOP() to disable trace events based on config options And other minor fixes and clean ups" * tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) x86: Hide the int3_emulate_call/jmp functions from UML livepatch: Remove klp_check_compiler_support() ftrace/x86: Remove mcount support ftrace/x86_32: Remove support for non DYNAMIC_FTRACE tracing: Simplify "if" macro code tracing: Fix documentation about disabling options using trace_options tracing: Replace kzalloc with kcalloc tracing: Fix partial reading of trace event's id file tracing: Allow RCU to run between postponed startup tests tracing: Fix white space issues in parse_pred() function tracing: Eliminate const char[] auto variables ring-buffer: Fix mispelling of Calculate tracing: probeevent: Fix to make the type of $comm string tracing: probeevent: Do not accumulate on ret variable tracing: uprobes: Re-enable $comm support for uprobe events ftrace/x86_64: Emulate call function while updating in breakpoint handler x86_64: Allow breakpoints to emulate call instructions x86_64: Add gap to int3 to allow for call emulation tracing: kdb: Allow ftdump to skip all but the last few entries tracing: Add trace_total_entries() / trace_total_entries_cpu() ...
| * rcu: validate arguments for rcu tracepointsYafang Shao2019-04-082-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_RCU_TRACE is not set, all these tracepoints are defined as do-nothing macro. We'd better make those inline functions that take proper arguments. As RCU_TRACE() is defined as do-nothing marco as well when CONFIG_RCU_TRACE is not set, so we can clean it up. Link: http://lkml.kernel.org/r/1553602391-11926-4-git-send-email-laoar.shao@gmail.com Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | Merge tag 'printk-for-5.2' of ↵Linus Torvalds2019-05-071-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow state reset of printk_once() calls. - Prevent crashes when dereferencing invalid pointers in vsprintf(). Only the first byte is checked for simplicity. - Make vsprintf warnings consistent and inlined. - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf modifiers. - Some clean up of vsprintf and test_printf code. * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Make function pointer_string static vsprintf: Limit the length of inlined error messages vsprintf: Avoid confusion between invalid address and value vsprintf: Prevent crash when dereferencing invalid pointers vsprintf: Consolidate handling of unknown pointer specifiers vsprintf: Factor out %pO handler as kobject_string() vsprintf: Factor out %pV handler as va_format() vsprintf: Factor out %p[iI] handler as ip_addr_string() vsprintf: Do not check address of well-known strings vsprintf: Consistent %pK handling for kptr_restrict == 0 vsprintf: Shuffle restricted_pointer() printk: Tie printk_once / printk_deferred_once into .data.once for reset treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively lib/test_printf: Switch to bitmap_zalloc()
| * | treewide: Switch printk users from %pf and %pF to %ps and %pS, respectivelySakari Ailus2019-04-091-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | %pF and %pf are functionally equivalent to %pS and %ps conversion specifiers. The former are deprecated, therefore switch the current users to use the preferred variant. The changes have been produced by the following command: git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \ while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done And verifying the result. Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: xen-devel@lists.xenproject.org Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvdimm@lists.01.org Cc: linux-pci@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-btrfs@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-mm@kvack.org Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: David Sterba <dsterba@suse.com> (for btrfs) Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c) Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci) Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
| |
| \
| \
| \
| \
| \
*-----. \ Merge branches 'consolidate.2019.04.09a', 'doc.2019.03.26b', ↵Paul E. McKenney2019-04-0912-826/+827
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'fixes.2019.03.26b', 'srcu.2019.03.26b', 'stall.2019.03.26b' and 'torture.2019.03.26b' into HEAD consolidate.2019.04.09a: Lingering RCU flavor consolidation cleanups. doc.2019.03.26b: Documentation updates. fixes.2019.03.26b: Miscellaneous fixes. srcu.2019.03.26b: SRCU updates. stall.2019.03.26b: RCU CPU stall warning updates. torture.2019.03.26b: Torture-test updates.
| | | | * rcuperf: Fix cleanup path for invalid perf_type stringsPaul E. McKenney2019-03-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the specified rcuperf.perf_type is not in the rcu_perf_init() function's perf_ops[] array, rcuperf prints some console messages and then invokes rcu_perf_cleanup() to set state so that a future torture test can run. However, rcu_perf_cleanup() also attempts to end the test that didn't actually start, and in doing so relies on the value of cur_ops, a value that is not particularly relevant in this case. This can result in confusing output or even follow-on failures due to attempts to use facilities that have not been properly initialized. This commit therefore sets the value of cur_ops to NULL in this case and inserts a check near the beginning of rcu_perf_cleanup(), thus avoiding relying on an irrelevant cur_ops value. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | | * rcutorture: Fix cleanup path for invalid torture_type stringsPaul E. McKenney2019-03-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the specified rcutorture.torture_type is not in the rcu_torture_init() function's torture_ops[] array, rcutorture prints some console messages and then invokes rcu_torture_cleanup() to set state so that a future torture test can run. However, rcu_torture_cleanup() also attempts to end the test that didn't actually start, and in doing so relies on the value of cur_ops, a value that is not particularly relevant in this case. This can result in confusing output or even follow-on failures due to attempts to use facilities that have not been properly initialized. This commit therefore sets the value of cur_ops to NULL in this case and inserts a check near the beginning of rcu_torture_cleanup(), thus avoiding relying on an irrelevant cur_ops value. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | | * rcutorture: Fix expected forward progress duration in OOM notifierNeeraj Upadhyay2019-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcutorture_oom_notify() function has a misplaced close parenthesis that results in increasingly long delays in rcu_fwd_progress_check()'s checking for various RCU forward-progress problems. This commit therefore puts the parenthesis in the right place. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | | * rcutorture: Remove ->ext_irq_conflict fieldPaul E. McKenney2019-03-261-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back when there was a separate RCU-bh flavor, the ->ext_irq_conflict field was used to prevent executing local_bh_enable() while interrupts were disabled. However, there is no longer an RCU-bh flavor, so this commit removes the no-longer-needed ->ext_irq_conflict field. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | | * rcutorture: Make rcutorture_extend_mask() comment match the codePaul E. McKenney2019-03-261-1/+1
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code actually rarely uses more than one type of RCU read-side protection, as is actually desired given that we need some reasonable probability of preempting RCU read-side critical sections, which cannot happen with multiple types of protection. This comment therefore adjusts the comment. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Fix nohz status in stall warningNeeraj Upadhyay2019-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Documentation/RCU/stallwarn.txt file says that stall warnings print "D" if dyntick-idle processing is enabled, but the code in print_cpu_stall_fast_no_hz() prints "." instead. This commit therefore reverses the sense of the test to make the code match the documentation. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Move forward-progress checkers into tree_stall.hPaul E. McKenney2019-03-263-166/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit further consolidates stall-warning functionality by moving forward-progress checkers into kernel/rcu/tree_stall.h, updating a comment or two while in the area. More specifically, this commit moves show_rcu_gp_kthreads(), rcu_check_gp_start_stall(), rcu_fwd_progress_check(), sysrq_rcu, sysrq_show_rcu(), sysrq_rcudump_op, and rcu_sysrq_init() from kernel/rcu/tree.c to kernel/rcu/tree_stall.h. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Move irq-disabled stall-warning checking to tree_stall.hPaul E. McKenney2019-03-263-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_iw_handler() function's sole purpose in life is to indicate whether a stalled CPU had interrupts disabled, so it belongs in kernel/rcu/tree_stall.h. This commit therefore makes that move, clarifying its header comment while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Organize functions in tree_stall.hPaul E. McKenney2019-03-262-84/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit does only code movement, removal of now-unneeded forward declarations, and addition of comments. It organizes the functions that implement RCU CPU stall warnings for normal grace periods into three categories: 1. Control of RCU CPU stall warnings, including computing timeouts. 2. Interaction of stall warnings with grace periods. 3. Actual printing of the RCU CPU stall-warning messages. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Move FAST_NO_HZ stall-warning code to tree_stall.hPaul E. McKenney2019-03-263-81/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit further consolidates the stall-warning code by moving print_cpu_stall_info() and its helper functions along with zero_cpu_stall_ticks() to kernel/rcu/tree_stall.h. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Inline RCU stall-warning info helper functionsPaul E. McKenney2019-03-263-22/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The print_cpu_stall_info_begin() and print_cpu_stall_info_end() print a single character each onto the console, and are a holdover from a time when RCU CPU stall warning messages could be abbreviated using a long-gone Kconfig option. This commit therefore adds these single characters to already-printed strings in the calling functions, and then eliminates both print_cpu_stall_info_begin() and print_cpu_stall_info_end(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Move rcu_print_task_exp_stall() to tree_exp.hPaul E. McKenney2019-03-262-31/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because expedited CPU stall warnings are contained within the kernel/rcu/tree_exp.h file, rcu_print_task_exp_stall() should live there too. This commit carries out the required code motion. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Inline RCU task stall-warning helper functionsPaul E. McKenney2019-03-262-30/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_print_detail_task_stall(), rcu_print_task_stall_begin(), and rcu_print_task_stall_end() functions were defined to allow long-gone Kconfig options to provide an abbreviated RCU CPU stall warning printout. This commit saves a few lines of code by inlining them into their sole callers. While in the area, a useless call of rcu_print_detail_task_stall_rnp() on the root rcu_node structure was eliminated. If there is only one rcu_node structure, its tasks get printed twice, but if there are more, the root rcu_node structure is guaranteed to have an empty list of blocked tasks, hence the uselessness. (Long ago, root rcu_node structures with non-empty ->blkd_tasks lists could happen, but no longer.) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Move RCU CPU stall-warning code out of tree.cPaul E. McKenney2019-03-263-294/+299
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit completes the process of consolidating the code for RCU CPU stall warnings for normal grace periods by moving the remaining such code from kernel/rcu/tree.c to kernel/rcu/tree_stall.h. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Move RCU CPU stall-warning code out of tree_plugin.hPaul E. McKenney2019-03-262-90/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RCU CPU stall-warning code for normal grace periods is currently scattered across two files, due to earlier Tiny RCU support for RCU CPU stall warnings and for old Kconfig options that have long since been retired. Given that it is hard for the lead RCU maintainer to find relevant stall-warning code, it would be good to consolidate it. This commit continues this process by moving stall-warning code from kernel/rcu/tree_plugin.c to a new kernel/rcu/tree_stall.h file. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | | * rcu: Move RCU CPU stall-warning code out of update.cPaul E. McKenney2019-03-264-58/+66
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RCU CPU stall-warning code for normal grace periods is currently scattered across three files, due to earlier Tiny RCU support for RCU CPU stall warnings and for old Kconfig options that have long since been retired. Given that it is hard for the lead RCU maintainer to find relevant stall-warning code, it would be good to consolidate it. This commit starts this process by moving stall-warning code from kernel/rcu/update.c to a new kernel/rcu/tree_stall.h file. Note that the definitions of rcu_cpu_stall_suppress and rcu_cpu_stall_timeout must remain in kernel/rcu/update.h to provide compatibility for kernel boot parameter lists. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| | * srcu: Remove cleanup_srcu_struct_quiesced()Paul E. McKenney2019-03-263-30/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cleanup_srcu_struct_quiesced() function was added because NVME used WQ_MEM_RECLAIM workqueues and SRCU did not, which meant that NVME workqueues waiting on SRCU workqueues could result in deadlocks during low-memory conditions. However, SRCU now also has WQ_MEM_RECLAIM workqueues, so there is no longer a potential for deadlock. Furthermore, it turns out to be extremely hard to use cleanup_srcu_struct_quiesced() correctly due to the fact that SRCU callback invocation accesses the srcu_struct structure's per-CPU data area just after callbacks are invoked. Therefore, the usual practice of using srcu_barrier() to wait for callbacks to be invoked before invoking cleanup_srcu_struct_quiesced() fails because SRCU's callback-invocation workqueue handler might be delayed, which can result in cleanup_srcu_struct_quiesced() being invoked (and thus freeing the per-CPU data) before the SRCU's callback-invocation workqueue handler is finished using that per-CPU data. Nor is this a theoretical problem: KASAN emitted use-after-free warnings because of this problem on actual runs. In short, NVME can now safely invoke cleanup_srcu_struct(), which avoids the use-after-free scenario. And cleanup_srcu_struct_quiesced() is quite difficult to use safely. This commit therefore removes cleanup_srcu_struct_quiesced(), switching its sole user back to cleanup_srcu_struct(). This effectively reverts the following pair of commits: f7194ac32ca2 ("srcu: Add cleanup_srcu_struct_quiesced()") 4317228ad9b8 ("nvme: Avoid flush dependency in delete controller flow") Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org>
| | * srcu: Check for in-flight callbacks in _cleanup_srcu_struct()Paul E. McKenney2019-03-261-0/+2
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If someone fails to drain the corresponding SRCU callbacks (for example, by failing to invoke srcu_barrier()) before invoking either cleanup_srcu_struct() or cleanup_srcu_struct_quiesced(), the resulting diagnostic is an ambiguous use-after-free diagnostic, and even then only if you are running something like KASAN. This commit therefore improves SRCU diagnostics by adding checks for in-flight callbacks at _cleanup_srcu_struct() time. Note that these diagnostics can still be defeated, for example, by invoking call_srcu() concurrently with cleanup_srcu_struct(). Which is a really bad idea, but sometimes all too easy to do. But even then, these diagnostics have at least some probability of catching the problem. Reported-by: Sagi Grimberg <sagi@grimberg.me> Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Tested-by: Bart Van Assche <bvanassche@acm.org>
| * rcu: Correct READ_ONCE()/WRITE_ONCE() for ->rcu_read_unlock_specialPaul E. McKenney2019-03-262-3/+3
| | | | | | | | | | | | | | | | | | | | | | The task_struct structure's ->rcu_read_unlock_special field is only ever read or written by the owning task, but it is accessed both at process and interrupt levels. It may therefore be accessed using plain reads and writes while interrupts are disabled, but must be accessed using READ_ONCE() and WRITE_ONCE() or better otherwise. This commit makes a few adjustments to align with this discipline. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * rcu: Fix typo in tree_exp.h commentPaul E. McKenney2019-03-261-1/+1
| | | | | | | | | | | | | | This commit changes a rcu_exp_handler() comment from rcu_preempt_defer_qs() to rcu_preempt_deferred_qs() in order to better match reality. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * rcu: Eliminate redundant NULL-pointer checkPaul E. McKenney2019-03-261-5/+2
| | | | | | | | | | | | | | | | Because rcu_wake_cond() checks for a null task_struct pointer, there is no need for its callers to do so. This commit eliminates the redundant check. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * rcu: Fix force_qs_rnp() header commentZhouyi Zhou2019-03-261-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, threads blocked on offlining CPUS were migrated to the root rcu_node structure, thus requiring RCU priority boosting on this structure. However, since commit d19fb8d1f3f6 ("rcu: Don't migrate blocked tasks even if all corresponding CPUs offline"), RCU does not migrate blocked tasks. Consequently, RCU no longer does RCU priority boosting on the root rcu_node structure as of commit 1be0085b515e ("rcu: Don't initiate RCU priority boosting on root rcu_node"). This commit therefore brings comments for the force_qs_rnp() function's header comment in line with this new no-root-boosting reality. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> [ paulmck: Also remove obsolete comment on suppressing new grace periods. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * rcu: Update jiffies_to_sched_qs and adjust_jiffies_till_sched_qs() commentsPaul E. McKenney2019-03-261-1/+2
| | | | | | | | | | | | | | | | This commit better documents the jiffies_to_sched_qs default-value strategy used by adjust_jiffies_till_sched_qs() Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * rcu: Default jiffies_to_sched_qs to jiffies_till_sched_qsNeeraj Upadhyay2019-03-261-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code only calls adjust_jiffies_till_sched_qs() if jiffies_till_sched_qs is left at its default value, so when the jiffies_till_sched_qs kernel-boot parameter actually is specified, jiffies_to_sched_qs will be left with the value zero, which will result in useless slowdowns of cond_resched(). This commit therefore changes rcu_init_geometry() to unconditionally invoke adjust_jiffies_till_sched_qs(), which ensures that jiffies_to_sched_qs will be initialized in all cases, thus maintaining good cond_resched() performance. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * rcu: Fix self-wakeups for grace-period kthreadNeeraj Upadhyay2019-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current rcu_gp_kthread_wake() function uses in_interrupt() and thus does a self-wakeup from all interrupt contexts, including the pointless case where the GP kthread happens to be running with bottom halves disabled, along with the impossible case where the GP kthread is running within an NMI handler (you are not supposed to invoke rcu_gp_kthread_wake() from within an NMI handler. This commit therefore replaces the in_interrupt() with in_irq(), so that the self-wakeups happen only from handlers for hardware interrupts and softirqs. This also makes the code match the comment. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * rcu: Report error for bad rcu_nocbs= parameter valuesPaul E. McKenney2019-03-261-2/+10
| | | | | | | | | | | | | | | | | | | | | | This commit prints a console message when cpulist_parse() reports a bad list of CPUs, and sets all CPUs' bits in that case. The reason for setting all CPUs' bits is that this is the safe(r) choice for real-time workloads, which would normally be the ones using the rcu_nocbs= kernel boot parameter. Either way, later RCU console log messages list the actual set of CPUs whose RCU callbacks will be offloaded. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * rcu: Allow rcu_nocbs= to specify all CPUsPaul E. McKenney2019-03-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the rcu_nocbs= kernel boot parameter requires that a specific list of CPUs be specified, and has no way to say "all of them". As noted by user RavFX in a comment to Phoronix topic 1002538, this is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL Kconfig option. This commit therefore enables the rcu_nocbs= kernel boot parameter to be given the string "all", as in "rcu_nocbs=all" to specify that all CPUs on the system are to have their RCU callbacks offloaded. Another approach would be to make cpulist_parse() check for "all", but there are uses of cpulist_parse() that do other checking, which could conflict with an "all". This commit therefore focuses on the specific use of cpulist_parse() in rcu_nocb_setup(). Just a note to other people who would like changes to Linux-kernel RCU: If you send your requests to me directly, they might get fixed somewhat faster. RavFX's comment was posted on January 22, 2018 and I first saw it on March 5, 2019. And the only reason that I found it -at- -all- was that I was looking for projects using RCU, and my search engine showed me that Phoronix comment quite by accident. Your choice, though! ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
| * rcu: Move common code out of if-else blockAkira Yokosawa2019-03-261-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | As the result of recent addition of "rdp->core_needs_qs = false;" in the "if" block, now both branches of the if-else have the same assignment. Factor it out and reduce line count. Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
| * rcu: Set rcutree.kthread_prio sysfs access to read-onlyLiu Song2019-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcutree.kthread_prio kernel-boot parameter is used to set the priority for boost (rcub), per-CPU (rcuc), and grace-period (rcu_preempt or rcu_sched) kthreads. It is also used by rcutorture to check whether it is possible to meaningfully test RCU priority boosting. However, all of these cases will either ignore or be confused by any post-boot changes to rcutree.kthread_prio. Note that the user really can change the priorities of all of these kthreads using chrt, given sufficient privileges. Therefore, the read-write nature of sysfs access to rcutree.kthread_prio is thus at best an attractive nuisance. This commit therefore changes sysfs access to rcutree.kthread_prio to be read-only. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
OpenPOWER on IntegriCloud