summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRoland Dreier <rdreier@cisco.com>2008-06-19 15:04:07 -0700
committerIngo Molnar <mingo@elte.hu>2008-06-20 13:19:32 +0200
commitbb10ed0994927d433f6dbdf274fdb26cfcf516b7 (patch)
tree832f26f1c45e1ddd67beb655a9474d167b9e4cd1
parent8a8cde163ea724baf74e7752a31a69d3121a240e (diff)
downloadtalos-op-linux-bb10ed0994927d433f6dbdf274fdb26cfcf516b7.tar.gz
talos-op-linux-bb10ed0994927d433f6dbdf274fdb26cfcf516b7.zip
sched: fix wait_for_completion_timeout() spurious failure under heavy load
It seems that the current implementaton of wait_for_completion_timeout() has a small problem under very high load for the common pattern: if (!wait_for_completion_timeout(&done, timeout)) /* handle failure */ because the implementation very roughly does (lots of code deleted to show the basic flow): static inline long __sched do_wait_for_common(struct completion *x, long timeout, int state) { if (x->done) return timeout; do { timeout = schedule_timeout(timeout); if (!timeout) return timeout; } while (!x->done); return timeout; } so if the system is very busy and x->done is not set when do_wait_for_common() is entered, it is possible that the first call to schedule_timeout() returns 0 because the task doing wait_for_completion doesn't get rescheduled for a long time, even if it is woken up early enough. In this case, wait_for_completion_timeout() returns 0 without even checking x->done again, and the code above falls into its failure case purely for scheduler reasons, even if the hardware event or whatever was being waited for happened early enough. It would make sense to add an extra test to do_wait_for() in the timeout case and return 1 if x->done is actually set. A quick audit (not exhaustive) of wait_for_completion_timeout() callers seems to indicate that no one actually cares about the return value in the success case -- they just test for 0 (timed out) versus non-zero (wait succeeded). Signed-off-by: Ingo Molnar <mingo@elte.hu>
-rw-r--r--kernel/sched.c10
1 files changed, 10 insertions, 0 deletions
diff --git a/kernel/sched.c b/kernel/sched.c
index 4a3cb0614158..577f160131bd 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4405,6 +4405,16 @@ do_wait_for_common(struct completion *x, long timeout, int state)
spin_unlock_irq(&x->wait.lock);
timeout = schedule_timeout(timeout);
spin_lock_irq(&x->wait.lock);
+
+ /*
+ * If the completion has arrived meanwhile
+ * then return 1 jiffy time left:
+ */
+ if (x->done && !timeout) {
+ timeout = 1;
+ break;
+ }
+
if (!timeout) {
__remove_wait_queue(&x->wait, &wait);
return timeout;
OpenPOWER on IntegriCloud