summaryrefslogtreecommitdiffstats
path: root/src/usr/trace
diff options
context:
space:
mode:
authorPatrick Williams <iawillia@us.ibm.com>2012-02-07 16:49:39 -0600
committerA. Patrick Williams III <iawillia@us.ibm.com>2012-02-08 08:10:56 -0600
commit4c500ad53631f8a42d64a88112b30b19a0c6373b (patch)
treee54808328ea6a2c25647ad05283b37a52dfd42f3 /src/usr/trace
parent1ae3dfacaccbcb9bdb1b0e8306118844331a6e10 (diff)
downloadtalos-hostboot-4c500ad53631f8a42d64a88112b30b19a0c6373b.tar.gz
talos-hostboot-4c500ad53631f8a42d64a88112b30b19a0c6373b.zip
Fix a deadlock in trace.
This is actually a workaround and based on the symptom I believe there are many other hazards in the trace code. What happened here is that a task caused a page-fault in the trace code while holding the trace mutex. This prevented code like the PNOR resource provider from being able to execute traces, which deadlocked all code. There are similar deadlock hazards in the binary and %s handling of trace. We need to revisit trace to ensure that it can never cause a page-fault while holding the global mutex. We talked recently about revamping trace entirely, so this is just one more design item to consider. Change-Id: I28e32d2d79cf419a7a7eb680627e79a88bc6a5a7 Reviewed-on: http://gfw160.austin.ibm.com:8080/gerrit/649 Reviewed-by: Mark W. Wenning <wenning@us.ibm.com> Reviewed-by: CAMVAN T. NGUYEN <ctnguyen@us.ibm.com> Tested-by: Jenkins Server Reviewed-by: Van H. Lee <vanlee@us.ibm.com> Reviewed-by: A. Patrick Williams III <iawillia@us.ibm.com>
Diffstat (limited to 'src/usr/trace')
-rw-r--r--src/usr/trace/trace.C8
1 files changed, 8 insertions, 0 deletions
diff --git a/src/usr/trace/trace.C b/src/usr/trace/trace.C
index 1510eac78..29e7f7771 100644
--- a/src/usr/trace/trace.C
+++ b/src/usr/trace/trace.C
@@ -213,6 +213,14 @@ void Trace::initBuffer( trace_desc_t **o_td,
// Store buffer name internally in upper case
strupr(l_comp);
+ // The page containing the trace-descriptor destination might not be
+ // loaded yet, so we write to it outside of the mutex to force a page
+ // fault to bring the page in. If we don't do this, we can end up with
+ // a dead-lock where this code is blocked due to a page-fault while
+ // holding the trace mutex, which in turn blocks the code that handles
+ // page faults.
+ *o_td = NULL;
+
// CRITICAL REGION START
mutex_lock(&iv_trac_mutex);
OpenPOWER on IntegriCloud