diff options
author | Patrick Williams <iawillia@us.ibm.com> | 2012-02-07 16:49:39 -0600 |
---|---|---|
committer | A. Patrick Williams III <iawillia@us.ibm.com> | 2012-02-08 08:10:56 -0600 |
commit | 4c500ad53631f8a42d64a88112b30b19a0c6373b (patch) | |
tree | e54808328ea6a2c25647ad05283b37a52dfd42f3 /src/usr/trace | |
parent | 1ae3dfacaccbcb9bdb1b0e8306118844331a6e10 (diff) | |
download | talos-hostboot-4c500ad53631f8a42d64a88112b30b19a0c6373b.tar.gz talos-hostboot-4c500ad53631f8a42d64a88112b30b19a0c6373b.zip |
Fix a deadlock in trace.
This is actually a workaround and based on the symptom I
believe there are many other hazards in the trace code.
What happened here is that a task caused a page-fault in
the trace code while holding the trace mutex. This
prevented code like the PNOR resource provider from being
able to execute traces, which deadlocked all code.
There are similar deadlock hazards in the binary and %s
handling of trace. We need to revisit trace to ensure
that it can never cause a page-fault while holding the
global mutex. We talked recently about revamping trace
entirely, so this is just one more design item to consider.
Change-Id: I28e32d2d79cf419a7a7eb680627e79a88bc6a5a7
Reviewed-on: http://gfw160.austin.ibm.com:8080/gerrit/649
Reviewed-by: Mark W. Wenning <wenning@us.ibm.com>
Reviewed-by: CAMVAN T. NGUYEN <ctnguyen@us.ibm.com>
Tested-by: Jenkins Server
Reviewed-by: Van H. Lee <vanlee@us.ibm.com>
Reviewed-by: A. Patrick Williams III <iawillia@us.ibm.com>
Diffstat (limited to 'src/usr/trace')
-rw-r--r-- | src/usr/trace/trace.C | 8 |
1 files changed, 8 insertions, 0 deletions
diff --git a/src/usr/trace/trace.C b/src/usr/trace/trace.C index 1510eac78..29e7f7771 100644 --- a/src/usr/trace/trace.C +++ b/src/usr/trace/trace.C @@ -213,6 +213,14 @@ void Trace::initBuffer( trace_desc_t **o_td, // Store buffer name internally in upper case
strupr(l_comp);
+ // The page containing the trace-descriptor destination might not be
+ // loaded yet, so we write to it outside of the mutex to force a page
+ // fault to bring the page in. If we don't do this, we can end up with
+ // a dead-lock where this code is blocked due to a page-fault while
+ // holding the trace mutex, which in turn blocks the code that handles
+ // page faults.
+ *o_td = NULL;
+
// CRITICAL REGION START
mutex_lock(&iv_trac_mutex);
|