summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>2019-03-22 20:28:00 +0530
committerStewart Smith <stewart@linux.ibm.com>2019-03-28 15:24:12 +1100
commit2c71d7032484f4267ca425b5a58c5a6f06a2eaf3 (patch)
tree139296b503c070079aba65f604f6cb79ac745225
parent08de14aaa937ad703182302470c3525bb328b6de (diff)
downloadblackbird-skiboot-2c71d7032484f4267ca425b5a58c5a6f06a2eaf3.tar.gz
blackbird-skiboot-2c71d7032484f4267ca425b5a58c5a6f06a2eaf3.zip
Fix hang in pnv_platform_error_reboot path due to TOD failure.
On TOD failure, with TB stuck, when linux heads down to pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed. But with panic cpu stuck inside OPAL, linux never recovers/reboot. p0 c1 t0 NIA : 0x000000003001dd3c <.time_wait+0x64> CFAR : 0x000000003001dce4 <.time_wait+0xc> MSR : 0x9000000002803002 LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec> STACK: SP NIA 0x0000000031c236e0 0x0000000031c23760 (big-endian) 0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec> 0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c> 0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150> 0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc> 0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc> 0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc> 0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4> 0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0> 0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134> 0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80> 0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94> 0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0> 0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4> 0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140> 0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c> 0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68> 0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8> 0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8> 0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88> 0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164> 0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c> This is because, there is a while loop towards the end of ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match with "msg". It loops over time_wait_ms() until exit condition is met. In normal scenario time_wait_ms() calls run pollers so that ipmi backend gets a chance to check ipmi response and set sync_msg to NULL. while (sync_msg == msg) time_wait_ms(10); But in the event when TB is in failed state time_wait_ms()->time_wait_poll() returns immediately without calling pollers and hence we end up looping forever. This patch fixes this hang by calling opal_run_pollers() in TB failed state as well. Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.") Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
-rw-r--r--core/timebase.c10
1 files changed, 10 insertions, 0 deletions
diff --git a/core/timebase.c b/core/timebase.c
index 0ae11d25..b1a02a4a 100644
--- a/core/timebase.c
+++ b/core/timebase.c
@@ -29,6 +29,16 @@ static void time_wait_poll(unsigned long duration)
unsigned long period = msecs_to_tb(5);
if (this_cpu()->tb_invalid) {
+ /*
+ * Run pollers to allow some backends to process response.
+ *
+ * In TOD failure case where TOD is unrecoverable, running
+ * pollers allows ipmi backend to deal with ipmi response
+ * from bmc and helps ipmi_queue_msg_sync() to get un-stuck.
+ * Thus it avoids linux kernel to hang during panic due to
+ * TOD failure.
+ */
+ opal_run_pollers();
cpu_relax();
return;
}
OpenPOWER on IntegriCloud