| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
These messages just fill up the opal console log with useless messages
resulting in us losing useful information.
They have been like this since the first commit in skiboot. Make them
trace.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
elog_reject_head() routine makes the state 'elog_read_from_fsp_head_state'
either 'ELOG_STATE_REJECTED' or 'ELOG_STATE_NONE' depending on the current
state of 'elog_read_from_fsp_head_state'.
We can remove this elog_reject_head() from 'opal_kexec_elog_notify()' as just
after that it is called inside 'fsp_opal_resend_pending_logs()'. So, it is
redundant inside opal_kexec_elog_notify() routine.
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use 'elog_enabled' flag to check whether host OS is ready to receive
error log or not. This is nothing to do with reading error log from
service processor.
This patch is to remove the check and keep this 'elog_enabled' free from
FSP specific code and move it into core/errorlog.c in later upcoming patches.
With this changes, in some corner cases we may endup reading same error
log twice from FSP. It happens as we call 'elog_reject_head' inside
'fsp_opal_resend_pending_logs' which makes the state either
'ELOG_STATE_REJECTED' or 'ELOG_STATE_NONE'. So, a call to
'fsp_elog_check_and_fetch_head' routine ends up reading the error
log from FSP which was already read. This case happens twice in a reboot
as whenever 'fsp_opal_resend_pending_logs' gets called.
So, we can ignore it.
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Some modifications related to typo errors, alignment, case letter mismatch to add
more clarity to the code.
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This issue is one of the corner case, which is related to recent change
went upstream and only observed in the petitboot prompt, where we see
only one error log instead of getting all error log in
/sys/firmware/opal/elog.
Below is snippet of the code, where elog module in the kernel
initialised.
{
..
...
rc = request_threaded_irq(irq, NULL, elog_event, =<=======
IRQF_TRIGGER_HIGH | IRQF_ONESHOT, "opal-elog", NULL); |
if (rc) { |
pr_err("%s: Can't request OPAL event irq (%d)\n", |
__func__, rc); |
return rc; |
} |
/* We are now ready to pull error logs from opal. */ |
if (opal_check_token(OPAL_ELOG_RESEND)) |
opal_resend_pending_logs(); =<=======
}
Scenario:
While elog_enabled is true, OPAL_EVENT_ERROR_LOG_AVAIL will be set from
OPAL, whenever it has error logs that are waiting to be fetched from the
kernel.
Race occurs between the code arrowed above, as soon as kernel registers
error log handler, it sees OPAL_EVENT_ERROR_LOG_AVAIL is set, so it
schedule the handler. Which makes 'opal_get_elog_size'(kernel) call on
the error log set the state from ELOG_STATE_FETCHED_DATA to
ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. During
the same time 'opal_resend_pending_logs'(kernel) call which will set the
state machine from ELOG_STATE_FETCHED_INFO to ELOG_STATE_NONE in OPAL.
Because of that, read call from the kernel, which was to be made after
the 'opal_get_elog_size' ends up failing. But, the elog kobject was
created for the particular error log.
Further in the resend routine in the OPAL, we make opal_commit_elog_in_host()
call that sets OPAL_EVENT_ERROR_LOG_AVAIL. So, Kernel again makes
'opal_get_elog_size' which results in getting the error log info of the
same error log which was fetched earlier. It also changes the state
machine to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL.
Below is the snippet from the elog_event registered handler call
{
...
...
/* we may get notified twice, let's handle
* that gracefully and not create two conflicting
* entries.
*/
if (kset_find_obj(elog_kset, name))
return IRQ_HANDLED;
...
...
}
In the kernel, we search kobject for the error log whether it already
exist. So kobject is found and it returns without reading error log
data.
So, this patch makes the flag which was true during initialisation
to false. And that solves the race.
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some corner cases host may send acknowledgement without
reading actual data (fsp_opal_elog_info -> fsp_opal_elog_ack).
Because of this elog_read_from_fsp_head_state may be stuck in
wrong state (ELOG_STATE_HOST_INFO) and not able to send remaining
ELOG's to host. Hence reset ELOG state and start sending remaining
ELOG's.
Also in normal case we will ACK the logs which are already processed
(elog_read_processed). Hence rearrange the code such that we go
through elog_read_processed first.
Finally return OPAL_PARAMETER if we are not able to find ELOG ID.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[stewart@linux.vnet.ibm.com: spelling fix]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Chances of elog_read_pending inconsistent state is very very
less. Just to be on safer side, disable notification if list
is not in consistent state.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ELOG enables event notification once new log is available. And this
will be disabled after host completes reading logs (it has to complete
both fsp_opal_elog_info and fsp_opal_elog_read).
Ideally we should disable notification as soon as host consumes event
(after fsp_opal_elog_info). Also if host fails to call fsp_opal_elog_read
(ex: situations like duplicate event), then we endup keeping notification
forever.
This patch introduces new ELOG state (ELOG_STATE_HOST_INFO). As soon
as host consumes event elog will move to this new state so that event
notification is disabled.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use elog notifier to notify logs from multiple sources (FSP generated
logs - fsp-elog-read.c and OPAL generated logs - fsp-elog-write.c).
OPAL generated logs sets elog event bit whenever it has new logs to send
to host. But it relies on fsp-elog-read.c to disable the event bit..which
is wrong!
This patch creates common function to enable/disable event notification.
It will enable event notification if any of the source is ready to send
error log to host and disables notification once it completes sending
all errors to host.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ELOG enables event notification once new log is available. And this
will be disabled after host completes reading logs (it has to complete
both fsp_opal_elog_info and fsp_opal_elog_read).
In some corner cases like kexec, host may endup reading same ELOG id twice
(calling fsp_opal_elog_info twice because of resend request). Host finds it
as duplicate and it will not read actual log (fsp_opal_elog_read()). In such
situations we fails to disable event notification :-(
Scenario :
OPAL Host
-------------------------------------
OPAL_EVENT_ELOG_AVAIL --> kexec
OPAL_EVENT_ELOG_AVAIL --> elog client registered
<-- read ELOG (id=x)
<-- resend elog (opal_resend_pending_logs())
resend all ELOG --> read ELOG (id=x) -- Duplicate ELOG !
bhoom!!
kernel call trace:
------------------
[ 28.055923] CPU: 10 PID: 20 Comm: irq/29-opal-elo Not tainted 4.4.0-24-generic #43-Ubuntu
[ 28.056012] task: c0000000ef982a20 ti: c0000000efa38000 task.ti: c0000000efa38000
[ 28.056100] NIP: c000000008010a24 LR: c000000008010a24 CTR: 0000000030033758
[ 28.056188] REGS: c0000000efa3b9c0 TRAP: 0901 Not tainted (4.4.0-24-generic)
[ 28.056274] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22000844 XER: 20000000
[ 28.056499] CFAR: c000000008009958 SOFTE: 1
GPR00: c000000008131e8c c0000000efa3bc40 c0000000095b4200 0000000000000900
GPR04: c0000000094a63c8 0000000000000001 9000000100009033 0000000000000062
GPR08: 0000000000000000 0000000000000000 c0000000ef960400 9000000100001003
GPR12: c00000000806de48 c00000000fb45f00
[ 28.057042] NIP [c000000008010a24] arch_local_irq_restore+0x74/0x90
[ 28.057117] LR [c000000008010a24] arch_local_irq_restore+0x74/0x90
[ 28.057189] Call Trace:
[ 28.057221] [c0000000efa3bc40] [c0000000f108a980] 0xc0000000f108a980 (unreliable)
[ 28.057326] [c0000000efa3bc60] [c000000008131e8c] irq_finalize_oneshot.part.2+0xbc/0x250
[ 28.057429] [c0000000efa3bcb0] [c000000008132170] irq_thread_fn+0x80/0xa0
[ 28.057519] [c0000000efa3bcf0] [c00000000813263c] irq_thread+0x1ac/0x280
[ 28.057609] [c0000000efa3bd80] [c0000000080e61e0] kthread+0x110/0x130
[ 28.057698] [c0000000efa3be30] [c000000008009538] ret_from_kernel_thread+0x5c/0xa4
[ 28.057799] Instruction dump:
[ 28.057844] 994d02ca 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 e8010010
[ 28.057995] 7c0803a6 4e800020 60420000 4bff17ad <60000000> 4bffffe4 60420000 e92d0020
This patch adds kexec notifier client. It will disable event notification
during kexec. Once host is ready to receive ELOG's again it will call
fsp_opal_resend_pending_logs(). This call re-enables ELOG notication.
It will fix above issue. I will add follow up patch to improve event state.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
We don't need to validate msg->resp message as its always
allocated.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
These errors are essentially assert()s - something has gone wrong and
it's likely because of a bug somewhere. Things we should *never* it
regards to inconsistency, so have FWTS throw warnings on them.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Remove useless elog_reject_head call in acknowledgement path.
We can ACK elog in any order.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Presently we continue to read error log even though elog state is
"REJECTED". This patch fixes this by rearraning code.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Presently we are allocating 256K for elog read buffer. But we just
read one log at a time. Also maximum size of ELOG is 16KB.
Effectively we are not using remaining 240K. This patch reduces
the size of the buffer to 16K.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This is probably not the best collection of things in the world,
but it means that opal.h is much closer to being directly usable
by an OS.
This triggers a bunch of #include fixes throughout the tree.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent gcc 4.9 will silently add conditional traps for NULL checks in
cases where it thinks we may dereference a NULL pointer. It seems to
be pretty keen on doing so when we dereference the result of list_top.
Most of the time, these aren't bugs because we have some other state
variable that tells us whether our list contains something or not but
we hit a bug in the FSP code recently where that was getting out of
sync, and the result of a trap in real mode isn't pretty ....
So this adds explicit NULL checks in a number of place where gcc added
trap instructions. With this patch, the current tree doesn't generate
any. I didn't find a way to make gcc warn unfortunately.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Linux can request for lesser bytes than actual elog size.
Also make sure Linux is not requesting more than elog size.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
| |
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
This patch adds support to re-send Sapphire logs
to the host.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
| |
This patch consists of elog-read framework changes to
support pulling of Sapphire logs directly on to the
host.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
been read.
The host would first query OPAL about the error log format,
size and log id and later make a call to read the complete log
buffer from OPAL to host.
This patch introduces a new elog_head_state to indicate
that host has queried elog info before fetching the elog_buffer.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
| |
This patch fixes a bunch of typos and removes NULL assignment
to global variables.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|