| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't see a reason why there would need to be a PHB3 *specific*
subsystem in the error logs, so rename it to PHB so that PHB4 and
later can use it too without continually redefining it.
This shouldn't change any existing assumptions because it's unused.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use TCE mapped area to write data to console. Console header
(fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).
Kernel makes opal_console_write() OPAL call to write data to console.
OPAL write data to TCE mapped area and sends MBOX command to FSP.
If our console becomes full and we have data to write to console,
we keep on waiting until FSP reads data.
In some corner cases, where FSP is active but not responding to
console MBOX message (due to buggy IPMI) and we have heavy console
write happening from kernel, then eventually our console buffer
becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
kernel. Kernel will keep on retrying. This is creating kernel soft
lockups. In some extreme case when every CPU is trying to write to
console, user will not be able to ssh and thinks system is hang.
If we reset FSP or restart IPMI daemon on FSP, system recovers and
everything becomes normal.
This patch adds workaround to above issue by returning OPAL_HARDWARE
when cosole is full. Side effect of this patch is, we may endup dropping
latest console data. But better to drop console data than system hang.
Alternative approach is to drop old data from console buffer, make space
for new data. But in normal condition only FSP can update 'next_out'
pointer and if we touch that pointer, it may introduce some other
race conditions. Hence we decided to just new console write request.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trigging a Host Initiated Reset (when the host detects the FSP has gone
out to lunch and should be rebooted), would cause "Unknown Command" messages
to appear in the OPAL log.
This patch implements those messages
How to trigger FSP RR(HIR):
$ putmemproc 300000f8 0x00000000deadbeef
s1 k0:n0:s0:p00
ecmd_ppc putmemproc 300000f8 0x00000000deadbeef
Log showing unknown command:
/ # cat /sys/firmware/opal/msglog | grep -i ,3
[ 110.232114723,3] FSP: fsp_trigger_reset() entry
[ 188.431793837,3] FSP #0: Link down, starting R&R
[ 464.109239162,3] FSP #0: Got XUP with no pending message !
[ 466.340598554,3] FSP-DPO: Unknown command 0xce0900
[ 466.340600126,3] FSP: Unhandled message ce0900
The message we need to handle is "Get PLID after host initiated FipS
reset/reload". When the FSP comes back from HIR, it asks "hey, so, which
error log explains why you rebooted me?". So, we tell it.
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
High volume of SYNC errors onto LPC bus cause degraded system
performance and are likely due to bad hardware present onto system.
Thus once LPC SYNC errors cross a certain threshold, OPAL should log
them onto BMC as unrecoverable errors in manufacturing mode. This
will help manufacturing screen bad parts, causing such errors.
Cc: stable
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
[stewart@linux.vnet.ibm.com: s/mfg/manufacturing/]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
It replaces two letter SRC components macro name with some meaningful
components name to make it more legible.
E.g:
OPAL_XS => OPAL_SRC_COMPONENT_XSCOM
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
componenet => component
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Since these were unused, it's safe to just reassign them and not change
anything we expose to anyone.
Reported-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
This makes the (two) duplicates easier to spot
Reported-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OPAL retries XSCOM read/write operations forever till it succeeds.
This can cause XSCOM ops to hang forever when XSCOM engine remains
busy for some reason. Changed it to retry XSCOM operations only
XSCOM_BUSY_MAX_RETRIES number of times instead of retrying forever.
Also added logic to reset XSCOM engine after XSCOM_BUSY_RESET_THRESHOLD
number of retries to unblock it when it remains busy.
Cc: stable # 9c2d82394fd2 ("xscom: Return OPAL_WRONG_STATE on XSCOM ops..")
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Log LPC SYNC errors as OPAL_PLATFORM_ERR_EVT errors with generic
predictive error (OPAL_PREDICTIVE_ERR_GENERAL) severity.
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
[stewart@linux.vnet.ibm.com: use const char* rather than strcpy]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem Description:
During FSP termination/reset, FSP received mbox command from OPAL for
"Fetching platform management function data". As FSP is in termination
state DMAE operation failed to write memory data to hypervisor,
so FSP sent mbox command with response status as 0x24 to OPAL and
OPAL committed a predictive log with SRC BB822411 and sent back
response status as 0xFE, which FSP IPMI will not understand the
failure at the Host and IPMI will log the error.
Fix:This patch is to fix when OPAL receives a bad response from FSP 0x24
due to DMAE error, commit informational log and return response status
as SUCCESS and for all other bad status response commit predictive log.
Signed-off-by: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
See https://github.com/lucasdemarchi/codespel
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new OPAL call OPAL_CEC_REBOOT2 which will
be used to handle abnormal reboot/termination by kernel host.
This call will allow host kernel to pass reboot type and additional
debug data which needs to be captured/saved somewhere (for later
analysis) before going down.
Currently it will support two reboot types (0). normal reboot, that
will behave similar to that of opal_cec_reboot() call, and
(1). platform error reboot, that will trigger a system checkstop
using xscom address and FIR bit information obtained via device-tree
property 'ibm,sw-checkstop-fir'.
For unsupported reboot type, this call will do nothing and return
with OPAL_UNSUPPORTED.
In future, we can overload this call to support additional reboot types.
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
There are now no users of the call_out parameter and future users should
use the log_append_msg() and log_append_data() functions, so remove all
references to call_out.
Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the callback functionality from log_error() and replace it with
the ability to append to a user data section, or add addtional user
data sections to an error log.
For multiline or otherwise complex logging the convention is now to call
opal_elog_create() to obtain a errorlog buffer and use log_append_msg()
and log_append_data() to append to the user data section. Additional
user data sections can be added to the error log via log_add_section().
The caller is then responsible for calling log_commit().
For simple logs log_simple_error() takes care of creating and committing
the error log as before.
Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
FSP implements the IPMI commands support that can be accessed over
the mailbox interface from the host. The host needs to provide and
receive the message/data bytes of the IPMI command in the TCE space
in the KCS (Keyboard Controller Style) format.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently just log an error when we don't find an active PSI link
15 minutes after it went down. Add a PEL log, with sufficient severity
so it gets pushed to the administrator.
V2:
Reset the timeout correctly to prevent error log flooding.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This is probably not the best collection of things in the world,
but it means that opal.h is much closer to being directly usable
by an OS.
This triggers a bunch of #include fixes throughout the tree.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit cf6f4e8912d29fb89ce85c84834607065ad595a5 introduced a platform
independent frontend for error logging. However it failed to move the
generic parts of the fsp-elog.h header into the platform independent
one, instead relying on the fact that up until now fsp-elog.h was
included whenever a function needed to log errors.
This patch moves the platform independent defines into the frontend
header file (errorlog.h) and removes the include of the platform
specific header in generic code paths.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
None of these actually produce any warnings. My next patch will.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to support fsp-less machines we need to be able to log errors
using a BMC or some other mechanism. Currently the error logging code
is tightly coupled to the platform making it difficult to add
different platforms.
This patch factors out the generic parts of the error logging code in
preparation for adding different logging backends. It also adds a
generic mechanism for pre-allocating a specific number of objects.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
Remove opal prefix from opal_* structures in errorlog.h
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Currently, all the error log data structures needed to commit an error
in PEL are declared in opal.h and are exposed to kernel.
At present we do not have any infrastrucutre in the kernel to make use
of these ABIs.
Any changes to these strucutes in the future can
potentially cause an ABI breakage. In order to avoid
any breakage this patch moves all the error log data structures
to a new errorlog.h file and deprecating the OPAL tokens for
error-log write functionality calls only.
Once the kernel support is added (anytime in future),
one needs to move all of this back to opal.h and re-enable
the OPAL tokens for the same.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|