summaryrefslogtreecommitdiffstats
path: root/include/errorlog.h
Commit message (Collapse)AuthorAgeFilesLines
* errorlog: Rename PHB3 to just PHBRussell Currey2018-09-171-2/+2
| | | | | | | | | | | | | I don't see a reason why there would need to be a PHB3 *specific* subsystem in the error logs, so rename it to PHB so that PHB4 and later can use it too without continually redefining it. This shouldn't change any existing assumptions because it's unused. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* FSP/CONSOLE: Workaround for unresponsive ipmi daemonVasant Hegde2017-06-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use TCE mapped area to write data to console. Console header (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer). Kernel makes opal_console_write() OPAL call to write data to console. OPAL write data to TCE mapped area and sends MBOX command to FSP. If our console becomes full and we have data to write to console, we keep on waiting until FSP reads data. In some corner cases, where FSP is active but not responding to console MBOX message (due to buggy IPMI) and we have heavy console write happening from kernel, then eventually our console buffer becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to kernel. Kernel will keep on retrying. This is creating kernel soft lockups. In some extreme case when every CPU is trying to write to console, user will not be able to ssh and thinks system is hang. If we reset FSP or restart IPMI daemon on FSP, system recovers and everything becomes normal. This patch adds workaround to above issue by returning OPAL_HARDWARE when cosole is full. Side effect of this patch is, we may endup dropping latest console data. But better to drop console data than system hang. Alternative approach is to drop old data from console buffer, make space for new data. But in normal condition only FSP can update 'next_out' pointer and if we touch that pointer, it may introduce some other race conditions. Hence we decided to just new console write request. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* FSP: Notify FSP of Platform Log ID after Host Initiated Reset ReloadStewart Smith2017-05-101-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trigging a Host Initiated Reset (when the host detects the FSP has gone out to lunch and should be rebooted), would cause "Unknown Command" messages to appear in the OPAL log. This patch implements those messages How to trigger FSP RR(HIR): $ putmemproc 300000f8 0x00000000deadbeef s1 k0:n0:s0:p00 ecmd_ppc putmemproc 300000f8 0x00000000deadbeef Log showing unknown command: / # cat /sys/firmware/opal/msglog | grep -i ,3 [ 110.232114723,3] FSP: fsp_trigger_reset() entry [ 188.431793837,3] FSP #0: Link down, starting R&R [ 464.109239162,3] FSP #0: Got XUP with no pending message ! [ 466.340598554,3] FSP-DPO: Unknown command 0xce0900 [ 466.340600126,3] FSP: Unhandled message ce0900 The message we need to handle is "Get PLID after host initiated FipS reset/reload". When the FSP comes back from HIR, it asks "hey, so, which error log explains why you rebooted me?". So, we tell it. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* lpc: Log LPC SYNC errors as unrecoverable ones for manufacturingVipin K Parashar2016-08-251-0/+1
| | | | | | | | | | | | | High volume of SYNC errors onto LPC bus cause degraded system performance and are likely due to bad hardware present onto system. Thus once LPC SYNC errors cross a certain threshold, OPAL should log them onto BMC as unrecoverable errors in manufacturing mode. This will help manufacturing screen bad parts, causing such errors. Cc: stable Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: s/mfg/manufacturing/] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* include/errorlog.h : Renames SRC component's macro nameMukesh Ojha2016-07-281-134/+134
| | | | | | | | | | | It replaces two letter SRC components macro name with some meaningful components name to make it more legible. E.g: OPAL_XS => OPAL_SRC_COMPONENT_XSCOM Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* errorlog : Typo mistakeMukesh Ojha2016-07-141-1/+1
| | | | | | | componenet => component Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* Reassign duplicate error log component IDs to free onesStewart Smith2016-07-141-2/+2
| | | | | | | | Since these were unused, it's safe to just reassign them and not change anything we expose to anyone. Reported-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* Re-order error log component ID defines to be in orderStewart Smith2016-07-141-14/+14
| | | | | | | This makes the (two) duplicates easier to spot Reported-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* hw/xscom: Reset XSCOM engine after finite number of retries when busyVipin K Parashar2016-07-051-0/+1
| | | | | | | | | | | | | | OPAL retries XSCOM read/write operations forever till it succeeds. This can cause XSCOM ops to hang forever when XSCOM engine remains busy for some reason. Changed it to retry XSCOM operations only XSCOM_BUSY_MAX_RETRIES number of times instead of retrying forever. Also added logic to reset XSCOM engine after XSCOM_BUSY_RESET_THRESHOLD number of retries to unblock it when it remains busy. Cc: stable # 9c2d82394fd2 ("xscom: Return OPAL_WRONG_STATE on XSCOM ops..") Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* hw/lpc: Log LPC SYNC errors as OPAL_PLATFORM_ERR_EVT errorsVipin K Parashar2016-05-031-0/+1
| | | | | | | | | Log LPC SYNC errors as OPAL_PLATFORM_ERR_EVT errors with generic predictive error (OPAL_PREDICTIVE_ERR_GENERAL) severity. Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: use const char* rather than strcpy] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* OPAL:Handle mbox response with bad status:0x24 during FSP terminationMamatha2016-04-061-0/+2
| | | | | | | | | | | | | | | | | | | Problem Description: During FSP termination/reset, FSP received mbox command from OPAL for "Fetching platform management function data". As FSP is in termination state DMAE operation failed to write memory data to hypervisor, so FSP sent mbox command with response status as 0x24 to OPAL and OPAL committed a predictive log with SRC BB822411 and sent back response status as 0xFE, which FSP IPMI will not understand the failure at the Host and IPMI will log the error. Fix:This patch is to fix when OPAL receives a bad response from FSP 0x24 due to DMAE error, commit informational log and return response status as SUCCESS and for all other bad status response commit predictive log. Signed-off-by: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* Fix spelling mistakesStewart Smith2015-08-261-1/+1
| | | | | | See https://github.com/lucasdemarchi/codespel Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* opal-api: Add OPAL call to handle abnormal reboots.Vipin K Parashar2015-07-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new OPAL call OPAL_CEC_REBOOT2 which will be used to handle abnormal reboot/termination by kernel host. This call will allow host kernel to pass reboot type and additional debug data which needs to be captured/saved somewhere (for later analysis) before going down. Currently it will support two reboot types (0). normal reboot, that will behave similar to that of opal_cec_reboot() call, and (1). platform error reboot, that will trigger a system checkstop using xscom address and FIR bit information obtained via device-tree property 'ibm,sw-checkstop-fir'. For unsupported reboot type, this call will do nothing and return with OPAL_UNSUPPORTED. In future, we can overload this call to support additional reboot types. Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* errorlog: Deprecate elog callback parameterSamuel Mendoza-Jonas2015-07-311-4/+2
| | | | | | | | | | There are now no users of the call_out parameter and future users should use the log_append_msg() and log_append_data() functions, so remove all references to call_out. Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* errorlog: Simplify log calling conventionSamuel Mendoza-Jonas2015-07-311-7/+9
| | | | | | | | | | | | | | | | | | | Remove the callback functionality from log_error() and replace it with the ability to append to a user data section, or add addtional user data sections to an error log. For multiline or otherwise complex logging the convention is now to call opal_elog_create() to obtain a errorlog buffer and use log_append_msg() and log_append_data() to append to the user data section. Additional user data sections can be added to the error log via log_add_section(). The caller is then responsible for calling log_commit(). For simple logs log_simple_error() takes care of creating and committing the error log as before. Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* fsp/ipmi: Add the in-band IPMI support for FSP systemsNeelesh Gupta2015-07-101-0/+6
| | | | | | | | | | | FSP implements the IPMI commands support that can be accessed over the mailbox interface from the host. The host needs to provide and receive the message/data bytes of the IPMI command in the TCE space in the KCS (Keyboard Controller Style) format. Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* sparse: declare opal_err_info as staticCédric Le Goater2015-07-061-1/+1
| | | | | | Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* PSI: Report a PEL for PSI timeoutAnanth N Mavinakayanahalli2015-02-091-0/+1
| | | | | | | | | | | | | We currently just log an error when we don't find an active PSI link 15 minutes after it went down. Add a PEL log, with sufficient severity so it gets pushed to the administrator. V2: Reset the timeout correctly to prevent error log flooding. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* Move skiboot internal things from opal.h to opal-api.hStewart Smith2015-02-061-0/+1
| | | | | | | | | | This is probably not the best collection of things in the world, but it means that opal.h is much closer to being directly usable by an OS. This triggers a bunch of #include fixes throughout the tree. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* elog: Clean up error logging headersAlistair Popple2014-12-021-2/+174
| | | | | | | | | | | | | | | Commit cf6f4e8912d29fb89ce85c84834607065ad595a5 introduced a platform independent frontend for error logging. However it failed to move the generic parts of the fsp-elog.h header into the platform independent one, instead relying on the fact that up until now fsp-elog.h was included whenever a function needed to log errors. This patch moves the platform independent defines into the frontend header file (errorlog.h) and removes the include of the platform specific header in generic code paths. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* Liberally sprinkle attribute((warn_unused_result)) around the placeStewart Smith2014-11-271-1/+1
| | | | | | None of these actually produce any warnings. My next patch will. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* fsp/elog: Create a logging frontendAlistair Popple2014-10-221-0/+41
| | | | | | | | | | | | | | In order to support fsp-less machines we need to be able to log errors using a BMC or some other mechanism. Currently the error logging code is tightly coupled to the platform making it difficult to add different platforms. This patch factors out the generic parts of the error logging code in preparation for adding different logging backends. It also adds a generic mechanism for pre-allocating a specific number of objects. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* elog: Remove opal prefix from data structures in errorlog.hDeepthi Dharwar2014-07-251-2/+2
| | | | | | | Remove opal prefix from opal_* structures in errorlog.h Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* elog: Move error log data structres from opal.h to new file errorlog.hDeepthi Dharwar2014-07-251-0/+135
Currently, all the error log data structures needed to commit an error in PEL are declared in opal.h and are exposed to kernel. At present we do not have any infrastrucutre in the kernel to make use of these ABIs. Any changes to these strucutes in the future can potentially cause an ABI breakage. In order to avoid any breakage this patch moves all the error log data structures to a new errorlog.h file and deprecating the OPAL tokens for error-log write functionality calls only. Once the kernel support is added (anytime in future), one needs to move all of this back to opal.h and re-enable the OPAL tokens for the same. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
OpenPOWER on IntegriCloud