talos-skiboot/include, branch master

Expose PNOR Flash partitions to host MTD driver via devicetree

2018-04-30T08:55:23+00:00

Signal skiboot completion to BMC when done

2018-04-30T08:55:22+00:00

sensors: Dont add DTS sensors when OCC inband sensors are available

2018-04-19T23:56:06+00:00

There are two sets of core temperature sensors today. One is DTS scom based core temperature sensors and the second group is the sensors provided by OCC. DTS is the highest temperature among the different temperature zones in the core while OCC core temperature sensors are the average temperature of the core. DTS sensors are read directly by the host by SCOMing the DTS sensors while OCC sensors are read and updated by OCC to main memory. Reading DTS sensors by SCOMing is a heavy and slower operation as compared to reading OCC sensors which is as good as reading memory. So dont add DTS sensors when OCC sensors are available. Signed-off-by: Shilpasri G Bhat Acked-by: Vaidyanathan Srinivasan Signed-off-by: Stewart Smith

core/opal: Emergency stack for re-entry

2018-04-19T01:23:07+00:00

This detects OPAL being re-entered by the OS, and switches to an emergency stack if it was. This protects the firmware's main stack from re-entrancy and allows the OS to use NMI facilities for crash / debug functionality. Further nested re-entry will destroy the previous emergency stack and prevent returning, but those should be rare cases. This stack is sized at 16kB, which doubles the size of CPU stacks, so as not to introduce a regression in primary stack size. The 16kB stack originally had a 4kB machine check stack at the top, which was removed by 80eee1946 ("opal: Remove machine check interrupt patching in OPAL."). So it is possible the size could be tightened again, but that would require further analysis. Signed-off-by: Nicholas Piggin Signed-off-by: Stewart Smith

asm/head: implement quiescing without stack or clobbering regs

2018-04-19T01:23:07+00:00

Quiescing currently is implmeented in C in opal_entry before the opal call handler is called. This works well enough for simple cases like fast reset when one CPU wants all others out of the way. Linux would like to use it to prevent an sreset IPI from interrupting firmware, which could lead to deadlocks when crash dumping or entering the debugger. Linux interrupts do not recover well when returning back to general OPAL code, due to r13 not being restored. OPAL also can't be re-entered, which may happen e.g., from the debugger. So move the quiesce hold/reject to entry code, beore the stack or r1 or r13 registers are switched. OPAL can be interrupted and returned to or re-entered during this period. This does not completely solve all such problems. OPAL will be interrupted with sreset if the quiesce times out, and it can be interrupted by MCEs as well. These still have the issues above. Signed-off-by: Nicholas Piggin Signed-off-by: Stewart Smith

core/stack: backtrace unwind basic OPAL call details

2018-04-19T01:23:07+00:00

Put OPAL callers' r1 into the stack back chain, and then use that to unwind back to the OPAL entry frame (as opposed to boot entry, which has a 0 back chain). >From there, dump the OPAL call token and the caller's r1. A backtrace looks like this: CPU 0000 Backtrace: S: 0000000031c03ba0 R: 000000003001a548 ._abort+0x4c S: 0000000031c03c20 R: 000000003001baac .opal_run_pollers+0x3c S: 0000000031c03ca0 R: 000000003001bcbc .opal_poll_events+0xc4 S: 0000000031c03d20 R: 00000000300051dc opal_entry+0x12c --- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 --- This is pretty basic for the moment, but it does give you the bottom of the Linux stack. It will allow some interesting improvements in future. First, with the eframe, all the call's parameters can be printed out as well. The ___backtrace / ___print_backtrace API needs to be reworked in order to support this, but it's otherwise very simple (see opal_trace_entry()). Second, it will allow Linux's stack to be passed back to Linux via a debugging opal call. This will allow Linux's BUG() or xmon to also print the Linux back trace in case of a NMI or MCE or watchdog lockup that hits in OPAL. Signed-off-by: Nicholas Piggin Signed-off-by: Stewart Smith

opal/hmi: Generate hmi event for recovered HDEC parity error.

2018-04-17T08:52:10+00:00

Signed-off-by: Mahesh Salgaonkar Signed-off-by: Stewart Smith

opal/hmi: check thread 0 tfmr to validate latched tfmr errors.

2018-04-17T08:52:10+00:00

Due to P9 errata, HDEC parity and TB residue errors are latched for non-zero threads 1-3 even if they are cleared. But these are not latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr value and ignore them on non-zero threads if they are not present on thread 0. Signed-off-by: Mahesh Salgaonkar Signed-off-by: Stewart Smith

opal/hmi: Fix soft lockups during TOD errors

2018-04-17T08:52:10+00:00

There are some TOD errors which do not affect working of TOD and TB. They stay in valid state. Hence we don't need rendez vous for TOD errors that does not affect TB working. TOD errors that affects TOD/TB will report a global error on TFMR[44] alongwith bit 51, and they will go in rendez vous path as expected. But the TOD errors that does not affect TB register sets only TFMR bit 51. The TFMR bit 51 is cleared when any single thread clears the TOD error. Once cleared, the bit 51 is reflected to all the cores on that chip. Any thread that reads the TFMR register after the error is cleared will see TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through rendez-vous path and threads that see TFMR[51]=0, returns doing nothing. This ends up in a soft lockups in host kernel. This patch fixes this issue by not considering TOD interrupt (TFMR[51]) as a core-global error and hence avoiding rendez-vous path completely. Instead threads that see TFMR[51]=1 will now take different path that just do the TOD error recovery. Signed-off-by: Mahesh Salgaonkar Signed-off-by: Stewart Smith

opal/hmi: Do not send HMI event if no errors are found.

2018-04-17T08:52:10+00:00

For TOD errors, all the cores in the chip get HMIs. Any one thread from any core can fix the issue and TFMR will have error conditions cleared. Rest of the threads need take any action if TOD errors are already cleared. Hence thread 0 of every core should get a fresh copy of TFMR before going ahead recovery path. Initialize recover = -1, so that if no errors found that thread need not send a HMI event to linux. This helps in stop flooding host with hmi event by every thread even there are no errors found. Signed-off-by: Mahesh Salgaonkar Signed-off-by: Stewart Smith