| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
This is required by the architecture and the implementations, I've
observed failures to wake up on big cores without this.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Currently if Linux boots with a non-zero PCR, things can go bad where
some early userspace programs can take illegal instructions. This is
being fixed in Linux, but in the mean time, we should cleanup in
skiboot also.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current HMI error message does not specifiy where the HMI
error occured.
The original error message was
NPU: FIR#0 FIR 0x0080100000000000 mask 0x009a48180f01ffff
The enhanced error message is
NPU2: [Loc: UOPWR.0000000-Node0-Proc0] P:0 FIR#0 FIR 0x0000100000000000 mask 0x009a48180f03ffff
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An earlier commit introduced a counter variable poller_recursion to
limit to the number number of error messages shown when opal_pollers
are run recursively. However the check for the counter value was
placed in a way that the poller recursion was only detected first 16
times and then allowed afterwards.
This patch fixes this by moving the check for the counter value inside
the conditional branch with some re-factoring so that opal_poller
recursion is not erroneously allowed after poll_recursion is detected
first 16 times.
Fixes: b6a729e118f4 ("Limit number of Poller recursion detected errors to display")
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the recent patch:
eddff9bf40 hmi: Clear unknown debug trigger
I rebased the code from an older skiboot before the HMI rework. When I
did this, I missed the handled flag. Without this the HMER is not
cleared properly and the HMI keeps happening.
This properly sets the handled flag and hence clears the HMER bit.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
op-build commit 736a08b996e292a449c4996edb264011dfe56a40
added hcode to the VERSION partition, let's parse it out
and let the user know.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
p9_stop_thread should fail the operation if it finds the thread was
already quiescd. This implies something else is doing direct controls
on the thread (e.g., pdbg) or there is some exceptional condition we
don't know how to deal with. Proceeding here would cause things to
trample on each other, for example the hard lockup watchdog trying to
send a sreset to the core while it is stopped for debugging with pdbg
will end in tears.
If p9_stop_thread times out waiting for the thread to quiesce, do
not hit it with a core_start direct control, because we don't know
what state things are in and doing more things at this point is worse
than doing nothing. There is no good recipe described in the workbook
to de-assert the core_stop control if it fails to quiesce the thread.
After timing out here, the thread may eventually quiesce and get
stuck, but that's simpler to debug than undefied behaviour.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Firstly, p9_cont_thread should check that the thread actually was
quiesced before it tries to resume it. Anything could happen if we
try this from an arbitrary thread state.
Then when resuming a quiesced thread that is inactive or stopped (in
a stop idle state), we must not send a core_start direct control,
clear_maint must be used in these cases.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some systems, seeing hangs like this when Linux starts:
[ 170.027252763,5] OCC: All Chip Rdy after 0 ms
[ 170.062930145,5] INIT: Starting kernel at 0x20011000, fdt at 0x30ae0530 366247 bytes)
[ 171.238270428,5] OPAL: Switch to little-endian OS
If you look at the in memory skiboot console (or do 'nvram -p
ibm,skiboot --update-config log-level-driver=7') we see the console get
spammed with:
[ 5209.109790675,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
[ 5209.109792716,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
[ 5209.109794695,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
[ 5209.109796689,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
We're taking the debug trigger (bit 17) early on, before the
hmi_debug_trigger function in the kernel is set up.
This clears the HMI in Skiboot and reports to the kernel instead of
bringing down the machine.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
core/pci-quirk.c:70:7: warning: missing field 'vendor_id' initializer
[-Wmissing-field-initializers]
{NULL}
^
Instead use an empty initaliser, as this is what the kernel does.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Practically speaking, I don't think you'd *currently* hit this.
Found with Clang's scan-build.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were getting:
[ 0.047155847,5] INIT: Starting kernel at 0x0, fdt at 0x30409d70 10390 bytes)
Now it says:
[ 0.048059045,5] INIT: Starting kernel at 0x0, fdt at 0x30409d80 10406 bytes
Fixes: 293ca03683bf ("init: print the FDT blob size in decimal")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SBE on P9 provides one shot programmable timer facility. We can use this
to implement OPAL timers and hence limit the reliance on the Linux
heartbeat (similar to HW timer facility provided by SLW on P8).
Design:
- We will continue to run Linux heartbeat.
- Each chip has SBE. This patch always schedules timer on SBE on master chip.
- Start timer option starts new timer or modifies an active timer for the
specified timeout.
- SBE expects timeout value in microseconds. We track timeout value in TB.
Hence we convert tb to microseconds before sending request to SBE.
- We are requesting ack from SBE for timer message. It gaurantees that
SBE has scheduled timer.
- Disabling SBE timer
We expect SBE to send timer expiry interrupt whenever timer expires. We
wait for 10 more ms before disabling timer.
In future we can consider below alternative approaches:
- Presently SBE timer disable is permanent (until we reboot system).
SBE sends "I'm back" interrupt after reset. We can consider restarting
timer after SBE reset.
- Reset SBE and start timer again.
- Each chip has SBE. On multi chip system we can try to schedule timer
on different chip.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lets move P8 timer support code from slw.c to sbe-p8.c (as suggested
by BenH). There is a difference between timer support in P8 and P9.
Hence I think it makes sense to name it as sbe-p8.c.
Note that this is pure code movement and renaming functions/variables.
No functionality changes.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SBE (Self Boot Engine) on P9 has two different jobs:
- Boot the chip up to the point the core is functional
- Provide various services like timer, scom, stash MPIPL, etc., at runtime
OPAL can communicate to SBE via a set of data and control registers provided
by the PSU block in P9 chip.
- Four 8 byte registers for Host to send command packets to SBE
- Four 8 byte registers for SBE to send response packets to Host
- Two doorbell registers (1 on each side) to alert either party
when data is placed in above mentioned data register
Protocol constraints:
Only one command is accepted in the command buffer until the response for the
command is enqueued in the response buffer by SBE.
Usage:
We will use SBE for various purposes like timer, MPIPL, etc.
This patch implements the SBE MBOX spec for OPAL to communicate with
SBE.
Design consideration:
- Each chip has SBE. We need to track SBE messages per chip. Hence added
per chip sbe structure and list of messages to that chip
- SBE accepts only one command at a time. Hence serialized MBOX commands.
- OPAL gets interrupted once SBE sets doorbell register
- OPAL has to clear doorbell register after reading response
- Every command class has timeout option. Timed out messages are discarded
- SBE MBOX commands can be classified into four types :
- Those that must be sent to the master only (ex: sending MDST/MDDT info)
- Those that must be sent to slaves only (ex: continue MPIPL)
- Those that must be sent to all chips (ex: close insecure window)
- Those that can be sent to any chip (ex: timer)
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Otherwise we could exit OPAL holding locks, potentially leading
to all sorts of problems later on.
Cc: stable # 5.3+
Fixes: 7a3e2c4ee3aa0
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We only want to use the device part of the bdfn when looking up the
switch down port. The required bit twiddling happens inside
find_devfn() and the masking here is broken since:
a) Keeps the fn part of the bdfn, and
b) Masks off part of the device number.
This breaks looking up the slot information in some cases.
Fixes: 6878b806682f ("pci-dt-slot: Big ol' cleanup")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Processor recovery is per core error. All threads on that core receive
HMI. All threads don't need to generate HMI event for same error.
Let thread 0 only generate the event.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch now we see reason string printed for CORE_WOF[43] bit.
[ 477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred.
[ 477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error:
[ 477.352242181,7] HMI: PC - Thread hang recovery
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The underlying data that we get from HDAT can only really describe a
PCIe system. As such we can simplify the devicetree slot lookup code
by only caring about the important cases, namly, root ports and switch
downstream ports.
This also fixes a bug where root port didn't get a Slot label applied
which results in devices under that port not having ibm,loc-code set.
This results in the EEH core being unable to report the location of
EEHed devices under that port.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two sets of core temperature sensors today. One is DTS scom
based core temperature sensors and the second group is the sensors
provided by OCC. DTS is the highest temperature among the different
temperature zones in the core while OCC core temperature sensors are
the average temperature of the core. DTS sensors are read directly by
the host by SCOMing the DTS sensors while OCC sensors are read and
updated by OCC to main memory.
Reading DTS sensors by SCOMing is a heavy and slower operation as
compared to reading OCC sensors which is as good as reading memory.
So dont add DTS sensors when OCC sensors are available.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
Hackish fix from benh
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Direct control xscom can take more time to complete. We seem to
wait too little on Boston failing fast-reboot for no good reason.
Increase timeout to 1 sec as a reasonable value for sreset to be delivered
and core to start executing instructions.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Fix the logic error in the loop that iterated incorrectly
over garded cpu.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an NMI interrupts the middle of running pollers and the OS
invokes pollers again (e.g., for console output), the poller
re-entrancy check will prevent it from running and spam the
console.
That check was designed to catch a poller calling opal_run_pollers,
OPAL re-entrancy is something different and is detected elsewhere.
Avoid the poller recursion check if OPAL has been re-entered. This
is a best-effort attempt to cope with errors.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This detects OPAL being re-entered by the OS, and switches to an
emergency stack if it was. This protects the firmware's main stack
from re-entrancy and allows the OS to use NMI facilities for crash
/ debug functionality.
Further nested re-entry will destroy the previous emergency stack
and prevent returning, but those should be rare cases.
This stack is sized at 16kB, which doubles the size of CPU stacks,
so as not to introduce a regression in primary stack size. The 16kB
stack originally had a 4kB machine check stack at the top, which was
removed by 80eee1946 ("opal: Remove machine check interrupt patching
in OPAL."). So it is possible the size could be tightened again, but
that would require further analysis.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quiescing currently is implmeented in C in opal_entry before the
opal call handler is called. This works well enough for simple
cases like fast reset when one CPU wants all others out of the way.
Linux would like to use it to prevent an sreset IPI from
interrupting firmware, which could lead to deadlocks when crash
dumping or entering the debugger. Linux interrupts do not recover
well when returning back to general OPAL code, due to r13 not being
restored. OPAL also can't be re-entered, which may happen e.g.,
from the debugger.
So move the quiesce hold/reject to entry code, beore the stack or
r1 or r13 registers are switched. OPAL can be interrupted and
returned to or re-entered during this period.
This does not completely solve all such problems. OPAL will be
interrupted with sreset if the quiesce times out, and it can be
interrupted by MCEs as well. These still have the issues above.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Put OPAL callers' r1 into the stack back chain, and then use that to
unwind back to the OPAL entry frame (as opposed to boot entry, which
has a 0 back chain).
>From there, dump the OPAL call token and the caller's r1. A backtrace
looks like this:
CPU 0000 Backtrace:
S: 0000000031c03ba0 R: 000000003001a548 ._abort+0x4c
S: 0000000031c03c20 R: 000000003001baac .opal_run_pollers+0x3c
S: 0000000031c03ca0 R: 000000003001bcbc .opal_poll_events+0xc4
S: 0000000031c03d20 R: 00000000300051dc opal_entry+0x12c
--- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 ---
This is pretty basic for the moment, but it does give you the bottom
of the Linux stack. It will allow some interesting improvements in
future.
First, with the eframe, all the call's parameters can be printed out
as well. The ___backtrace / ___print_backtrace API needs to be
reworked in order to support this, but it's otherwise very simple
(see opal_trace_entry()).
Second, it will allow Linux's stack to be passed back to Linux via
a debugging opal call. This will allow Linux's BUG() or xmon to
also print the Linux back trace in case of a NMI or MCE or watchdog
lockup that hits in OPAL.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Due to P9 errata, HDEC parity and TB residue errors are latched for
non-zero threads 1-3 even if they are cleared. But these are not
latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr
value and ignore them on non-zero threads if they are not present on
thread 0.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
Helps in debugging...
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While testing TFMR parity/corrupt error it has been observed that HMIs are
delivered twice for this error
- First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.
- Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB.
On second HMI we end up throwing below error message even though TB is in
valid state.
"HMI: TB invalid without core error reported"
This patch fixes this issue by ignoring HMER[5] and checking only for
TFMR[60] before setting this_cpu()->tb_invalid to true.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some TOD errors which do not affect working of TOD and TB. They
stay in valid state. Hence we don't need rendez vous for TOD errors that
does not affect TB working.
TOD errors that affects TOD/TB will report a global error on TFMR[44]
alongwith bit 51, and they will go in rendez vous path as expected.
But the TOD errors that does not affect TB register sets only TFMR bit 51.
The TFMR bit 51 is cleared when any single thread clears the TOD error.
Once cleared, the bit 51 is reflected to all the cores on that chip. Any
thread that reads the TFMR register after the error is cleared will see
TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through
rendez-vous path and threads that see TFMR[51]=0, returns doing
nothing. This ends up in a soft lockups in host kernel.
This patch fixes this issue by not considering TOD interrupt (TFMR[51])
as a core-global error and hence avoiding rendez-vous path completely.
Instead threads that see TFMR[51]=1 will now take different path that
just do the TOD error recovery.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For TOD errors, all the cores in the chip get HMIs. Any one thread from any
core can fix the issue and TFMR will have error conditions cleared. Rest of
the threads need take any action if TOD errors are already cleared. Hence
thread 0 of every core should get a fresh copy of TFMR before going ahead
recovery path. Initialize recover = -1, so that if no errors found that
thread need not send a HMI event to linux. This helps in stop flooding host
with hmi event by every thread even there are no errors found.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do this before we check for TFAC errors. Otherwise the event at host console
shows no error reported in HMER register.
Without this patch the console event show HMER with all zeros
[ 216.753417] Severe Hypervisor Maintenance interrupt [Recovered]
[ 216.753498] Error detail: Timer facility experienced an error
[ 216.753509] HMER: 0000000000000000
[ 216.753518] TFMR: 3c12000870e04000
After this patch it shows old HMER values on host console:
[ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered]
[ 2237.652651] Error detail: Timer facility experienced an error
[ 2237.652766] HMER: 0840000000000000
[ 2237.652837] TFMR: 3c12000870e04000
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reworks the HMI handling for TFAC errors by introducing
4 rendez-vous points improve the thread synchronization while handling
timebase errors that requires all thread to clear dirty data from TB/HDEC
register before clearing the errors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
The test for TFAC error is now redundant so we remove it and
remove the HMER argument.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
Currently no functional change. This is a first step to completely
rewriting how these things are handled.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
It returns a 64-bit flags mask currently set to provide info
about which timer facilities were lost, and whether an event
was generated.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Writing to HMER acts as an "AND". The current code writes back the
value we originally read with the bits we handled cleared. This is
racy, if a new bit gets set in HW after the original read, we'll end
up clearing it without handling it.
Instead, use an all 1's mask with only the bit handled cleared.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
We want to make sure all reporting and actions are based
upon the same snapshot of HMER in case bits get added
by HW while we are in OPAL.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we restore PCIe bus numbers right after the link is
up. Unfortunately as this point we haven't done CRS so config space
may not be accessible.
This moves the bus number restore till after CRS has happened.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
The PCIe slot capability can be implemented in a root or switch
downstream port to set the maximum power a card is allowed to draw
from the system. This patch adds support for setting the power limit
when the platform has defined one.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deprecate the old "opal-interrupts", it's still there, but the new
property follows the standard and allow us to specify whether an
interrupt is level or edge sensitive.
Similarly create "interrupt-names" whose content is identical to
"opal-interrupts-names".
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Skiboot does not calculate the actual size and start location of the
initramfs if it is wrapped by an STB container (for example if loading
an initramfs from the ROOTFS partition).
Check if the initramfs is in an STB container and determine the size and
location correctly in the same manner as the kernel. Since
load_initramfs() is called after load_kernel() move the call to
trustedboot_exit_boot_services() into load_and_boot_kernel() so it is
called after both of these.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
If the power state is already the required value, return
OPAL_SUCCESS rather than OPAL_PARAMETER to avoid spurrious
errors during boot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Document the bridge class code hack and what ibm,pci-config-space-type
actually means. Also replace some of the pci_cap() calls with a variable
to make it a bit more readable.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The blocklevel abstraction allows for regions of the backing store to be
marked as ECC protected so that blocklevel can decode/encode the ECC
bytes into the buffer automatically without the caller having to be ECC
aware.
Unfortunately this abstraction is far from perfect, this is only useful
if reads and writes are performed at the start of the ECC region or in
some circumstances at an ECC aligned position - which requires the
caller be aware of the ECC regions.
The problem that has arisen is that the blocklevel abstraction is
initialised somewhere but when it is later called the caller is unaware
if ECC exists in the region it wants to arbitrarily read and write to.
This should not have been a problem since blocklevel knows. Currently
misaligned reads will fail ECC checks and misaligned writes will
overwrite ECC bytes and the backing store will become corrupted.
This patch add the smarts to blocklevel_read() and blocklevel_write() to
cope with the problem. Note that ECC can always be bypassed by calling
blocklevel_raw_() functions.
All this work means that the gard tool can can safely call
blocklevel_read() and blocklevel_write() and as long as the blocklevel
knows of the presence of ECC then it will deal with all cases.
This also commit removes code in the gard tool which compensated for
inadequacies no longer present in blocklevel.
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
[stewart: core/flash: Adapt to new libflash ECC API
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|