| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
Log HMI errors as step 1. OS will need to deduce
and interpret the HMI event.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Print DSISR and DAR, to help with deciphering machine check exceptions,
and improve the output a bit, decode NIP symbol, improve alignment, etc.
Also print a specific header for machine check, because we do expect to
see these if there is a hardware failure.
Before:
[ 0.005968779,3] ***********************************************
[ 0.005974102,3] Unexpected exception 200 !
[ 0.005978696,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000
[ 0.005985239,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000
[ 0.005991782,3] LR : 000000003002ad80 CTR : 0000000000000000
[ 0.005998130,3] CFAR : 00000000300b58bc
[ 0.006002769,3] CR : 40000004 XER: 20000000
[ 0.006008069,3] GPR00: 000000003002ad80 GPR16: 0000000000000000
[ 0.006015170,3] GPR01: 0000000031c03bd0 GPR17: 0000000000000000
[...]
After:
[ 0.003287941,3] ***********************************************
[ 0.003561769,3] Fatal MCE at 000000003002ad80 .nvram_init+0x24
[ 0.003579628,3] CFAR : 00000000300b5964
[ 0.003584268,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000
[ 0.003590812,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000
[ 0.003597355,3] DSISR: 00000000 DAR : 0000000000000000
[ 0.003603480,3] LR : 000000003002ad68 CTR : 0000000030093d80
[ 0.003609930,3] CR : 40000004 XER : 20000000
[ 0.003615698,3] GPR00: 00000000300149e8 GPR16: 0000000000000000
[ 0.003622799,3] GPR01: 0000000031c03bc0 GPR17: 0000000000000000
[...]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current boot sequence inherits MSR[ME] from the IPL firmware, and
never changes it. Some environments disable MSR[ME] (e.g., mambo), and
others can enable it (hostboot).
This has two problems. First, MSR[ME] must be disabled while in the
process of taking over the interrupt vector from the previous
environment. Second, after installing our machine check handler,
MSR[ME] should be enabled to get some useful output rather than a
checkstop.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
get_symbol is difficult to use. Add snprintf_symbol helper which
prints a symbol into a buffer with length, and returns the number
of bytes used, similarly to snprintf. Use this in the stack dumping
code rather than open-coding it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OCC shares the frequency list to host by copying the pstate table to
main memory in HOMER. This table is parsed during boot to create
device-tree properties for frequency and pstate IDs. OCC can update
the pstate table to present a new set of frequencies to the host. But
host will remain oblivious to these changes unless it is re-inited
with the updated device-tree CPU frequency properties. So this patch
allows to re-parse the pstate table and update the device-tree
properties during fast-reboot.
OCC updates the pstate table when asked to do so using pstate-table
bias command. And this is mainly used by WOF team for
characterization purposes.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pci_reset() currently does a platform reboot if it fails. It
should not know about fast-reboot at this level, so instead have
it return an error, and the fast reboot caller will do the
platform reboot.
The code essentially does the same thing, but flexibility is
improved. Ideally the fast reboot code should perform pci_reset
and all such fail-able operations before the CPU resets itself
and destroys its own stack. That's not the case now, but that
should be the goal.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
As a safer side move the imc catalog preload after the STB init
to make sure the imc catalog resource get's verified and measured
properly during loading when both secure and trusted boot modes
are on.
Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
A likely copy and paste oversight.
Fixes: 0d84ea6b (core: Add support for quiescing OPAL)
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Create cpuidle device-tree after slw_init(), so that we can stop the
deeper states from being added , when wakeup engine is not present or
failed.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Keep track of lock owner name and replace lock_depth counter
with a per-cpu list of locks held by the cpu.
This allows us to print the actual locks held in case we hit
the (in)famous message about opal_pollers being run with a
lock held.
It also allows us to warn (and drop them) if locks are still
held when returning to the OS or completing a scheduled job.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[stewart: fix unit tests]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
This gives us per-cpu guard values as well. For now I just
xor a magic constant with the CPU PIR value.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
We *disable* the secure boot part, but we keep the verified boot
part as we don't currently have container verification code for Mambo.
We can run a small part of the code currently though.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The linux documentation, reserved-memory.txt, says that memory-region is
a phandle that pairs to a children of /reserved-memory.
This updates /ibm,secureboot/ibm,cvc/memory-region to point to
/reserved-memory/secure-crypt-algo-code instead of
/ibm,hostboot/reserved-memory/secure-crypt-algo-code.
Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
List of libstb calls that were superseded:
sb_verify() -> secureboot_verify()
tb_measure() -> trustedboot_measure()
stb_final() -> trustedboot_exit_boot_services()
stb_init() -> secureboot_init() and trustedboot_init()
The new functions are supported in both P8 and P9.
Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The flash driver calls libstb to verify and measure every PNOR partition
requested at boot time.
This removes redundat code from init.c used to verify and measure
BOOTKERNEL.
Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
This adds the flash_map_resource_name() to allow skiboot subsystems to
lookup the name of a PNOR partition. Thus, we don't need to duplicate
the same information in other places (e.g. libstb).
Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
To (slightly) lower the barrier for contributions, we can make valgrind
optional with just a small amount of plumbing.
This allows make check to run successfully without valgrind.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This patch handles OCC_RESET runtime events in host opal-prd and also
provides support for calling 'hostinterface->wakeup()' which is
required for doing the reset operation.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
In P9 HBRT sends error logs to FSP via firmware_request interface.
This patch adds support to parse error log and send it to FSP.
CC: Jeremy Kerr <jk@ozlabs.org>
CC: Daniel M Crowell <dcrowell@us.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
No functionality change.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
So that we don't flood OPAL console logs with information that is not
useful.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add few HMI debug prints with location code info few additional info.
No functionality change.
With this patch the log messages will look like:
[210612.175196744,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[210612.175200449,7] HMI: [Loc: UOPWR.1302LFA-Node0-Proc1]: P:8 C:16 T:1: TFMR(2d12000870e04020) Timer Facility Error
[210660.259689526,7] HMI: Received HMI interrupt: HMER = 0x2040000000000000
[210660.259695649,7] HMI: [Loc: UOPWR.1302LFA-Node0-Proc0]: P:0 C:16 T:1: Processor recovery Done.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
No functionality changes.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
and store it under proc_chip for quick reference during HMI handling
code.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we reset the timebase value to (almost) zero when
synchronising the timebase of each chip to the Chip TOD network which
results in this:
[ 42.374813167,5] CPU: All 80 processors called in...
[ 2.222791151,5] FLASH: Found system flash: Macronix MXxxL51235F id:0
[ 2.222977933,5] BT: Interface initialized, IO 0x00e4
This patch modifies the chiptod_init() process to use the current
timebase value rather than resetting it to zero. This results in the
timestamps remaining contigious from the start of hostboot until
the petikernel starts. e.g.
[ 70.188811484,5] CPU: All 144 processors called in...
[ 72.458004252,5] FLASH: Found system flash: id:0
[ 72.458147358,5] BT: Interface initialized, IO 0x00e4
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This will trip the debug checks in debug builds under some circumstances
and is actually a rather bad idea as we might look at a timer that is
concurrently being removed and modified, and thus incorrectly assume
there is no work to do.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a debug build, _mcount would trash r3 and opal_exit_check
would not restore it, leaving OPAL calls returning garbage.
this fix simply preserves the return value and doesn't let
the compiler get fancy on us. We effectively just get an
extra `mr` instruction to restore r3.
Fixes: 9c565ee6bca4b665d9d1120bfff5e88ee80615bc
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We require all instructions to be completed before a thread is
considered stopped, by the dctl interface. Long running instructions
like cache misses and CI loads may take a significant amount of time
to complete, and timeouts have been observed in stress testing.
Increase the timeout significantly, to cover this. The workbook
just says to poll, but we like to have timeouts to avoid getting
stuck in firmware.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Add mambo direct controls to stop threads, which is required for
reliable fast-reboot. Enable direct controls by default on mambo.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an initial fast reboot implementation for p9 which has only been
tested on the Witherspoon platform, and without the use of NPUs, NX/VAS,
etc.
This has worked reasonably well so far, with no failures in about 100
reboots. It is hidden behind the traditional fast-reboot experimental
nvram option, until more platforms and configurations are tested.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the boot CPU cleanup and state transition to active, logically
together with secondaries. Don't release secondaries from fast reboot
hold until everyone has cleaned up and transitioned to active.
This is cosmetic, but it is helpful to run the fast reboot state machine
the same way on all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Change existing failure error messages to PR_NOTICE so they get
printed to the console, and add some new ones. It's not a more
severe class because it falls back to IPL on failure.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch fast reboot to use quiescing rather than "wait for a while".
If firmware can not be quiesced, then fast reboot is skipped. This
significantly improves the robustness of fast reboot in the face of
bugs or unexpected latencies.
Complexity of synchronization in fast-reboot is reduced, because we
are guaranteed to be single-threaded when quiesce succeeds, so locks
can be removed.
In the case that firmware can be quiesced, then it will generally
reduce fast reboot times by nearly 200ms, because quiescing usually
takes very little time.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quiescing is ensuring all host controlled CPUs (except the current
one) are out of OPAL and prevented from entering. This can be use in
debug and shutdown paths, particularly with system reset sequences.
This patch adds per-CPU entry and exit tracking for OPAL calls, and
adds logic to "hold" or "reject" at entry time, if OPAL is quiesced.
An OPAL call is added, to expose the functionality to Linux, where it
can be used for shutdown, kexec, and before generating sreset IPIs for
debugging (so the debug code does not recurse into OPAL).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
Move opal_check_token from asm to C.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Add entry and exit C functions that can do some more complex
checks before the opal proper call. This requires saving off
volatile registers that have arguments in them.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Prevent try_lock from modifying the lock state when bust_locks is set.
unlock will not unlock it in that case, so locks will get taken and
never released while bust_locks is set.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
cmpxchg will be used in a subsequent change, and this reduces the
amount of asm code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[stewart: fix some ifdef __TEST__ foo to ensure unittests work]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Add xscom checks which will print something useful and return error
back to callers (which already have error handling plumbed in).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This reworks the sreset functionality that was brought over from
fast-reboot, and fits it under the generic direct controls APIs.
The fast reboot APIs are implemented using generic direct controls,
which also makes them available on p9.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Change the p8_sreset_all_others sequence from prenap all, sreset
all; to prenap, sreset all. This makes it more suitable to fit the
direct controls APIs, which does not expose "prenap".
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
Don't tie mambo fast reboot to POWER8 CPU type.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the boot CPU (not the initiator) clears special wakeups
after all CPUs have called in. After the earlier change to have the
initiator wait for secondaries before calling in, this is no longer
necessary.
Have the initiator finish the entire sreset sequence, clearing special
wakeups after all others have called in.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
This function has shrunk to the point it's not so helpful to keep it,
it's no longer power8 specific, and getting rid of it simplifies
error handling a little in future changes.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
There is a 100ms delay when targets reach sreset which does not appear
to have a good purpose. Remove it and therefore reduce the sreset timeout
by the same amount.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
This is a bit of paranoia, but when a CPU changes state to signal it
has reached a particular point, all previous stores should be visible.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Have the initiator wait for all its sreset targets to call in, and
time out after 200ms if they did not. Fail and revert to IPL reboot.
Testing indicates that after successful sreset_all_others(), it
takes less than 102ms (in hundreds of fast reboots) for secondaries
to call in. 100 of that is due to an initial delay, but core
un-splitting was not measured.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|