| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Direct control xscom can take more time to complete. We seem to
wait too little on Boston failing fast-reboot for no good reason.
Increase timeout to 1 sec as a reasonable value for sreset to be delivered
and core to start executing instructions.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves the security and predictability of the fast reboot
environment.
There can not be a secure fence between fast reboots, because a
malicious OS can modify the firmware itself. However a well-behaved
OS can have a reasonable expectation that OS memory regions it has
modified will be cleared upon fast reboot.
The memory is zeroed after all other CPUs come up from fast reboot,
just before the new kernel is loaded and booted into. This allows
image preloading to run concurrently, and will allow parallelisation
of the clearing in future.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Run the mem_region sanity checkers before proceeding with fast
reboot.
This is the beginning of proactive sanity checks on opal data
for fast reboot (with complements the reactive disable_fast_reboot
cases). This is encouraged to re-use and share any kind of debug
code and unit test code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pci_reset() currently does a platform reboot if it fails. It
should not know about fast-reboot at this level, so instead have
it return an error, and the fast reboot caller will do the
platform reboot.
The code essentially does the same thing, but flexibility is
improved. Ideally the fast reboot code should perform pci_reset
and all such fail-able operations before the CPU resets itself
and destroys its own stack. That's not the case now, but that
should be the goal.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an initial fast reboot implementation for p9 which has only been
tested on the Witherspoon platform, and without the use of NPUs, NX/VAS,
etc.
This has worked reasonably well so far, with no failures in about 100
reboots. It is hidden behind the traditional fast-reboot experimental
nvram option, until more platforms and configurations are tested.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the boot CPU cleanup and state transition to active, logically
together with secondaries. Don't release secondaries from fast reboot
hold until everyone has cleaned up and transitioned to active.
This is cosmetic, but it is helpful to run the fast reboot state machine
the same way on all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Change existing failure error messages to PR_NOTICE so they get
printed to the console, and add some new ones. It's not a more
severe class because it falls back to IPL on failure.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch fast reboot to use quiescing rather than "wait for a while".
If firmware can not be quiesced, then fast reboot is skipped. This
significantly improves the robustness of fast reboot in the face of
bugs or unexpected latencies.
Complexity of synchronization in fast-reboot is reduced, because we
are guaranteed to be single-threaded when quiesce succeeds, so locks
can be removed.
In the case that firmware can be quiesced, then it will generally
reduce fast reboot times by nearly 200ms, because quiescing usually
takes very little time.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
Don't tie mambo fast reboot to POWER8 CPU type.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the boot CPU (not the initiator) clears special wakeups
after all CPUs have called in. After the earlier change to have the
initiator wait for secondaries before calling in, this is no longer
necessary.
Have the initiator finish the entire sreset sequence, clearing special
wakeups after all others have called in.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
This function has shrunk to the point it's not so helpful to keep it,
it's no longer power8 specific, and getting rid of it simplifies
error handling a little in future changes.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
There is a 100ms delay when targets reach sreset which does not appear
to have a good purpose. Remove it and therefore reduce the sreset timeout
by the same amount.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
This is a bit of paranoia, but when a CPU changes state to signal it
has reached a particular point, all previous stores should be visible.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Have the initiator wait for all its sreset targets to call in, and
time out after 200ms if they did not. Fail and revert to IPL reboot.
Testing indicates that after successful sreset_all_others(), it
takes less than 102ms (in hundreds of fast reboots) for secondaries
to call in. 100 of that is due to an initial delay, but core
un-splitting was not measured.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass back failures from sreset_all_others, also change return codes to
OPAL_ form in sreset_all_prepare to match.
Errors will revert to the IPL path, so it's not critical to completely
clean up everything if that would complicate things. Detecting the
error and failing is the important thing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
Move the mambo sreset code out from the P8 implementation.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The "last man standing" logic has the initiator CPU sreset all
others, then one of them sresets the initiator.
This complicates the fast reboot process and increases potential for
errors. The initiator can simply branch to 0x100 directly.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
This provides a simple API that is amenable to be implemented
by the direct-controls subsystem in a future change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
pm idle requires the system reset vector and IPI facilities before
it can be enabled. Split these out and manage them individually.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Create a more generic helper for changing HID0 bits on all
processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
It uses tlbiel and only cleans up the TLB of the calling core
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent CPUs have introduced a lower SMT priority. This uses the
Linux pattern of executing priority nops in descending order to
get a simple portable way to put the CPU into lowest SMT priority.
Introduce smt_lowest() and use it in place of smt_very_low and
smt_low ; smt_very_low sequences.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Now that we can handle disabling CAPI mode on PHBs, we don't need to
disable fast reboot if there's a CAPI mode PHB.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
In fast-reboot path, OPAL is re-initializing the PCI subsystem.
Accordingly set firmware progress sensor status with IPMI_FW_PCI_INIT.
Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the fast reboot path opal sends multiple special sequence to all the
CPUs using xscom operations. On freshly booted system where all CPUs are
in good condition, the fast reboot works fine. But fast reboot fails when
any of the COREs are GARDed by Service processor. This is because xscom
operations fails/timeout on the CPUs/COREs that are GARDed. Fix this issue
by skipping GARDed CPUs during fast reboot path.
The GARDed CPUs are presented as 'bad' to OPAL and OPAL marks that
cpu->state as 'cpu_state_unavailable'. This patch checks the cpu state
to skip GARDed CPUs during fast reboot.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
[stewart@linux.vnet.ibm.com: unlock before return (suggested by Mahesh/Andrew),
disable only on non-cancelling fsp codeupdate call (suggested by Vasant)]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a PHB is in CAPI mode, we cannot safely fast reboot - the PHB will be
fenced during the reboot resulting in major problems when we load the new
kernel.
In order to handle this safely, we need to disable CAPI mode before
resetting PHBs during the fast reboot. However, we don't currently support
this.
In the meantime, when fast rebooting, check if there are any PHBs with a
CAPP attached, and if so, abort the fast reboot and revert to a normal
reboot instead.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
A bit of a hack to free the flattened device tree on fast reset.
This means we don't leak ~500kb memory every fast reset.
We also count the number of fast resets we've done (if enabled),
which means that for stress testing, we have a hope of finding out
how many we managed to do before we hit a problem.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an experimental patch that implements "Fast reboot" on P8
machines.
The basic idea is that when the OS calls OPAL reboot, we gather all
the threads in the system using a combination of patching the reset
vector and soft-resetting them, then cleanup a few bits of hardware
(we do re-probe PCIe for example), and reload & restart the bootloader.
For Trusted Boot, this means we *add* measurements to the TPM, so you
will get *different* PCR values as compared to a full IPL. This makes
sense as if you want to be sure you are running something known then,
well, do a full IPL as soft reset should never be trusted to clear any
malicious code.
This is very experimental and needs a lot of testing and also auditing
code for other bits of HW that might need to be cleaned up.
BenH TODO: I also need to check if we are properly PERST'ing PCI devices.
This is partially based on old code I had to do that on P7. I only
support it on P8 though as there are issues with the PSI interrupts
on P7 that cannot be reliably solved.
Even though this should be considered somewhat experimental, we've had
a lot of success on a variety of machines. Dozens/hundreds of reboots
across Tuleta, Garrison and Habanero.
Currently, we've hidden it behind a NVRAM config option, which *is*
liable to change in the future (to ensure that only those who know
what they're doing enable it)
You can enable the experimental support via nvram option:
nvram -p ibm,skiboot --update-config experimental-fast-reset=feeling-lucky
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[stewart@linux.vnet.ibm.com: hide behind nvram option, include Mambo fixes
from Mikey]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
It only exposed one function that is local to the hdat stuff
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|