| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
Fast reset causes the OCC to randomly stop communicating with the BMC,
leading to a fan control failure and possible second CPU guard due
to the resultant FSI bus communication issues.
Disable until Issue 1699 (https://github.com/openbmc/openbmc/issues/1699) is fixed
|
|
|
|
|
|
|
|
|
| |
This option sets the OCC in characterization mode and the changes the
governor to performance.
This patch adds two new sub-options to 'occ' sub-command
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We print out a whole bunch of things on boot, most of which aren't
interesting, so we should *not* print them instead.
Printing things like what CPUs we found and what PCI devices we found
*are* useful, so continue to do that. But we don't need to splat out
a bunch of things that are always going to be true.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HMIs for NPU xstops are broadcasted to all chips. All cores on all the
chips receive HMI. HMI handler correctly identifies and extracts the
NPU FIR details from affected chip, but while printing FIR data it
prints chip id and location code details of this_cpu()->chip_id which
may not be correct. This patch fixes this issue.
CC: stable # v6.0+
Fixes: 7bcbc78c ("Add location code to NPU2 HMI logging")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[stewart: add fixes and cc stable]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Watchdog resets can return an error code from the BMC indicating that
the BMC watchdog was not initialized. Currently we abort skiboot due to
a missing error handler. This patch implements handling
re-initialization for the watchdog, automatically saving the last
watchdog set values and re-issuing them if needed.
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
This does not create any behavioral change yet, but this will be useful
in a future commit that adds support for re-initializing the watchdog.
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This makes no functional changes, just refactors the completion function
to be used for reset only, since it does nothing but free the message
for set calls. This will be useful for future changes to reduce nesting
depth.
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Otherwise it is possible for the reset timer to elapse and trigger the
watchdog to wake back up. This doesn't affect the behavior of the
system since we are providing a NONE action to the BMC. However we would
like to avoid the action from taking place if possible.
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This makes it easier for future changes to ensure that the watchdog
stops ticking and doesn't requeue itself for execution in the
background. This way it is safe for resets to be performed after the
ticks are assumed to be stopped and it won't start the timer again.
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The op-build linux kernel has been configured to support the ipmi
watchdog. This driver will always handle the watchdog by either leaving
it enabled if configured, or by disabling it during module load if no
configuration is provided. This increases the coverage of the watchdog
during the boot process. The watchdog should no longer be disabled at
any point during skiboot execution.
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no clarification for why this change was needed, but presumably
this is due to a buggy BMC implementation where the Watchdog Set command
was processed concurrently or after the initial Watchdog Reset. This
inversion would cause the watchdog to stop since the DONT_STOP bit was
not set. Since we are now using the DONT_STOP bit during initialization,
the watchdog should not be stopped even if an inversion occurs.
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The IPMI standard supports setting a DONT_STOP bit during an Watchdog
Set operation. Most of the time we don't want to stop the Watchdog when
updating the settings so we should be using this bit. This patch makes
it possible for callers of set_wdt to prevent the watchdog from being
stopped. This only changes the behavior of the watchdog during the
initial settings update when initializing skiboot. The watchdog is no
longer disabled and then immediately re-enabled.
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
The IPMI specification denotes that action 0x1 is Host Reset and 0x3 is
Host Power Cycle. Use the correct name for Reset in our watchdog code.
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Add '--allow-empty' which allows the filename for a given partition to
be blank. If set ffspart will set that part of the PNOR file 'blank' and
set ECC bits if required.
Without this option behaviour is unchanged and ffspart will return an
error if it can not find the partition file.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
pflash uses lowercase prefix when running make install in it's
direcetory, but uppercase PREFIX when running it in shared. Use
lowercase everywhere.
With this the OpenBMC bitbake recipie can drop an out of tree patch it's
been carrying for years.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The PHB callback 'get_link_state' is always reporting the link width,
irrespective of the link status and even when the link is down. It is
causing too much work (and failures) when the PHB is probed during pci
init.
The fix is to look at the link status first and report the link as
down when appropriate.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that links may train in parallel, traces shown during training can
be all mixed up. So add a prefix to all the traces to clearly identify
the chip and link the trace refers to:
OCAPI[<chip id>:<link id>]: this is a very useful message
The lower-level hardware procedures (npu2-hw-procedures.c) also print
traces which would need work. But that code is being reworked to be
better integrated with opencapi and nvidia, so leave it alone for now.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reorder our link training steps so that they are executed on
fundamental reset instead of during the initial setup. Skiboot always
call a fundamental reset on all the PHBs during pci init.
It is done through a state machine, similarly to what is done for
'real' PHBs.
This is the first step for a longer term goal to be able to trigger an
adapter reset from linux. We'll need the reset callbacks of the PHB to
be defined. We have to handle the various delays differently, since a
linux thread shouldn't stay stuck waiting in opal for too long.
No functional changes.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework a bit the code to reset the opencapi adapter:
- make clearer which i2c pin is resetting which device
- break the reset operation in smaller chunks. This is really to
prepare for a future patch.
No functional changes.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Presence detection is not part of the opencapi specification. So each
platform may choose to implement it the way it wants.
All current platforms implement it through an i2c device where we can
query a pin to know if a device is connected or not. ZZ and Zaius have
a similar design and even use the same i2c information and pin
numbers.
However, presence detection on older ZZ planar (older than v4) doesn't
work, so we don't activate it for now, until our lab systems are
upgraded and it's better tested.
Presence detection on witherspoon is still being worked on. It's
shaping up to be quite different, so we may have to revisit the topic
in a later patch.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there is no documentation for function #86, aka
OPAL_CHECK_ASYNC_COMPLETION, which is used on both Linux and FreeBSD,
but not documented yet.
This patch simply adds an initial entry for this function on
OPAL documentation.
Signed-off-by: Breno Leitao <leitao@debian.org>
[stewart: clean up some formatting/details]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
It's been EOL since April 2017 (now 13 months ago).
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
3d019581c98153 introduced clearing PCR on reinit cpus,
and until (the near future from now) qemu didn't support
this register.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
This warning appears to not be particularly useful if you ever
actually *want* to truncate a string.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
hdata/test/../i2c.c:200:1: error: alignment 1 of ‘struct host_i2c_hdr’ is less than 4 [-Werror=packed-not-aligned]
} __packed;
^
Fixes: https://github.com/open-power/skiboot/issues/160
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Fixes this gcc8 warning:
hdata/test/../spira.c: In function ‘add_iplparams_features’:
hdata/test/../spira.c:1209:38: error: argument to ‘sizeof’ in ‘strncpy’ call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
strncpy(name, feature->name, sizeof(feature->name));
^
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
error: ‘const’ attribute on function returning ‘void’ [-Werror=attributes]
Fix by not putting the const attribute in the stub
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
hdata/test/stubs.c:112:11: error: ‘lock_caller’ alias between functions of incompatible types ‘void(void)’ and ‘_Bool(void)’ [-Werror=attribute-alias]
NOOP_STUB(lock_caller);
^~~~~~~~~~~
We fix it by giving the correct prototype to our stub
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Add placeholder support for prepare_hbrt_update call into
hostboot runtime (opal-prd) code. This interface is only
called as part of a concurrent code update on a FSP based
system.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit aaa3e159cb2ce6baa3d4ca1a283c5f918944c18b)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 81e58346c8b8a9f85d7b116f2d3c1b6bc724daea)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 48e5e8be4612b4233b2e443c86fffa1997ab0799)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit c55a54bbf38b6b3144105885e173ae7b6afab091)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
stop1_lite has been removed since it adds no additional benefit
over stop0_lite. stop2_lite has been removed since currently it adds
minimal benefit over stop2. However, the benefit is eclipsed by the time
required to ungate the clocks
Moreover, Lite states don't give up the SMT resources, can potentially
have a performance impact on sibling threads.
Since current OSs (Linux) aren't smart enough to make good decisions
with these stop states, we're (temporarly) removing them from what
we expose to the OS, the idea being to bring them back in a new
DT representation so that only an OS that knows what to do will
do things with them.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[stewart: add to explanation]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
P9 onwards OPAL is building device tree for BMC based system using
HDAT. We are populating bmc/compatible node with bmc version. Hence
do not delete this property.
CC: Jeremy Kerr <jk@ozlabs.org>
CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Add support for tclreadline package if it is present.
This patch loads the package and uses it when the
simulation stops for any reason.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some debugging options will print while the console lock is held,
which is why the console lock is taken as a recursive lock.
However console_write calls __flush_console, which will drop and
re-take the lock non-recursively in some cases.
Just set con_need_flush and return from __flush_console if we are
holding the console lock already.
This stack usage message (taken with this patch applied) could lead
to a deadlock without this:
CPU 0000 lowest stack mark 11768 bytes left pc=300cb808 token=0
CPU 0000 Backtrace:
S: 0000000031c03370 R: 00000000300cb808 .list_check_node+0x1c
S: 0000000031c03410 R: 00000000300cb910 .list_check+0x38
S: 0000000031c034b0 R: 00000000300190ac .try_lock_caller+0xb8
S: 0000000031c03540 R: 00000000300192e0 .lock_caller+0x80
S: 0000000031c03600 R: 0000000030012c70 .__flush_console+0x134
S: 0000000031c036d0 R: 00000000300130cc .console_write+0x68
S: 0000000031c03780 R: 00000000300347bc .vprlog+0xc8
S: 0000000031c03970 R: 0000000030034844 ._prlog+0x50
S: 0000000031c03a00 R: 00000000300364a4 .log_simple_error+0x74
S: 0000000031c03b90 R: 000000003004ab48 .occ_pstates_init+0x184
S: 0000000031c03d50 R: 000000003001480c .load_and_boot_kernel+0x38c
S: 0000000031c03e30 R: 000000003001571c .main_cpu_entry+0x62c
S: 0000000031c03f00 R: 0000000030002700 boot_entry+0x1c0
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The memory errors (CEs and UEs) that are detected as part of background
memory scrubbing are reported by PRD asynchronously to opal-prd along with
affected memory ranges. hservice_memory_error() converts these ranges into
page granularity before hooking up them to soft/hard offline-ing
infrastructure.
But the current implementation of hservice_memory_error() does not hookup
all the pages to soft/hard offline-ing if any of the page offline action
fails. e.g hard offline can fail for:
- Pages that are not part of buddy managed pool.
- Pages that are reserved by kernel using memblock_reserved()
- Pages that are in use by kernel.
But for the pages that are in use by user space application, the hard
offline marks the page as hwpoison, sends SIGBUS signal to kill the
affected application as recovery action and returns success.
Hence, It is possible that some of the pages in that memory range are in
use by application or free. By stopping on first error we loose the
opportunity to hwpoison the subsequent pages which may be free or in use by
application. This patch fixes this issue.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 0x3a OEM command is for IBM commands, while 0x32 was for AMI ones.
Sometime in the P8 timeframe, AMI BMCs were changed to listen for our
commands on either 0x32 or 0x3a. Since 0x3a is the direction forward,
we'll use that, as P9 machines with AMI BMCs probably also want these
to work, and let's not bet that 0x32 will continue to be okay.
Suggested-by: Joseph Reynolds <jrey@us.ibm.com>
Suggested-by: Maury Zipse <zipse@us.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
Caught by scan-build. We rewrite ecc_len with a different
value prior to use
Signed-off-by: Balbir singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
Caught by scan-build, we seem to trap the errors in rc, but
not take any recovery action during blocklevel_write.
Signed-off-by: Balbir singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|