| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initialising raw flash lead to a dead assignment to rc. Check the return
code and take the failure path as necessary. Both before and after the
fix we see output along the lines of the following when flash_init()
fails:
[ 53.283182881,7] IRQ: Registering 0800..0ff7 ops @0x300d4b98 (data 0x3052b9d8)
[ 53.283184335,7] IRQ: Registering 0ff8..0fff ops @0x300d4bc8 (data 0x3052b9d8)
[ 53.283185513,7] PHB#0000: Initializing PHB...
[ 53.288260827,4] FLASH: Can't load resource id:0. No system flash found
[ 53.288354442,4] FLASH: Can't load resource id:1. No system flash found
[ 53.342933439,3] CAPP: Error loading ucode lid. index=200ea
[ 53.462749486,2] NVRAM: Failed to load
[ 53.462819095,2] NVRAM: Failed to load
[ 53.462894236,2] NVRAM: Failed to load
[ 53.462967071,2] NVRAM: Failed to load
[ 53.463033077,2] NVRAM: Failed to load
[ 53.463144847,2] NVRAM: Failed to load
Eventually followed by:
[ 57.216942479,5] INIT: platform wait for kernel load failed
[ 57.217051132,5] INIT: Assuming kernel at 0x20000000
[ 57.217127508,3] INIT: ELF header not found. Assuming raw binary.
[ 57.217249886,2] NVRAM: Failed to load
[ 57.221294487,0] FATAL: Kernel is zeros, can't execute!
[ 57.221397429,0] Assert fail: core/init.c:615:0
[ 57.221471414,0] Aborting!
CPU 0028 Backtrace:
S: 0000000031d43c60 R: 000000003001b274 ._abort+0x4c
S: 0000000031d43ce0 R: 000000003001b2f0 .assert_fail+0x34
S: 0000000031d43d60 R: 0000000030014814 .load_and_boot_kernel+0xae4
S: 0000000031d43e30 R: 0000000030015164 .main_cpu_entry+0x680
S: 0000000031d43f00 R: 0000000030002718 boot_entry+0x1c0
--- OPAL boot ---
Analysis of the execution paths suggests we'll always "safely" end this
way due the setup sequence for the blocklevel callbacks in flash_init()
and error handling in blocklevel_get_info(), and there's no current risk
of executing from unexpected memory locations. As such the issue is
reduced to down to a fix for poor error hygene in the original change
and a resolution for a Coverity warning (famous last words etc).
Fixes: c826e1ca9e5b ("astbmc: Try IPMI HIOMAP for P8 (again)")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
Update skiboot.tcl device tree to include trace-imc node to help
test the code path in mambo.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch to enhance the imc opal call to support and handle trace_imc mode.
To initialize the trace-mode, TRACE_IMC_SCOM value is written to
TRACE_IMC_ADDR of the respective core.
TRACE_IMC_SCOM is a 64bit value, and each bit represent the following:
0:1 : SAMPSEL
2:33 : CPMC_LOAD
34:40 : CPMC1SEL
41:47 : CPMC2SEL
48:50 : BUFFERSIZE
51:63 : RESERVED
Currently the value for TRACE_IMC_SCOM is hard coded.
During initialization htm_mode is disabled, and enabled only at start.
The opal calls to start/stop the counters, will write CORE_IMC_HTM_MODE_ENABLE/
CORE_IMC_HTM_MODE_DISABLE respectively to the htm_scom_index of the desired
cores.
Additional switch cases are added to the current opal calls to start/stop
the counters for trace-mode.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Factor out core-imc stop api code from opal_imc_counters_init() for
better readability.
Also fix the error message if, wake_up_engine_state is not
"WAKEUP_ENGINE_PRESENT".
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Add macros needed for Trace mode enablement of IMC(In-Memory
Collection Counters). These macros are used to identify the
trace node in the device-tree and to make appropriate scom calls
to enable trace-mode in the hardware.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
OPAL call APIs for In-Memory Collection Counter infrastructure(IMC),
includes a new device type called OPAL_IMC_COUNTERS_TRACE. Edit the
documentation to include this information.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
trace-node information.
Add trace-node information in the device-tree document for IMC.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
Add documentation for IMC trace-mode in imc.rst.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
PCIe ports connected to CPU1 and CPU3 now work as x16 instead of x8x8.
Signed-off-by: Artem Senichev <a.senichev@yadro.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Rename ___backtrace() to backtrace_create() and ___print_backtrace() to
backtrace_print(). Get rid of __backtrace() and __print_backtrace()
wrappers.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
We're about to get rid of __backtrace() and __print_backtrace(), convert
the stack check code to not use them.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
We're about to get rid of __backtrace() and __print_backtrace(), convert
the FSP/IPMI attn code to not use them.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
In ___backtrace(), store the current PIR in the metadata struct, rather
than relying on the caller to do it.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every time we take a backtrace, we have to store the number of entries, the
OPAL API token, r1 caller and PIR values. Rather than defining these and
passing them around all over the place, let's throw them in a struct.
Define a struct, struct bt_metadata, to store these details, and convert
___backtrace() and ___print_backtrace() to use it.
We change the wrapper functions __backtrace() and __print_backtrace() to
call ___backtrace()/___print_backtrace() with struct bt_metadata, but don't
change their parameter profiles for now - we'll do that later.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
___backtrace() is always called with r1 = __builtin_frame_address(0), and
it's unlikely we're going to need it to do something else any time soon, so
simplify the API by removing the parameter.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On TOD failure, with TB stuck, when linux heads down to
pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic
cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest
all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed.
But with panic cpu stuck inside OPAL, linux never recovers/reboot.
p0 c1 t0
NIA : 0x000000003001dd3c <.time_wait+0x64>
CFAR : 0x000000003001dce4 <.time_wait+0xc>
MSR : 0x9000000002803002
LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
STACK: SP NIA
0x0000000031c236e0 0x0000000031c23760 (big-endian)
0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c>
0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150>
0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc>
0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc>
0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc>
0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4>
0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0>
0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134>
0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80>
0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94>
0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0>
0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4>
0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140>
0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c>
0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68>
0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8>
0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8>
0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88>
0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164>
0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c>
This is because, there is a while loop towards the end of
ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match
with "msg". It loops over time_wait_ms() until exit condition is met. In
normal scenario time_wait_ms() calls run pollers so that ipmi backend gets
a chance to check ipmi response and set sync_msg to NULL.
while (sync_msg == msg)
time_wait_ms(10);
But in the event when TB is in failed state time_wait_ms()->time_wait_poll()
returns immediately without calling pollers and hence we end up looping
forever. This patch fixes this hang by calling opal_run_pollers() in TB
failed state as well.
Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.")
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Fixes: 7516e382 (core/ipmi: Improve error message)
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you're trying to boot a gigantic kernel in mambo (which you can
reproduce by building a kernel with CONFIG_MODULES=n) you'll get
misleading errors like:
WARNING: 0: (0): [0:0]: Invalid/unsupported instr 0x00000000[INVALID]
WARNING: 0: (0): PC(EA): 0x0000000030000010 PC(RA):0x0000000030000010 MSR: 0x9000000000000000 LR: 0x0000000000000000
WARNING: 0: (0): numInstructions = 0
WARNING: 1: (1): [0:0]: Invalid/unsupported instr 0x00000000[INVALID]
WARNING: 1: (1): PC(EA): 0x0000000000000E40 PC(RA):0x0000000000000E40 MSR: 0x9000000000000000 LR: 0x0000000000000000
WARNING: 1: (1): numInstructions = 1
WARNING: 1: (1): Interrupt to 0x0000000000000E40 from 0x0000000000000E40
INFO: 1: (2): ** Execution stopped: Continuous Interrupt, Instruction caused exception, **
So add an error to skiboot.tcl to warn the user before this happens.
Making PAYLOAD_ADDR further back is one way to do this but if there's a
less gross way to generally work around this very niche problem, I can
suggest that instead.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
skiboot.tcl defines PAYLOAD_ADDR as 0x20000000, which is the default in
skiboot. This is also the default in skiboot unless kernel-base-address
is set in the device tree.
If you change PAYLOAD_ADDR to something else for mambo, skiboot won't
see it because it doesn't set that DT property, so fix it so that it does.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Michael Neuling <mikey@neuling.org>
[stewart: fix up mambo hacks for STB]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We would like to successfully boot if we have a dependency on the BMC
for flash even if the BMC is not current ready to service flash
requests. On the assumption that it will become ready, retry for several
minutes to cover a BMC reboot cycle and *eventually* rather than
*immediately* crash out with:
[ 269.549748] reboot: Restarting system
[ 390.297462587,5] OPAL: Reboot request...
[ 390.297737995,5] RESET: Initiating fast reboot 1...
[ 391.074707590,5] Clearing unused memory:
[ 391.075198880,5] PCI: Clearing all devices...
[ 391.075201618,7] Clearing region 201ffe000000-201fff800000
[ 391.086235699,5] PCI: Resetting PHBs and training links...
[ 391.254089525,3] FFS: Error 17 reading flash header
[ 391.254159668,3] FLASH: Can't open ffs handle: 17
[ 392.307245135,5] PCI: Probing slots...
[ 392.363723191,5] PCI Summary:
...
[ 393.423255262,5] OCC: All Chip Rdy after 0 ms
[ 393.453092828,5] INIT: Starting kernel at 0x20000000, fdt at
0x30800a88 390645 bytes
[ 393.453202605,0] FATAL: Kernel is zeros, can't execute!
[ 393.453247064,0] Assert fail: core/init.c:593:0
[ 393.453289682,0] Aborting!
CPU 0040 Backtrace:
S: 0000000031e03ca0 R: 000000003001af60 ._abort+0x4c
S: 0000000031e03d20 R: 000000003001afdc .assert_fail+0x34
S: 0000000031e03da0 R: 00000000300146d8 .load_and_boot_kernel+0xb30
S: 0000000031e03e70 R: 0000000030026cf0 .fast_reboot_entry+0x39c
S: 0000000031e03f00 R: 0000000030002a4c fast_reset_entry+0x2c
--- OPAL boot ---
The OPAL flash API hooks directly into the blocklevel layer, so there's
no delay for e.g. the host kernel, just for asynchronously loaded
resources during boot.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To be able to support migration of guests using the XIVE native
exploitation mode, (where the queue is effectively owned by the
guest), KVM needs to be able to save and restore the HW-modified
fields of the queue, such as the current queue producer pointer and
generation bit, and to retrieve the modified thread context registers
of the VP from the NVT structure : the VP interrupt pending bits.
However, there is no need to set back the NVT structure on P9. P10
should be the same.
Based on previous work from BenH.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Exiting early in the power off case makes sense since we can't disable
slot power (or assert PERST) for suprise hotplug slots. However, we
should not exit early in the power-on case since it's possible slot
power may have been disabled (or just not enabled at boot time).
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
Working out what was actually going on here took forever.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
We can slot information from the LXVPD without having power control
information about that slot. This patch changes the init path so that
we always override the add_properties() call rather than only when we
have power control information about the slot.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
Useful to know since it changes the behaviour of the slot core.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some reason we look at the power control indicator and use that to
determine if the slot is "off" rather than the power control flag that
is used to power down the slot.
While we're here change the default behaviour so that the slot is
assumed to be powered on if there's no slot capability, or if there's
no power control available.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The maximum string length for the slot label / device location code in
the PCI summary is currently 32 characters. This results in some IBM
location codes being truncated due to their length, e.g.
PHB#0001:02:11.0 [SWDN] SLOT=C11 x8
PHB#0001:13:00.0 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
PHB#0001:13:00.1 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
PHB#0001:13:00.2 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
PHB#0001:13:00.3 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
Which obscure the actual location of the card, and it looks bad. This
patch increases the maximum length of the label string to 80 characters
since that's the maximum length for a location code.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
The hub-id is stored in the PBCQ node rather than the stack node so we
never add it to the PHB node. This breaks the lxvpd slot lookup code
since the hub-id is encoded in the VPD record that we need to find the
slot information.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
P8 and P9 use the same IO VPD setup, so we need to load the IOHUB VPD on
P9 systems too.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[stewart: fixup op920 hdat_to_dt dts expected result, remove incorrect
comment, skip IOVPD loading on non-FSP.]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GPUs on Redbud and Sequoia platforms are interconnected in groups of
2 or 3 GPUs. The problem with that is if the user decides to pass a single
GPU from a group to the userspace, we need to ensure that links between
GPUs do not get enabled.
A V100 GPU provides a way to disable selected links. In order to only
disable links to peer GPUs, we need a topology map.
This adds an "ibm,nvlink-peers" property to a GPU DT node with phandles
of peer GPUs and NVLink2 bridges. The index in the property is a GPU link
number.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Reza Arbab <arbab@linux.ibm.com>
[stewart: fixed strtol found in review by Reza]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
The two are similar enough and I'd like to have a slot table for our
Talos.
Cc: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Found using afl-lop on P9 HDAT. Pretty obvious what the problem is once
you look at it, and it's much better having a controlled failure mode
than just going off randomly into memory and segfaulting.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
| |
Fixes: e4a06f098c4f34fb5539129dddb6646667f4d5ab
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've been getting this warning/error from recent GCC:
In file included from hw/ipmi/test/run-fru.c:22:
hw/ipmi/test/../ipmi-fru.c: In function ‘fru_add’:
hw/ipmi/test/../ipmi-fru.c:162:3: warning: ‘strncpy’ output truncated copying 32 bytes from a string of length 38 [-Wstringop-truncation]
strncpy(info.version, version, MAX_STR_LEN + 1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This patch does two things:
1) Re-arrange some code to shut GCC up.
2) Add extra fu to tests to ensure we're producing correct bytes.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For opencapi, we currently do impedance calibration when initializing
the PHY for the device, which could run in parallel if we were rich
and had multiple opencapi devices. But if 2 devices are on the same
obus, the 2 calibration sequences could overlap, which likely yields
bad results and is useless anyway since it only needs to be done once
per obus.
This patch splits the opencapi PHY reset in 2 parts:
- a 'init' part called serially at boot. That's when zcal is done. If
we have 2 devices on the same socket, the zcal won't be redone,
since we're called serially and we'll see it has already be done for
the obus
- a 'reset' part called during fundamental reset as a prereq for link
training. It does the PHY setup for a set of lanes and the dccal.
The PHY team confirmed there's no dependency between zcal and the
other reset steps and it can be moved earlier.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zcal procedure needs to be run once per obus. We keep track of
which obus is already calibrated in an array indexed by the obus
number. However, the obus number is inferred from the brick index,
which works well for nvlink but not for opencapi.
Create an obus_index() function, which, from a device, returns the
correct obus index, irrespective of the device type.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
set_iovalid() is called on the PHY reset path. The hw logic it touches
is meaningless for opencapi. It's not hurting as long as all the links
under the NPU are in opencapi mode, but in case of mixing opencapi and
nvlink, we'll be in troubles: the code finds which bit to modify based
on the brick index, which varies depending on the mode. So calling
that function on an opencapi device may modify a nvlink brick! For
example, for brick index 3.
So we simply avoid doing anything when calling set_iovalid() for an
opencapi device.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
Use memcpy as other libffs functions do.
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
(cherry picked from commit 8463ee4bc297fab0181fbb418954c3476a2adbde)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
(cherry picked from commit 3d135fe39a6ac509bfa49a9eb9e5f8386fc5109d)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have new mailing list (skiboot-stable@lists.ozlabs.org) for handling
stable trees!
Now onwards to submit patches to stable tree, one should send patches to
skiboot-stable@lists.ozlabs.org mailing list with subject prefix
[PATH <stable-version>] -OR- CC skiboot-stable@lists.ozlabs.org mailing
list while sending patches to upstream mailing list (skiboot@lists.ozlabs.org).
This will remove the requirement to do the --suppress-cc OR other related
--no-cc-* options from git-send-email to remove "CC" list.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During some code updates the slot labels were updated to reflect
the phb layout, however expectations were that the slot labels be
aligned with the riser card slots and not the system planar slots.
[stewart: The tale of how we got here is long and varied and not at
all clear. The first ESS systems went out with a skiboot v5.9.8 with
additional SuperMicro patches. It was probably a slot table, but who knows,
we don't have the code so can't check. It's possible it was all coming
in through HDAT instead). The op-build tree (thus the exact patches)
shipped on systems that work correct seems to not be around anywhere anymore
(if it ever was). It was only in skiboot v6.0 that a slot table made
it in, and, of course, only having remote machines in random configs,
including possibly with riser cards from Briggs&Stratton rather than
the ones destined for this system, doesn't make for verifying this
at all. It also doesn't help that *consistently* there is *never*
any review on slot tables, and we've had things be wrong in the past.
Combine this with not upstream Hostboot patches.]
Cc: skiboot-stable@lists.ozlabs.org
Cc: Benjamin Mashak <mashak@us.ibm.com>
Cc: Michael Lim <youhour@us.ibm.com>
Fixes: 64a16ae05bb2 ("p9dsu: Fix slot labels for p9dsu2u")
Fixes: 87517c8737b9 ("p9dsu: Fix p9dsu slot tables")
Fixes: 31231ed300f2 ("p9dsu: Fix p9dsu default variant")
Signed-off-by: Deb McLemore <debmc@linux.ibm.com>
[stewart: added more detailed explanation, cc stable]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
We're getting close to Fedora 30, and keeping N-1 fedora around for too
long doesn't really add much.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
We use the same compiler on our CentOS7 image, and it has the bonus of
being able to test against P8 and P9 Mambo.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 815417dcda2e ("init, occ: Initialise OCC earlier on BMC systems")
conditionally invoked occ_pstates_init() only on FSP based systems in
load_and_boot_kernel(). Due to this pstate table is re-parsed on FSP
system and skipped on BMC system during fast-reboot. So this patch fixes
this by invoking occ_pstates_init() on all boxes during fast-reboot.
Cc: skiboot-stable@lists.ozlabs.org
Fixes: 815417dcda2e ("init, occ: Initialise OCC earlier on BMC systems")
Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OCC can change the pstate table at runtime to modify pstate limits or
for characterization purpose. These changes are reflected by re-parsing
the pstate table during fast-reboot to update the device-tree. Only
relevant pstate DT properties are deleted and newly added during
fast-reboot. The device-tree properties like 'freq-domain-mask' and
'domain-runs-at' are currently hard-coded and need not be updated
during fast-reboot. So this patch removes them from the fast-reboot path.
This patch fixes the below crash:
[ 270.313998453,5] OCC: All Chip Rdy after 0 ms
[ 270.314148918,3] Duplicate property "freq-domain-mask" in node /ibm,opal/power-mgt
[ 270.314208553,0] Aborting!
CPU 083c Backtrace:
S: 0000000035de3a20 R: 000000003001b480 ._abort+0x4c
S: 0000000035de3aa0 R: 0000000030028704 .new_property+0xd8
S: 0000000035de3b30 R: 0000000030028964 .__dt_add_property_cells+0x30
S: 0000000035de3bd0 R: 0000000030042980 .occ_pstates_init+0x7c8
S: 0000000035de3d90 R: 00000000300145f4 .load_and_boot_kernel+0x980
S: 0000000035de3e70 R: 00000000300276b4 .fast_reboot_entry+0x37c
S: 0000000035de3f00 R: 0000000030002ac4 reset_fast_reboot_wakeup+0x40
Fixes: b821f8c2a8e3("power-mgmt : occ : Add 'freq-domain-mask' DT property")
Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|