| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Debian (in its infinite "wisdom") has decided to erase most evidence of
there ever being a ppc64el installer for Debian Jessie.
So, screw them. Backwards compatibility testing was for losers anyway.
There is snapshot.debian.org, but it's *really* slow pulling things from
there, so it's not really an option unless we want to add multiple
minutes to test duration.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some system planars we need to apply some fixups to the PCI slot
power controllers. These are done at boot time and a slightly bizzare in
their construction since they share the I2C request completion callback
with the runtime slot power on method which affects the PCI slot state
machine.
This is confusing to say the least, so this patch reworks the fixup code
to use the synchronus I2C request code rather than open-coding the wait
based on what PCI slot state is in use. It also does some general
control flow cleanup and adds some comments explaining what the fixups
are for.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split the i2c_request_send() method into two methods: i2c_request_send()
which allocates and populates and i2c_request structure, and
i2c_request_sync() which take a request structure and blocks until it
completes.
This allows code that allocates a i2c_request structure elsewhere to
make use of the existing busy-wait and request retry logic. Fix the
return types to use int64_t while we're here since these are returning
OPAL_API error codes.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
Use the new built-in state variable rather than a single-use completion
function. Makes things a bit cleaner.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Allow the submitter to track the state of an I2C request by adding
a state field to the request. This avoids the need to use a stub
completion callback in some cases.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The delay between the ASSERT_DELAY and DEASSERT_DELAY states is set to
one timebase tick. This state seems to have been a hold over from PHB3
where it was used to add a 1s delay between de-asserting PERST and
polling the link for the CAPI FPGA. There's no requirement for that here
since the link polling on PHB4 is a bit smarter so we should be fine.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some time ago Mikey added some code work around a bug we found where a
certain RAID card wouldn't come back again after a fast-reboot. The
workaround is setting the Link Disable bit before asserting PERST and
clear it after de-asserting PERST.
Currently we do this in the FRESET path, but not in the CRESET path.
This patch moves the PERST control into its own function to reduce
duplication and to the workaround is applied in all circumstances.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we do an freset the first step is to check if a card is present in
the slot. However, this only occurs when we enter phb4_freset() with the
slot state set to SLOT_NORMAL. This occurs in:
a) The creset path, and
b) When the OS manually requests an FRESET via an OPAL call.
a) is problematic because in the boot path the generic code will put the
slot into FRESET_START manually before calling into phb4_freset(). This
can result in a situation where a device is detected on boot, but not
after a CRESET.
I've noticed this occurring on systems where the PHB's slot presence
detect signal is not wired to an adapter. In this situation we can rely
on the in-band presence mechanism, but the presence check will make
us exit before that has a chance to work.
Additionally, if we enter from the CRESET path this early exit leaves
the slot's PERST signal being left asserted. This isn't currently an issue,
but if we want to support hotplug of devices into the root port it will
be.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PERST is asserted at the beginning of the CRESET process to prevent
the downstream device from interacting with the host while the PHB logic
is being reset and re-initialised. There is at least a 100ms wait during
the CRESET processing so it's not necessary to wait this time again
in the FRESET handler.
This patch extends the delay after re-setting the PHB logic to extend
to the 250ms PERST wait period that we typically use and sets the
skip_perst flag so that we don't wait this time again in the FRESET
handler.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initialising raw flash lead to a dead assignment to rc. Check the return
code and take the failure path as necessary. Both before and after the
fix we see output along the lines of the following when flash_init()
fails:
[ 53.283182881,7] IRQ: Registering 0800..0ff7 ops @0x300d4b98 (data 0x3052b9d8)
[ 53.283184335,7] IRQ: Registering 0ff8..0fff ops @0x300d4bc8 (data 0x3052b9d8)
[ 53.283185513,7] PHB#0000: Initializing PHB...
[ 53.288260827,4] FLASH: Can't load resource id:0. No system flash found
[ 53.288354442,4] FLASH: Can't load resource id:1. No system flash found
[ 53.342933439,3] CAPP: Error loading ucode lid. index=200ea
[ 53.462749486,2] NVRAM: Failed to load
[ 53.462819095,2] NVRAM: Failed to load
[ 53.462894236,2] NVRAM: Failed to load
[ 53.462967071,2] NVRAM: Failed to load
[ 53.463033077,2] NVRAM: Failed to load
[ 53.463144847,2] NVRAM: Failed to load
Eventually followed by:
[ 57.216942479,5] INIT: platform wait for kernel load failed
[ 57.217051132,5] INIT: Assuming kernel at 0x20000000
[ 57.217127508,3] INIT: ELF header not found. Assuming raw binary.
[ 57.217249886,2] NVRAM: Failed to load
[ 57.221294487,0] FATAL: Kernel is zeros, can't execute!
[ 57.221397429,0] Assert fail: core/init.c:615:0
[ 57.221471414,0] Aborting!
CPU 0028 Backtrace:
S: 0000000031d43c60 R: 000000003001b274 ._abort+0x4c
S: 0000000031d43ce0 R: 000000003001b2f0 .assert_fail+0x34
S: 0000000031d43d60 R: 0000000030014814 .load_and_boot_kernel+0xae4
S: 0000000031d43e30 R: 0000000030015164 .main_cpu_entry+0x680
S: 0000000031d43f00 R: 0000000030002718 boot_entry+0x1c0
--- OPAL boot ---
Analysis of the execution paths suggests we'll always "safely" end this
way due the setup sequence for the blocklevel callbacks in flash_init()
and error handling in blocklevel_get_info(), and there's no current risk
of executing from unexpected memory locations. As such the issue is
reduced to down to a fix for poor error hygene in the original change
and a resolution for a Coverity warning (famous last words etc).
Fixes: c826e1ca9e5b ("astbmc: Try IPMI HIOMAP for P8 (again)")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
Update skiboot.tcl device tree to include trace-imc node to help
test the code path in mambo.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch to enhance the imc opal call to support and handle trace_imc mode.
To initialize the trace-mode, TRACE_IMC_SCOM value is written to
TRACE_IMC_ADDR of the respective core.
TRACE_IMC_SCOM is a 64bit value, and each bit represent the following:
0:1 : SAMPSEL
2:33 : CPMC_LOAD
34:40 : CPMC1SEL
41:47 : CPMC2SEL
48:50 : BUFFERSIZE
51:63 : RESERVED
Currently the value for TRACE_IMC_SCOM is hard coded.
During initialization htm_mode is disabled, and enabled only at start.
The opal calls to start/stop the counters, will write CORE_IMC_HTM_MODE_ENABLE/
CORE_IMC_HTM_MODE_DISABLE respectively to the htm_scom_index of the desired
cores.
Additional switch cases are added to the current opal calls to start/stop
the counters for trace-mode.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Factor out core-imc stop api code from opal_imc_counters_init() for
better readability.
Also fix the error message if, wake_up_engine_state is not
"WAKEUP_ENGINE_PRESENT".
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Add macros needed for Trace mode enablement of IMC(In-Memory
Collection Counters). These macros are used to identify the
trace node in the device-tree and to make appropriate scom calls
to enable trace-mode in the hardware.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
OPAL call APIs for In-Memory Collection Counter infrastructure(IMC),
includes a new device type called OPAL_IMC_COUNTERS_TRACE. Edit the
documentation to include this information.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
trace-node information.
Add trace-node information in the device-tree document for IMC.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
Add documentation for IMC trace-mode in imc.rst.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
PCIe ports connected to CPU1 and CPU3 now work as x16 instead of x8x8.
Signed-off-by: Artem Senichev <a.senichev@yadro.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Rename ___backtrace() to backtrace_create() and ___print_backtrace() to
backtrace_print(). Get rid of __backtrace() and __print_backtrace()
wrappers.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
We're about to get rid of __backtrace() and __print_backtrace(), convert
the stack check code to not use them.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
We're about to get rid of __backtrace() and __print_backtrace(), convert
the FSP/IPMI attn code to not use them.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
In ___backtrace(), store the current PIR in the metadata struct, rather
than relying on the caller to do it.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every time we take a backtrace, we have to store the number of entries, the
OPAL API token, r1 caller and PIR values. Rather than defining these and
passing them around all over the place, let's throw them in a struct.
Define a struct, struct bt_metadata, to store these details, and convert
___backtrace() and ___print_backtrace() to use it.
We change the wrapper functions __backtrace() and __print_backtrace() to
call ___backtrace()/___print_backtrace() with struct bt_metadata, but don't
change their parameter profiles for now - we'll do that later.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
___backtrace() is always called with r1 = __builtin_frame_address(0), and
it's unlikely we're going to need it to do something else any time soon, so
simplify the API by removing the parameter.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On TOD failure, with TB stuck, when linux heads down to
pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic
cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest
all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed.
But with panic cpu stuck inside OPAL, linux never recovers/reboot.
p0 c1 t0
NIA : 0x000000003001dd3c <.time_wait+0x64>
CFAR : 0x000000003001dce4 <.time_wait+0xc>
MSR : 0x9000000002803002
LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
STACK: SP NIA
0x0000000031c236e0 0x0000000031c23760 (big-endian)
0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c>
0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150>
0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc>
0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc>
0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc>
0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4>
0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0>
0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134>
0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80>
0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94>
0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0>
0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4>
0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140>
0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c>
0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68>
0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8>
0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8>
0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88>
0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164>
0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c>
This is because, there is a while loop towards the end of
ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match
with "msg". It loops over time_wait_ms() until exit condition is met. In
normal scenario time_wait_ms() calls run pollers so that ipmi backend gets
a chance to check ipmi response and set sync_msg to NULL.
while (sync_msg == msg)
time_wait_ms(10);
But in the event when TB is in failed state time_wait_ms()->time_wait_poll()
returns immediately without calling pollers and hence we end up looping
forever. This patch fixes this hang by calling opal_run_pollers() in TB
failed state as well.
Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.")
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Fixes: 7516e382 (core/ipmi: Improve error message)
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you're trying to boot a gigantic kernel in mambo (which you can
reproduce by building a kernel with CONFIG_MODULES=n) you'll get
misleading errors like:
WARNING: 0: (0): [0:0]: Invalid/unsupported instr 0x00000000[INVALID]
WARNING: 0: (0): PC(EA): 0x0000000030000010 PC(RA):0x0000000030000010 MSR: 0x9000000000000000 LR: 0x0000000000000000
WARNING: 0: (0): numInstructions = 0
WARNING: 1: (1): [0:0]: Invalid/unsupported instr 0x00000000[INVALID]
WARNING: 1: (1): PC(EA): 0x0000000000000E40 PC(RA):0x0000000000000E40 MSR: 0x9000000000000000 LR: 0x0000000000000000
WARNING: 1: (1): numInstructions = 1
WARNING: 1: (1): Interrupt to 0x0000000000000E40 from 0x0000000000000E40
INFO: 1: (2): ** Execution stopped: Continuous Interrupt, Instruction caused exception, **
So add an error to skiboot.tcl to warn the user before this happens.
Making PAYLOAD_ADDR further back is one way to do this but if there's a
less gross way to generally work around this very niche problem, I can
suggest that instead.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
skiboot.tcl defines PAYLOAD_ADDR as 0x20000000, which is the default in
skiboot. This is also the default in skiboot unless kernel-base-address
is set in the device tree.
If you change PAYLOAD_ADDR to something else for mambo, skiboot won't
see it because it doesn't set that DT property, so fix it so that it does.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Michael Neuling <mikey@neuling.org>
[stewart: fix up mambo hacks for STB]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We would like to successfully boot if we have a dependency on the BMC
for flash even if the BMC is not current ready to service flash
requests. On the assumption that it will become ready, retry for several
minutes to cover a BMC reboot cycle and *eventually* rather than
*immediately* crash out with:
[ 269.549748] reboot: Restarting system
[ 390.297462587,5] OPAL: Reboot request...
[ 390.297737995,5] RESET: Initiating fast reboot 1...
[ 391.074707590,5] Clearing unused memory:
[ 391.075198880,5] PCI: Clearing all devices...
[ 391.075201618,7] Clearing region 201ffe000000-201fff800000
[ 391.086235699,5] PCI: Resetting PHBs and training links...
[ 391.254089525,3] FFS: Error 17 reading flash header
[ 391.254159668,3] FLASH: Can't open ffs handle: 17
[ 392.307245135,5] PCI: Probing slots...
[ 392.363723191,5] PCI Summary:
...
[ 393.423255262,5] OCC: All Chip Rdy after 0 ms
[ 393.453092828,5] INIT: Starting kernel at 0x20000000, fdt at
0x30800a88 390645 bytes
[ 393.453202605,0] FATAL: Kernel is zeros, can't execute!
[ 393.453247064,0] Assert fail: core/init.c:593:0
[ 393.453289682,0] Aborting!
CPU 0040 Backtrace:
S: 0000000031e03ca0 R: 000000003001af60 ._abort+0x4c
S: 0000000031e03d20 R: 000000003001afdc .assert_fail+0x34
S: 0000000031e03da0 R: 00000000300146d8 .load_and_boot_kernel+0xb30
S: 0000000031e03e70 R: 0000000030026cf0 .fast_reboot_entry+0x39c
S: 0000000031e03f00 R: 0000000030002a4c fast_reset_entry+0x2c
--- OPAL boot ---
The OPAL flash API hooks directly into the blocklevel layer, so there's
no delay for e.g. the host kernel, just for asynchronously loaded
resources during boot.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To be able to support migration of guests using the XIVE native
exploitation mode, (where the queue is effectively owned by the
guest), KVM needs to be able to save and restore the HW-modified
fields of the queue, such as the current queue producer pointer and
generation bit, and to retrieve the modified thread context registers
of the VP from the NVT structure : the VP interrupt pending bits.
However, there is no need to set back the NVT structure on P9. P10
should be the same.
Based on previous work from BenH.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Exiting early in the power off case makes sense since we can't disable
slot power (or assert PERST) for suprise hotplug slots. However, we
should not exit early in the power-on case since it's possible slot
power may have been disabled (or just not enabled at boot time).
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
Working out what was actually going on here took forever.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
We can slot information from the LXVPD without having power control
information about that slot. This patch changes the init path so that
we always override the add_properties() call rather than only when we
have power control information about the slot.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
| |
Useful to know since it changes the behaviour of the slot core.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some reason we look at the power control indicator and use that to
determine if the slot is "off" rather than the power control flag that
is used to power down the slot.
While we're here change the default behaviour so that the slot is
assumed to be powered on if there's no slot capability, or if there's
no power control available.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The maximum string length for the slot label / device location code in
the PCI summary is currently 32 characters. This results in some IBM
location codes being truncated due to their length, e.g.
PHB#0001:02:11.0 [SWDN] SLOT=C11 x8
PHB#0001:13:00.0 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
PHB#0001:13:00.1 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
PHB#0001:13:00.2 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
PHB#0001:13:00.3 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
Which obscure the actual location of the card, and it looks bad. This
patch increases the maximum length of the label string to 80 characters
since that's the maximum length for a location code.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
The hub-id is stored in the PBCQ node rather than the stack node so we
never add it to the PHB node. This breaks the lxvpd slot lookup code
since the hub-id is encoded in the VPD record that we need to find the
slot information.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
P8 and P9 use the same IO VPD setup, so we need to load the IOHUB VPD on
P9 systems too.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[stewart: fixup op920 hdat_to_dt dts expected result, remove incorrect
comment, skip IOVPD loading on non-FSP.]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GPUs on Redbud and Sequoia platforms are interconnected in groups of
2 or 3 GPUs. The problem with that is if the user decides to pass a single
GPU from a group to the userspace, we need to ensure that links between
GPUs do not get enabled.
A V100 GPU provides a way to disable selected links. In order to only
disable links to peer GPUs, we need a topology map.
This adds an "ibm,nvlink-peers" property to a GPU DT node with phandles
of peer GPUs and NVLink2 bridges. The index in the property is a GPU link
number.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Reza Arbab <arbab@linux.ibm.com>
[stewart: fixed strtol found in review by Reza]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
The two are similar enough and I'd like to have a slot table for our
Talos.
Cc: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Found using afl-lop on P9 HDAT. Pretty obvious what the problem is once
you look at it, and it's much better having a controlled failure mode
than just going off randomly into memory and segfaulting.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
| |
Fixes: e4a06f098c4f34fb5539129dddb6646667f4d5ab
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've been getting this warning/error from recent GCC:
In file included from hw/ipmi/test/run-fru.c:22:
hw/ipmi/test/../ipmi-fru.c: In function ‘fru_add’:
hw/ipmi/test/../ipmi-fru.c:162:3: warning: ‘strncpy’ output truncated copying 32 bytes from a string of length 38 [-Wstringop-truncation]
strncpy(info.version, version, MAX_STR_LEN + 1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This patch does two things:
1) Re-arrange some code to shut GCC up.
2) Add extra fu to tests to ensure we're producing correct bytes.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For opencapi, we currently do impedance calibration when initializing
the PHY for the device, which could run in parallel if we were rich
and had multiple opencapi devices. But if 2 devices are on the same
obus, the 2 calibration sequences could overlap, which likely yields
bad results and is useless anyway since it only needs to be done once
per obus.
This patch splits the opencapi PHY reset in 2 parts:
- a 'init' part called serially at boot. That's when zcal is done. If
we have 2 devices on the same socket, the zcal won't be redone,
since we're called serially and we'll see it has already be done for
the obus
- a 'reset' part called during fundamental reset as a prereq for link
training. It does the PHY setup for a set of lanes and the dccal.
The PHY team confirmed there's no dependency between zcal and the
other reset steps and it can be moved earlier.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zcal procedure needs to be run once per obus. We keep track of
which obus is already calibrated in an array indexed by the obus
number. However, the obus number is inferred from the brick index,
which works well for nvlink but not for opencapi.
Create an obus_index() function, which, from a device, returns the
correct obus index, irrespective of the device type.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
set_iovalid() is called on the PHY reset path. The hw logic it touches
is meaningless for opencapi. It's not hurting as long as all the links
under the NPU are in opencapi mode, but in case of mixing opencapi and
nvlink, we'll be in troubles: the code finds which bit to modify based
on the brick index, which varies depending on the mode. So calling
that function on an opencapi device may modify a nvlink brick! For
example, for brick index 3.
So we simply avoid doing anything when calling set_iovalid() for an
opencapi device.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
Use memcpy as other libffs functions do.
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
(cherry picked from commit 8463ee4bc297fab0181fbb418954c3476a2adbde)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|