blackbird-skiboot - Blackbird™ skiboot sources

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	astbmc: Handle failure to initialise raw flash	Andrew Jeffery	2019-03-28	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Initialising raw flash lead to a dead assignment to rc. Check the return code and take the failure path as necessary. Both before and after the fix we see output along the lines of the following when flash_init() fails: [ 53.283182881,7] IRQ: Registering 0800..0ff7 ops @0x300d4b98 (data 0x3052b9d8) [ 53.283184335,7] IRQ: Registering 0ff8..0fff ops @0x300d4bc8 (data 0x3052b9d8) [ 53.283185513,7] PHB#0000: Initializing PHB... [ 53.288260827,4] FLASH: Can't load resource id:0. No system flash found [ 53.288354442,4] FLASH: Can't load resource id:1. No system flash found [ 53.342933439,3] CAPP: Error loading ucode lid. index=200ea [ 53.462749486,2] NVRAM: Failed to load [ 53.462819095,2] NVRAM: Failed to load [ 53.462894236,2] NVRAM: Failed to load [ 53.462967071,2] NVRAM: Failed to load [ 53.463033077,2] NVRAM: Failed to load [ 53.463144847,2] NVRAM: Failed to load Eventually followed by: [ 57.216942479,5] INIT: platform wait for kernel load failed [ 57.217051132,5] INIT: Assuming kernel at 0x20000000 [ 57.217127508,3] INIT: ELF header not found. Assuming raw binary. [ 57.217249886,2] NVRAM: Failed to load [ 57.221294487,0] FATAL: Kernel is zeros, can't execute! [ 57.221397429,0] Assert fail: core/init.c:615:0 [ 57.221471414,0] Aborting! CPU 0028 Backtrace: S: 0000000031d43c60 R: 000000003001b274 ._abort+0x4c S: 0000000031d43ce0 R: 000000003001b2f0 .assert_fail+0x34 S: 0000000031d43d60 R: 0000000030014814 .load_and_boot_kernel+0xae4 S: 0000000031d43e30 R: 0000000030015164 .main_cpu_entry+0x680 S: 0000000031d43f00 R: 0000000030002718 boot_entry+0x1c0 --- OPAL boot --- Analysis of the execution paths suggests we'll always "safely" end this way due the setup sequence for the blocklevel callbacks in flash_init() and error handling in blocklevel_get_info(), and there's no current risk of executing from unexpected memory locations. As such the issue is reduced to down to a fix for poor error hygene in the original change and a resolution for a Coverity warning (famous last words etc). Fixes: c826e1ca9e5b ("astbmc: Try IPMI HIOMAP for P8 (again)") Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	external/mambo: Mambo hack to add trace-imc nodes in the device-tree	Anju T Sudhakar	2019-03-28	1	-0/+21
\| \| \| \| \| \| \| \|	Update skiboot.tcl device tree to include trace-imc node to help test the code path in mambo. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/imc: Enable opal calls to init/start/stop IMC Trace mode	Anju T Sudhakar	2019-03-28	1	-1/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch to enhance the imc opal call to support and handle trace_imc mode. To initialize the trace-mode, TRACE_IMC_SCOM value is written to TRACE_IMC_ADDR of the respective core. TRACE_IMC_SCOM is a 64bit value, and each bit represent the following: 0:1 : SAMPSEL 2:33 : CPMC_LOAD 34:40 : CPMC1SEL 41:47 : CPMC2SEL 48:50 : BUFFERSIZE 51:63 : RESERVED Currently the value for TRACE_IMC_SCOM is hard coded. During initialization htm_mode is disabled, and enabled only at start. The opal calls to start/stop the counters, will write CORE_IMC_HTM_MODE_ENABLE/ CORE_IMC_HTM_MODE_DISABLE respectively to the htm_scom_index of the desired cores. Additional switch cases are added to the current opal calls to start/stop the counters for trace-mode. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/imc: Refactor opal init call for core-imc	Anju T Sudhakar	2019-03-28	1	-27/+43
\| \| \| \| \| \| \| \| \| \| \| \|	Factor out core-imc stop api code from opal_imc_counters_init() for better readability. Also fix the error message if, wake_up_engine_state is not "WAKEUP_ENGINE_PRESENT". Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	include/imc: Trace IMC Macro definitions	Anju T Sudhakar	2019-03-28	2	-0/+31
\| \| \| \| \| \| \| \| \| \|	Add macros needed for Trace mode enablement of IMC(In-Memory Collection Counters). These macros are used to identify the trace node in the device-tree and to make appropriate scom calls to enable trace-mode in the hardware. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	doc/opal-api: Edit documentation for IMC opal call to include trace-imc	Anju T Sudhakar	2019-03-28	1	-8/+9
\| \| \| \| \| \| \| \| \|	OPAL call APIs for In-Memory Collection Counter infrastructure(IMC), includes a new device type called OPAL_IMC_COUNTERS_TRACE. Edit the documentation to include this information. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	doc/device-tree: Edit device tree documentation for imc to include ↵	Anju T Sudhakar	2019-03-28	1	-0/+50
\| \| \| \| \| \| \| \| \|	trace-node information. Add trace-node information in the device-tree document for IMC. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	doc/imc: Edit imc.rst documentation to include	Anju T Sudhakar	2019-03-28	1	-0/+67
\| \| \| \| \| \| \|	Add documentation for IMC trace-mode in imc.rst. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	platforms/vesnin: Disable PCIe port bifurcation	Artem Senichev	2019-03-28	1	-34/+16
\| \| \| \| \| \| \|	PCIe ports connected to CPU1 and CPU3 now work as x16 instead of x8x8. Signed-off-by: Artem Senichev <a.senichev@yadro.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/stack: Rename backtrace functions, get rid of wrappers	Andrew Donnellan	2019-03-28	4	-38/+16
\| \| \| \| \| \| \| \| \|	Rename ___backtrace() to backtrace_create() and ___print_backtrace() to backtrace_print(). Get rid of __backtrace() and __print_backtrace() wrappers. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/stack: Convert stack check code to not use backtrace wrapper	Andrew Donnellan	2019-03-28	2	-6/+6
\| \| \| \| \| \| \| \|	We're about to get rid of __backtrace() and __print_backtrace(), convert the stack check code to not use them. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/fsp, hw/ipmi: Convert attn code to not use backtrace wrappers	Andrew Donnellan	2019-03-28	2	-9/+10
\| \| \| \| \| \| \| \|	We're about to get rid of __backtrace() and __print_backtrace(), convert the FSP/IPMI attn code to not use them. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/stack: Store PIR in ___backtrace()	Andrew Donnellan	2019-03-28	1	-3/+3
\| \| \| \| \| \| \| \|	In ___backtrace(), store the current PIR in the metadata struct, rather than relying on the caller to do it. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/stack: Define a backtrace metadata struct	Andrew Donnellan	2019-03-28	2	-39/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Every time we take a backtrace, we have to store the number of entries, the OPAL API token, r1 caller and PIR values. Rather than defining these and passing them around all over the place, let's throw them in a struct. Define a struct, struct bt_metadata, to store these details, and convert ___backtrace() and ___print_backtrace() to use it. We change the wrapper functions __backtrace() and __print_backtrace() to call ___backtrace()/___print_backtrace() with struct bt_metadata, but don't change their parameter profiles for now - we'll do that later. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/stack: Remove r1 argument from ___backtrace()	Andrew Donnellan	2019-03-28	2	-8/+3
\| \| \| \| \| \| \| \| \|	___backtrace() is always called with r1 = __builtin_frame_address(0), and it's unlikely we're going to need it to do something else any time soon, so simplify the API by removing the parameter. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Fix hang in pnv_platform_error_reboot path due to TOD failure.	Mahesh Salgaonkar	2019-03-28	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On TOD failure, with TB stuck, when linux heads down to pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed. But with panic cpu stuck inside OPAL, linux never recovers/reboot. p0 c1 t0 NIA : 0x000000003001dd3c <.time_wait+0x64> CFAR : 0x000000003001dce4 <.time_wait+0xc> MSR : 0x9000000002803002 LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec> STACK: SP NIA 0x0000000031c236e0 0x0000000031c23760 (big-endian) 0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec> 0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c> 0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150> 0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc> 0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc> 0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc> 0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4> 0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0> 0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134> 0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80> 0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94> 0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0> 0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4> 0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140> 0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c> 0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68> 0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8> 0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8> 0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88> 0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164> 0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c> This is because, there is a while loop towards the end of ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match with "msg". It loops over time_wait_ms() until exit condition is met. In normal scenario time_wait_ms() calls run pollers so that ipmi backend gets a chance to check ipmi response and set sync_msg to NULL. while (sync_msg == msg) time_wait_ms(10); But in the event when TB is in failed state time_wait_ms()->time_wait_poll() returns immediately without calling pollers and hence we end up looping forever. This patch fixes this hang by calling opal_run_pollers() in TB failed state as well. Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.") Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/ipmi: Print correct netfn value	Vasant Hegde	2019-03-28	1	-1/+1
\| \| \| \| \| \|	Fixes: 7516e382 (core/ipmi: Improve error message) Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	external/mambo: Error out if kernel is too large	Russell Currey	2019-03-28	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If you're trying to boot a gigantic kernel in mambo (which you can reproduce by building a kernel with CONFIG_MODULES=n) you'll get misleading errors like: WARNING: 0: (0): [0:0]: Invalid/unsupported instr 0x00000000[INVALID] WARNING: 0: (0): PC(EA): 0x0000000030000010 PC(RA):0x0000000030000010 MSR: 0x9000000000000000 LR: 0x0000000000000000 WARNING: 0: (0): numInstructions = 0 WARNING: 1: (1): [0:0]: Invalid/unsupported instr 0x00000000[INVALID] WARNING: 1: (1): PC(EA): 0x0000000000000E40 PC(RA):0x0000000000000E40 MSR: 0x9000000000000000 LR: 0x0000000000000000 WARNING: 1: (1): numInstructions = 1 WARNING: 1: (1): Interrupt to 0x0000000000000E40 from 0x0000000000000E40 INFO: 1: (2): Execution stopped: Continuous Interrupt, Instruction caused exception, So add an error to skiboot.tcl to warn the user before this happens. Making PAYLOAD_ADDR further back is one way to do this but if there's a less gross way to generally work around this very niche problem, I can suggest that instead. Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	external/mambo: Populate kernel-base-address in the DT	Russell Currey	2019-03-28	2	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	skiboot.tcl defines PAYLOAD_ADDR as 0x20000000, which is the default in skiboot. This is also the default in skiboot unless kernel-base-address is set in the device tree. If you change PAYLOAD_ADDR to something else for mambo, skiboot won't see it because it doesn't set that DT property, so fix it so that it does. Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Michael Neuling <mikey@neuling.org> [stewart: fix up mambo hacks for STB] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/flash: Retry requests as necessary in flash_load_resource()	Andrew Jeffery	2019-03-28	1	-2/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We would like to successfully boot if we have a dependency on the BMC for flash even if the BMC is not current ready to service flash requests. On the assumption that it will become ready, retry for several minutes to cover a BMC reboot cycle and eventually rather than immediately crash out with: [ 269.549748] reboot: Restarting system [ 390.297462587,5] OPAL: Reboot request... [ 390.297737995,5] RESET: Initiating fast reboot 1... [ 391.074707590,5] Clearing unused memory: [ 391.075198880,5] PCI: Clearing all devices... [ 391.075201618,7] Clearing region 201ffe000000-201fff800000 [ 391.086235699,5] PCI: Resetting PHBs and training links... [ 391.254089525,3] FFS: Error 17 reading flash header [ 391.254159668,3] FLASH: Can't open ffs handle: 17 [ 392.307245135,5] PCI: Probing slots... [ 392.363723191,5] PCI Summary: ... [ 393.423255262,5] OCC: All Chip Rdy after 0 ms [ 393.453092828,5] INIT: Starting kernel at 0x20000000, fdt at 0x30800a88 390645 bytes [ 393.453202605,0] FATAL: Kernel is zeros, can't execute! [ 393.453247064,0] Assert fail: core/init.c:593:0 [ 393.453289682,0] Aborting! CPU 0040 Backtrace: S: 0000000031e03ca0 R: 000000003001af60 ._abort+0x4c S: 0000000031e03d20 R: 000000003001afdc .assert_fail+0x34 S: 0000000031e03da0 R: 00000000300146d8 .load_and_boot_kernel+0xb30 S: 0000000031e03e70 R: 0000000030026cf0 .fast_reboot_entry+0x39c S: 0000000031e03f00 R: 0000000030002a4c fast_reset_entry+0x2c --- OPAL boot --- The OPAL flash API hooks directly into the blocklevel layer, so there's no delay for e.g. the host kernel, just for asynchronously loaded resources during boot. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	xive: Add calls to save/restore the queues and VPs HW state	Cédric Le Goater	2019-03-28	3	-3/+185
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To be able to support migration of guests using the XIVE native exploitation mode, (where the queue is effectively owned by the guest), KVM needs to be able to save and restore the HW-modified fields of the queue, such as the current queue producer pointer and generation bit, and to retrieve the modified thread context registers of the VP from the NVT structure : the VP interrupt pending bits. However, there is no need to set back the NVT structure on P9. P10 should be the same. Based on previous work from BenH. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/pcie-slot: Don't bail early in the power on case	Oliver O'Halloran	2019-03-28	1	-4/+5
\| \| \| \| \| \| \| \| \| \|	Exiting early in the power off case makes sense since we can't disable slot power (or assert PERST) for suprise hotplug slots. However, we should not exit early in the power-on case since it's possible slot power may have been disabled (or just not enabled at boot time). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/pcie-slot: Better explain suprise_check	Oliver O'Halloran	2019-03-28	1	-16/+11
\| \| \| \| \| \| \|	Working out what was actually going on here took forever. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	firenze-pci: Always init slot info from LXVPD	Oliver O'Halloran	2019-03-28	1	-8/+4
\| \| \| \| \| \| \| \| \| \|	We can slot information from the LXVPD without having power control information about that slot. This patch changes the init path so that we always override the add_properties() call rather than only when we have power control information about the slot. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	fsp/lxvpd: Print more LXVPD slot information	Oliver O'Halloran	2019-03-28	1	-0/+3
\| \| \| \| \| \| \|	Useful to know since it changes the behaviour of the slot core. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/pcie-slot: Set power state from the PWRCTL flag	Oliver O'Halloran	2019-03-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	For some reason we look at the power control indicator and use that to determine if the slot is "off" rather than the power control flag that is used to power down the slot. While we're here change the default behaviour so that the slot is assumed to be powered on if there's no slot capability, or if there's no power control available. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/pci: Increase the max slot string size	Oliver O'Halloran	2019-03-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The maximum string length for the slot label / device location code in the PCI summary is currently 32 characters. This results in some IBM location codes being truncated due to their length, e.g. PHB#0001:02:11.0 [SWDN] SLOT=C11 x8 PHB#0001:13:00.0 [EP ] snip LOC_CODE=U78D3.ND1.WZS004A-P1-C PHB#0001:13:00.1 [EP ] snip LOC_CODE=U78D3.ND1.WZS004A-P1-C PHB#0001:13:00.2 [EP ] snip LOC_CODE=U78D3.ND1.WZS004A-P1-C PHB#0001:13:00.3 [EP ] snip LOC_CODE=U78D3.ND1.WZS004A-P1-C Which obscure the actual location of the card, and it looks bad. This patch increases the maximum length of the label string to 80 characters since that's the maximum length for a location code. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/phb4: Look for the hub-id from in the PBCQ node	Oliver O'Halloran	2019-03-28	1	-3/+9
\| \| \| \| \| \| \| \| \| \|	The hub-id is stored in the PBCQ node rather than the stack node so we never add it to the PHB node. This breaks the lxvpd slot lookup code since the hub-id is encoded in the VPD record that we need to find the slot information. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hdata/iohub: Look for IOVPD on P9	Oliver O'Halloran	2019-03-28	4	-3/+14
\| \| \| \| \| \| \| \| \| \| \|	P8 and P9 use the same IO VPD setup, so we need to load the IOHUB VPD on P9 systems too. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [stewart: fixup op920 hdat_to_dt dts expected result, remove incorrect comment, skip IOVPD loading on non-FSP.] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	witherspoon: Add nvlink2 interconnect information	Alexey Kardashevskiy	2019-03-20	1	-1/+131
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GPUs on Redbud and Sequoia platforms are interconnected in groups of 2 or 3 GPUs. The problem with that is if the user decides to pass a single GPU from a group to the userspace, we need to ensure that links between GPUs do not get enabled. A V100 GPU provides a way to disable selected links. In order to only disable links to peer GPUs, we need a topology map. This adds an "ibm,nvlink-peers" property to a GPU DT node with phandles of peer GPUs and NVLink2 bridges. The index in the property is a GPU link number. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Reza Arbab <arbab@linux.ibm.com> [stewart: fixed strtol found in review by Reza] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	platforms/romulus: Also support talos	Oliver O'Halloran	2019-03-20	1	-1/+2
\| \| \| \| \| \| \| \| \|	The two are similar enough and I'd like to have a slot table for our Talos. Cc: Timothy Pearson <tpearson@raptorengineering.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hdata: Prevent NULL dereference on duplicate slot map info	Stewart Smith	2019-03-20	1	-0/+4
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hdata_to_dt: fail "gracefully" on fatal op_display()	Stewart Smith	2019-03-20	2	-1/+11
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hdata: Add protection against corrupt ntuples structure	Stewart Smith	2019-03-20	1	-0/+21
\| \| \| \| \| \| \| \| \| \|	Found using afl-lop on P9 HDAT. Pretty obvious what the problem is once you look at it, and it's much better having a controlled failure mode than just going off randomly into memory and segfaulting. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Fix broken opal-ci/build-fedora-rawhide.sh symlink	Stewart Smith	2019-03-20	1	-1/+1
\| \| \| \| \|	Fixes: e4a06f098c4f34fb5539129dddb6646667f4d5ab Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/ipmi/test/run-fru: Fix string truncation warning, enhance test	Stewart Smith	2019-03-20	2	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've been getting this warning/error from recent GCC: In file included from hw/ipmi/test/run-fru.c:22: hw/ipmi/test/../ipmi-fru.c: In function ‘fru_add’: hw/ipmi/test/../ipmi-fru.c:162:3: warning: ‘strncpy’ output truncated copying 32 bytes from a string of length 38 [-Wstringop-truncation] strncpy(info.version, version, MAX_STR_LEN + 1); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This patch does two things: 1) Re-arrange some code to shut GCC up. 2) Add extra fu to tests to ensure we're producing correct bytes. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	npu2/hw-procedures: Fix parallel zcal for opencapi	Frederic Barrat	2019-03-20	3	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For opencapi, we currently do impedance calibration when initializing the PHY for the device, which could run in parallel if we were rich and had multiple opencapi devices. But if 2 devices are on the same obus, the 2 calibration sequences could overlap, which likely yields bad results and is useless anyway since it only needs to be done once per obus. This patch splits the opencapi PHY reset in 2 parts: - a 'init' part called serially at boot. That's when zcal is done. If we have 2 devices on the same socket, the zcal won't be redone, since we're called serially and we'll see it has already be done for the obus - a 'reset' part called during fundamental reset as a prereq for link training. It does the PHY setup for a set of lanes and the dccal. The PHY team confirmed there's no dependency between zcal and the other reset steps and it can be moved earlier. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	npu2-hw-procedures: Fix zcal in mixed opencapi and nvlink mode	Frederic Barrat	2019-03-20	1	-3/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The zcal procedure needs to be run once per obus. We keep track of which obus is already calibrated in an array indexed by the obus number. However, the obus number is inferred from the brick index, which works well for nvlink but not for opencapi. Create an obus_index() function, which, from a device, returns the correct obus index, irrespective of the device type. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	npu2-hw-procedures: Don't set iovalid for opencapi devices	Frederic Barrat	2019-03-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	set_iovalid() is called on the PHY reset path. The hw logic it touches is meaningless for opencapi. It's not hurting as long as all the links under the NPU are in opencapi mode, but in case of mixing opencapi and nvlink, we'll be in troubles: the code finds which bit to modify based on the brick index, which varies depending on the mode. So calling that function on an opencapi device may modify a nvlink brick! For example, for brick index 3. So we simply avoid doing anything when calling set_iovalid() for an opencapi device. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	libffs: Fix string truncation gcc warning.	Michal Suchanek	2019-03-20	1	-1/+1
\| \| \| \| \| \| \| \|	Use memcpy as other libffs functions do. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	skiboot v6.2.3 release notes	Vasant Hegde	2019-03-20	1	-0/+45
\| \| \| \| \| \|	Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> (cherry picked from commit 8463ee4bc297fab0181fbb418954c3476a2adbde) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	skiboot v6.0.19 release notes	Vasant Hegde	2019-03-20	1	-0/+37
\| \| \| \| \| \|	Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> (cherry picked from commit 3d135fe39a6ac509bfa49a9eb9e5f8386fc5109d) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Update skiboot stable tree rules	Vasant Hegde	2019-03-18	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have new mailing list (skiboot-stable@lists.ozlabs.org) for handling stable trees! Now onwards to submit patches to stable tree, one should send patches to skiboot-stable@lists.ozlabs.org mailing list with subject prefix [PATH <stable-version>] -OR- CC skiboot-stable@lists.ozlabs.org mailing list while sending patches to upstream mailing list (skiboot@lists.ozlabs.org). This will remove the requirement to do the --suppress-cc OR other related --no-cc-* options from git-send-email to remove "CC" list. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	p9dsu: Undo slot label name changes	Deb McLemore	2019-03-15	1	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During some code updates the slot labels were updated to reflect the phb layout, however expectations were that the slot labels be aligned with the riser card slots and not the system planar slots. [stewart: The tale of how we got here is long and varied and not at all clear. The first ESS systems went out with a skiboot v5.9.8 with additional SuperMicro patches. It was probably a slot table, but who knows, we don't have the code so can't check. It's possible it was all coming in through HDAT instead). The op-build tree (thus the exact patches) shipped on systems that work correct seems to not be around anywhere anymore (if it ever was). It was only in skiboot v6.0 that a slot table made it in, and, of course, only having remote machines in random configs, including possibly with riser cards from Briggs&Stratton rather than the ones destined for this system, doesn't make for verifying this at all. It also doesn't help that consistently there is never any review on slot tables, and we've had things be wrong in the past. Combine this with not upstream Hostboot patches.] Cc: skiboot-stable@lists.ozlabs.org Cc: Benjamin Mashak <mashak@us.ibm.com> Cc: Michael Lim <youhour@us.ibm.com> Fixes: 64a16ae05bb2 ("p9dsu: Fix slot labels for p9dsu2u") Fixes: 87517c8737b9 ("p9dsu: Fix p9dsu slot tables") Fixes: 31231ed300f2 ("p9dsu: Fix p9dsu default variant") Signed-off-by: Deb McLemore <debmc@linux.ibm.com> [stewart: added more detailed explanation, cc stable] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Drop old Coverity jobs (we build via separate .travis.yml in a branch)	Stewart Smith	2019-03-15	1	-24/+2
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	opal-ci: drop fedora 28	Stewart Smith	2019-03-15	4	-41/+30
\| \| \| \| \| \| \|	We're getting close to Fedora 30, and keeping N-1 fedora around for too long doesn't really add much. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	opal-ci: Drop unneded reference to ubuntu 12.04	Stewart Smith	2019-03-15	1	-5/+0
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	opal-ci: Drop CentOS6 support	Stewart Smith	2019-03-15	2	-38/+0
\| \| \| \| \| \| \|	We use the same compiler on our CentOS7 image, and it has the bonus of being able to test against P8 and P9 Mambo. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	fast-reboot: occ: Call occ_pstates_init() on fast-reset on all machines	Shilpasri G Bhat	2019-03-15	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 815417dcda2e ("init, occ: Initialise OCC earlier on BMC systems") conditionally invoked occ_pstates_init() only on FSP based systems in load_and_boot_kernel(). Due to this pstate table is re-parsed on FSP system and skipped on BMC system during fast-reboot. So this patch fixes this by invoking occ_pstates_init() on all boxes during fast-reboot. Cc: skiboot-stable@lists.ozlabs.org Fixes: 815417dcda2e ("init, occ: Initialise OCC earlier on BMC systems") Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	fast-reboot: occ: Remove 'freq-domain-mask' from fast-reboot path	Shilpasri G Bhat	2019-03-15	1	-43/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OCC can change the pstate table at runtime to modify pstate limits or for characterization purpose. These changes are reflected by re-parsing the pstate table during fast-reboot to update the device-tree. Only relevant pstate DT properties are deleted and newly added during fast-reboot. The device-tree properties like 'freq-domain-mask' and 'domain-runs-at' are currently hard-coded and need not be updated during fast-reboot. So this patch removes them from the fast-reboot path. This patch fixes the below crash: [ 270.313998453,5] OCC: All Chip Rdy after 0 ms [ 270.314148918,3] Duplicate property "freq-domain-mask" in node /ibm,opal/power-mgt [ 270.314208553,0] Aborting! CPU 083c Backtrace: S: 0000000035de3a20 R: 000000003001b480 ._abort+0x4c S: 0000000035de3aa0 R: 0000000030028704 .new_property+0xd8 S: 0000000035de3b30 R: 0000000030028964 .__dt_add_property_cells+0x30 S: 0000000035de3bd0 R: 0000000030042980 .occ_pstates_init+0x7c8 S: 0000000035de3d90 R: 00000000300145f4 .load_and_boot_kernel+0x980 S: 0000000035de3e70 R: 00000000300276b4 .fast_reboot_entry+0x37c S: 0000000035de3f00 R: 0000000030002ac4 reset_fast_reboot_wakeup+0x40 Fixes: b821f8c2a8e3("power-mgmt : occ : Add 'freq-domain-mask' DT property") Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>