summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* ci: Remove debian-jessie boot test.Stewart Smith2019-03-287-84/+1
| | | | | | | | | | | | | Debian (in its infinite "wisdom") has decided to erase most evidence of there ever being a ppc64el installer for Debian Jessie. So, screw them. Backwards compatibility testing was for losers anyway. There is snapshot.debian.org, but it's *really* slow pulling things from there, so it's not really an option unless we want to add multiple minutes to test duration. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* platforms/firenze: Rework I2C controller fixupsOliver O'Halloran2019-03-281-49/+51
| | | | | | | | | | | | | | | | | For some system planars we need to apply some fixups to the PCI slot power controllers. These are done at boot time and a slightly bizzare in their construction since they share the I2C request completion callback with the runtime slot power on method which affects the PCI slot state machine. This is confusing to say the least, so this patch reworks the fixup code to use the synchronus I2C request code rather than open-coding the wait based on what PCI slot state is in use. It also does some general control flow cleanup and adds some comments explaining what the fixups are for. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/i2c: split i2c_request_send()Oliver O'Halloran2019-03-282-45/+55
| | | | | | | | | | | | | | | Split the i2c_request_send() method into two methods: i2c_request_send() which allocates and populates and i2c_request structure, and i2c_request_sync() which take a request structure and blocks until it completes. This allows code that allocates a i2c_request structure elsewhere to make use of the existing busy-wait and request retry logic. Fix the return types to use int64_t while we're here since these are returning OPAL_API error codes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/i2c: Poll on request state in i2c_request_send()Oliver O'Halloran2019-03-281-22/+4
| | | | | | | | Use the new built-in state variable rather than a single-use completion function. Makes things a bit cleaner. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/i2c: Add request state trackingOliver O'Halloran2019-03-282-2/+13
| | | | | | | | | Allow the submitter to track the state of an I2C request by adding a state field to the request. This avoids the need to use a stub completion callback in some cases. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/phb4: Drop FRESET_DEASSERT_DELAY stateOliver O'Halloran2019-03-282-6/+0
| | | | | | | | | | | The delay between the ASSERT_DELAY and DEASSERT_DELAY states is set to one timebase tick. This state seems to have been a hold over from PHB3 where it was used to add a 1s delay between de-asserting PERST and polling the link for the CAPI FPGA. There's no requirement for that here since the link polling on PHB4 is a bit smarter so we should be fine. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/phb4: Factor out PERST controlOliver O'Halloran2019-03-281-28/+36
| | | | | | | | | | | | | | Some time ago Mikey added some code work around a bug we found where a certain RAID card wouldn't come back again after a fast-reboot. The workaround is setting the Link Disable bit before asserting PERST and clear it after de-asserting PERST. Currently we do this in the FRESET path, but not in the CRESET path. This patch moves the PERST control into its own function to reduce duplication and to the workaround is applied in all circumstances. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/phb4: Remove FRESET presence checkOliver O'Halloran2019-03-281-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | When we do an freset the first step is to check if a card is present in the slot. However, this only occurs when we enter phb4_freset() with the slot state set to SLOT_NORMAL. This occurs in: a) The creset path, and b) When the OS manually requests an FRESET via an OPAL call. a) is problematic because in the boot path the generic code will put the slot into FRESET_START manually before calling into phb4_freset(). This can result in a situation where a device is detected on boot, but not after a CRESET. I've noticed this occurring on systems where the PHB's slot presence detect signal is not wired to an adapter. In this situation we can rely on the in-band presence mechanism, but the presence check will make us exit before that has a chance to work. Additionally, if we enter from the CRESET path this early exit leaves the slot's PERST signal being left asserted. This isn't currently an issue, but if we want to support hotplug of devices into the root port it will be. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/phb4: Skip FRESET PERST when coming from CRESETOliver O'Halloran2019-03-282-1/+24
| | | | | | | | | | | | | | | | PERST is asserted at the beginning of the CRESET process to prevent the downstream device from interacting with the host while the PHB logic is being reset and re-initialised. There is at least a 100ms wait during the CRESET processing so it's not necessary to wait this time again in the FRESET handler. This patch extends the delay after re-setting the PHB logic to extend to the 250ms PERST wait period that we typically use and sets the skip_perst flag so that we don't wait this time again in the FRESET handler. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* astbmc: Handle failure to initialise raw flashAndrew Jeffery2019-03-281-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initialising raw flash lead to a dead assignment to rc. Check the return code and take the failure path as necessary. Both before and after the fix we see output along the lines of the following when flash_init() fails: [ 53.283182881,7] IRQ: Registering 0800..0ff7 ops @0x300d4b98 (data 0x3052b9d8) [ 53.283184335,7] IRQ: Registering 0ff8..0fff ops @0x300d4bc8 (data 0x3052b9d8) [ 53.283185513,7] PHB#0000: Initializing PHB... [ 53.288260827,4] FLASH: Can't load resource id:0. No system flash found [ 53.288354442,4] FLASH: Can't load resource id:1. No system flash found [ 53.342933439,3] CAPP: Error loading ucode lid. index=200ea [ 53.462749486,2] NVRAM: Failed to load [ 53.462819095,2] NVRAM: Failed to load [ 53.462894236,2] NVRAM: Failed to load [ 53.462967071,2] NVRAM: Failed to load [ 53.463033077,2] NVRAM: Failed to load [ 53.463144847,2] NVRAM: Failed to load Eventually followed by: [ 57.216942479,5] INIT: platform wait for kernel load failed [ 57.217051132,5] INIT: Assuming kernel at 0x20000000 [ 57.217127508,3] INIT: ELF header not found. Assuming raw binary. [ 57.217249886,2] NVRAM: Failed to load [ 57.221294487,0] FATAL: Kernel is zeros, can't execute! [ 57.221397429,0] Assert fail: core/init.c:615:0 [ 57.221471414,0] Aborting! CPU 0028 Backtrace: S: 0000000031d43c60 R: 000000003001b274 ._abort+0x4c S: 0000000031d43ce0 R: 000000003001b2f0 .assert_fail+0x34 S: 0000000031d43d60 R: 0000000030014814 .load_and_boot_kernel+0xae4 S: 0000000031d43e30 R: 0000000030015164 .main_cpu_entry+0x680 S: 0000000031d43f00 R: 0000000030002718 boot_entry+0x1c0 --- OPAL boot --- Analysis of the execution paths suggests we'll always "safely" end this way due the setup sequence for the blocklevel callbacks in flash_init() and error handling in blocklevel_get_info(), and there's no current risk of executing from unexpected memory locations. As such the issue is reduced to down to a fix for poor error hygene in the original change and a resolution for a Coverity warning (famous last words etc). Fixes: c826e1ca9e5b ("astbmc: Try IPMI HIOMAP for P8 (again)") Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* external/mambo: Mambo hack to add trace-imc nodes in the device-treeAnju T Sudhakar2019-03-281-0/+21
| | | | | | | | Update skiboot.tcl device tree to include trace-imc node to help test the code path in mambo. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/imc: Enable opal calls to init/start/stop IMC Trace modeAnju T Sudhakar2019-03-281-1/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | Patch to enhance the imc opal call to support and handle trace_imc mode. To initialize the trace-mode, TRACE_IMC_SCOM value is written to TRACE_IMC_ADDR of the respective core. TRACE_IMC_SCOM is a 64bit value, and each bit represent the following: 0:1 : SAMPSEL 2:33 : CPMC_LOAD 34:40 : CPMC1SEL 41:47 : CPMC2SEL 48:50 : BUFFERSIZE 51:63 : RESERVED Currently the value for TRACE_IMC_SCOM is hard coded. During initialization htm_mode is disabled, and enabled only at start. The opal calls to start/stop the counters, will write CORE_IMC_HTM_MODE_ENABLE/ CORE_IMC_HTM_MODE_DISABLE respectively to the htm_scom_index of the desired cores. Additional switch cases are added to the current opal calls to start/stop the counters for trace-mode. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/imc: Refactor opal init call for core-imcAnju T Sudhakar2019-03-281-27/+43
| | | | | | | | | | | | Factor out core-imc stop api code from opal_imc_counters_init() for better readability. Also fix the error message if, wake_up_engine_state is not "WAKEUP_ENGINE_PRESENT". Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* include/imc: Trace IMC Macro definitionsAnju T Sudhakar2019-03-282-0/+31
| | | | | | | | | | Add macros needed for Trace mode enablement of IMC(In-Memory Collection Counters). These macros are used to identify the trace node in the device-tree and to make appropriate scom calls to enable trace-mode in the hardware. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* doc/opal-api: Edit documentation for IMC opal call to include trace-imcAnju T Sudhakar2019-03-281-8/+9
| | | | | | | | | OPAL call APIs for In-Memory Collection Counter infrastructure(IMC), includes a new device type called OPAL_IMC_COUNTERS_TRACE. Edit the documentation to include this information. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* doc/device-tree: Edit device tree documentation for imc to include ↵Anju T Sudhakar2019-03-281-0/+50
| | | | | | | | | trace-node information. Add trace-node information in the device-tree document for IMC. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* doc/imc: Edit imc.rst documentation to includeAnju T Sudhakar2019-03-281-0/+67
| | | | | | | Add documentation for IMC trace-mode in imc.rst. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* platforms/vesnin: Disable PCIe port bifurcationArtem Senichev2019-03-281-34/+16
| | | | | | | PCIe ports connected to CPU1 and CPU3 now work as x16 instead of x8x8. Signed-off-by: Artem Senichev <a.senichev@yadro.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/stack: Rename backtrace functions, get rid of wrappersAndrew Donnellan2019-03-284-38/+16
| | | | | | | | | Rename ___backtrace() to backtrace_create() and ___print_backtrace() to backtrace_print(). Get rid of __backtrace() and __print_backtrace() wrappers. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/stack: Convert stack check code to not use backtrace wrapperAndrew Donnellan2019-03-282-6/+6
| | | | | | | | We're about to get rid of __backtrace() and __print_backtrace(), convert the stack check code to not use them. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/fsp, hw/ipmi: Convert attn code to not use backtrace wrappersAndrew Donnellan2019-03-282-9/+10
| | | | | | | | We're about to get rid of __backtrace() and __print_backtrace(), convert the FSP/IPMI attn code to not use them. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/stack: Store PIR in ___backtrace()Andrew Donnellan2019-03-281-3/+3
| | | | | | | | In ___backtrace(), store the current PIR in the metadata struct, rather than relying on the caller to do it. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/stack: Define a backtrace metadata structAndrew Donnellan2019-03-282-39/+54
| | | | | | | | | | | | | | | | Every time we take a backtrace, we have to store the number of entries, the OPAL API token, r1 caller and PIR values. Rather than defining these and passing them around all over the place, let's throw them in a struct. Define a struct, struct bt_metadata, to store these details, and convert ___backtrace() and ___print_backtrace() to use it. We change the wrapper functions __backtrace() and __print_backtrace() to call ___backtrace()/___print_backtrace() with struct bt_metadata, but don't change their parameter profiles for now - we'll do that later. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/stack: Remove r1 argument from ___backtrace()Andrew Donnellan2019-03-282-8/+3
| | | | | | | | | ___backtrace() is always called with r1 = __builtin_frame_address(0), and it's unlikely we're going to need it to do something else any time soon, so simplify the API by removing the parameter. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* Fix hang in pnv_platform_error_reboot path due to TOD failure.Mahesh Salgaonkar2019-03-281-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On TOD failure, with TB stuck, when linux heads down to pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed. But with panic cpu stuck inside OPAL, linux never recovers/reboot. p0 c1 t0 NIA : 0x000000003001dd3c <.time_wait+0x64> CFAR : 0x000000003001dce4 <.time_wait+0xc> MSR : 0x9000000002803002 LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec> STACK: SP NIA 0x0000000031c236e0 0x0000000031c23760 (big-endian) 0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec> 0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c> 0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150> 0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc> 0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc> 0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc> 0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4> 0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0> 0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134> 0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80> 0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94> 0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0> 0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4> 0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140> 0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c> 0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68> 0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8> 0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8> 0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88> 0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164> 0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c> This is because, there is a while loop towards the end of ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match with "msg". It loops over time_wait_ms() until exit condition is met. In normal scenario time_wait_ms() calls run pollers so that ipmi backend gets a chance to check ipmi response and set sync_msg to NULL. while (sync_msg == msg) time_wait_ms(10); But in the event when TB is in failed state time_wait_ms()->time_wait_poll() returns immediately without calling pollers and hence we end up looping forever. This patch fixes this hang by calling opal_run_pollers() in TB failed state as well. Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.") Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/ipmi: Print correct netfn valueVasant Hegde2019-03-281-1/+1
| | | | | | Fixes: 7516e382 (core/ipmi: Improve error message) Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* external/mambo: Error out if kernel is too largeRussell Currey2019-03-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | If you're trying to boot a gigantic kernel in mambo (which you can reproduce by building a kernel with CONFIG_MODULES=n) you'll get misleading errors like: WARNING: 0: (0): [0:0]: Invalid/unsupported instr 0x00000000[INVALID] WARNING: 0: (0): PC(EA): 0x0000000030000010 PC(RA):0x0000000030000010 MSR: 0x9000000000000000 LR: 0x0000000000000000 WARNING: 0: (0): numInstructions = 0 WARNING: 1: (1): [0:0]: Invalid/unsupported instr 0x00000000[INVALID] WARNING: 1: (1): PC(EA): 0x0000000000000E40 PC(RA):0x0000000000000E40 MSR: 0x9000000000000000 LR: 0x0000000000000000 WARNING: 1: (1): numInstructions = 1 WARNING: 1: (1): Interrupt to 0x0000000000000E40 from 0x0000000000000E40 INFO: 1: (2): ** Execution stopped: Continuous Interrupt, Instruction caused exception, ** So add an error to skiboot.tcl to warn the user before this happens. Making PAYLOAD_ADDR further back is one way to do this but if there's a less gross way to generally work around this very niche problem, I can suggest that instead. Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* external/mambo: Populate kernel-base-address in the DTRussell Currey2019-03-282-1/+13
| | | | | | | | | | | | | | skiboot.tcl defines PAYLOAD_ADDR as 0x20000000, which is the default in skiboot. This is also the default in skiboot unless kernel-base-address is set in the device tree. If you change PAYLOAD_ADDR to something else for mambo, skiboot won't see it because it doesn't set that DT property, so fix it so that it does. Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Michael Neuling <mikey@neuling.org> [stewart: fix up mambo hacks for STB] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/flash: Retry requests as necessary in flash_load_resource()Andrew Jeffery2019-03-281-2/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would like to successfully boot if we have a dependency on the BMC for flash even if the BMC is not current ready to service flash requests. On the assumption that it will become ready, retry for several minutes to cover a BMC reboot cycle and *eventually* rather than *immediately* crash out with: [ 269.549748] reboot: Restarting system [ 390.297462587,5] OPAL: Reboot request... [ 390.297737995,5] RESET: Initiating fast reboot 1... [ 391.074707590,5] Clearing unused memory: [ 391.075198880,5] PCI: Clearing all devices... [ 391.075201618,7] Clearing region 201ffe000000-201fff800000 [ 391.086235699,5] PCI: Resetting PHBs and training links... [ 391.254089525,3] FFS: Error 17 reading flash header [ 391.254159668,3] FLASH: Can't open ffs handle: 17 [ 392.307245135,5] PCI: Probing slots... [ 392.363723191,5] PCI Summary: ... [ 393.423255262,5] OCC: All Chip Rdy after 0 ms [ 393.453092828,5] INIT: Starting kernel at 0x20000000, fdt at 0x30800a88 390645 bytes [ 393.453202605,0] FATAL: Kernel is zeros, can't execute! [ 393.453247064,0] Assert fail: core/init.c:593:0 [ 393.453289682,0] Aborting! CPU 0040 Backtrace: S: 0000000031e03ca0 R: 000000003001af60 ._abort+0x4c S: 0000000031e03d20 R: 000000003001afdc .assert_fail+0x34 S: 0000000031e03da0 R: 00000000300146d8 .load_and_boot_kernel+0xb30 S: 0000000031e03e70 R: 0000000030026cf0 .fast_reboot_entry+0x39c S: 0000000031e03f00 R: 0000000030002a4c fast_reset_entry+0x2c --- OPAL boot --- The OPAL flash API hooks directly into the blocklevel layer, so there's no delay for e.g. the host kernel, just for asynchronously loaded resources during boot. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* xive: Add calls to save/restore the queues and VPs HW stateCédric Le Goater2019-03-283-3/+185
| | | | | | | | | | | | | | | | | To be able to support migration of guests using the XIVE native exploitation mode, (where the queue is effectively owned by the guest), KVM needs to be able to save and restore the HW-modified fields of the queue, such as the current queue producer pointer and generation bit, and to retrieve the modified thread context registers of the VP from the NVT structure : the VP interrupt pending bits. However, there is no need to set back the NVT structure on P9. P10 should be the same. Based on previous work from BenH. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/pcie-slot: Don't bail early in the power on caseOliver O'Halloran2019-03-281-4/+5
| | | | | | | | | | Exiting early in the power off case makes sense since we can't disable slot power (or assert PERST) for suprise hotplug slots. However, we should not exit early in the power-on case since it's possible slot power may have been disabled (or just not enabled at boot time). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/pcie-slot: Better explain suprise_checkOliver O'Halloran2019-03-281-16/+11
| | | | | | | Working out what was actually going on here took forever. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* firenze-pci: Always init slot info from LXVPDOliver O'Halloran2019-03-281-8/+4
| | | | | | | | | | We can slot information from the LXVPD without having power control information about that slot. This patch changes the init path so that we always override the add_properties() call rather than only when we have power control information about the slot. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* fsp/lxvpd: Print more LXVPD slot informationOliver O'Halloran2019-03-281-0/+3
| | | | | | | Useful to know since it changes the behaviour of the slot core. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/pcie-slot: Set power state from the PWRCTL flagOliver O'Halloran2019-03-281-3/+3
| | | | | | | | | | | | | For some reason we look at the power control indicator and use that to determine if the slot is "off" rather than the power control flag that is used to power down the slot. While we're here change the default behaviour so that the slot is assumed to be powered on if there's no slot capability, or if there's no power control available. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/pci: Increase the max slot string sizeOliver O'Halloran2019-03-281-1/+1
| | | | | | | | | | | | | | | | | | | The maximum string length for the slot label / device location code in the PCI summary is currently 32 characters. This results in some IBM location codes being truncated due to their length, e.g. PHB#0001:02:11.0 [SWDN] SLOT=C11 x8 PHB#0001:13:00.0 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C PHB#0001:13:00.1 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C PHB#0001:13:00.2 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C PHB#0001:13:00.3 [EP ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C Which obscure the actual location of the card, and it looks bad. This patch increases the maximum length of the label string to 80 characters since that's the maximum length for a location code. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/phb4: Look for the hub-id from in the PBCQ nodeOliver O'Halloran2019-03-281-3/+9
| | | | | | | | | | The hub-id is stored in the PBCQ node rather than the stack node so we never add it to the PHB node. This breaks the lxvpd slot lookup code since the hub-id is encoded in the VPD record that we need to find the slot information. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hdata/iohub: Look for IOVPD on P9Oliver O'Halloran2019-03-284-3/+14
| | | | | | | | | | | P8 and P9 use the same IO VPD setup, so we need to load the IOHUB VPD on P9 systems too. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [stewart: fixup op920 hdat_to_dt dts expected result, remove incorrect comment, skip IOVPD loading on non-FSP.] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* witherspoon: Add nvlink2 interconnect informationAlexey Kardashevskiy2019-03-201-1/+131
| | | | | | | | | | | | | | | | | | | GPUs on Redbud and Sequoia platforms are interconnected in groups of 2 or 3 GPUs. The problem with that is if the user decides to pass a single GPU from a group to the userspace, we need to ensure that links between GPUs do not get enabled. A V100 GPU provides a way to disable selected links. In order to only disable links to peer GPUs, we need a topology map. This adds an "ibm,nvlink-peers" property to a GPU DT node with phandles of peer GPUs and NVLink2 bridges. The index in the property is a GPU link number. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Reza Arbab <arbab@linux.ibm.com> [stewart: fixed strtol found in review by Reza] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* platforms/romulus: Also support talosOliver O'Halloran2019-03-201-1/+2
| | | | | | | | | The two are similar enough and I'd like to have a slot table for our Talos. Cc: Timothy Pearson <tpearson@raptorengineering.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hdata: Prevent NULL dereference on duplicate slot map infoStewart Smith2019-03-201-0/+4
| | | | Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hdata_to_dt: fail "gracefully" on fatal op_display()Stewart Smith2019-03-202-1/+11
| | | | Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hdata: Add protection against corrupt ntuples structureStewart Smith2019-03-201-0/+21
| | | | | | | | | | Found using afl-lop on P9 HDAT. Pretty obvious what the problem is once you look at it, and it's much better having a controlled failure mode than just going off randomly into memory and segfaulting. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* Fix broken opal-ci/build-fedora-rawhide.sh symlinkStewart Smith2019-03-201-1/+1
| | | | | Fixes: e4a06f098c4f34fb5539129dddb6646667f4d5ab Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* hw/ipmi/test/run-fru: Fix string truncation warning, enhance testStewart Smith2019-03-202-8/+19
| | | | | | | | | | | | | | | | | | We've been getting this warning/error from recent GCC: In file included from hw/ipmi/test/run-fru.c:22: hw/ipmi/test/../ipmi-fru.c: In function ‘fru_add’: hw/ipmi/test/../ipmi-fru.c:162:3: warning: ‘strncpy’ output truncated copying 32 bytes from a string of length 38 [-Wstringop-truncation] strncpy(info.version, version, MAX_STR_LEN + 1); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This patch does two things: 1) Re-arrange some code to shut GCC up. 2) Add extra fu to tests to ensure we're producing correct bytes. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* npu2/hw-procedures: Fix parallel zcal for opencapiFrederic Barrat2019-03-203-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | For opencapi, we currently do impedance calibration when initializing the PHY for the device, which could run in parallel if we were rich and had multiple opencapi devices. But if 2 devices are on the same obus, the 2 calibration sequences could overlap, which likely yields bad results and is useless anyway since it only needs to be done once per obus. This patch splits the opencapi PHY reset in 2 parts: - a 'init' part called serially at boot. That's when zcal is done. If we have 2 devices on the same socket, the zcal won't be redone, since we're called serially and we'll see it has already be done for the obus - a 'reset' part called during fundamental reset as a prereq for link training. It does the PHY setup for a set of lanes and the dccal. The PHY team confirmed there's no dependency between zcal and the other reset steps and it can be moved earlier. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* npu2-hw-procedures: Fix zcal in mixed opencapi and nvlink modeFrederic Barrat2019-03-201-3/+21
| | | | | | | | | | | | | | The zcal procedure needs to be run once per obus. We keep track of which obus is already calibrated in an array indexed by the obus number. However, the obus number is inferred from the brick index, which works well for nvlink but not for opencapi. Create an obus_index() function, which, from a device, returns the correct obus index, irrespective of the device type. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* npu2-hw-procedures: Don't set iovalid for opencapi devicesFrederic Barrat2019-03-201-0/+3
| | | | | | | | | | | | | | | | | set_iovalid() is called on the PHY reset path. The hw logic it touches is meaningless for opencapi. It's not hurting as long as all the links under the NPU are in opencapi mode, but in case of mixing opencapi and nvlink, we'll be in troubles: the code finds which bit to modify based on the brick index, which varies depending on the mode. So calling that function on an opencapi device may modify a nvlink brick! For example, for brick index 3. So we simply avoid doing anything when calling set_iovalid() for an opencapi device. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* libffs: Fix string truncation gcc warning.Michal Suchanek2019-03-201-1/+1
| | | | | | | | Use memcpy as other libffs functions do. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* skiboot v6.2.3 release notesVasant Hegde2019-03-201-0/+45
| | | | | | Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> (cherry picked from commit 8463ee4bc297fab0181fbb418954c3476a2adbde) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
OpenPOWER on IntegriCloud