summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* sensors: Dont add DTS sensors when OCC inband sensors are availableShilpasri G Bhat2018-04-191-1/+1
| | | | | | | | | | | | | | | | | | There are two sets of core temperature sensors today. One is DTS scom based core temperature sensors and the second group is the sensors provided by OCC. DTS is the highest temperature among the different temperature zones in the core while OCC core temperature sensors are the average temperature of the core. DTS sensors are read directly by the host by SCOMing the DTS sensors while OCC sensors are read and updated by OCC to main memory. Reading DTS sensors by SCOMing is a heavy and slower operation as compared to reading OCC sensors which is as good as reading memory. So dont add DTS sensors when OCC sensors are available. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/opal: Emergency stack for re-entryNicholas Piggin2018-04-183-5/+15
| | | | | | | | | | | | | | | | | | | | This detects OPAL being re-entered by the OS, and switches to an emergency stack if it was. This protects the firmware's main stack from re-entrancy and allows the OS to use NMI facilities for crash / debug functionality. Further nested re-entry will destroy the previous emergency stack and prevent returning, but those should be rare cases. This stack is sized at 16kB, which doubles the size of CPU stacks, so as not to introduce a regression in primary stack size. The 16kB stack originally had a 4kB machine check stack at the top, which was removed by 80eee1946 ("opal: Remove machine check interrupt patching in OPAL."). So it is possible the size could be tightened again, but that would require further analysis. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* asm/head: implement quiescing without stack or clobbering regsNicholas Piggin2018-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Quiescing currently is implmeented in C in opal_entry before the opal call handler is called. This works well enough for simple cases like fast reset when one CPU wants all others out of the way. Linux would like to use it to prevent an sreset IPI from interrupting firmware, which could lead to deadlocks when crash dumping or entering the debugger. Linux interrupts do not recover well when returning back to general OPAL code, due to r13 not being restored. OPAL also can't be re-entered, which may happen e.g., from the debugger. So move the quiesce hold/reject to entry code, beore the stack or r1 or r13 registers are switched. OPAL can be interrupted and returned to or re-entered during this period. This does not completely solve all such problems. OPAL will be interrupted with sreset if the quiesce times out, and it can be interrupted by MCEs as well. These still have the issues above. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/stack: backtrace unwind basic OPAL call detailsNicholas Piggin2018-04-181-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Put OPAL callers' r1 into the stack back chain, and then use that to unwind back to the OPAL entry frame (as opposed to boot entry, which has a 0 back chain). >From there, dump the OPAL call token and the caller's r1. A backtrace looks like this: CPU 0000 Backtrace: S: 0000000031c03ba0 R: 000000003001a548 ._abort+0x4c S: 0000000031c03c20 R: 000000003001baac .opal_run_pollers+0x3c S: 0000000031c03ca0 R: 000000003001bcbc .opal_poll_events+0xc4 S: 0000000031c03d20 R: 00000000300051dc opal_entry+0x12c --- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 --- This is pretty basic for the moment, but it does give you the bottom of the Linux stack. It will allow some interesting improvements in future. First, with the eframe, all the call's parameters can be printed out as well. The ___backtrace / ___print_backtrace API needs to be reworked in order to support this, but it's otherwise very simple (see opal_trace_entry()). Second, it will allow Linux's stack to be passed back to Linux via a debugging opal call. This will allow Linux's BUG() or xmon to also print the Linux back trace in case of a NMI or MCE or watchdog lockup that hits in OPAL. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal/hmi: Generate hmi event for recovered HDEC parity error.Mahesh Salgaonkar2018-04-171-1/+1
| | | | | Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal/hmi: check thread 0 tfmr to validate latched tfmr errors.Mahesh Salgaonkar2018-04-171-0/+8
| | | | | | | | | | | Due to P9 errata, HDEC parity and TB residue errors are latched for non-zero threads 1-3 even if they are cleared. But these are not latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr value and ignore them on non-zero threads if they are not present on thread 0. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal/hmi: Fix soft lockups during TOD errorsMahesh Salgaonkar2018-04-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | There are some TOD errors which do not affect working of TOD and TB. They stay in valid state. Hence we don't need rendez vous for TOD errors that does not affect TB working. TOD errors that affects TOD/TB will report a global error on TFMR[44] alongwith bit 51, and they will go in rendez vous path as expected. But the TOD errors that does not affect TB register sets only TFMR bit 51. The TFMR bit 51 is cleared when any single thread clears the TOD error. Once cleared, the bit 51 is reflected to all the cores on that chip. Any thread that reads the TFMR register after the error is cleared will see TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through rendez-vous path and threads that see TFMR[51]=0, returns doing nothing. This ends up in a soft lockups in host kernel. This patch fixes this issue by not considering TOD interrupt (TFMR[51]) as a core-global error and hence avoiding rendez-vous path completely. Instead threads that see TFMR[51]=1 will now take different path that just do the TOD error recovery. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal/hmi: Do not send HMI event if no errors are found.Mahesh Salgaonkar2018-04-171-1/+1
| | | | | | | | | | | | | For TOD errors, all the cores in the chip get HMIs. Any one thread from any core can fix the issue and TFMR will have error conditions cleared. Rest of the threads need take any action if TOD errors are already cleared. Hence thread 0 of every core should get a fresh copy of TFMR before going ahead recovery path. Initialize recover = -1, so that if no errors found that thread need not send a HMI event to linux. This helps in stop flooding host with hmi event by every thread even there are no errors found. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal/hmi: Rework HMI handling of TFAC errorsBenjamin Herrenschmidt2018-04-172-3/+6
| | | | | | | | | | | This patch reworks the HMI handling for TFAC errors by introducing 4 rendez-vous points improve the thread synchronization while handling timebase errors that requires all thread to clear dirty data from TB/HDEC register before clearing the errors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* opal/hmi: Add a new opal_handle_hmi2 that returns direct info to LinuxBenjamin Herrenschmidt2018-04-172-1/+8
| | | | | | | | | It returns a 64-bit flags mask currently set to provide info about which timer facilities were lost, and whether an event was generated. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* phb4: Restore bus numbers after CRSMichael Neuling2018-04-111-0/+1
| | | | | | | | | | | Currently we restore PCIe bus numbers right after the link is up. Unfortunately as this point we haven't done CRS so config space may not be accessible. This moves the bus number restore till after CRS has happened. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* phb4: Enable the PCIe slotcap on pluggable slotsOliver O'Halloran2018-04-111-0/+1
| | | | | | | | | | Enables reporting of slot status information, etc in the config space of the root complex. Currently this is only used to set the slot power limit in our generic PCI code, but we might use it for other things later on. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* core/pci: Set slot power limit when supportedOliver O'Halloran2018-04-111-0/+1
| | | | | | | | | | The PCIe slot capability can be implemented in a root or switch downstream port to set the maximum power a card is allowed to draw from the system. This patch adds support for setting the power limit when the platform has defined one. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* interrupts: Create an "interrupts" property in the OPAL nodeBenjamin Herrenschmidt2018-04-111-0/+3
| | | | | | | | | | | | Deprecate the old "opal-interrupts", it's still there, but the new property follows the standard and allow us to specify whether an interrupt is level or edge sensitive. Similarly create "interrupt-names" whose content is identical to "opal-interrupts-names". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
* xive: disable store EOI supportCédric Le Goater2018-04-031-0/+3
| | | | | | | | | | | | | | | | | | Hardware has limitations which would require to put a sync after each store EOI to make sure the MMIO operations that change the ESB state are ordered. This is a killer for performance and the PHBs do not support the sync. So remove the store EOI for the moment, until hardware is improved. Also, while we are at changing the XIVE source flags, let's fix the settings for the PHB4s which should follow these rules : - SHIFT_BUG for DD10 - STORE_EOI for DD20 and if enabled - TRIGGER_PAGE for DDx0 and if not STORE_EOI Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* core/cpu: discover stack region size before initialising memory regionsNicholas Piggin2018-03-272-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stack allocation first allocates a memory region sized to hold stacks for all possible CPUs up to the maximum PIR of the architecture, zeros the region, then initialises all stacks. Max PIR is 32768 on POWER9, which is 512MB for stacks. The stack region is then shrunk after CPUs are discovered, but this is a bit of a hack, and it leaves a hole in the memory allocation regions as it's done after mem regions are initialised. 0x000000000000..00002fffffff : ibm,os-reserve - OS 0x000030000000..0000303fffff : ibm,firmware-code - OPAL 0x000030400000..000030ffffff : ibm,firmware-heap - OPAL 0x000031000000..000031bfffff : ibm,firmware-data - OPAL 0x000031c00000..000031c0ffff : ibm,firmware-stacks - OPAL *** gap *** 0x000051c00000..000051d01fff : ibm,firmware-allocs-memory@0 - OPAL 0x000051d02000..00007fffffff : ibm,firmware-allocs-memory@0 - OS 0x000080000000..000080b3cdff : initramfs - OPAL 0x000080b3ce00..000080b7cdff : ibm,fake-nvram - OPAL 0x000080b7ce00..0000ffffffff : ibm,firmware-allocs-memory@0 - OS This change moves zeroing into the per-cpu stack setup. The boot CPU stack is set up based on the current PIR. Then the size of the stack region is set, by discovering the maximum PIR of the system from the device tree, before mem regions are intialised. This results in all memory being accounted within memory regions, and less memory fragmentation of OPAL allocations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* mem-map: Use a symbolic constant for exception vector sizeNicholas Piggin2018-03-271-0/+5
| | | | | Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* core/fast-reboot: verify mem regions before fast rebootNicholas Piggin2018-03-271-0/+4
| | | | | | | | | | | | | Run the mem_region sanity checkers before proceeding with fast reboot. This is the beginning of proactive sanity checks on opal data for fast reboot (with complements the reactive disable_fast_reboot cases). This is encouraged to re-use and share any kind of debug code and unit test code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* Revert "NPU2 HMIs: dump out a *LOT* of npu2 registers for debugging"Stewart Smith2018-03-272-8/+3
| | | | | | | | | | | | | This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17. We don't need this as we need to do it a different way, with a explicit set of registers as otherwise we trip other random FIR bits and everything becomes even more terrible. I suggest alcohol. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2: Remove DD1 supportAndrew Donnellan2018-03-222-3/+0
| | | | | | | | | | | | | | | | | | Major changes in the NPU between DD1 and DD2 necessitated a fair bit of revision-specific code. Now that all our lab machines are DD2, we no longer test anything on DD1 and it's time to get rid of it. Remove DD1-specific code and abort probe if we're running on a DD1 machine. Cc: Alistair Popple <alistair@popple.id.au> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2: Remove unused fields in struct npu2Andrew Donnellan2018-03-221-2/+0
| | | | | | | | | | | Trivial cleanup of two unused fields in struct npu2. Cc: Alistair Popple <alistair@popple.id.au> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* phb*: Remove the state field in the various phb structuresOliver O'Halloran2018-03-223-186/+3
| | | | | | | | | | | | | | | | We've been carting around this field since the original p7ioc-phb code. As far as I can tell we never actually use it for anything other than checking if the PHB has been marked as broken or not. The _FENCED state is set in a few places, but we never use it in favour of just checking the MMIO register. This patch just replaces it with a boolean that indicates if the PHB has been marked as broken and removes the giant, mostly wrong, comment explaining it's usage that is copied and pasted into each phb header file. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* Reserve OPAL API number for opal_handle_hmi2 function.Mahesh Salgaonkar2018-03-141-1/+2
| | | | | | Requested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2: Use unfiltered mode in XTS tablesReza Arbab2018-03-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | The XTS_PID context table is limited to 256 possible pids/contexts. To relieve this limitation, make use of "unfiltered mode" instead. If an entry in the XTS_BDF table has the bit for unfiltered mode set, we can just use one context for that entire bdf/lpar, regardless of pid. Instead of of searching the XTS_PID table, the NMMU checkout request will simply use the entry indexed by lparshort id instead. Change opal_npu_init_context() to create these lparshort-indexed wildcard entries (0-15) instead of allocating one for each pid. Check that multiple calls for the same bdf all specify the same msr value. In opal_npu_destroy_context(), continue validating the bdf argument, ensuring that it actually maps to an lpar, but no longer remove anything from the XTS_PID table. If/when we start supporting virtualized GPUs, we might consider actually removing these wildcard entries by keeping a refcount, but keep things simple for now. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* core/lock: Add deadlock detectionMatt Brown2018-03-071-0/+5
| | | | | | | | | | | | | This adds simple deadlock detection. The detection looks for circular dependencies in the lock requests. It will abort and display a stack trace when a deadlock occurs. The detection is enabled by DEBUG_LOCKS (enabled by default). While the detection may have a slight performance overhead, as there are not a huge number of locks in skiboot this overhead isn't significant. Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> [stewart: fix build with DEBUG_LOCKS off] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* phb4: set PBCQ Tunnel BAR for tunneled operationsPhilippe Bergheaud2018-03-012-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | P9 supports PCI tunneled operations (atomics and as_notify) that are initiated by devices. A subset of the tunneled operations require a response, that must be sent back from the host to the device. For example, an atomic compare and swap will return the compare status, as swap will only performed in case of success. Similarly, as_notify reports if the target thread has been woken up or not, because the operation may fail. To enable tunneled operations, a device driver must tell the host where it expects tunneled operation responses, by setting the PBCQ Tunnel BAR Response register with a specific value within the range of its BARs. This register is currently initialized by enable_capi_mode(). But, as tunneled operations may also operate in PCI mode, a new API is required to set the PBCQ Tunnel BAR Response register, without switching to CAPI mode. This patch provides two new OPAL calls to get/set the PBCQ Tunnel BAR Response register. Note: as there is only one PBCQ Tunnel BAR register, shared between all the devices connected to the same PHB, only one of these devices will be able to use tunneled operations, at any time. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* phb4: set PHB CMPM registers for tunneled operationsPhilippe Bergheaud2018-03-011-2/+2
| | | | | | | | | | | | | | | | | | | P9 supports PCI tunneled operations (atomics and as_notify) that require setting the PHB ASN Compare/Mask register with a 16-bit indication. This register is currently initialized by enable_capi_mode(). But, as tunneled operations may also work in PCI mode, the ASN Compare/Mask register should rather be initialized in phb4_init_ioda3(). This patch also adds "ibm,phb-indications" to the device tree, to tell Linux the values of CAPI, ASN, and NBW indications, when supported. Tunneled operations tested by IBM in CAPI mode, by Mellanox Technologies in PCI mode. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2-opencapi: Add OpenCAPI OPAL API callsFrederic Barrat2018-03-012-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add three OPAL API calls that are required by the ocxl driver. - OPAL_NPU_SPA_SETUP The Shared Process Area (SPA) is a table containing one entry (a "Process Element") per memory context which can be accessed by the OpenCAPI device. - OPAL_NPU_SPA_CLEAR_CACHE The NPU keeps a cache of recently accessed memory contexts. When a Process Element is removed from the SPA, the cache for the link must be cleared. - OPAL_NPU_TL_SET The Transaction Layer specification defines several templates for messages to be exchanged on the link. During link setup, the host and device must negotiate what templates are supported on both sides and at what rates those messages can be sent. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2-opencapi: Train OpenCAPI links and setup devicesAndrew Donnellan2018-03-012-2/+57
| | | | | | | | | | | | | | | | | | | | | | | | Scan the OpenCAPI links under the NPU, and for each link, reset the card, set up a device, train the link and register a PHB. Implement the necessary operations for the OpenCAPI PHB type. For bringup, test and debug purposes, we allow an NVRAM setting, "opencapi-link-training" that can be set to either disable link training completely or to use the prbs31 test pattern. To disable link training: nvram -p ibm,skiboot --update-config opencapi-link-training=none To use prbs31: nvram -p ibm,skiboot --update-config opencapi-link-training=prbs31 Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2-hw-procedures: Add support for OpenCAPI PHY link trainingAndrew Donnellan2018-03-012-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Unlike NVLink, which uses the pci-virt framework to fake a PCI configuration space for NVLink devices, the OpenCAPI device model presents us with a real configuration space handled by the device over the OpenCAPI link. As a result, we have to train the OpenCAPI link in skiboot before we do PCI probing, so that config space can be accessed, rather than having link training being triggered by the Linux driver. Add some helper functions to wrap the existing NVLink PHY training sequence so we can easily run it within skiboot. Additionally, we add OpenCAPI-specific lane settings, and a function to "bump" lanes that haven't trained properly (this process isn't documented in the workbook, but the hardware experts assure us that this improves link training reliability...) We also support the PRBS31 pattern that's used for bringup and test purposes. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2-opencapi: Configure NPU for OpenCAPIAndrew Donnellan2018-03-013-0/+92
| | | | | | | | | | | | | Scan the device tree for NPUs with OpenCAPI links and configure the NPU per the initialisation sequence in the NPU OpenCAPI workbook. Training of individual links and setup of per-AFU/link configuration will be in a later patch. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* platform: Add fields for OpenCAPI platform dataAndrew Donnellan2018-03-011-0/+14
| | | | | | | | | | Add a platform_ocapi struct to store platform-specific values for resetting OpenCAPI devices via I2C and for setting up the ODL PHY. A later patch will add this to the relevant platforms. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2: Rework NPU data structures for OpenCAPIAndrew Donnellan2018-03-012-19/+59
| | | | | | | | | | | | | | | | | | | | | Unlike NVLink, OpenCAPI registers a separate PHB for each device, in order to allow us to force Linux to use the correct MMIO windows for each NPU link. This requires some reworking of NPU data structures to account for the fact that a PHB could correspond to either an NPU (NVLink) or a single link (OpenCAPI). At some later point, we may want to rework the NVLink code to present a separate PHB per device in order to simplify this. For now, we split NVLink-specific device data into a separate struct in order to make it clear which fields are NVLink-only. Additionally, add helper functions to correctly translate between OpenCAPI/NVLink PHBs and the underlying structures, and various fields for OpenCAPI data that we're going to need later on. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2: Split out common helper functions into separate fileAndrew Donnellan2018-03-012-0/+7
| | | | | | | | | Split out common helper functions for NPU register access into a separate file, as these will be used extensively by both NVLink and OpenCAPI code. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* NPU2 HMIs: dump out a *LOT* of npu2 registers for debuggingStewart Smith2018-02-282-3/+8
| | | | | | | | | | This is not the way we want to end up doing this. This is a hack to make folk happy and not require crondump to debug nvidia/npu2 issues. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* phb4: Disable lane eq when retrying some nvidia GEN3 devicesMichael Neuling2018-02-222-0/+5
| | | | | | | | | | | | | | This fixes these nvidia cards training at only GEN2 spends rather than GEN3 by disabling PCIe lane equalisation. Firstly we check if the card is in a whitelist. If it is and the link has not trained optimally, retry with lane equalisation off. We do this on all POWER9 chip revisions since this is a device issue, not a POWER9 chip issue. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* opal-api: Re-jig OPAL API numbers because OpenCAPI kernel mergeStewart Smith2018-02-211-3/+6
| | | | | | | | | | 74d656d219b98ef3b96f92439337aa6392a7577d added OPAL APIs to kernel (and this commit is now in Linus' tree) that hadn't yet made their way to OPAL. Also, be slightly grumbly about it. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* npu2/opal-api: move npu2 checkstop defines to npu2-regs.hStewart Smith2018-02-212-98/+98
| | | | | | | | These aren't API. Fixes: b57a5380aa489fa877b2d619225aea2602f20dca Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* sensor-groups: occ: Add support to disable/enable sensor groupShilpasri G Bhat2018-02-213-3/+269
| | | | | | | | | | This patch adds a new opal call to enable/disable a sensor group. This call is used to select the sensor groups that needs to be copied to main memory by OCC at runtime. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [stewart: rebase and bump OPAL API number] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* sensors: Support reading u64 sensor valuesShilpasri G Bhat2018-02-217-6/+8
| | | | | | | | | | | | | This patch adds support to read u64 sensor values. This also adds changes to the core and the backend implementation code to make this API as the base call. Host can use this new API to read sensors upto 64bits. This adds a list to store the pointer to the kernel u32 buffer, for older kernels making async sensor u32 reads. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* dt: add /cpus/ibm, powerpc-cpu-features device tree bindingsNicholas Piggin2018-02-212-0/+6
| | | | | | | | | | | | This is a new CPU feature advertising interface that is fine-grained, extensible, aware of privilege levels, and gives control of features to all levels of the stack (firmware, hypervisor, and OS). The design and binding specification is described in detail in doc/. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [stewart: fix maybe-uninitialized warning from older GCC, doc cleanup] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* vas: Disable VAS/NX-842 on some P9 revisionsSukadev Bhattiprolu2018-02-141-0/+1
| | | | | | | | | | | | | | | VAS/NX-842 are not functional on some P9 revisions, so disable them in hardware and skip creating their device tree nodes. Since the intent is to prevent OS from configuring VAS/NX, we remove only the platform device nodes but leave the VAS/NX DT nodes under xscom (i.e we don't skip add_vas_node() in hdata/spira.c) Thanks to input from Michael Ellerman, Michael Neuling. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* hw/npu2: support creset of npu2 devicesBalbir Singh2018-02-131-0/+1
| | | | | | | | | creset calls in the hw procedure that resets the PHY, we don't take them out of reset, just put them in reset. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* ATTN: Enable flush instruction cache bit in HID registerVasant Hegde2018-02-131-1/+1
| | | | | | | | In P9, we have to enable "flush the instruction cache" bit along with "attn instruction support" bit to trigger attention. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* hw/npu2: Implement logging HMI actionsBalbir Singh2018-02-082-0/+109
| | | | | | | | | Log HMI errors as step 1. OS will need to deduce and interpret the HMI event. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* core/exception: beautify exception handler, add MCE-involved registersNicholas Piggin2018-02-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print DSISR and DAR, to help with deciphering machine check exceptions, and improve the output a bit, decode NIP symbol, improve alignment, etc. Also print a specific header for machine check, because we do expect to see these if there is a hardware failure. Before: [ 0.005968779,3] *********************************************** [ 0.005974102,3] Unexpected exception 200 ! [ 0.005978696,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000 [ 0.005985239,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000 [ 0.005991782,3] LR : 000000003002ad80 CTR : 0000000000000000 [ 0.005998130,3] CFAR : 00000000300b58bc [ 0.006002769,3] CR : 40000004 XER: 20000000 [ 0.006008069,3] GPR00: 000000003002ad80 GPR16: 0000000000000000 [ 0.006015170,3] GPR01: 0000000031c03bd0 GPR17: 0000000000000000 [...] After: [ 0.003287941,3] *********************************************** [ 0.003561769,3] Fatal MCE at 000000003002ad80 .nvram_init+0x24 [ 0.003579628,3] CFAR : 00000000300b5964 [ 0.003584268,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000 [ 0.003590812,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000 [ 0.003597355,3] DSISR: 00000000 DAR : 0000000000000000 [ 0.003603480,3] LR : 000000003002ad68 CTR : 0000000030093d80 [ 0.003609930,3] CR : 40000004 XER : 20000000 [ 0.003615698,3] GPR00: 00000000300149e8 GPR16: 0000000000000000 [ 0.003622799,3] GPR01: 0000000031c03bc0 GPR17: 0000000000000000 [...] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* core/init: manage MSR[ME] explicitly, always enableNicholas Piggin2018-02-081-0/+2
| | | | | | | | | | | | | | | The current boot sequence inherits MSR[ME] from the IPL firmware, and never changes it. Some environments disable MSR[ME] (e.g., mambo), and others can enable it (hostboot). This has two problems. First, MSR[ME] must be disabled while in the process of taking over the interrupt vector from the previous environment. Second, after installing our machine check handler, MSR[ME] should be enabled to get some useful output rather than a checkstop. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* core/utils: add snprintf_symbolNicholas Piggin2018-02-081-2/+1
| | | | | | | | | | get_symbol is difficult to use. Add snprintf_symbol helper which prints a symbol into a buffer with length, and returns the number of bytes used, similarly to snprintf. Use this in the stack dumping code rather than open-coding it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* fast-reboot: move pci_reset error handling into fast-reboot codeNicholas Piggin2018-02-081-1/+1
| | | | | | | | | | | | | | | | | pci_reset() currently does a platform reboot if it fails. It should not know about fast-reboot at this level, so instead have it return an error, and the fast reboot caller will do the platform reboot. The code essentially does the same thing, but flexibility is improved. Ideally the fast reboot code should perform pci_reset and all such fail-able operations before the CPU resets itself and destroys its own stack. That's not the case now, but that should be the goal. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* xive: Mask MMIO load/store to bad location FIRFrederic Barrat2018-01-301-0/+2
| | | | | | | | | | | | | | | | For opencapi, the trigger page of an interrupt is mapped to user space. The intent is to write the page to raise an interrupt but there's nothing to prevent a user process from reading it, which has the infortunate consequence of checkstopping the system. Mask the FIR bit raised when an MMIO operation targets an invalid location. It's the recommendation from recent documentation and hostboot is expected to mask it at some point. In the meantime, let's play it safe. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
OpenPOWER on IntegriCloud