talos-skiboot - Talos™ II skiboot sources

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	npu2-opencapi: Don't send commands to NPU when link is down	Frederic Barrat	2018-07-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even if an opencapi link is down, we currently always try to issue a config read operation when probing for PCI devices, because of the default scan map used for an opencapi PHB. The config operation fails, as expected, but it can also raise a FIR bit and trigger an HMI. For opencapi, there's no root device like for a "normal" PCI PHB, so there's no reason to do the config operation. To fix it, we keep the scan map blank by default, and only add a device once the link is trained. CC: stable # v6.1+ Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/phb4: Add helpers to dump the IODA tables	Oliver O'Halloran	2018-07-26	1	-0/+89
\| \| \| \| \| \| \| \| \| \| \|	The IODA tables are stored inside the PHB itself rather than in memory. This makes accessing them slightly tedious, but the process is more or less the same for every table. This patch adds a helper function for dumping the contents of the IODA tables to help with debugging PHB issues. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/phb4: Add a helper to dump the PELT-V	Oliver O'Halloran	2018-07-26	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \|	The "Partitionable Endpoint Lookup Table (Vector)" is used by the PHB when processing EEH events. The PELT-V defines which PEs should be additionally frozen in the event of an error being flagged on a given PE. Knowing the state of the PELT-V is sometimes useful for debugging PHB issues so this patch adds a helper to dump it. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	hw/phb4: Print the PEs in the EEH dump in hex	Oliver O'Halloran	2018-07-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Linux always displays the PE number in hexidecimal while skiboot displays the PEST index (PE number) in decimal. This makes correlating errors between Skiboot and Linux more annoying than it should be so this patch makes Skiboot print the PEST number in hex. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Makefile: remove try-cflags on no-altivec and no-vsx	Joel Stanley	2018-07-26	1	-4/+3
\| \| \| \| \| \| \| \|	As Segher points out, any compiler that is capable of building skiboot will support these flags. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Makefile: Remove -mno-direct-move cflag	Joel Stanley	2018-07-26	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC 8 warns that -mno-direct-move is depreciated. We had it there so we wouldn't use VSX registers in skiboot, as they are not saved/restored, however Segher confirms: > if you already have -mno-altivec then -mno-direct-move does zilch So it was never doing anything. Resolves: open-power/skiboot#186 Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/cpu.c: assert pir is sane before using	Stewart Smith	2018-07-20	1	-0/+1
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	skiboot 6.0.6 release notes	Stewart Smith	2018-07-19	1	-0/+51
\| \| \| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com> (cherry picked from commit 89160502f3695216e9d801b1e97aeee9188a132e) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	doc: Add a man page for OPAL_PCI_SET_PHB_CAPI_MODE	Vaibhav Jain	2018-07-19	1	-0/+74
\| \| \| \| \| \| \| \| \| \|	We add a man page describing the opal call OPAL_PCI_SET_PHB_CAPI_MODE used for activating/deactivating CAPP attached to a PEC for CAPI 1 & 2. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> [stewart: nitpicks that Andrew pointed out in review] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Reallocate PEC2 DMA-Read engines to improve GPU-Direct bandwidth	Vaibhav Jain	2018-07-19	2	-3/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We reallocate additional 16/8 DMA-Read engines allocated to stack0/1 on PEC2 respectively. This is needed to improve bandwidth available to the Mellanox CX5 adapter when trying to read GPU memory (GPU-Direct). If kernel cxl driver indicates a request to allocate maximum possible DMA read engines when calling enable_capi_mode() and card is attached to PEC2/stack0 slot then we assume its a Mellanox CX5 adapter. We then allocate additional 16/8 extra DMA read engines to stack0 and stack1 respectively on PEC2. This is done by populating the XPEC_PCI_PRDSTKOVR and XPEC_NEST_READ_STACK_OVERRIDE as suggested by the h/w team. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	STOP API: API conditionally supports 255 SCOM restore entries for each quad.	Prem Shanker Jha	2018-07-19	4	-108/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is first of the series of commits intended for incorporating new mechanisms for SCOM restore. STOP API looks for a specific version in QPMR header of HOMER. If version is greater than 2, it allows - 255 SCOM Restore entries per quad - doesn't divide quad restore region in to L2, L3 and EQ sub-region If version is less than or equal to 2, API provideis legacy functionality. Key_Cronus_Test=PM_REGRESS RTC: 188827 Change-Id: Iac6ee94619302f745fee0c77acc168eaba04c3da Reviewed-on: http://ralgit01.raleigh.ibm.com/gerrit1/56385 Tested-by: Cronus HW CI <cronushw-ci+hostboot@us.ibm.com> Tested-by: Jenkins Server <pfd-jenkins+hostboot@us.ibm.com> Tested-by: Hostboot CI <hostboot-ci+hostboot@us.ibm.com> Reviewed-by: Gregory S. Still <stillgs@us.ibm.com> Reviewed-by: AMIT J. TENDOLKAR <amit.tendolkar@in.ibm.com> Reviewed-by: Jennifer A. Stofer <stofer@us.ibm.com> Reviewed-on: http://ralgit01.raleigh.ibm.com/gerrit1/56390 Tested-by: Jenkins OP Build CI <op-jenkins+hostboot@us.ibm.com> Tested-by: Jenkins OP HW <op-hw-jenkins+hostboot@us.ibm.com> Tested-by: FSP CI Jenkins <fsp-CI-jenkins+hostboot@us.ibm.com> Reviewed-by: Christian R. Geddes <crgeddes@us.ibm.com> [stewart: Assume 15 entries if 0 are reported, maintains compat with old f/w] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Disable nodal scoped DMA accesses when PB pump mode is enabled	Alistair Popple	2018-07-17	2	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default when a PCIe device issues a read request via the PHB it is first issued with nodal scope. When accessing GPU memory the NPU does not know at the time of response if the requested memory page is off node or not. Therefore every read of GPU memory by a PHB is retried with larger scope which introduces bandwidth and latency issues. On smaller boxes which have pump mode enabled nodal and group scoped reads are treated the same and both types of request are broadcast to one chip. Therefore we can avoid the retry by disabling nodal scope on the PHB for these boxes. On larger boxes nodal (single chip) and group (multiple chip) scoped reads are treated differently. Therefore we avoid disabling nodal scope on large boxes which have pump mode disabled to avoid all PHB requests being broadcast to multiple chips. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Move pb_cen_hp_mode_curr register definition to xscom-p9-reg.h	Alistair Popple	2018-07-17	3	-2/+5
\| \| \| \| \| \| \| \|	Currently it is defined in npu2-regs.h but needs to be used by other files as well so move it somewhere generic. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Fixup pflash build for ast refactor	Stewart Smith	2018-07-18	2	-2/+13
\| \| \| \| \|	Fixes: 5b1bc2ffe791ae94361d86b2ae063ee543bf2df5 Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Fixup unit tests for cpu_queue_job() in mem_region.c	Stewart Smith	2018-07-18	5	-5/+21
\| \| \| \| \|	Fixes: 06808a037d44231ba36e814ff1dbf66bc8b707da Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	npu2/hw-procedures: Enable parity and credit overflow checks	Reza Arbab	2018-07-17	3	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Enable these error checking features by setting the appropriate bits in our one-off initialization of each "NTL Misc Config 2" register. The exception is NDL RX parity checking, which should be disabled during the link training procedures. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	npu2/hw-procedures: Don't open code NPU2_NTL_MISC_CFG2_BRICK_ENABLE	Reza Arbab	2018-07-17	2	-6/+8
\| \| \| \| \| \| \| \| \|	Name this bit properly. There's a lot more cleanup like this to be done, but I'm catching this one now as part of some related changes. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	lpc: Silence LPC SYNC no-response error when necessary	Andrew Jeffery	2018-07-17	5	-28/+72
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add the ability to silence particular errors from the LPC bus when they can be expected, particularly: LPC[000]: Got SYNC no-response error. Error address reg: 0xd001002f This is necessary on platform exit on some astbmc machines to avoid unnecessary noise in the msglog. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	ast-io: Rework setup/tear-down of communication with the BMC	Andrew Jeffery	2018-07-17	7	-30/+174
\| \| \| \| \| \| \| \| \|	It's possible for the platform to configure the BMC with SuperIO access disabled. Rework the interfaces to report failures if SuperIO is not enabled, and clean up once we're finished. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	ast-bmc: Rename LPC FW cycle helpers	Andrew Jeffery	2018-07-17	3	-6/+6
\| \| \| \| \| \| \| \|	Introduce some consistency for readability and make the names better reflect the nature of the tests. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/pci-quirk: Clean up commented code in quirk_astbmc_vga()	Andrew Jeffery	2018-07-17	1	-13/+0
\| \| \| \| \| \| \|	Also remove the comment associated with the commented code. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/pci-quirk: Remove broken comment in quirk_astbmc_vga()	Andrew Jeffery	2018-07-17	1	-15/+0
\| \| \| \| \| \| \| \|	The comment talks about one mechanism to handle the quirk whilst the code uses another. Avoid confusion by removing the comment. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	ast-bmc: Move copy routines to ast-sf-ctrl	Andrew Jeffery	2018-07-17	3	-100/+94
\| \| \| \| \| \| \| \| \| \| \|	The only user was hw/ast-bmc/ast-sf-ctrl.c, and for accessing flash the copy routines require knowledge of the PNOR LPC offset. For systems using MBOX the ast-sf-ctrl implementation is unused, so move the offset initialisation out of the common code-path and the copy routines to the place where they are necessary. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	doc: Add documentation on supported platforms and CPUs	Michael Ellerman	2018-07-17	2	-0/+61
\| \| \| \| \| \| \|	This adds some info on the platforms and CPUs skiboot supports. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/flash: Emit a warning if Skiboot version doesn't match	Samuel Mendoza-Jonas	2018-07-17	1	-0/+4
\| \| \| \| \|	Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Recognise signed VERSION partition	Samuel Mendoza-Jonas	2018-07-17	4	-10/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A few things need to change to support a signed VERSION partition: - A signed VERSION partition will be 4K + SECURE_BOOT_HEADERS_SIZE (4K). - The VERSION partition needs to be loaded after secure/trusted boot is set up, and therefore after nvram_init(). - Added to the trustedboot resources array. This also moves the ipmi_dt_add_bmc_info() call to after flash_dt_add_fw_version() since it adds info to ibm,firmware-versions. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	mem_check(): Correct alignment assumptions	Stewart Smith	2018-07-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Back in the dim dark past, mem_check() was written to take the assumption that mem regions need to be sizeof(alloc_hdr) aligned. I can't see any real reason for this, so change it to sizeof(long) aligned as we count space by number of longs, so at least that kind of makes sense. We hit this assert in a future patch when preserving BOOTKERNEL across fast reboots. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	mem_region: log region name on mem_alloc failure	Stewart Smith	2018-07-16	1	-2/+2
\| \| \| \| \| \| \|	This can help with debugging when trying to do node local allocations. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	Scan PCI and clear memory simultaneously	Stewart Smith	2018-07-16	3	-19/+40
\| \| \| \| \| \| \| \|	For many systems, scanning PCI takes about as much time as zeroing all of RAM, so we may as well do them at the same time and cut a few seconds off the total fast reboot time. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	fast-reboot: parallel memory clearing	Stewart Smith	2018-07-16	3	-4/+103
\| \| \| \| \| \| \| \| \| \| \| \|	Arbitrarily pick 16GB as the unit of parallelism, and split up clearing memory into jobs and schedule them node-local to the memory (or on node 0 if we can't work that out because it's the memory up to SKIBOOT_BASE) This seems to cut at least ~40% time from memory zeroing on fast-reboot on a 256GB Boston system. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	cpu: add cpu_queue_job_on_node()	Nicholas Piggin	2018-07-15	15	-70/+230
\| \| \| \| \| \| \| \| \| \|	Add a job scheduling API which will run the job on the requested chip_id (or return failure). Includes test harness fixes from Stewart. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	skiboot 6.1 release notes	Stewart Smith	2018-07-11	1	-0/+651
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	skiboot 6.0.5 release notes	Stewart Smith	2018-07-11	1	-0/+118
\| \| \| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com> (cherry picked from commit 6da102053f99765d8c973805745e0255d44b3e57) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	capi: Select the correct IODA table entry for the mbt cache.	Christophe Lombard	2018-07-10	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	With the current code, the capi mmio window is not correctly configured in the IODA table entry. The first entry (generally the non-prefetchable BAR) is overwrriten. This patch sets the capi window bar at the right place. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	cpu: Ensure no-return flag is updated for current cpu_thread	Vaibhav Jain	2018-07-10	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Presently in case a cpu_thread queues a non returning job on itself, the variable cpu_thread.job_has_no_return is never updated and other cpu_threads can still queue a job on it without triggering any warnings. So this patch updates __cpu_queue_job() to ensure that job_has_no_return is updated on the current cpu_thread before it branches to the job->func(). So if the current job is non-returning then other cpu_threads queuing a job on this cpu will trigger a warning. This should aid in debugging some skiboot deadlocks. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	npu2/hw-procedures: Fence bricks via NTL instead of MISC	Reza Arbab	2018-07-10	1	-24/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a couple of places we can set/unset fence for a brick: 1. MISC register: NPU2_MISC_FENCE_STATE 2. NTL register for the brick: NPU2_NTL_MISC_CFG1(ndev) Recent testing of ATS in combination with GPU reset has exposed a side effect of using (1); if fence is set for all six bricks, it triggers a sticky nmmu latch which prevents the NPU from getting ATR responses. This manifests as a hang in the tests. We have npu2_dev_fence_brick() which uses (1), and only two calls to it. Replace the call which sets fence with a write to (2). Remove the corresponding unset call entirely. It's unneeded because the procedures already do a progression from full fence to half to idle using (2). Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4/capp: Calculate STQ/DMA read engines based on link-width for PEC	Vaibhav Jain	2018-07-04	2	-9/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Presently in CAPI mode the number of STQ/DMA-read engines allocated on PEC2 for CAPP is fixed to 6 and 0-30 respectively irrespective of the PCI link width. These values are only suitable for x8 cards and quickly run out if a x16 card is plugged to a PEC2 attached slot. This usually manifests as CAPP reporting TLBI timeout due to these messages getting stalled due to insufficient STQs. To fix this we update enable_capi_mode() to check if PEC2 chiplet is in x16 mode and if yes then we allocate 4/0-47 STQ/DMA-read engines for the CAPP traffic. Cc: stable # v5.7+ Fixes: 37ea3cfdc852("capi: Enable capi mode for PHB4") Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	occ: sensors: Fix the size of the phandle array 'sensors' in DT	Shilpasri G Bhat	2018-07-04	1	-2/+2
\| \| \| \| \| \|	Fixes: 99505c03f493 ("sensor-groups: occ: Add support to disable/enable sensor group") Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core: always flush console before stopping	Nicholas Piggin	2018-07-04	2	-2/+6
\| \| \| \| \| \| \| \|	This catches a few cases (e.g., fast reboot failure messages) that don't always make it to the console before the machine is rebooted. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/cpu: parallelise global CPU register setting jobs	Nicholas Piggin	2018-07-04	1	-10/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a 176 thread system, before: [ 122.319923233,5] OPAL: Switch to big-endian OS [ 126.317897467,5] OPAL: Switch to little-endian OS after: [ 212.439299889,5] OPAL: Switch to big-endian OS [ 212.469323643,5] OPAL: Switch to little-endian OS Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Delay training till after PERST is deasserted	Michael Neuling	2018-07-03	1	-0/+14
\| \| \| \| \| \| \| \|	This helps some cards train on the second PERST (ie fast-reboot). The reason is not clear why but it helps, so YOLO! Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Move training trace logging to next state.	Michael Neuling	2018-07-03	1	-2/+2
\| \| \| \| \| \| \| \|	I'm going to defer training to this state soon, so move the tracing first. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	phb4: Minimise wait when moving through FRESET states	Michael Neuling	2018-07-03	1	-1/+1
\| \| \| \| \| \| \| \|	We want to get through this as fast as possible so minimise by removing msecs_to_tb() call. Changes number passed from 512 -> 1. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	occ: Update Dynamic Data comment block with new GPU presence fields	Andrew Donnellan	2018-07-03	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	Document new GPU presence fields in the comment block next to struct occ_dynamic_data. Fixes: 9b394a32c8ea ("occ: Add support for GPU presence detection") Suggested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	npu2: Use same compatible string for NVLink and OpenCAPI link nodes in ↵	Andrew Donnellan	2018-07-03	7	-16/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	device tree Currently, we distinguish between NPU links for NVLink devices and OpenCAPI devices through the use of two different compatible strings - ibm,npu-link and ibm,npu-link-opencapi. As we move towards supporting configurations with both NVLink and OpenCAPI devices behind a single NPU, we need to detect the device type as part of presence detection, which can't happen until well after the point where the HDAT or platform code has created the NPU device tree nodes. Changing a node's compatible string after it's been created is a bit ugly, so instead we should move the device type to a new property which we can add to the node later on. Get rid of the ibm,npu-link-opencapi compatible string, add a new ibm,npu-link-type property, and a helper function to check the link type. Add an "unknown" device type in preparation for later patches to detect device type dynamically. These device tree bindings are entirely internal to skiboot and are not consumed directly by Linux, so this shouldn't break anything (other than internal BML lab environments). Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	pmem: volatile bindings for the poorly enabled	Balbir singh	2018-06-27	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \|	PMEM_DISK bindings were added, but they rely on a rather recent mmap feature. This patch steals from those bindings to add volatile bindings. I've used these bindings with PMEM_VOLATILE to launch an instance with the publicly available systemsim-p9. The bindings are volatile and one should not expect any data to be saved/retrieved. Signed-off-by: Balbir singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	occ: Add support for GPU presence detection	Andrew Donnellan	2018-06-27	2	-3/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On the Witherspoon platform, we need to distinguish between NVLink GPUs and OpenCAPI accelerators. In order to do this, we first need to find out whether the SXM2 socket is populated. On Witherspoon, the SXM2 socket's presence detection pin is only visible via I2C from the APSS, and thus can only be exposed to the host via the OCC. The OCC, per OCC Firmware Interface Specification for POWER9 version 0.22, now exposes this to skiboot through a field in the dynamic data shared memory. Add the necessary dynamic data changes required to read the version and GPU presence fields. Add a function, occ_get_gpu_presence(), that can be used to check GPU presence. If the OCC isn't reporting presence (old OCC firmware, or some other reason), we default to assuming there is a device present and wait until link training to fail. This will be used in later patches to fix up the NPU2 probe path for OpenCAPI support on Witherspoon. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	occ: Move occ declarations into occ.h	Andrew Donnellan	2018-06-27	11	-29/+41
\| \| \| \| \| \| \| \| \| \| \| \| \|	OCC declarations are currently split between skiboot.h and occ-sensor.h. Given the growing unwieldyness of skiboot.h it's probably time to move it all into one header. Rename occ-sensor.h to occ.h, move all OCC-related declarations out of skiboot.h, and add #includes as necessary. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	init, occ: Initialise OCC earlier on BMC systems	Andrew Donnellan	2018-06-27	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to use the OCC to obtain presence data for the SXM2 slots on Witherspoon systems. This is needed to determine device type for NVLink GPUs and OpenCAPI devices which can be plugged into the same slot. Support for this will be implemented in a future patch. Currently, OCC initialisation is done just before handing over to Linux, which is well after NPU probe. On FSP systems, OCC boot starts very late, so we wait until the last possible moment to initialise the skiboot side in order to give it the maximum time to boot. On BMC systems, OCC boot starts earlier, so there aren't any issues in moving it earlier in the skiboot init sequence. When running on a BMC machine, call occ_pstates_init() as early as possible in the init sequence. On FSP machines, continue to call it from its current location. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	slw: Fix trivial typo in debug message	Andrew Donnellan	2018-06-27	1	-1/+1
\| \| \| \| \| \| \|	s/goint/going/ Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>