talos-skiboot - Talos™ II skiboot sources

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	libffs: Standardise ffs partition flags	Cyril Bur	2018-04-09	9	-9/+122
\| \| \| \| \| \| \| \| \| \| \| \|	It seems we've developed a character respresentation for ffs partition flags. Currently only pflash really prints them so it hasn't been a problem but now ffspart wants to read them in from user input. It is important that what libffs reads and what pflash prints remain consistent, we should move the code into libffs to avoid problems. Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	external/ffspart: Allow # comments in input file	Cyril Bur	2018-04-09	1	-0/+4
\| \| \| \| \|	Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	skiboot-5.11 release notes	Stewart Smith	2018-04-06	1	-0/+828
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	core/cpufeatures: Fix setting DARN and SCV HWCAP feature bits	Nicholas Piggin	2018-04-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	DARN and SCV has been assigned AT_HWCAP2 (32-63) bits: #define PPC_FEATURE2_DARN 0x00200000 /* darn random number insn / #define PPC_FEATURE2_SCV 0x00100000 / scv syscall */ A cpufeatures-aware OS will not advertise these to userspace without this patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	core/cpu: Prevent clobbering of stack guard for boot-cpu	Vaibhav Jain	2018-04-04	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 90d53934c2da ("core/cpu: discover stack region size before initialising memory regions") introduced memzero for struct cpu_thread in init_cpu_thread(). This has an unintended side effect of clobbering the stack-guard cannery of the boot_cpu stack. This results in opal failing to init with this failure message: CPU: P9 generation processor (max 4 threads/core) CPU: Boot CPU PIR is 0x0004 PVR is 0x004e1200 Guard skip = 0 Stack corruption detected ! Aborting! CPU 0004 Backtrace: S: 0000000031c13ab0 R: 0000000030013b0c .backtrace+0x5c S: 0000000031c13b50 R: 000000003001bd18 ._abort+0x60 S: 0000000031c13be0 R: 0000000030013bbc .__stack_chk_fail+0x54 S: 0000000031c13c60 R: 00000000300c5b70 .memset+0x12c S: 0000000031c13d00 R: 0000000030019aa8 .init_cpu_thread+0x40 S: 0000000031c13d90 R: 000000003001b520 .init_boot_cpu+0x188 S: 0000000031c13e30 R: 0000000030015050 .main_cpu_entry+0xd0 S: 0000000031c13f00 R: 0000000030002700 boot_entry+0x1c0 So the patch provides a fix by tweaking the memset() call in init_cpu_thread() to skip over the stack-guard cannery. Fixes:90d53934c2da("core/cpu: discover stack region size before initialising memory regions") Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	travis: add -L to all curl invocations to follow redirects	Stewart Smith	2018-04-05	8	-13/+18
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	skiboot 5.10.4 release notes	Stewart Smith	2018-04-04	1	-0/+28
\| \| \| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.ibm.com> (cherry picked from commit 604f758b2cbf6629bb2ef3b0e0637ffd7dde472b) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
*	xive: disable store EOI support	Cédric Le Goater	2018-04-03	3	-7/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hardware has limitations which would require to put a sync after each store EOI to make sure the MMIO operations that change the ESB state are ordered. This is a killer for performance and the PHBs do not support the sync. So remove the store EOI for the moment, until hardware is improved. Also, while we are at changing the XIVE source flags, let's fix the settings for the PHB4s which should follow these rules : - SHIFT_BUG for DD10 - STORE_EOI for DD20 and if enabled - TRIGGER_PAGE for DDx0 and if not STORE_EOI Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	doc: Document ibm,heartbeat-ms	Stewart Smith	2018-04-03	1	-0/+21
\| \| \| \| \| \| \| \|	and by 'document', I mean 'gather a few extra minor bits of breadcrumbs of clues as to what, when and why this thing exists'. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	phb4: Reset FIR/NFIR registers before PHB4 probe	Vaibhav Jain	2018-04-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function phb4_probe_stack() resets "ETU Reset Register" to unfreeze the PHB before it performs mmio access on the PHB. However in case the FIR/NFIR registers are set while entering this function, the reset of "ETU Reset Register" wont unfreeze the PHB and it will remain fenced. This leads to failure during initial CRESET of the PHB as mmio access is still not enabled and an error message of the form below is logged: PHB#0000[0:0]: Initializing PHB4... PHB#0000[0:0]: Default system config: 0xffffffffffffffff PHB#0000[0:0]: New system config : 0xffffffffffffffff PHB#0000[0:0]: Initial PHB CRESET is 0xffffffffffffffff PHB#0000[0:0]: Waiting for DLP PG reset to complete... <snip> PHB#0000[0:0]: Timeout waiting for DLP PG reset ! PHB#0000[0:0]: Initialization failed This is especially seen happening during the MPIPL flow where SBE would quiesces and fence the PHB so that it doesn't stomp on the main memory. However when skiboot enters phb4_probe_stack() after MPIPL, the FIR/NFIR registers are set forcing PHB to re-enter fence after ETU reset is done. So to fix this issue the patch introduces new xscom writes to phb4_probe_stack() to reset the FIR/NFIR registers before performing ETU reset to enable mmio access to the PHB. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	capi: Poll Err/Status register during CAPP recovery	Vaibhav Jain	2018-04-03	1	-17/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch updates do_capp_recovery_scoms() to poll the CAPP Err/Status control register, check for CAPP-Recovery to complete/fail based on indications of BITS-1,5,9 and then proceed with the CAPP-Recovery scoms iif recovery completed successfully. This would prevent cases where we bring-up the PCIe link while recovery sequencer on CAPP is still busy with casting out cache lines. In case CAPP-Recovery didn't complete successfully an error is returned from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4 fenced and mark it as broken. The loop that implements polling of Err/Status register will also log an error on the PHB when it continues for more than 168ms which is the max time to failure for CAPP-Recovery. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Acked-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	core/lock.c: ensure valid start value for lock spin duration warning	Stewart Smith	2018-04-03	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous fix in a8e6cc3f4 only addressed half of the problem, as we could also get an invalid value for start, causing us to fail in a weird way. This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS test in op-test-framework. You'd get to this part of the test and get the erroneous lock spinning warnings: PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000 0000080000000000 [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms [ 790.140976918,4] WARNING: Lock has been spinning for 790275ms This patch checks the validity of timebase before setting start, and only checks the lock timeout if we got a valid start value. Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8 Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	skiboot 5.11-rc1 release notes	Stewart Smith	2018-03-28	1	-0/+694
\| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	skiboot 5.10.3 release notes	Stewart Smith	2018-03-28	1	-0/+82
\| \| \| \| \| \|	Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a6c62330f8b08032434f7cf9587ac5bfb79ffe91) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	Fix 'make check' compile for mem_clear_range	Stewart Smith	2018-03-28	1	-2/+3
\| \| \| \| \| \| \| \|	We play funny business with printf format specifiers because of how we do unit tests. Fixes: c32943bfc1e254176ecab564fdb4752403a48cab Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	core/cpu: discover stack region size before initialising memory regions	Nicholas Piggin	2018-03-27	5	-40/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stack allocation first allocates a memory region sized to hold stacks for all possible CPUs up to the maximum PIR of the architecture, zeros the region, then initialises all stacks. Max PIR is 32768 on POWER9, which is 512MB for stacks. The stack region is then shrunk after CPUs are discovered, but this is a bit of a hack, and it leaves a hole in the memory allocation regions as it's done after mem regions are initialised. 0x000000000000..00002fffffff : ibm,os-reserve - OS 0x000030000000..0000303fffff : ibm,firmware-code - OPAL 0x000030400000..000030ffffff : ibm,firmware-heap - OPAL 0x000031000000..000031bfffff : ibm,firmware-data - OPAL 0x000031c00000..000031c0ffff : ibm,firmware-stacks - OPAL * gap * 0x000051c00000..000051d01fff : ibm,firmware-allocs-memory@0 - OPAL 0x000051d02000..00007fffffff : ibm,firmware-allocs-memory@0 - OS 0x000080000000..000080b3cdff : initramfs - OPAL 0x000080b3ce00..000080b7cdff : ibm,fake-nvram - OPAL 0x000080b7ce00..0000ffffffff : ibm,firmware-allocs-memory@0 - OS This change moves zeroing into the per-cpu stack setup. The boot CPU stack is set up based on the current PIR. Then the size of the stack region is set, by discovering the maximum PIR of the system from the device tree, before mem regions are intialised. This results in all memory being accounted within memory regions, and less memory fragmentation of OPAL allocations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	hw/imc: don't access homer memory if it was not initialised	Nicholas Piggin	2018-03-27	1	-0/+3
\| \| \| \| \| \| \| \|	This can happen under mambo, at least. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	nvram: run nvram_validate() after nvram_reformat()	Nicholas Piggin	2018-03-27	2	-3/+8
\| \| \| \| \| \| \| \| \| \|	nvram_reformat() sets nvram_valid = true, but it does not set skiboot_part_hdr. Call nvram_validate() instead, which sets everything up properly. Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	core/fast-reboot: zero memory after fast reboot	Nicholas Piggin	2018-03-27	2	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This improves the security and predictability of the fast reboot environment. There can not be a secure fence between fast reboots, because a malicious OS can modify the firmware itself. However a well-behaved OS can have a reasonable expectation that OS memory regions it has modified will be cleared upon fast reboot. The memory is zeroed after all other CPUs come up from fast reboot, just before the new kernel is loaded and booted into. This allows image preloading to run concurrently, and will allow parallelisation of the clearing in future. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	mem-map: Use a symbolic constant for exception vector size	Nicholas Piggin	2018-03-27	3	-9/+16
\| \| \| \| \|	Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	core/fast-reboot: verify mem regions before fast reboot	Nicholas Piggin	2018-03-27	4	-7/+39
\| \| \| \| \| \| \| \| \| \| \| \| \|	Run the mem_region sanity checkers before proceeding with fast reboot. This is the beginning of proactive sanity checks on opal data for fast reboot (with complements the reactive disable_fast_reboot cases). This is encouraged to re-use and share any kind of debug code and unit test code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	npu2: Add performance tuning SCOM inits	Reza Arbab	2018-03-27	1	-1/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Peer-to-peer GPU bandwidth latency testing has produced some tunable values that improve performance. Add them to our device initialization. File these under things that need to be cleaned up with nice #defines for the register names and bitfields when we get time. A few of the settings are dependent on the system's particular NVLink topology, so introduce a helper to determine how many links go to a single GPU. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	hw/npu2: Assign a unique LPARSHORTID per GPU	Alistair Popple	2018-03-27	1	-0/+1
\| \| \| \| \| \| \| \| \|	This gets used elsewhere to index items in the XTS tables. Signed-off-by: Alistair Popple <alistair@popple.id.au> [arbab@linux.vnet.ibm.com: Added commit log] Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	NPU2: dump NPU2 registers on npu2 HMI	Stewart Smith	2018-03-27	1	-2/+73
\| \| \| \| \| \| \| \| \| \| \|	Due to the nature of debugging npu2 issues, folk are wanting the full list of NPU2 registers dumped when there's a problem. We have to list out each register as traversing the range triggers FIR bits that confuse PRD. Suggested-by: Ryan Black <rblack@us.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	Revert "NPU2 HMIs: dump out a LOT of npu2 registers for debugging"	Stewart Smith	2018-03-27	5	-69/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17. We don't need this as we need to do it a different way, with a explicit set of registers as otherwise we trip other random FIR bits and everything becomes even more terrible. I suggest alcohol. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	dts: Zero struct to avoid using uninitialised value	Cyril Bur	2018-03-27	1	-2/+2
\| \| \| \| \|	Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	hw/imc: Don't dereference possible NULL	Cyril Bur	2018-03-27	1	-1/+2
\| \| \| \| \| \|	Fixes: CID 263056 and 263052 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they exist	Cyril Bur	2018-03-27	1	-5/+7
\| \| \| \| \| \| \|	Fixes: ac4272bf ("fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgt") Fixes: CID 263053 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	libstb/create-container: munmap() signature file address	Cyril Bur	2018-03-27	2	-19/+21
\| \| \| \| \| \| \| \| \| \|	I also couldn't ignore that the same function had both a void infile and a char inFile. The inFile variable is clearly a filename, why not call it that. Fixes: CID 263054 and 263051 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	npu2-opencapi: Fix memory leak	Cyril Bur	2018-03-27	1	-1/+1
\| \| \| \| \| \|	Fixes: CID 264267 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	npu2: Fix possible NULL dereference	Cyril Bur	2018-03-27	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The follow pattern exists in several npu2 functions: struct phb phb = pci_get_phb(phb_id); struct npu2 p = phb_to_npu2_nvlink(phb); The problem is that pci_get_phb() can return NULL and phb_to_npu2_nvlink() dereferences its parameter. Coverity says that the return value of pci_get_phb() is checked 43 out of 46 times which suggests we should be more careful. Futhurmore, functions with the baddly placed call to phb_to_npu2_nvlink() do seem to check that the return value of pci_get_phb() isn't NULL, but this check would be too little too late. This patch just moves the call of phb_to_npu2_nvlink() to after the NULL check for the return value of pci_get_phb(). Affected functions are: opal_npu_map_lpar() opal_npu_init_context() opal_npu_destroy_context() Fixes: CID 264274, 264273, 264272, 264271, 264266, 264265 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	occ-sensors: Remove NULL checks after dereference	Cyril Bur	2018-03-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both scale_sensor() and scale_energy() take the value to scale as a pointer. These functions do not NULL check the pointer before the first time they dereference it, which is fine since passing NULL would be completely pointless. Both functions do perform a pointless NULL check later on. This confuses coverity and really doesn't make much sense at all. Since calling these functions with NULL as the sensor parameter makes no sense, and currently theres a dereference before the check, just remove the check. Fixes: CID 264276 and 264275 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	mbox: Reduce default BMC timeouts	Cyril Bur	2018-03-27	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin for 70 seconds waiting for a BMC to come back. This also makes the current default of 30 seconds a bit pointless, is it far too short to be a worse case wait time but too long to avoid hitting hardlockup detectors and wrecking havoc inside host linux. Just change it to three seconds so that host linux will survive and that, reads and writes will fail but at least the host stays up. Also refactored the waiting loop just a bit so that it's easier to read. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	mbox: Harden against BMC daemon errors	Cyril Bur	2018-03-27	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bugs present in the BMC daemon mean that skiboot gets presented with mbox windows of size zero. These windows cannot be valid and skiboot already detects these conditions. Currently skiboot warns quite strongly about the occurrence of these problems. The problem for skiboot is that it doesn't take any action. Initially I wanting to avoid putting policy like this into skiboot but since these bugs aren't going away and skiboot barfing is leading to lockups and ultimately the host going down something needs to be done. I propose that when we detect the problem we fail the mbox call and punt the problem back up to Linux. I don't like it but at least it will cause errors to cascade and won't bring the host down. I'm not sure how Linux is supposed to detect this or what it can even do but this is better than a crash. Diagnosing a failure to boot if skiboot its self fails to read flash may be marginally more difficult with this patch. This is because skiboot will now only print one warning about the zero sized window rather than continuously spitting it out. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	occ: Set up OCC messaging even if we fail to setup pstates	Stewart Smith	2018-03-27	1	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This means that we no longer hit this bug if we fail to get valid pstates from the OCC. [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 [ 10.318805] Disabling lock debugging due to kernel taint [ 10.318808] Severe Machine check interrupt [Not recovered] [ 10.318812] NIP [000000003003e434]: 0x3003e434 [ 10.318813] Initiator: CPU [ 10.318815] Error type: Real address [Load/Store (foreign)] [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3 [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240 [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1) [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000 [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1 Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> [Additional changes from Shilpa] Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
*	Merge pull request #161 from dcrowell77/gardtool	oohal	2018-03-26	1	-3/+4
\|\ \| \| \| \|	Make gard display show that a record is cleared
\| *	Make gard display show that a record is cleared	Dan Crowell	2018-03-14	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When clearing gard records, Hostboot only modifies the record_id portion to be 0xFFFFFFFF. The remainder of the entry remains. Without this change it can be confusing to users to know that the record they are looking at is no longer valid.
* \|	Revert "platforms/astbmc/slots.c: Allow comparison of bus numbers when ↵	Stewart Smith	2018-03-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	matching slots" This reverts commit bda7cc4d0354eb3f66629d410b2afc08c79f795f. Ben says: It's on purpose that we do NOT compare the bus numbers, they are always 0 in the slot table we do a hierarchical walk of the tree, matching only the devfn's along the way bcs the bus numbering isn't fixed this breaks all slot naming etc... stuff on anything using the "skiboot" slot tables (P8 opp typically) Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	xive: fix opal_xive_set_vp_info() error path	Cédric Le Goater	2018-03-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of error, opal_xive_set_vp_info() will return without unlocking the xive object. This is most certainly a typo. Signed-off-by: Cédric Le Goater <clg@kaod.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	npu2: Remove DD1 support	Andrew Donnellan	2018-03-22	4	-88/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Major changes in the NPU between DD1 and DD2 necessitated a fair bit of revision-specific code. Now that all our lab machines are DD2, we no longer test anything on DD1 and it's time to get rid of it. Remove DD1-specific code and abort probe if we're running on a DD1 machine. Cc: Alistair Popple <alistair@popple.id.au> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	npu2: Remove unused fields in struct npu2	Andrew Donnellan	2018-03-22	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trivial cleanup of two unused fields in struct npu2. Cc: Alistair Popple <alistair@popple.id.au> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errors	Nicholas Piggin	2018-03-22	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This disables fast reboot in several more cases where serious errors like lock corruption or call re-entrancy are detected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	core/opal: allow some re-entrant calls	Nicholas Piggin	2018-03-22	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows a small number of OPAL calls to succeed despite re-entering the firmware, and rejects others rather than aborting. This allows a system reset interrupt that interrupts OPAL to do something useful. Sreset other CPUs, use the console, which allows xmon to work or stack traces to be printed, reboot the system. Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is used for many other things that does not mean a serious permanent error. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	core/opal: abort in case of re-entrant OPAL call	Nicholas Piggin	2018-03-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The stack is already destroyed by the time we get here, so there is not much point continuing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	phb*: Remove the state field in the various phb structures	Oliver O'Halloran	2018-03-22	7	-233/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've been carting around this field since the original p7ioc-phb code. As far as I can tell we never actually use it for anything other than checking if the PHB has been marked as broken or not. The _FENCED state is set in a few places, but we never use it in favour of just checking the MMIO register. This patch just replaces it with a boolean that indicates if the PHB has been marked as broken and removes the giant, mostly wrong, comment explaining it's usage that is copied and pasted into each phb header file. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	SLW: Increase stop4-5 residency by 10x	Akshay Adiga	2018-03-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using DGEMM benchmark we observed there was a drop of 5-9% throughput with and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup and provide the subsequent data block to compute. The wakup latency accumulates over the run and shows up as a performance drop. Linux enters stop4/5 more aggressively for its wakeup latency. Increasing the residency from 1ms to 10ms makes the performance drop <1% Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	Reserve OPAL API number for opal_handle_hmi2 function.	Mahesh Salgaonkar	2018-03-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Requested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	dts: spl_wakeup: Remove all workarounds in the spl wakeup logic	Shilpasri G Bhat	2018-03-14	2	-59/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We coded few workarounds in special wakeup logic to handle the buggy firmware. Now that is fixed remove them as they break the special wakeup protocol. As per the spec we should not de-assert beofre assert is complete. So follow this protocol. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	npu2: Disable fast reboot	Reza Arbab	2018-03-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fast reboot does not yet work right with the NPU. It's been disabled on NVLink and OpenCAPI machines. Do the same for NVLink2. This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset") from the npu code to npu2. Cc: stable # 5.10.x Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
* \|	Add VESNIN platform support	Artem Senichev	2018-03-14	2	-1/+264
\|/ \| \| \| \|	Signed-off-by: Artem Senichev <a.senichev@yadro.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>