| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
It seems we've developed a character respresentation for ffs partition
flags. Currently only pflash really prints them so it hasn't been a
problem but now ffspart wants to read them in from user input.
It is important that what libffs reads and what pflash prints remain
consistent, we should move the code into libffs to avoid problems.
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DARN and SCV has been assigned AT_HWCAP2 (32-63) bits:
#define PPC_FEATURE2_DARN 0x00200000 /* darn random number insn */
#define PPC_FEATURE2_SCV 0x00100000 /* scv syscall */
A cpufeatures-aware OS will not advertise these to userspace without
this patch.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 90d53934c2da ("core/cpu: discover stack region size before
initialising memory regions") introduced memzero for struct cpu_thread
in init_cpu_thread(). This has an unintended side effect of clobbering
the stack-guard cannery of the boot_cpu stack. This results in opal
failing to init with this failure message:
CPU: P9 generation processor (max 4 threads/core)
CPU: Boot CPU PIR is 0x0004 PVR is 0x004e1200
Guard skip = 0
Stack corruption detected !
Aborting!
CPU 0004 Backtrace:
S: 0000000031c13ab0 R: 0000000030013b0c .backtrace+0x5c
S: 0000000031c13b50 R: 000000003001bd18 ._abort+0x60
S: 0000000031c13be0 R: 0000000030013bbc .__stack_chk_fail+0x54
S: 0000000031c13c60 R: 00000000300c5b70 .memset+0x12c
S: 0000000031c13d00 R: 0000000030019aa8 .init_cpu_thread+0x40
S: 0000000031c13d90 R: 000000003001b520 .init_boot_cpu+0x188
S: 0000000031c13e30 R: 0000000030015050 .main_cpu_entry+0xd0
S: 0000000031c13f00 R: 0000000030002700 boot_entry+0x1c0
So the patch provides a fix by tweaking the memset() call in
init_cpu_thread() to skip over the stack-guard cannery.
Fixes:90d53934c2da("core/cpu: discover stack region size before initialising memory regions")
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 604f758b2cbf6629bb2ef3b0e0637ffd7dde472b)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hardware has limitations which would require to put a sync after each
store EOI to make sure the MMIO operations that change the ESB state
are ordered. This is a killer for performance and the PHBs do not
support the sync. So remove the store EOI for the moment, until
hardware is improved.
Also, while we are at changing the XIVE source flags, let's fix the
settings for the PHB4s which should follow these rules :
- SHIFT_BUG for DD10
- STORE_EOI for DD20 and if enabled
- TRIGGER_PAGE for DDx0 and if not STORE_EOI
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
and by 'document', I mean 'gather a few extra minor bits of breadcrumbs
of clues as to what, when and why this thing exists'.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function phb4_probe_stack() resets "ETU Reset Register" to
unfreeze the PHB before it performs mmio access on the PHB. However in
case the FIR/NFIR registers are set while entering this function,
the reset of "ETU Reset Register" wont unfreeze the PHB and it will
remain fenced. This leads to failure during initial CRESET of the PHB
as mmio access is still not enabled and an error message of the form
below is logged:
PHB#0000[0:0]: Initializing PHB4...
PHB#0000[0:0]: Default system config: 0xffffffffffffffff
PHB#0000[0:0]: New system config : 0xffffffffffffffff
PHB#0000[0:0]: Initial PHB CRESET is 0xffffffffffffffff
PHB#0000[0:0]: Waiting for DLP PG reset to complete...
<snip>
PHB#0000[0:0]: Timeout waiting for DLP PG reset !
PHB#0000[0:0]: Initialization failed
This is especially seen happening during the MPIPL flow where SBE
would quiesces and fence the PHB so that it doesn't stomp on the main
memory. However when skiboot enters phb4_probe_stack() after MPIPL,
the FIR/NFIR registers are set forcing PHB to re-enter fence after ETU
reset is done.
So to fix this issue the patch introduces new xscom writes to
phb4_probe_stack() to reset the FIR/NFIR registers before performing
ETU reset to enable mmio access to the PHB.
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch updates do_capp_recovery_scoms() to poll the CAPP
Err/Status control register, check for CAPP-Recovery to complete/fail
based on indications of BITS-1,5,9 and then proceed with the
CAPP-Recovery scoms iif recovery completed successfully. This would
prevent cases where we bring-up the PCIe link while recovery sequencer
on CAPP is still busy with casting out cache lines.
In case CAPP-Recovery didn't complete successfully an error is returned
from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4
fenced and mark it as broken.
The loop that implements polling of Err/Status register will also log
an error on the PHB when it continues for more than 168ms which is the
max time to failure for CAPP-Recovery.
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous fix in a8e6cc3f4 only addressed half of the problem, as
we could also get an invalid value for start, causing us to fail
in a weird way.
This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS
test in op-test-framework.
You'd get to this part of the test and get the erroneous lock
spinning warnings:
PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000
0000080000000000
[ 790.140976993,4] WARNING: Lock has been spinning for 790275ms
[ 790.140976993,4] WARNING: Lock has been spinning for 790275ms
[ 790.140976918,4] WARNING: Lock has been spinning for 790275ms
This patch checks the validity of timebase before setting start,
and only checks the lock timeout if we got a valid start value.
Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee
Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
(cherry picked from commit a6c62330f8b08032434f7cf9587ac5bfb79ffe91)
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
We play funny business with printf format specifiers because
of how we do unit tests.
Fixes: c32943bfc1e254176ecab564fdb4752403a48cab
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stack allocation first allocates a memory region sized to hold stacks
for all possible CPUs up to the maximum PIR of the architecture, zeros
the region, then initialises all stacks. Max PIR is 32768 on POWER9,
which is 512MB for stacks.
The stack region is then shrunk after CPUs are discovered, but this is
a bit of a hack, and it leaves a hole in the memory allocation regions
as it's done after mem regions are initialised.
0x000000000000..00002fffffff : ibm,os-reserve - OS
0x000030000000..0000303fffff : ibm,firmware-code - OPAL
0x000030400000..000030ffffff : ibm,firmware-heap - OPAL
0x000031000000..000031bfffff : ibm,firmware-data - OPAL
0x000031c00000..000031c0ffff : ibm,firmware-stacks - OPAL
*** gap ***
0x000051c00000..000051d01fff : ibm,firmware-allocs-memory@0 - OPAL
0x000051d02000..00007fffffff : ibm,firmware-allocs-memory@0 - OS
0x000080000000..000080b3cdff : initramfs - OPAL
0x000080b3ce00..000080b7cdff : ibm,fake-nvram - OPAL
0x000080b7ce00..0000ffffffff : ibm,firmware-allocs-memory@0 - OS
This change moves zeroing into the per-cpu stack setup. The boot CPU
stack is set up based on the current PIR. Then the size of the stack
region is set, by discovering the maximum PIR of the system from the
device tree, before mem regions are intialised.
This results in all memory being accounted within memory regions,
and less memory fragmentation of OPAL allocations.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
This can happen under mambo, at least.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
nvram_reformat() sets nvram_valid = true, but it does not set
skiboot_part_hdr. Call nvram_validate() instead, which sets
everything up properly.
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves the security and predictability of the fast reboot
environment.
There can not be a secure fence between fast reboots, because a
malicious OS can modify the firmware itself. However a well-behaved
OS can have a reasonable expectation that OS memory regions it has
modified will be cleared upon fast reboot.
The memory is zeroed after all other CPUs come up from fast reboot,
just before the new kernel is loaded and booted into. This allows
image preloading to run concurrently, and will allow parallelisation
of the clearing in future.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Run the mem_region sanity checkers before proceeding with fast
reboot.
This is the beginning of proactive sanity checks on opal data
for fast reboot (with complements the reactive disable_fast_reboot
cases). This is encouraged to re-use and share any kind of debug
code and unit test code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Peer-to-peer GPU bandwidth latency testing has produced some tunable
values that improve performance. Add them to our device initialization.
File these under things that need to be cleaned up with nice #defines
for the register names and bitfields when we get time.
A few of the settings are dependent on the system's particular NVLink
topology, so introduce a helper to determine how many links go to a
single GPU.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
This gets used elsewhere to index items in the XTS tables.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
[arbab@linux.vnet.ibm.com: Added commit log]
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the nature of debugging npu2 issues, folk are wanting the
full list of NPU2 registers dumped when there's a problem.
We have to list out each register as traversing the range
triggers FIR bits that confuse PRD.
Suggested-by: Ryan Black <rblack@us.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17.
We don't need this as we need to do it a different way, with a explicit
set of registers as otherwise we trip other random FIR bits and everything
becomes even more terrible.
I suggest alcohol.
Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
Fixes: CID 263056 and 263052
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
Fixes: ac4272bf ("fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgt")
Fixes: CID 263053
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
I also couldn't ignore that the same function had both a void *infile
and a char *inFile. The inFile variable is clearly a filename, why not
call it that.
Fixes: CID 263054 and 263051
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
Fixes: CID 264267
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The follow pattern exists in several npu2 functions:
struct phb *phb = pci_get_phb(phb_id);
struct npu2 *p = phb_to_npu2_nvlink(phb);
The problem is that pci_get_phb() can return NULL and
phb_to_npu2_nvlink() dereferences its parameter. Coverity says that the
return value of pci_get_phb() is checked 43 out of 46 times which
suggests we should be more careful.
Futhurmore, functions with the baddly placed call to
phb_to_npu2_nvlink() do seem to check that the return value of
pci_get_phb() isn't NULL, but this check would be too little too late.
This patch just moves the call of phb_to_npu2_nvlink() to after the
NULL check for the return value of pci_get_phb().
Affected functions are:
opal_npu_map_lpar()
opal_npu_init_context()
opal_npu_destroy_context()
Fixes: CID 264274, 264273, 264272, 264271, 264266, 264265
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both scale_sensor() and scale_energy() take the value to scale as a
pointer. These functions do not NULL check the pointer before the first
time they dereference it, which is fine since passing NULL would be
completely pointless.
Both functions do perform a pointless NULL check later on. This
confuses coverity and really doesn't make much sense at all. Since
calling these functions with NULL as the sensor parameter makes no
sense, and currently theres a dereference before the check, just remove
the check.
Fixes: CID 264276 and 264275
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin for
70 seconds waiting for a BMC to come back. This also makes the current
default of 30 seconds a bit pointless, is it far too short to be a
worse case wait time but too long to avoid hitting hardlockup detectors
and wrecking havoc inside host linux.
Just change it to three seconds so that host linux will survive and
that, reads and writes will fail but at least the host stays up.
Also refactored the waiting loop just a bit so that it's easier to read.
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bugs present in the BMC daemon mean that skiboot gets presented with
mbox windows of size zero. These windows cannot be valid and skiboot
already detects these conditions.
Currently skiboot warns quite strongly about the occurrence of these
problems. The problem for skiboot is that it doesn't take any action.
Initially I wanting to avoid putting policy like this into skiboot but
since these bugs aren't going away and skiboot barfing is leading to
lockups and ultimately the host going down something needs to be done.
I propose that when we detect the problem we fail the mbox call and punt
the problem back up to Linux. I don't like it but at least it will cause
errors to cascade and won't bring the host down. I'm not sure how Linux
is supposed to detect this or what it can even do but this is better
than a crash.
Diagnosing a failure to boot if skiboot its self fails to read flash may
be marginally more difficult with this patch. This is because skiboot
will now only print one warning about the zero sized window rather than
continuously spitting it out.
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means that we no longer hit this bug if we fail to get valid pstates
from the OCC.
[console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
[ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 10.318805] Disabling lock debugging due to kernel taint
[ 10.318808] Severe Machine check interrupt [Not recovered]
[ 10.318812] NIP [000000003003e434]: 0x3003e434
[ 10.318813] Initiator: CPU
[ 10.318815] Error type: Real address [Load/Store (foreign)]
[ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
[ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
[ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
[ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
[ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
[ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
[Additional changes from Shilpa]
Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|\
| |
| | |
Make gard display show that a record is cleared
|
| |
| |
| |
| |
| |
| |
| | |
When clearing gard records, Hostboot only modifies the record_id
portion to be 0xFFFFFFFF. The remainder of the entry remains.
Without this change it can be confusing to users to know that
the record they are looking at is no longer valid.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
matching slots"
This reverts commit bda7cc4d0354eb3f66629d410b2afc08c79f795f.
Ben says:
It's on purpose that we do NOT compare the bus numbers,
they are always 0 in the slot table
we do a hierarchical walk of the tree, matching only the
devfn's along the way bcs the bus numbering isn't fixed
this breaks all slot naming etc... stuff on anything using
the "skiboot" slot tables (P8 opp typically)
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In case of error, opal_xive_set_vp_info() will return without
unlocking the xive object. This is most certainly a typo.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Major changes in the NPU between DD1 and DD2 necessitated a fair bit of
revision-specific code.
Now that all our lab machines are DD2, we no longer test anything on DD1
and it's time to get rid of it.
Remove DD1-specific code and abort probe if we're running on a DD1 machine.
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Trivial cleanup of two unused fields in struct npu2.
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This disables fast reboot in several more cases where serious errors
like lock corruption or call re-entrancy are detected.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This allows a small number of OPAL calls to succeed despite re-entering
the firmware, and rejects others rather than aborting.
This allows a system reset interrupt that interrupts OPAL to do something
useful. Sreset other CPUs, use the console, which allows xmon to work or
stack traces to be printed, reboot the system.
Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is
used for many other things that does not mean a serious permanent error.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The stack is already destroyed by the time we get here, so there
is not much point continuing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We've been carting around this field since the original p7ioc-phb code.
As far as I can tell we never actually use it for anything other than
checking if the PHB has been marked as broken or not. The _FENCED
state is set in a few places, but we never use it in favour of just
checking the MMIO register.
This patch just replaces it with a boolean that indicates if
the PHB has been marked as broken and removes the giant, mostly
wrong, comment explaining it's usage that is copied and pasted
into each phb header file.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Using DGEMM benchmark we observed there was a drop of 5-9% throughput with
and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup
and provide the subsequent data block to compute. The wakup latency
accumulates over the run and shows up as a performance drop.
Linux enters stop4/5 more aggressively for its wakeup latency. Increasing
the residency from 1ms to 10ms makes the performance drop <1%
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| | |
Requested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We coded few workarounds in special wakeup logic to handle the
buggy firmware. Now that is fixed remove them as they break the
special wakeup protocol. As per the spec we should not de-assert
beofre assert is complete. So follow this protocol.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fast reboot does not yet work right with the NPU. It's been disabled on
NVLink and OpenCAPI machines. Do the same for NVLink2.
This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset")
from the npu code to npu2.
Cc: stable # 5.10.x
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|/
|
|
|
| |
Signed-off-by: Artem Senichev <a.senichev@yadro.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|