| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reorder our link training steps so that they are executed on
fundamental reset instead of during the initial setup. Skiboot always
call a fundamental reset on all the PHBs during pci init.
It is done through a state machine, similarly to what is done for
'real' PHBs.
This is the first step for a longer term goal to be able to trigger an
adapter reset from linux. We'll need the reset callbacks of the PHB to
be defined. We have to handle the various delays differently, since a
linux thread shouldn't stay stuck waiting in opal for too long.
No functional changes.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework a bit the code to reset the opencapi adapter:
- make clearer which i2c pin is resetting which device
- break the reset operation in smaller chunks. This is really to
prepare for a future patch.
No functional changes.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Presence detection is not part of the opencapi specification. So each
platform may choose to implement it the way it wants.
All current platforms implement it through an i2c device where we can
query a pin to know if a device is connected or not. ZZ and Zaius have
a similar design and even use the same i2c information and pin
numbers.
However, presence detection on older ZZ planar (older than v4) doesn't
work, so we don't activate it for now, until our lab systems are
upgraded and it's better tested.
Presence detection on witherspoon is still being worked on. It's
shaping up to be quite different, so we may have to revisit the topic
in a later patch.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Currently if Linux boots with a non-zero PCR, things can go bad where
some early userspace programs can take illegal instructions. This is
being fixed in Linux, but in the mean time, we should cleanup in
skiboot also.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BMC Get device ID command gives BMC firmware version details. Lets add this
to device tree. User space tools will use this information to display BMC
version details.
Stewart,
I have added bmc information under /ibm,firmware-version node as its firmware
version. But may be we should add new node (/bmc/firmware). So that we can
keep BMC related information separately. Let me know your thoughts on this.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
hw/fsp/fsp.c:1011:17: warning: passing an object that undergoes default argument promotion to
'va_start' has undefined behavior [-Wvarargs]
va_start(list, add_words);
^
hw/fsp/fsp.c:1007:59: note: parameter of type 'u8' (aka 'unsigned char') is declared here
void fsp_fillmsg(struct fsp_msg *msg, u32 cmd_sub_mod, u8 add_words, ...)
^
[CC] platforms/ibm-fsp/apollo-pci.o
hw/fsp/fsp.c:1026:17: warning: passing an object that undergoes default argument promotion to
'va_start' has undefined behavior [-Wvarargs]
va_start(list, add_words);
^
hw/fsp/fsp.c:1016:47: note: parameter of type 'u8' (aka 'unsigned char') is declared here
struct fsp_msg *fsp_mkmsg(u32 cmd_sub_mod, u8 add_words, ...)
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Clang doesn't know about msgsnd, msgclr, msgsync yet. Open code them
using .long asm() calls.
Instead of introducing ifdef hell, do this unconditionally for all
compilers as the code generation does not change.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SBE on P9 provides one shot programmable timer facility. We can use this
to implement OPAL timers and hence limit the reliance on the Linux
heartbeat (similar to HW timer facility provided by SLW on P8).
Design:
- We will continue to run Linux heartbeat.
- Each chip has SBE. This patch always schedules timer on SBE on master chip.
- Start timer option starts new timer or modifies an active timer for the
specified timeout.
- SBE expects timeout value in microseconds. We track timeout value in TB.
Hence we convert tb to microseconds before sending request to SBE.
- We are requesting ack from SBE for timer message. It gaurantees that
SBE has scheduled timer.
- Disabling SBE timer
We expect SBE to send timer expiry interrupt whenever timer expires. We
wait for 10 more ms before disabling timer.
In future we can consider below alternative approaches:
- Presently SBE timer disable is permanent (until we reboot system).
SBE sends "I'm back" interrupt after reset. We can consider restarting
timer after SBE reset.
- Reset SBE and start timer again.
- Each chip has SBE. On multi chip system we can try to schedule timer
on different chip.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lets move P8 timer support code from slw.c to sbe-p8.c (as suggested
by BenH). There is a difference between timer support in P8 and P9.
Hence I think it makes sense to name it as sbe-p8.c.
Note that this is pure code movement and renaming functions/variables.
No functionality changes.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SBE (Self Boot Engine) on P9 has two different jobs:
- Boot the chip up to the point the core is functional
- Provide various services like timer, scom, stash MPIPL, etc., at runtime
OPAL can communicate to SBE via a set of data and control registers provided
by the PSU block in P9 chip.
- Four 8 byte registers for Host to send command packets to SBE
- Four 8 byte registers for SBE to send response packets to Host
- Two doorbell registers (1 on each side) to alert either party
when data is placed in above mentioned data register
Protocol constraints:
Only one command is accepted in the command buffer until the response for the
command is enqueued in the response buffer by SBE.
Usage:
We will use SBE for various purposes like timer, MPIPL, etc.
This patch implements the SBE MBOX spec for OPAL to communicate with
SBE.
Design consideration:
- Each chip has SBE. We need to track SBE messages per chip. Hence added
per chip sbe structure and list of messages to that chip
- SBE accepts only one command at a time. Hence serialized MBOX commands.
- OPAL gets interrupted once SBE sets doorbell register
- OPAL has to clear doorbell register after reading response
- Every command class has timeout option. Timed out messages are discarded
- SBE MBOX commands can be classified into four types :
- Those that must be sent to the master only (ex: sending MDST/MDDT info)
- Those that must be sent to slaves only (ex: continue MPIPL)
- Those that must be sent to all chips (ex: close insecure window)
- Those that can be sent to any chip (ex: timer)
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two sets of core temperature sensors today. One is DTS scom
based core temperature sensors and the second group is the sensors
provided by OCC. DTS is the highest temperature among the different
temperature zones in the core while OCC core temperature sensors are
the average temperature of the core. DTS sensors are read directly by
the host by SCOMing the DTS sensors while OCC sensors are read and
updated by OCC to main memory.
Reading DTS sensors by SCOMing is a heavy and slower operation as
compared to reading OCC sensors which is as good as reading memory.
So dont add DTS sensors when OCC sensors are available.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This detects OPAL being re-entered by the OS, and switches to an
emergency stack if it was. This protects the firmware's main stack
from re-entrancy and allows the OS to use NMI facilities for crash
/ debug functionality.
Further nested re-entry will destroy the previous emergency stack
and prevent returning, but those should be rare cases.
This stack is sized at 16kB, which doubles the size of CPU stacks,
so as not to introduce a regression in primary stack size. The 16kB
stack originally had a 4kB machine check stack at the top, which was
removed by 80eee1946 ("opal: Remove machine check interrupt patching
in OPAL."). So it is possible the size could be tightened again, but
that would require further analysis.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quiescing currently is implmeented in C in opal_entry before the
opal call handler is called. This works well enough for simple
cases like fast reset when one CPU wants all others out of the way.
Linux would like to use it to prevent an sreset IPI from
interrupting firmware, which could lead to deadlocks when crash
dumping or entering the debugger. Linux interrupts do not recover
well when returning back to general OPAL code, due to r13 not being
restored. OPAL also can't be re-entered, which may happen e.g.,
from the debugger.
So move the quiesce hold/reject to entry code, beore the stack or
r1 or r13 registers are switched. OPAL can be interrupted and
returned to or re-entered during this period.
This does not completely solve all such problems. OPAL will be
interrupted with sreset if the quiesce times out, and it can be
interrupted by MCEs as well. These still have the issues above.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Put OPAL callers' r1 into the stack back chain, and then use that to
unwind back to the OPAL entry frame (as opposed to boot entry, which
has a 0 back chain).
>From there, dump the OPAL call token and the caller's r1. A backtrace
looks like this:
CPU 0000 Backtrace:
S: 0000000031c03ba0 R: 000000003001a548 ._abort+0x4c
S: 0000000031c03c20 R: 000000003001baac .opal_run_pollers+0x3c
S: 0000000031c03ca0 R: 000000003001bcbc .opal_poll_events+0xc4
S: 0000000031c03d20 R: 00000000300051dc opal_entry+0x12c
--- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 ---
This is pretty basic for the moment, but it does give you the bottom
of the Linux stack. It will allow some interesting improvements in
future.
First, with the eframe, all the call's parameters can be printed out
as well. The ___backtrace / ___print_backtrace API needs to be
reworked in order to support this, but it's otherwise very simple
(see opal_trace_entry()).
Second, it will allow Linux's stack to be passed back to Linux via
a debugging opal call. This will allow Linux's BUG() or xmon to
also print the Linux back trace in case of a NMI or MCE or watchdog
lockup that hits in OPAL.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Due to P9 errata, HDEC parity and TB residue errors are latched for
non-zero threads 1-3 even if they are cleared. But these are not
latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr
value and ignore them on non-zero threads if they are not present on
thread 0.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some TOD errors which do not affect working of TOD and TB. They
stay in valid state. Hence we don't need rendez vous for TOD errors that
does not affect TB working.
TOD errors that affects TOD/TB will report a global error on TFMR[44]
alongwith bit 51, and they will go in rendez vous path as expected.
But the TOD errors that does not affect TB register sets only TFMR bit 51.
The TFMR bit 51 is cleared when any single thread clears the TOD error.
Once cleared, the bit 51 is reflected to all the cores on that chip. Any
thread that reads the TFMR register after the error is cleared will see
TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through
rendez-vous path and threads that see TFMR[51]=0, returns doing
nothing. This ends up in a soft lockups in host kernel.
This patch fixes this issue by not considering TOD interrupt (TFMR[51])
as a core-global error and hence avoiding rendez-vous path completely.
Instead threads that see TFMR[51]=1 will now take different path that
just do the TOD error recovery.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For TOD errors, all the cores in the chip get HMIs. Any one thread from any
core can fix the issue and TFMR will have error conditions cleared. Rest of
the threads need take any action if TOD errors are already cleared. Hence
thread 0 of every core should get a fresh copy of TFMR before going ahead
recovery path. Initialize recover = -1, so that if no errors found that
thread need not send a HMI event to linux. This helps in stop flooding host
with hmi event by every thread even there are no errors found.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reworks the HMI handling for TFAC errors by introducing
4 rendez-vous points improve the thread synchronization while handling
timebase errors that requires all thread to clear dirty data from TB/HDEC
register before clearing the errors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
It returns a 64-bit flags mask currently set to provide info
about which timer facilities were lost, and whether an event
was generated.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we restore PCIe bus numbers right after the link is
up. Unfortunately as this point we haven't done CRS so config space
may not be accessible.
This moves the bus number restore till after CRS has happened.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Enables reporting of slot status information, etc in the config space of
the root complex. Currently this is only used to set the slot power
limit in our generic PCI code, but we might use it for other things
later on.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
The PCIe slot capability can be implemented in a root or switch
downstream port to set the maximum power a card is allowed to draw
from the system. This patch adds support for setting the power limit
when the platform has defined one.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deprecate the old "opal-interrupts", it's still there, but the new
property follows the standard and allow us to specify whether an
interrupt is level or edge sensitive.
Similarly create "interrupt-names" whose content is identical to
"opal-interrupts-names".
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hardware has limitations which would require to put a sync after each
store EOI to make sure the MMIO operations that change the ESB state
are ordered. This is a killer for performance and the PHBs do not
support the sync. So remove the store EOI for the moment, until
hardware is improved.
Also, while we are at changing the XIVE source flags, let's fix the
settings for the PHB4s which should follow these rules :
- SHIFT_BUG for DD10
- STORE_EOI for DD20 and if enabled
- TRIGGER_PAGE for DDx0 and if not STORE_EOI
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stack allocation first allocates a memory region sized to hold stacks
for all possible CPUs up to the maximum PIR of the architecture, zeros
the region, then initialises all stacks. Max PIR is 32768 on POWER9,
which is 512MB for stacks.
The stack region is then shrunk after CPUs are discovered, but this is
a bit of a hack, and it leaves a hole in the memory allocation regions
as it's done after mem regions are initialised.
0x000000000000..00002fffffff : ibm,os-reserve - OS
0x000030000000..0000303fffff : ibm,firmware-code - OPAL
0x000030400000..000030ffffff : ibm,firmware-heap - OPAL
0x000031000000..000031bfffff : ibm,firmware-data - OPAL
0x000031c00000..000031c0ffff : ibm,firmware-stacks - OPAL
*** gap ***
0x000051c00000..000051d01fff : ibm,firmware-allocs-memory@0 - OPAL
0x000051d02000..00007fffffff : ibm,firmware-allocs-memory@0 - OS
0x000080000000..000080b3cdff : initramfs - OPAL
0x000080b3ce00..000080b7cdff : ibm,fake-nvram - OPAL
0x000080b7ce00..0000ffffffff : ibm,firmware-allocs-memory@0 - OS
This change moves zeroing into the per-cpu stack setup. The boot CPU
stack is set up based on the current PIR. Then the size of the stack
region is set, by discovering the maximum PIR of the system from the
device tree, before mem regions are intialised.
This results in all memory being accounted within memory regions,
and less memory fragmentation of OPAL allocations.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Run the mem_region sanity checkers before proceeding with fast
reboot.
This is the beginning of proactive sanity checks on opal data
for fast reboot (with complements the reactive disable_fast_reboot
cases). This is encouraged to re-use and share any kind of debug
code and unit test code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17.
We don't need this as we need to do it a different way, with a explicit
set of registers as otherwise we trip other random FIR bits and everything
becomes even more terrible.
I suggest alcohol.
Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Major changes in the NPU between DD1 and DD2 necessitated a fair bit of
revision-specific code.
Now that all our lab machines are DD2, we no longer test anything on DD1
and it's time to get rid of it.
Remove DD1-specific code and abort probe if we're running on a DD1 machine.
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Trivial cleanup of two unused fields in struct npu2.
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've been carting around this field since the original p7ioc-phb code.
As far as I can tell we never actually use it for anything other than
checking if the PHB has been marked as broken or not. The _FENCED
state is set in a few places, but we never use it in favour of just
checking the MMIO register.
This patch just replaces it with a boolean that indicates if
the PHB has been marked as broken and removes the giant, mostly
wrong, comment explaining it's usage that is copied and pasted
into each phb header file.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
| |
Requested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The XTS_PID context table is limited to 256 possible pids/contexts. To
relieve this limitation, make use of "unfiltered mode" instead.
If an entry in the XTS_BDF table has the bit for unfiltered mode set, we
can just use one context for that entire bdf/lpar, regardless of pid.
Instead of of searching the XTS_PID table, the NMMU checkout request
will simply use the entry indexed by lparshort id instead.
Change opal_npu_init_context() to create these lparshort-indexed
wildcard entries (0-15) instead of allocating one for each pid. Check
that multiple calls for the same bdf all specify the same msr value.
In opal_npu_destroy_context(), continue validating the bdf argument,
ensuring that it actually maps to an lpar, but no longer remove anything
from the XTS_PID table. If/when we start supporting virtualized GPUs, we
might consider actually removing these wildcard entries by keeping a
refcount, but keep things simple for now.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds simple deadlock detection. The detection looks for circular
dependencies in the lock requests. It will abort and display a stack trace
when a deadlock occurs.
The detection is enabled by DEBUG_LOCKS (enabled by default).
While the detection may have a slight performance overhead, as there are
not a huge number of locks in skiboot this overhead isn't significant.
Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[stewart: fix build with DEBUG_LOCKS off]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
P9 supports PCI tunneled operations (atomics and as_notify) that are
initiated by devices.
A subset of the tunneled operations require a response, that must be
sent back from the host to the device. For example, an atomic compare
and swap will return the compare status, as swap will only performed
in case of success. Similarly, as_notify reports if the target thread
has been woken up or not, because the operation may fail.
To enable tunneled operations, a device driver must tell the host where
it expects tunneled operation responses, by setting the PBCQ Tunnel BAR
Response register with a specific value within the range of its BARs.
This register is currently initialized by enable_capi_mode(). But, as
tunneled operations may also operate in PCI mode, a new API is required
to set the PBCQ Tunnel BAR Response register, without switching to CAPI
mode.
This patch provides two new OPAL calls to get/set the PBCQ Tunnel
BAR Response register.
Note: as there is only one PBCQ Tunnel BAR register, shared between
all the devices connected to the same PHB, only one of these devices
will be able to use tunneled operations, at any time.
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
P9 supports PCI tunneled operations (atomics and as_notify) that require
setting the PHB ASN Compare/Mask register with a 16-bit indication.
This register is currently initialized by enable_capi_mode(). But, as
tunneled operations may also work in PCI mode, the ASN Compare/Mask
register should rather be initialized in phb4_init_ioda3().
This patch also adds "ibm,phb-indications" to the device tree, to tell
Linux the values of CAPI, ASN, and NBW indications, when supported.
Tunneled operations tested by IBM in CAPI mode, by Mellanox Technologies
in PCI mode.
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add three OPAL API calls that are required by the ocxl driver.
- OPAL_NPU_SPA_SETUP
The Shared Process Area (SPA) is a table containing one entry (a
"Process Element") per memory context which can be accessed by the
OpenCAPI device.
- OPAL_NPU_SPA_CLEAR_CACHE
The NPU keeps a cache of recently accessed memory contexts. When a
Process Element is removed from the SPA, the cache for the link must be
cleared.
- OPAL_NPU_TL_SET
The Transaction Layer specification defines several templates for
messages to be exchanged on the link. During link setup, the host and
device must negotiate what templates are supported on both sides and at
what rates those messages can be sent.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scan the OpenCAPI links under the NPU, and for each link, reset the card,
set up a device, train the link and register a PHB.
Implement the necessary operations for the OpenCAPI PHB type.
For bringup, test and debug purposes, we allow an NVRAM setting,
"opencapi-link-training" that can be set to either disable link training
completely or to use the prbs31 test pattern.
To disable link training:
nvram -p ibm,skiboot --update-config opencapi-link-training=none
To use prbs31:
nvram -p ibm,skiboot --update-config opencapi-link-training=prbs31
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike NVLink, which uses the pci-virt framework to fake a PCI
configuration space for NVLink devices, the OpenCAPI device model presents
us with a real configuration space handled by the device over the OpenCAPI
link.
As a result, we have to train the OpenCAPI link in skiboot before we do PCI
probing, so that config space can be accessed, rather than having link
training being triggered by the Linux driver.
Add some helper functions to wrap the existing NVLink PHY training sequence
so we can easily run it within skiboot.
Additionally, we add OpenCAPI-specific lane settings, and a function to
"bump" lanes that haven't trained properly (this process isn't documented
in the workbook, but the hardware experts assure us that this improves link
training reliability...) We also support the PRBS31 pattern that's used for
bringup and test purposes.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scan the device tree for NPUs with OpenCAPI links and configure the NPU per
the initialisation sequence in the NPU OpenCAPI workbook.
Training of individual links and setup of per-AFU/link configuration will
be in a later patch.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Add a platform_ocapi struct to store platform-specific values for resetting
OpenCAPI devices via I2C and for setting up the ODL PHY.
A later patch will add this to the relevant platforms.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike NVLink, OpenCAPI registers a separate PHB for each device, in order
to allow us to force Linux to use the correct MMIO windows for each NPU
link. This requires some reworking of NPU data structures to account for
the fact that a PHB could correspond to either an NPU (NVLink) or a single
link (OpenCAPI).
At some later point, we may want to rework the NVLink code to present a
separate PHB per device in order to simplify this. For now, we split
NVLink-specific device data into a separate struct in order to make it
clear which fields are NVLink-only.
Additionally, add helper functions to correctly translate between
OpenCAPI/NVLink PHBs and the underlying structures, and various fields
for OpenCAPI data that we're going to need later on.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Split out common helper functions for NPU register access into a separate
file, as these will be used extensively by both NVLink and OpenCAPI code.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This is not the way we want to end up doing this.
This is a hack to make folk happy and not require crondump to
debug nvidia/npu2 issues.
Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes these nvidia cards training at only GEN2 spends rather than
GEN3 by disabling PCIe lane equalisation.
Firstly we check if the card is in a whitelist. If it is and the link
has not trained optimally, retry with lane equalisation off. We do
this on all POWER9 chip revisions since this is a device issue, not
a POWER9 chip issue.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
74d656d219b98ef3b96f92439337aa6392a7577d added OPAL APIs to
kernel (and this commit is now in Linus' tree) that hadn't
yet made their way to OPAL.
Also, be slightly grumbly about it.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
These aren't API.
Fixes: b57a5380aa489fa877b2d619225aea2602f20dca
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new opal call to enable/disable a sensor group. This
call is used to select the sensor groups that needs to be copied to
main memory by OCC at runtime.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[stewart: rebase and bump OPAL API number]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support to read u64 sensor values. This also adds
changes to the core and the backend implementation code to make this
API as the base call. Host can use this new API to read sensors
upto 64bits.
This adds a list to store the pointer to the kernel u32 buffer, for
older kernels making async sensor u32 reads.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|