| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The OPAL_PCI_EEH_FREEZE_STATUS call takes a bunch of parameters, one of
them is @phb_status. It is defined as __be64* and always NULL in
the current Linux upstream but if anyone ever decides to read that status,
then the PHB3's handler will assume it is struct OpalIoPhb3ErrorData*
(which is a lot bigger than 8 bytes) and zero it causing the stack
corruption; p7ioc-phb has the same issue.
This removes @phb_status from all eeh_freeze_status() hooks and moves
the error message from PHB4 to the affected OPAL handlers.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-By: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
| |
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deprecate the old "opal-interrupts", it's still there, but the new
property follows the standard and allow us to specify whether an
interrupt is level or edge sensitive.
Similarly create "interrupt-names" whose content is identical to
"opal-interrupts-names".
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
0679f61244b "fast-reset: by default (if possible)" broke NPU - now
the NV links does not get enabled after reboot.
This disables fast reboot for NPU machines till a better solution is found.
Suggested-by: Andrew Donnellan <andonnel@au1.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
NPU and NPU2 don't use diag data, but the kernel will allocate a buffer for
NPU PHBs regardless. Set ibm,phb-diag-data-size to 0 for NPU PHBs to save a
whole precious 8K.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the comment in npu_dev_populate_pcie_cap() says,
"We should support FLR" and the NPU device advertises its
support. However, when the kernel issues FLR, skiboot does
nothing which leaves NPU in a state which does not allow
to use NV links again after GPU was reset.
This adds basic handling of FLR (function level reset).
This does not update hreset/freset handlers as they are not going to be
called under any circumstance - EEH is not supported for NPU and
the kernel won't issue OPAL reset otherwise.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few places (mostly old code) were using:
add_property_cells(hi32(number), lo32(number));
This patch converts them to use the helper rather than doing it manually.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
PCI slot pfreset() operation is obsoleted as nobody uses it. This
removes it and the related PCI slot states. No functional changes
introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the reserved PE is set to NPU_NUM_OF_PES, which is one
greater than the maximum PE resulting in the following kernel errors
at boot:
[ 0.000000] pnv_ioda_reserve_pe: Invalid PE 4 on PHB#4
[ 0.000000] pnv_ioda_reserve_pe: Invalid PE 4 on PHB#5
Due to a HW errata PE#0 is already reserved in the kernel, so update
the opal-reserved-pe device-tree property to match this.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bdfn of the emulated/fake nvlink PCIe devices is allocated based
on the topology of GPU connections. Nvlinks going to the same GPU are
allocated to unique functions within the same device number.
In the device-tree every collection of nvlinks going to the same GPU
are given a unique group number which is currently also used as the
device number. To allocate a sequentially unique function number the
code should find the maximum previously allocated function.
However currently the code only checks for a single previously
allocated function number. This works fine on Garrison systems which
only have two links per GPU, but other systems may have more links per
GPU which will result in several links being assigned an identical
function number, resulting boot failure.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
NPUs have 4 PEs which are zero indexed, so {0, 1, 2, 3}. A bad PE number
check in npu_err_inject checks if the PE number is greater than 4 as a
fail case, so it would wrongly perform operations on a non-existant PE 4.
Reported-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: stable
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This rmoves the codes for emulated PCI config space as it can be
supported by generic PCI virtual device:
* The PCI virtual device and NPU device are created at same time.
* Uses PCI virtual device and filter to access NPU (PCI) device's
config space.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware enforces the buid range is on a 16 irsn boundary
even though there are only 8 irqs. Enforce that here and show
where the value comes from when programming the lsi source id
field in the npu register block.
Signed-off-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NPU BUID register was incorrectly programmed resulting in npu
interrupt level 0 causing a PB_CENT_CRESP_ADDR_ERROR checkstop,
and irqs from npus in odd chips being aliased to and processed
as the interrupts from the corresponding npu on the even chips.
The documentation for the BUID register is confusing, describing
required values of some bits and bits of differing meaning within
contained within one field.
This patch seperates the per-irq-level irq enable mask from the
documented buid base field, leaving the buid base as the part that
is directly compared. It documents the buid as the boundary of a
block of 16 sources (in the form of a 4 bit shift), and documents
that some bits are sourced from another register and are always
compared to that register, so they are not required to be set in
the base and mask fields.
Fixes: cc61799 Nvlink: Add NPU PHB functions
Signed-off-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Throughout skiboot (and the kernel) PE numbers are named "pe_no",
"pe_num" and "pe_number", and sized as 16, 32 and 64bit uints depending
on where you look. This is annoying and potentially misleading in cases
such as the OPAL API, where different calls have different int sizes
even though the PE number they want is the same.
Fix this by making *everything* uint64_t pe_number. In doing this, there
are some whitespace fixes and mve_number gets dragged into this as well
for cases like set_msi_{32/64} where they essentially mean the same thing.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows a given source to provide per-interrupt attributes
such as whether it targets OPAL or Linux and it's estimated
frequency.
The former allows to get rid of the double set of ops used to
decide which interrupts go where on some modules like the PHBs
and the latter will be eventually used to implement smart
caching of the source lookups.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Confirmed with Alistair on IRC, and earlier discussions with Russell.
Basically, I was a bit of an idiot and didn't think hard enough before
adding the FWTS annotation.
Without this patch, you get spurious FirmWare Test Suite (FWTS) warnings
about NVLink not working on machines that aren't fully populated with
GPUs.
Fixes: 00e3e275344a42f6a682be72c88c015df87a0e28
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
This uses assert() to assure the PHB device node is created successfully
as we never hit the failing case as I image, to simplify the code.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DL/PL/AT BARs are assigned according to predetermined MMIO
layout by assign_mmio_bars() when probing NPU device node in
npu_probe_phb(). The AT BAR is covered by NPU LINK#1's second
BAR. assign_mmio_bars() updates the AT BAR register with the
predetermined values (base/size) and then npu_probe_phb() gets
same informatin from the register, which is unecessary.
This passes @at_bar[] to assign_mmio_bars[] where @at_bar[] are
filled, so that assign_mmio_bars() can use it directly without
getting it from AT BAR register. As a result, the code looks a
bit simplified.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
The NPU device node property "ibm,npu-links" indicates number of
NPU links supported. This retrieves the number of links from the
properties instead of counting its child device nodes.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocating BDFNs to NPU devices and associating NPU devices with PCI
devices of GPUs both rely on comparing PBCQ handles. This will fail if a
system has multiple sets of GPUs behind a single PHB.
Rework this to instead use slot locations. The following changes are
introduced:
- Groups of NPU links that connect to the same GPU are presented in the
slot table entries as st_npu_slot, using ST_LOC_NPU_GROUP
- NPU links are created with the ibm,npu-group-id property replacing the
ibm,pbcq property, which is used in BDFN allocation and GPU association
- Slot comparison is handled slightly differently for NPU devices as the
function of the BDFN is ignored, since the device number represents the
physical GPU the link is connected to
- BDFN allocation for NPU devices is now derived from the groups in the
slot table. For Garrison, the same BDFNs are generated as before.
- Association with GPU PCI devices is performed by comparing the slot
label. This means for future machines with NPUs that slot labels
are compulsory to have NVLink functionality working.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes irq_source public, and change all irq_source_ops to take
the source pointer as a first argument (they can still dig the void *
data out of that).
This will allow us to embed/wrap it for XIVE later on.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
This is more compliant with PAPR, it will also allow us to
use the second cell for other attributes on P9.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NPU fences aren't recoverable, and as such, would require user
intervention to have a working system again. The fence will be picked up
by the kernel through EEH, but this doesn't happen until the NPU is used
for something. So, let's print a message so it's obvious when this
happens.
A helper function was added to reduce duplication. This also enables code
in skiboot to un-fence a NPU, which is useful to NPU developers but very
stupid otherwise.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NPU freeze injection works by performing an invalid MMIO read on the NPU
device BAR. If the BAR isn't enabled, which is the case when the
appropriate driver isn't loaded, this checkstops the machine.
Work around this by making sure the BAR is enabled before performing the
read. The idea of an error inject doing anything other than an error
inject isn't great, but it's better than unintentionally crashing your
machine.
Also, fix the comment incorrectly stating the operation was a write
instead of a read.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EEH errors in the kernel report the physical slot location of the
erroneous PE and its PHB. For NPU devices, the PE's slot location will
refer to the physical GPU the link is associated with, and the PHB is
actually a NPU chip which has no relevance to a physical slot on a board.
Rather than reporting N/A for a NPU PHB's slot location, present the chip
number. It's not particularly useful, but better than nothing.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
| |
We also remove the NPUERR macros so that the FWTS parsing magic
can construct find the prlog statements.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
While reviewing the Debian packaging, codespell found those.
Most proposed fixes are based on codespell's default dictionnary.
Signed-off-by: Frederic Bonnard <frediz@linux.vnet.ibm.com>
Reviewed-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
This creates PCI slot before the PHB is registered. Nothing has
been done in the PCI slot operations except to keep the PCI probe
code going.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, pci_walk_dev() iterates all PCI devices behind the
specified PHB. This extends the function to allow iteration
on PCI devices behind the specified PCI slot so that it can
be used by PCI hotplug logic in the subsequent patches.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When scanning to non-existing PCI device, EEH (frozen) error is
usually happening. We clear the unexpected frozen PE state after
it. The reserved PE number is assumed to be 0 wrongly. So the
frozen state on the reserved PE number isn't cleared properly.
This introduces struct phb_ops::get_reserved_pe_number() to
retrieve the reserved PE number from platforms. Then the EEH
frozen state checking and clearing are applied to the reserved
PE number.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
phb_ops->device_node_fixup() was introduced for NPU1 so that the
chip backend can bind the emulated NPU device with the GPU device
and fixes the device-tree node accordingly. There're couple of
issues as I can image:
* In pci_fixup_nodes(), one PHB has only one level of device
depth in the hierarchy tree. It's true for NPU PHBs, but
false for other PHBs. That indicates the function can be
called for NPU PHBs.
* The callback name indicates the specific work to be done
there. That doesn't make sense. We need another name without
indicating the specific work to do. It will give the backend
on chip level more freedom. Similarly, the callback is called
on basis of PCI device. It's hard for backend to manuplate
the PHB. More freedom the backend will get with more bold
granularity.
This fixes above issues by replacing phb_ops->device_node_fixup()
with phb_ops->phb_final_fixup(). More freedom will be received in
the backends.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All kinds of PHBs are maintaining a spinlock. At mean while, the
spinlock is acquired or released by backends for phb_ops->lock()
or phb_ops->unlock(). There're no difference of the logic on all
kinds of PHBs. So it's reasonable to maintain the lock in the
generic layer (struct phb).
This moves lock from specific PHB to generic one. The spinlock is
initialized when the generic PHB is registered in pci_register_phb().
Also, two inline functions phb_{lock, unlock}() are introduced to
acquire/release it. No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The PHB slot location code ueses the ibm,phb-index property to find
slot location names. As the NPU is implemented as a different PHB type
it means the phb-index property overlaps with the other PHBs in the
system.
This patch changes the existing usage of phb-index to npu-index which
allows the phb-index property to be assigned a unique value which can
then be matched by the PHB slot location code.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implements Extended Error Handling callbacks for NVLink devices.
At present, this supports fence mode emulation, and some easily detectable
freezes. There is a lot of work still to be done here, but this enables
EEH to work as expected in some specific scenarios.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable NPU freeze and fence injection through debugfs.
For example, if a NPU is PCI bus 8, a freeze on PE 1 can be injected with:
echo 1:0:0:0:0 >> /sys/kernel/debug/powerpc/PCI0008/err_injct
or a fence on PE 2 on PCI bus 9 with:
echo 2:1:0:0:0 >> /sys/kernel/debug/powerpc/PCI0009/err_injct
These will cause the appropriate EEH event to occur upon a DMA to the
NVLink.
PE number was added to the npu_dev struct to enable this.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As NPUs are emulated PCI devices, they do not get physically fenced as real
PCI devices do. As such, when the device is in a state that it should be
fenced, we need to emulate this behaviour by returning all 1s in config
space reads.
This will be utilised by error injection in subsequent patches.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
The version was already set (somewhat obscurely), that has been refactored and the version incremented as per the following patch (which supercedes the previous):
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
No functional change, but static analysis showed up the oddity of
something that is generally unsigned (opal_id) having a signed
value assigned to it.
Took the opportunity to use a define to increase readability.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
| |
hw/npu.c:796:11: warning: Value stored to 'end' during its initialization is never read
uint64_t end = pci_start_addr + pci_mem_size;
^~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This patch adds support for the NPU Nvlink PHB type. It provides
access to each nvlink in the system by exposing them as PCIe devices
under a NPU PHB type. Each PCIe device has a configuration space
implemented in software which indicates the base address of the
DL/TL/PL registers required by the device drivers.
It also presents one LSI per device which is used to signal device
drivers of changes in device status. The configuration space also adds
a vendor specific capability which is used primarily by device drivers
to power on an train the IBM PHY.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|