summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/pci_of_scan.c
diff options
context:
space:
mode:
authorGavin Shan <gwshan@linux.vnet.ibm.com>2014-04-24 18:00:19 +1000
committerBenjamin Herrenschmidt <benh@kernel.crashing.org>2014-04-28 17:34:32 +1000
commitd2b0f6f77ee525811b6efe864efa6a4eb82eea73 (patch)
tree84205706f9cc2e03425ba3a48edf2a1d527e3267 /arch/powerpc/kernel/pci_of_scan.c
parent7f52a526f64c69c913f0027fbf43821ff0b3a7d7 (diff)
downloadtalos-obmc-linux-d2b0f6f77ee525811b6efe864efa6a4eb82eea73.tar.gz
talos-obmc-linux-d2b0f6f77ee525811b6efe864efa6a4eb82eea73.zip
powerpc/eeh: No hotplug on permanently removed dev
The issue was detected in a bit complicated test case where we have multiple hierarchical PEs shown as following figure: +-----------------+ | PE#3 p2p#0 | | p2p#1 | +-----------------+ | +-----------------+ | PE#4 pdev#0 | | pdev#1 | +-----------------+ PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p bridges. We accidentally had less-known scenario: PE#4 was removed permanently from the system because of permanent failure (e.g. exceeding the max allowd failure times in last hour), then we detects EEH errors on PE#3 and tried to recover it. However, eeh_dev instances for pdev#0/1 were not detached from PE#4, which was still connected to PE#3. All of that was because of the fact that we rely on count-based pcibios_release_device(), which isn't reliable enough. When doing recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which are not valid any more. Eventually, we run into kernel crash. The patch fixes above issue from two aspects. For unplug, we simply skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read its PCI config so that PCI core will omit them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Diffstat (limited to 'arch/powerpc/kernel/pci_of_scan.c')
-rw-r--r--arch/powerpc/kernel/pci_of_scan.c9
1 files changed, 9 insertions, 0 deletions
diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
index 83c26d829991..ea6470c21f4e 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -304,6 +304,9 @@ static struct pci_dev *of_scan_pci_dev(struct pci_bus *bus,
struct pci_dev *dev = NULL;
const __be32 *reg;
int reglen, devfn;
+#ifdef CONFIG_EEH
+ struct eeh_dev *edev = of_node_to_eeh_dev(dn);
+#endif
pr_debug(" * %s\n", dn->full_name);
if (!of_device_is_available(dn))
@@ -321,6 +324,12 @@ static struct pci_dev *of_scan_pci_dev(struct pci_bus *bus,
return dev;
}
+ /* Device removed permanently ? */
+#ifdef CONFIG_EEH
+ if (edev && (edev->mode & EEH_DEV_REMOVED))
+ return NULL;
+#endif
+
/* create a new pci_dev for this device */
dev = of_create_pci_dev(dn, bus, devfn);
if (!dev)
OpenPOWER on IntegriCloud