summaryrefslogtreecommitdiffstats
path: root/drivers/vfio/pci/vfio_pci_config.c
Commit message (Collapse)AuthorAgeFilesLines
* vfio-pci: Virtualize PCIe & AF FLRAlex Williamson2016-09-261-5/+77
| | | | | | | | | | | | | | | | | | | | | We use a BAR restore trick to try to detect when a user has performed a device reset, possibly through FLR or other backdoors, to put things back into a working state. This is important for backdoor resets, but we can actually just virtualize the "front door" resets provided via PCIe and AF FLR. Set these bits as virtualized + writable, allowing the default write to set them in vconfig, then we can simply check the bit, perform an FLR of our own, and clear the bit. We don't actually have the granularity in PCI to specify the type of reset we want to do, but generally devices don't implement both PCIe and AF FLR and we'll favor these over other types of reset, so we should generally lineup. We do test whether the device provides the requested FLR type to stay consistent with hardware capabilities though. This seems to fix several instance of devices getting into bad states with userspace drivers, like dpdk, running inside a VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Rose <grose@lightfleet.com>
* vfio/pci: Fix typos in commentsWei Jiangang2016-08-291-4/+4
| | | | | Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/pci: Allow VPD short readAlex Williamson2016-05-311-1/+2
| | | | | | | | | The size of the VPD area is not necessarily 4-byte aligned, so a pci_vpd_read() might return less than 4 bytes. Zero our buffer and accept anything other than an error. Intel X710 NICs exercise this. Fixes: 4e1a635552d3 ("vfio/pci: Use kernel VPD access functions") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio_pci: Test for extended capabilities if config space > 256 bytesAlexey Kardashevskiy2016-05-191-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | PCI-Express spec says that reading 4 bytes at offset 100h should return zero if there is no extended capability so VFIO reads this dword to know if there are extended capabilities. However it is not always possible to access the extended space so generic PCI code in pci_cfg_space_size_ext() checks if pci_read_config_dword() can read beyond 100h and if the check fails, it sets the config space size to 100h. VFIO does its own extended capabilities check by reading at offset 100h which may produce 0xffffffff which VFIO treats as the extended config space presense and calls vfio_ecap_init() which fails to parse capabilities (which is expected) but right before the exit, it writes zero at offset 100h which is beyond the buffer allocated for vdev->vconfig (which is 256 bytes) which leads to random memory corruption. This makes VFIO only check for the extended capabilities if the discovered config size is more than 256 bytes. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/pci: Add test for BAR restoreAlex Williamson2016-04-281-1/+19
| | | | | | | | | If a device is reset without the memory or i/o bits enabled in the command register we may not detect it, potentially leaving the device without valid BAR programming. Add an additional test to check the BARs on each write to the command register. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/pci: Hide broken INTx support from userAlex Williamson2016-04-281-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | INTx masking has two components, the first is that we need the ability to prevent the device from continuing to assert INTx. This is provided via the DisINTx bit in the command register and is the only thing we can really probe for when testing if INTx masking is supported. The second component is that the device needs to indicate if INTx is asserted via the interrupt status bit in the device status register. With these two features we can generically determine if one of the devices we own is asserting INTx, signal the user, and mask the interrupt while the user services the device. Generally if one or both of these components is broken we resort to APIC level interrupt masking, which requires an exclusive interrupt since we have no way to determine the source of the interrupt in a shared configuration. This often makes it difficult or impossible to configure the system for userspace use of the device, for an interrupt mode that the user may not need. One possible configuration of broken INTx masking is that the DisINTx support is fully functional, but the interrupt status bit never signals interrupt assertion. In this case we do have the ability to prevent the device from asserting INTx, but lack the ability to identify the interrupt source. For this case we can simply pretend that the device lacks INTx support entirely, keeping DisINTx set on the physical device, virtualizing this bit for the user, and virtualizing the interrupt pin register to indicate no INTx support. We already support virtualization of the DisINTx bit and already virtualize the interrupt pin for platforms without INTx support. By tying these components together, setting DisINTx on open and reset, and identifying devices broken in this particular way, we can provide support for them w/o the handicap of APIC level INTx masking. Intel i40e (XL710/X710) 10/20/40GbE NICs have been identified as being broken in this specific way. We leave the vfio-pci.nointxmask option as a mechanism to bypass this support, enabling INTx on the device with all the requirements of APIC level masking. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: John Ronciak <john.ronciak@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
* vfio/pci: Expose shadow ROM as PCI option ROMAlex Williamson2016-02-221-3/+8
| | | | | | | | Integrated graphics may have their ROM shadowed at 0xc0000 rather than implement a PCI option ROM. Make this ROM appear to the user using the ROM BAR. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/pci: Enable virtual register in PCI config spaceAlex Williamson2016-02-221-4/+30
| | | | | | | | | | | | | Typically config space for a device is mapped out into capability specific handlers and unassigned space. The latter allows direct read/write access to config space. Sometimes we know about registers living in this void space and would like an easy way to virtualize them, similar to how BAR registers are managed. To do this, create one more pseudo (fake) PCI capability to be handled as purely virtual space. Reads and writes are serviced entirely from virtual config space. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/pci: make an array largerDan Carpenter2015-11-091-2/+2
| | | | | | | | | | | | | | | | | | | | | Smatch complains about a possible out of bounds error: drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init() error: buffer overflow 'pci_cap_length' 20 <= 20 The problem is that pci_cap_length[] was defined as large enough to hold "PCI_CAP_ID_AF + 1" elements. The code in vfio_cap_init() assumes it has PCI_CAP_ID_MAX + 1 elements. Originally, PCI_CAP_ID_AF and PCI_CAP_ID_MAX were the same but then we introduced PCI_CAP_ID_EA in commit f80b0ba95964 ("PCI: Add Enhanced Allocation register entries") so now the array is too small. Let's fix this by making the array size PCI_CAP_ID_MAX + 1. And let's make a similar change to pci_ext_cap_length[] for consistency. Also both these arrays can be made const. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/pci: Use kernel VPD access functionsAlex Williamson2015-10-271-1/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | The PCI VPD capability operates on a set of window registers in PCI config space. Writing to the address register triggers either a read or write, depending on the setting of the PCI_VPD_ADDR_F bit within the address register. The data register provides either the source for writes or the target for reads. This model is susceptible to being broken by concurrent access, for which the kernel has adopted a set of access functions to serialize these registers. Additionally, commits like 932c435caba8 ("PCI: Add dev_flags bit to access VPD through function 0") and 7aa6ca4d39ed ("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate that VPD registers can be shared between functions on multifunction devices creating dependencies between otherwise independent devices. Fortunately it's quite easy to emulate the VPD registers, simply storing copies of the address and data registers in memory and triggering a VPD read or write on writes to the address register. This allows vfio users to avoid seeing spurious register changes from accesses on other devices and enables the use of shared quirks in the host kernel. We can theoretically still race with access through sysfs, but the window of opportunity is much smaller. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Mark Rustad <mark.d.rustad@intel.com>
* vfio: make vfio run on s390Frank Blaschka2014-11-071-0/+7
| | | | | | | | add Kconfig switch to hide INTx add Kconfig switch to let vfio announce PCI BARs are not mapable Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* PCI/AER: Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UNDChen, Gong2014-09-251-1/+1
| | | | | | | | | | | | | In PCIe r1.0, sec 5.10.2, bit 0 of the Uncorrectable Error Status, Mask, and Severity Registers was for "Training Error." In PCIe r1.1, sec 7.10.2, bit 0 was redefined to be "Undefined." Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UND to reflect this change. No functional change. [bhelgaas: changelog] Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* vfio/pci: Fix sizing of DPA and THP express capabilitiesAlex Williamson2014-05-301-4/+3
| | | | | | | | | | | | | When sizing the TPH capability we store the register containing the table size into the 'dword' variable, but then use the uninitialized 'byte' variable to analyze the size. The table size is also actually reported as an N-1 value, so correct sizing to account for this. The round_up() for both TPH and DPA is unnecessary, remove it. Detected by Coverity: CID 714665 & 715156 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* PCI: Rename PCI_VC_PORT_REG1/2 to PCI_VC_PORT_CAP1/2Alex Williamson2013-12-171-6/+6
| | | | | | | | These are set of two capability registers, it's pretty much given that they're registers, so reflect their purpose in the name. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* vfio-pci: Test for extended config spaceAlex Williamson2013-09-041-3/+8
| | | | | | | | | | Having PCIe/PCI-X capability isn't enough to assume that there are extended capabilities. Both specs define that the first capability header is all zero if there are no extended capabilities. Testing for this avoids an erroneous message about hiding capability 0x0 at offset 0x100. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* Merge tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfioLinus Torvalds2013-05-021-78/+94
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull vfio updates from Alex Williamson: "Changes include extension to support PCI AER notification to userspace, byte granularity of PCI config space and access to unarchitected PCI config space, better protection around IOMMU driver accesses, default file mode fix, and a few misc cleanups." * tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio: vfio: Set container device mode vfio: Use down_reads to protect iommu disconnects vfio: Convert container->group_lock to rwsem PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register vfio-pci: Enable raw access to unassigned config space vfio-pci: Use byte granularity in config map vfio: make local function vfio_pci_intx_unmask_handler() static VFIO-AER: Vfio-pci driver changes for supporting AER VFIO: Wrapper for getting reference to vfio_device
| * PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities RegisterYijing Wang2013-04-151-5/+1
| | | | | | | | | | | | | | | | | | Currently, we use pcie_flags_reg to cache PCI-E Capabilities Register, because PCI-E Capabilities Register bits are almost read-only. This patch use pcie_caps_reg() instead of another access PCI-E Capabilities Register. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio-pci: Enable raw access to unassigned config spaceAlex Williamson2013-04-011-32/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | Devices like be2net hide registers between the gaps in capabilities and architected regions of PCI config space. Our choices to support such devices is to either build an ever growing and unmanageable white list or rely on hardware isolation to protect us. These registers are really no different than MMIO or I/O port space registers, which we don't attempt to regulate, so treat PCI config space in the same way. Reported-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
| * vfio-pci: Use byte granularity in config mapAlex Williamson2013-04-011-41/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | The config map previously used a byte per dword to map regions of config space to capabilities. Modulo a bug where we round the length of capabilities down instead of up, this theoretically works well and saves space so long as devices don't try to hide registers in the gaps between capabilities. Unfortunately they do exactly that so we need byte granularity on our config space map. Increase the allocation of the config map and split accesses at capability region boundaries. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
* | vfio: include <linux/slab.h> for kmallocArnd Bergmann2013-03-151-0/+1
|/ | | | | | | | | | The vfio drivers call kmalloc or kzalloc, but do not include <linux/slab.h>, which causes build errors on ARM. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: kvm@vger.kernel.org
* vfio-pci: Manage user power state transitionsAlex Williamson2013-02-181-3/+38
| | | | | | | | | | | | | | | We give the user access to change the power state of the device but certain transitions result in an uninitialized state which the user cannot resolve. To fix this we need to mark the PowerState field of the PMCSR register read-only and effect the requested change on behalf of the user. This has the added benefit that pdev->current_state remains accurate while controlled by the user. The primary example of this bug is a QEMU guest doing a reboot where the device it put into D3 on shutdown and becomes unusable on the next boot because the device did a soft reset on D3->D0 (NoSoftRst-). Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio-pci: Cleanup BAR accessAlex Williamson2013-02-141-3/+2
| | | | | | | | | We can actually handle MMIO and I/O port from the same access function since PCI already does abstraction of this. The ROM BAR only requires a minor difference, so it gets included too. vfio_pci_config_readwrite gets renamed for consistency. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio-pci: Enable PCIe extended capabilities on v1Alex Williamson2013-02-141-3/+3
| | | | | | Even PCIe 1.x had extended config space. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* vfio: Add PCI device driverAlex Williamson2012-07-311-0/+1540
Add PCI device support for VFIO. PCI devices expose regions for accessing config space, I/O port space, and MMIO areas of the device. PCI config access is virtualized in the kernel, allowing us to ensure the integrity of the system, by preventing various accesses while reducing duplicate support across various userspace drivers. I/O port supports read/write access while MMIO also supports mmap of sufficiently sized regions. Support for INTx, MSI, and MSI-X interrupts are provided using eventfds to userspace. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
OpenPOWER on IntegriCloud