summaryrefslogtreecommitdiffstats
path: root/virt/kvm/assigned-dev.c
Commit message (Collapse)AuthorAgeFilesLines
* KVM: Fix device assignment threaded irq handlerAlex Williamson2012-07-111-2/+13
| | | | | | | | | | | | | | | | The kernel no longer allows us to pass NULL for the hard handler without also specifying IRQF_ONESHOT. IRQF_ONESHOT imposes latency in the exit path that we don't need for MSI interrupts. Long term we'd like to inject these interrupts from the hard handler when possible. In the short term, we can create dummy hard handlers that return us to the previous behavior. Credit to Michael for original patch. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43328 Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix PCI header check on device assignmentJan Kiszka2012-06-151-3/+1
| | | | | | | | | | The masking was wrong (must have been 0x7f), and there is no need to re-read the value as pci_setup_device already does this for us. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43339 Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Convert intx_mask_lock to spin lockJan Kiszka2012-03-201-7/+7
| | | | | | | | | | | As kvm_notify_acked_irq calls kvm_assigned_dev_ack_irq under rcu_read_lock, we cannot use a mutex in the latter function. Switch to a spin lock to address this. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Allow host IRQ sharing for assigned PCI 2.3 devicesJan Kiszka2012-03-081-29/+180
| | | | | | | | | | | | | | | | | PCI 2.3 allows to generically disable IRQ sources at device level. This enables us to share legacy IRQs of such devices with other host devices when passing them to a guest. The new IRQ sharing feature introduced here is optional, user space has to request it explicitly. Moreover, user space can inform us about its view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the interrupt and signaling it if the guest masked it via the virtualized PCI config space. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: fix error handling for out of range irqMichael S. Tsirkin2012-03-051-3/+1
| | | | | | | | | | | | | | | | find_index_from_host_irq returns 0 on error but callers assume < 0 on error. This should not matter much: an out of range irq should never happen since irq handler was registered with this irq #, and even if it does we get a spurious msix irq in guest and typically nothing terrible happens. Still, better to make it consistent. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Device assignment permission checksAlex Williamson2011-12-251-0/+75
| | | | | | | | | | | | | | | | | | | | | | | | | Only allow KVM device assignment to attach to devices which: - Are not bridges - Have BAR resources (assume others are special devices) - The user has permissions to use Assigning a bridge is a configuration error, it's not supported, and typically doesn't result in the behavior the user is expecting anyway. Devices without BAR resources are typically chipset components that also don't have host drivers. We don't want users to hold such devices captive or cause system problems by fencing them off into an iommu domain. We determine "permission to use" by testing whether the user has access to the PCI sysfs resource files. By default a normal user will not have access to these files, so it provides a good indication that an administration agent has granted the user access to the device. [Yang Bai: add missing #include] [avi: fix comment style] Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yang Bai <hamo.by@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Remove ability to assign a device without iommu supportAlex Williamson2011-12-251-9/+9
| | | | | | | | | This option has no users and it exposes a security hole that we can allow devices to be assigned without iommu protection. Make KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* Merge branch 'kvm-updates/3.2' of ↵Linus Torvalds2011-10-301-29/+33
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (75 commits) KVM: SVM: Keep intercepting task switching with NPT enabled KVM: s390: implement sigp external call KVM: s390: fix register setting KVM: s390: fix return value of kvm_arch_init_vm KVM: s390: check cpu_id prior to using it KVM: emulate lapic tsc deadline timer for guest x86: TSC deadline definitions KVM: Fix simultaneous NMIs KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode KVM: x86 emulator: switch lds/les/lss/lfs/lgs to direct decode KVM: x86 emulator: streamline decode of segment registers KVM: x86 emulator: simplify OpMem64 decode KVM: x86 emulator: switch src decode to decode_operand() KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack KVM: x86 emulator: switch OpImmUByte decode to decode_imm() KVM: x86 emulator: free up some flag bits near src, dst KVM: x86 emulator: switch src2 to generic decode_operand() KVM: x86 emulator: expand decode flags to 64 bits KVM: x86 emulator: split dst decode to a generic decode_operand() KVM: x86 emulator: move memop, memopp into emulation context ...
| * KVM: Split up MSI-X assigned device IRQ handlerJan Kiszka2011-09-251-13/+19
| | | | | | | | | | | | | | | | The threaded IRQ handler for MSI-X has almost nothing in common with the INTx/MSI handler. Move its code into a dedicated handler. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Avoid needless registrations of IRQ ack notifier for assigned devicesJan Kiszka2011-09-251-10/+8
| | | | | | | | | | | | | | | | | | We only perform work in kvm_assigned_dev_ack_irq if the guest IRQ is of INTx type. This completely avoids the callback invocation in non-INTx cases by registering the IRQ ack notifier only for INTx. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Clean up unneeded void pointer castsJan Kiszka2011-09-251-6/+6
| | | | | | | | | | Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | pci: Add flag indicating device has been assigned by KVMGreg Rose2011-09-231-0/+2
|/ | | | | | | | | | | | | | | | Device drivers that create and destroy SR-IOV virtual functions via calls to pci_enable_sriov() and pci_disable_sriov can cause catastrophic failures if they attempt to destroy VFs while they are assigned to guest virtual machines. By adding a flag for use by the KVM module to indicate that a device is assigned a device driver can check that flag and avoid destroying VFs while they are assigned and avoid system failures. CC: Ian Campbell <ijc@hellion.org.uk> CC: Konrad Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* KVM: Fix off-by-one in overflow check of KVM_ASSIGN_SET_MSIX_NRJan Kiszka2011-07-121-1/+1
| | | | | | | | KVM_MAX_MSIX_PER_DEV implies that up to that many MSI-X entries can be requested. But the kernel so far rejected already the upper limit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use pci_store/load_saved_state() around VM device usageAlex Williamson2011-05-211-4/+14
| | | | | | | | | | | Store the device saved state so that we can reload the device back to the original state when it's unassigned. This has the benefit that the state survives across pci_reset_function() calls via the PCI sysfs reset interface while the VM is using the device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* KVM: Clean up kvm_vm_ioctl_assigned_deviceJan Kiszka2011-01-121-5/+4
| | | | | | | | | | | | Any arch not supporting device assigment will also not build assigned-dev.c. So testing for KVM_CAP_DEVICE_DEASSIGNMENT is pointless. KVM_CAP_ASSIGN_DEV_IRQ is unconditinally set. Moreover, add a default case for dispatching the ioctl. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Save/restore state of assigned PCI deviceJan Kiszka2011-01-121-1/+4
| | | | | | | | | | The guest may change states that pci_reset_function does not touch. So we better save/restore the assigned device across guest usage. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Refactor IRQ names of assigned devicesJan Kiszka2011-01-121-5/+6
| | | | | | | | | Cosmetic change, but it helps to correlate IRQs with PCI devices. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Switch assigned device IRQ forwarding to threaded handlerJan Kiszka2011-01-121-73/+34
| | | | | | | | | | | | | | | | This improves the IRQ forwarding for assigned devices: By using the kernel's threaded IRQ scheme, we can get rid of the latency-prone work queue and simplify the code in the same run. Moreover, we no longer have to hold assigned_dev_lock while raising the guest IRQ, which can be a lenghty operation as we may have to iterate over all VCPUs. The lock is now only used for synchronizing masking vs. unmasking of INTx-type IRQs, thus is renames to intx_lock. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Clear assigned guest IRQ on releaseJan Kiszka2011-01-121-0/+3
| | | | | | | | | | | When we deassign a guest IRQ, clear the potentially asserted guest line. There might be no chance for the guest to do this, specifically if we switch from INTx to MSI mode. Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fix unused but set warningsAndi Kleen2010-08-011-2/+0
| | | | | | | No real bugs in this one. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Update Red Hat copyrightsAvi Kivity2010-08-011-1/+1
| | | | Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irqAlex Williamson2010-08-011-3/+0
| | | | | | | | | | | Remove this check in an effort to allow kvm guests to run without root privileges. This capability check doesn't seem to add any security since the device needs to have already been added via the assign device ioctl and the io actually occurs through the pci sysfs interface. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: fix assigned_device_enable_host_msix error handlingjing zhang2010-05-171-2/+6
| | | | | | | | Free IRQ's and disable MSIX upon failure. Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* KVM: enable PCI multiple-segments for pass-through deviceZhai, Edwin2010-03-011-1/+3
| | | | | | | | | Enable optional parameter (default 0) - PCI segment (or domain) besides BDF, when assigning PCI device to guest. Signed-off-by: Zhai Edwin <edwin.zhai@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU updateMarcelo Tosatti2010-03-011-4/+4
| | | | | | | | | | Use two steps for memslot deletion: mark the slot invalid (which stops instantiation of new shadow pages for that slot, but allows destruction), then instantiate the new empty slot. Also simplifies kvm_handle_hva locking. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fix possible circular locking in kvm_vm_ioctl_assign_device()Sheng Yang2009-12-271-3/+3
| | | | | | | | | | | | | | | | | | One possible order is: KVM_CREATE_IRQCHIP ioctl(took kvm->lock) -> kvm_iobus_register_dev() -> down_write(kvm->slots_lock). The other one is in kvm_vm_ioctl_assign_device(), which take kvm->slots_lock first, then kvm->lock. Update the comment of lock order as well. Observe it due to kernel locking debug warnings. Cc: stable@kernel.org Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Move assigned device code to own fileAvi Kivity2009-12-031-0/+818
Signed-off-by: Avi Kivity <avi@redhat.com>
OpenPOWER on IntegriCloud