summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* KVM: SVM: Improve nested interrupt injectionAlexander Graf2009-09-101-15/+24
| | | | | | | | | | | While trying to get Hyper-V running, I realized that the interrupt injection mechanisms that are in place right now are not 100% correct. This patch makes nested SVM's interrupt injection behave more like on a real machine. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Implement INVLPGAAlexander Graf2009-09-101-1/+14
| | | | | | | | | | | SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, so let's implement it! For now we just do the same thing invlpg does, as asid switching means we flush the mmu anyways. That might change one day though. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Implement MSRs used by Hyper-VAlexander Graf2009-09-101-0/+5
| | | | | | | | | | | | | Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage. But let's be nice today and have it its way, because otherwise it fails terribly. [jaswinder: fix build for linux-next changes] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* x86: Add definition for IGNNE MSRAlexander Graf2009-09-101-0/+1
| | | | | | | Hyper-V accesses MSR_IGNNE while running under KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Don't save/restore host cr2Avi Kivity2009-09-101-17/+0
| | | | | | | The host never reads cr2 in process context, so are free to clobber it. The vmx code does this, so we can safely remove the save/restore code. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Only reload guest cr2 if different from host cr2Avi Kivity2009-09-101-2/+7
| | | | | | | | | cr2 changes only rarely, and writing it is expensive. Avoid the costly cr2 writes by checking if it does not already hold the desired value. Shaves 70 cycles off the vmexit latency. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Drop useless atomic test from timer functionJan Kiszka2009-09-101-2/+2
| | | | | | | | | | | The current code tries to optimize the setting of KVM_REQ_PENDING_TIMER but used atomic_inc_and_test - which always returns true unless pending had the invalid value of -1 on entry. This patch drops the test part preserving the original semantic but expressing it less confusingly. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix racy event propagation in timerJan Kiszka2009-09-101-6/+10
| | | | | | | | | | | | | | | Minor issue that likely had no practical relevance: the kvm timer function so far incremented the pending counter and then may reset it again to 1 in case reinjection was disabled. This opened a small racy window with the corresponding VCPU loop that may have happened to run on another (real) CPU and already consumed the value. Fix it by skipping the incrementation in case pending is already > 0. This opens a different race windows, but may only rarely cause lost events in case we do not care about them anyway (!reinject). Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Optimize searching for highest IRRGleb Natapov2009-09-102-3/+22
| | | | | | | | | | Most of the time IRR is empty, so instead of scanning the whole IRR on each VM entry keep a variable that tells us if IRR is not empty. IRR will have to be scanned twice on each IRQ delivery, but this is much more rare than VM entry. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Replace pending exception by PF if it happens seriallyGleb Natapov2009-09-101-7/+13
| | | | | | | | Replace previous exception with a new one in a hope that instruction re-execution will regenerate lost exception. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: conditionally disable 2M pagesMarcelo Tosatti2009-09-103-2/+16
| | | | | | | | | | Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled. [avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: EPT misconfiguration handlerMarcelo Tosatti2009-09-101-1/+85
| | | | | | | | | | Handler for EPT misconfiguration which checks for valid state in the shadow pagetables, printing the spte on each level. The separate WARN_ONs are useful for kerneloops.org. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: add kvm_mmu_get_spte_hierarchy helperMarcelo Tosatti2009-09-102-0/+20
| | | | | | | Required by EPT misconfiguration handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: make for_each_shadow_entry aware of largepagesMarcelo Tosatti2009-09-101-0/+5
| | | | | | | | This way there is no need to add explicit checks in every for_each_shadow_entry user. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bitsMarcelo Tosatti2009-09-102-0/+27
| | | | | | | Required for EPT misconfiguration handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Move performance counter MSR access interception to generic x86 pathAndre Przywara2009-09-103-28/+30
| | | | | | | | | | | | | The performance counter MSRs are different for AMD and Intel CPUs and they are chosen mainly by the CPUID vendor string. This patch catches writes to all addresses (regardless of VMX/SVM path) and handles them in the generic MSR handler routine. Writing a 0 into the event select register is something we perfectly emulate ;-), so don't print out a warning to dmesg in this case. This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: largepage handlingMarcelo Tosatti2009-09-101-8/+7
| | | | | | | Make the audit code aware of largepages. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: audit_mappings tweaksMarcelo Tosatti2009-09-101-1/+7
| | | | | | | | | | - Fail early in case gfn_to_pfn returns is_error_pfn. - For the pre pte write case, avoid spurious "gva is valid but spte is notrap" messages (the emulation code does the guest write first, so this particular case is OK). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: nontrapping ptes in nonleaf levelMarcelo Tosatti2009-09-101-6/+1
| | | | | | | It is valid to set non leaf sptes as notrap. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: update audit_write_protectionMarcelo Tosatti2009-09-101-3/+11
| | | | | | | | - Unsync pages contain writable sptes in the rmap. - rmaps do not exclusively contain writable sptes anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU audit: update count_writable_mappings / count_rmapsMarcelo Tosatti2009-09-101-10/+94
| | | | | | | | | | | | | | | | | | Under testing, count_writable_mappings returns a value that is 2 integers larger than what count_rmaps returns. Suspicion is that either of the two functions is counting a duplicate (either positively or negatively). Modifying check_writable_mappings_rmap to check for rmap existance on all present MMU pages fails to trigger an error, which should keep Avi happy. Also introduce mmu_spte_walk to invoke a callback on all present sptes visible to the current vcpu, might be useful in the future. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: introduce is_last_spte helperMarcelo Tosatti2009-09-101-13/+13
| | | | | | | | | | Hiding some of the last largepage / level interaction (which is useful for gbpages and for zero based levels). Also merge the PT_PAGE_TABLE_LEVEL clearing loop in unlink_children. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Return to userspace on emulation failureAvi Kivity2009-09-102-2/+10
| | | | | | | Instead of mindlessly retrying to execute the instruction, report the failure to userspace. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use macro to iterate over vcpus.Gleb Natapov2009-09-109-74/+76
| | | | | | | | [christian: remove unused variables on s390] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Break dependency between vcpu index in vcpus array and vcpu_id.Gleb Natapov2009-09-109-33/+55
| | | | | | | | | Archs are free to use vcpu_id as they see fit. For x86 it is used as vcpu's apic id. New ioctl is added to configure boot vcpu id that was assumed to be 0 till now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use pointer to vcpu instead of vcpu_id in timer code.Gleb Natapov2009-09-104-4/+4
| | | | | Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Introduce kvm_vcpu_is_bsp() function.Gleb Natapov2009-09-1011-18/+28
| | | | | | | Use it instead of open code "vcpu_id zero is BSP" assumption. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: s/shadow_pte/spte/Avi Kivity2009-09-102-59/+59
| | | | | | | | | We use shadow_pte and spte inconsistently, switch to the shorter spelling. Rename set_shadow_pte() to __set_spte() to avoid a conflict with the existing set_spte(), and to indicate its lowlevelness. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pteAvi Kivity2009-09-104-21/+21
| | | | | | | | | Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Fix is_dirty_pte()Avi Kivity2009-09-101-1/+1
| | | | | | | | | | is_dirty_pte() is used on guest ptes, not shadow ptes, so it needs to avoid shadow_dirty_mask and use PT_DIRTY_MASK instead. Misdetecting dirty pages could lead to unnecessarily setting the dirty bit under EPT. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Move rmode structure to vmx-specific codeAvi Kivity2009-09-102-44/+44
| | | | | | rmode is only used in vmx, so move it to vmx.c Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Reorder ioctls in kvm.hAvi Kivity2009-09-101-5/+5
| | | | | | Somehow the VM ioctls got unsorted; resort. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Support Unrestricted Guest featureNitin A Kamble2009-09-103-11/+62
| | | | | | | | | | | | | | | | | | "Unrestricted Guest" feature is added in the VMX specification. Intel Westmere and onwards processors will support this feature. It allows kvm guests to run real mode and unpaged mode code natively in the VMX mode when EPT is turned on. With the unrestricted guest there is no need to emulate the guest real mode code in the vm86 container or in the emulator. Also the guest big real mode code works like native. The attached patch enhances KVM to use the unrestricted guest feature if available on the processor. It also adds a new kernel/module parameter to disable the unrestricted guest feature at the boot time. Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: switch irq injection/acking data structures to irq_lockMarcelo Tosatti2009-09-108-30/+58
| | | | | | | | | | | | | | | | | | | Protect irq injection/acking data structures with a separate irq_lock mutex. This fixes the following deadlock: CPU A CPU B kvm_vm_ioctl_deassign_dev_irq() mutex_lock(&kvm->lock); worker_thread() -> kvm_deassign_irq() -> kvm_assigned_dev_interrupt_work_handler() -> deassign_host_irq() mutex_lock(&kvm->lock); -> cancel_work_sync() [blocked] [gleb: fix ia64 path] Reported-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: introduce irq_lock, use it to protect ioapicMarcelo Tosatti2009-09-103-1/+8
| | | | | | | Introduce irq_lock, and use to protect ioapic data structures. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: move coalesced_mmio locking to its own deviceMarcelo Tosatti2009-09-102-6/+5
| | | | | | | | Move coalesced_mmio locking to its own device, instead of relying on kvm->lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Grab pic lock in kvm_pic_clear_isr_ackMarcelo Tosatti2009-09-101-0/+2
| | | | | | | isr_ack is protected by kvm_pic->lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Cleanup LAPIC interfaceJan Kiszka2009-09-101-12/+0
| | | | | | | | | None of the interface services the LAPIC emulation provides need to be exported to modules, and kvm_lapic_get_base is even totally unused today. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ppc: e500: Add MMUCFG and PVR emulationLiu Yu2009-09-102-0/+5
| | | | | | | Latest kernel started to use these two registers. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ppc: e500: Directly pass pvr to guestLiu Yu2009-09-103-5/+1
| | | | | Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ppc: e500: Move to Book-3e MMU definitionsLiu Yu2009-09-102-8/+8
| | | | | | | According to commit 70fe3af8403f85196bb74f22ce4813db7dfedc1a. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Calculate available entries in coalesced mmio ringAvi Kivity2009-09-101-5/+5
| | | | | | | | | | | Instead of checking whether we'll wrap around, calculate how many entries are available, and check whether we have enough (just one) for the pending mmio. By itself, this doesn't change anything, but it paves the way for making this function lockless. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fix reporting of unhandled EPT violationsAvi Kivity2009-09-101-2/+2
| | | | | | | Instead of returning -ENOTSUPP, exit normally but indicate the hardware exit reason. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Cache pdptrsAvi Kivity2009-09-107-13/+63
| | | | | | | Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx, guest memory access on svm) extract them on demand. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Simplify pdptr and cr3 managementAvi Kivity2009-09-101-6/+15
| | | | | | | | | Instead of reading the PDPTRs from memory after every exit (which is slow and wrong, as the PDPTRs are stored on the cpu), sync the PDPTRs from memory to the VMCS before entry, and from the VMCS to memory after exit. Do the same for cr3. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Avoid duplicate ept tlb flush when setting cr3Avi Kivity2009-09-101-1/+0
| | | | | | | vmx_set_cr3() will call vmx_tlb_flush(), which will flush the ept context. So there is no need to call ept_sync_context() explicitly. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: do not register i8254 PIO regions until we are initializedGregory Haskins2009-09-101-9/+8
| | | | | | | | | | | We currently publish the i8254 resources to the pio_bus before the devices are fully initialized. Since we hold the pit_lock, its probably not a real issue. But lets clean this up anyway. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: cleanup io_device codeGregory Haskins2009-09-108-53/+109
| | | | | | | | | | | We modernize the io_device code so that we use container_of() instead of dev->private, and move the vtable to a separate ops structure (theoretically allows better caching for multiple instances of the same ops structure) Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Clean up coalesced_mmio destructionGregory Haskins2009-09-101-1/+4
| | | | | | | | | | We invoke kfree() on a data member instead of the structure. This works today because the kvm_io_device is the first element of the private structure, but this could change in the future, so lets clean this up. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: powerpc: fix some init/exit annotationsStephen Rothwell2009-09-103-5/+5
| | | | | | | | | | | | | | Fixes a couple of warnings like this one: WARNING: arch/powerpc/kvm/kvm-440.o(.text+0x1e8c): Section mismatch in reference from the function kvmppc_44x_exit() to the function .exit.text:kvmppc_booke_exit() The function kvmppc_44x_exit() references a function in an exit section. Often the function kvmppc_booke_exit() has valid usage outside the exit section and the fix is to remove the __exit annotation of kvmppc_booke_exit. Also add some __init annotations on obvious routines. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Avi Kivity <avi@redhat.com>
OpenPOWER on IntegriCloud