summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* KVM: VMX: Implement PCID/INVPCID for guests with EPTMao, Junjie2012-07-128-7/+77
| | | | | | | | | | | | | | | | | This patch handles PCID/INVPCID for guests. Process-context identifiers (PCIDs) are a facility by which a logical processor may cache information for multiple linear-address spaces so that the processor may retain cached information when software switches to a different linear address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual Volume 3A for details. For guests with EPT, the PCID feature is enabled and INVPCID behaves as running natively. For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD. Signed-off-by: Junjie Mao <junjie.mao@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform checkPrarit Bhargava2012-07-113-0/+16
| | | | | | | | | | | | | | | | | | While debugging I noticed that unlike all the other hypervisor code in the kernel, kvm does not have an entry for x86_hyper which is used in detect_hypervisor_platform() which results in a nice printk in the syslog. This is only really a stub function but it does make kvm more consistent with the other hypervisors. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marcelo Tostatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: document mmu-lock and fast page faultXiao Guangrong2012-07-111-1/+129
| | | | | | | Document fast page fault and mmu-lock in locking.txt Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: fix kvm_mmu_pagetable_walk tracepointXiao Guangrong2012-07-112-6/+4
| | | | | | | | The P bit of page fault error code is missed in this tracepoint, fix it by passing the full error code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: trace fast page faultXiao Guangrong2012-07-112-0/+40
| | | | | | | To see what happen on this path and help us to optimize it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: fast path of handling guest page faultXiao Guangrong2012-07-111-17/+127
| | | | | | | | | | | | If the the present bit of page fault error code is set, it indicates the shadow page is populated on all levels, it means what we do is only modify the access bit which can be done out of mmu-lock Currently, in order to simplify the code, we only fix the page fault caused by write-protect on the fast path Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: introduce SPTE_MMU_WRITEABLE bitXiao Guangrong2012-07-111-19/+38
| | | | | | | | | | | | This bit indicates whether the spte can be writable on MMU, that means the corresponding gpte is writable and the corresponding gfn is not protected by shadow page protection In the later path, SPTE_MMU_WRITEABLE will indicates whether the spte can be locklessly updated Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: fold tlb flush judgement into mmu_spte_updateXiao Guangrong2012-07-111-13/+20
| | | | | | | mmu_spte_update() is the common function, we can easily audit the path Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: export PFEC.P bit on eptXiao Guangrong2012-07-111-1/+8
| | | | | | | | Export the present bit of page fault error code, the later patch will use it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: cleanup spte_write_protectXiao Guangrong2012-07-111-16/+29
| | | | | | | Use __drop_large_spte to cleanup this function and comment spte_write_protect Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: abstract spte write-protectXiao Guangrong2012-07-111-27/+31
| | | | | | | | Introduce a common function to abstract spte write-protect to cleanup the code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: return bool in __rmap_write_protectXiao Guangrong2012-07-111-6/+7
| | | | | | | | The reture value of __rmap_write_protect is either 1 or 0, use true/false instead of these Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Emulate invalid guest state by defaultAvi Kivity2012-07-091-1/+1
| | | | | | | | Our emulation should be complete enough that we can emulate guests while they are in big real mode, or in a mode transition that is not virtualizable without unrestricted guest support. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: implement LTRAvi Kivity2012-07-091-1/+10
| | | | | | Opcode 0F 00 /3. Encountered during Windows XP secondary processor bringup. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: make loading TR set the busy bitAvi Kivity2012-07-091-1/+7
| | | | | | | | Guest software doesn't actually depend on it, but vmx will refuse us entry if we don't. Set the bit in both the cached segment and memory, just to be nice. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: make read_segment_descriptor() return the addressAvi Kivity2012-07-091-5/+8
| | | | | | | Some operations want to modify the descriptor later on, so save the address for future use. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: emulate LLDTAvi Kivity2012-07-091-1/+10
| | | | | | Opcode 0F 00 /2. Used by isolinux durign the protected mode transition. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: emulate BSWAPAvi Kivity2012-07-091-2/+18
| | | | | | | | Opcodes 0F C8 - 0F CF. Used by the SeaBIOS cdrom code (though not in big real mode). Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Improve error reporting during invalid guest state emulationAvi Kivity2012-07-091-1/+5
| | | | | | If instruction emulation fails, report it properly to userspace. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Stop invalid guest state emulation on pending eventAvi Kivity2012-07-091-0/+3
| | | | | | Process the event, possibly injecting an interrupt, before continuing. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: implement ENTERAvi Kivity2012-07-091-1/+27
| | | | | | | | | | Opcode C8. Only ENTER with lexical nesting depth 0 is implemented, since others are very rare. We'll fail emulation if nonzero lexical depth is used so data is not corrupted. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: split push logic from push opcode emulationAvi Kivity2012-07-091-3/+8
| | | | | | | This allows us to reuse the code without populating ctxt->src and overriding ctxt->op_bytes. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: fix byte-sized MOVZX/MOVSXAvi Kivity2012-07-091-2/+2
| | | | | | | | | | | Commit 2adb5ad9fe1 removed ByteOp from MOVZX/MOVSX, replacing them by SrcMem8, but neglected to fix the dependency in the emulation code on ByteOp. This caused the instruction not to have any effect in some circumstances. Fix by replacing the check for ByteOp with the equivalent src.op_bytes == 1. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: emulate LAHFAvi Kivity2012-07-091-1/+8
| | | | | | Opcode 9F. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Continue emulating after batch exhaustedAvi Kivity2012-07-091-1/+1
| | | | | | | If we return early from an invalid guest state emulation loop, make sure we return to it later if the guest state is still invalid. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fix interrupt exit condition during emulationAvi Kivity2012-07-091-2/+1
| | | | | | | | | | | Checking EFLAGS.IF is incorrect as we might be in interrupt shadow. If that is the case, the main loop will notice that and not inject the interrupt, causing an endless loop. Fix by using vmx_interrupt_allowed() to check if we can inject an interrupt instead. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: emulate SGDT/SIDTAvi Kivity2012-07-091-2/+31
| | | | | | Opcodes 0F 01 /0 and 0F 01 /1 Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix SS default ESP/EBP based addressingAvi Kivity2012-07-091-3/+14
| | | | | | | | | | We correctly default to SS when BP is used as a base in 16-bit address mode, but we don't do that for 32-bit mode. Fix by adjusting the default to SS when either ESP or EBP is used as the base register. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: initialize memopAvi Kivity2012-07-091-1/+1
| | | | | | | | | | | memop is not initialized; this can lead to a two-byte operation following a 4-byte operation to see garbage values. Usually truncation fixes things fot us later on, but at least in one case (call abs) it doesn't. Fix by moving memop to the auto-initialized field area. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: emulate LEAVEAvi Kivity2012-07-091-1/+24
| | | | | | Opcode c9; used by some variants of Windows during boot, in big real mode. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Limit iterations with emulator_invalid_guest_stateAvi Kivity2012-07-091-1/+2
| | | | | | | Otherwise, if the guest ends up looping, we never exit the srcu critical section, which causes synchronize_srcu() to hang. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Relax check on unusable segmentAvi Kivity2012-07-091-3/+1
| | | | | | | | | Some userspace (e.g. QEMU 1.1) munge the d and g bits of segment descriptors, causing us not to recognize them as unusable segments with emulate_invalid_guest_state=1. Relax the check by testing for segment not present (a non-present segment cannot be usable). Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: fix LIDT/LGDT in long modeAvi Kivity2012-07-091-0/+4
| | | | | | | | | The operand size for these instructions is 8 bytes in long mode, even without a REX prefix. Set it explicitly. Triggered while booting Linux with emulate_invalid_guest_state=1. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: allow loading null SS in long modeAvi Kivity2012-07-091-4/+8
| | | | | | Null SS is valid in long mode; allow loading it. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: emulate cpuidAvi Kivity2012-07-091-1/+15
| | | | | | | | | | Opcode 0F A2. Used by Linux during the mode change trampoline while in a state that is not virtualizable on vmx without unrestricted_guest, so we need to emulate it is emulate_invalid_guest_state=1. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: change ->get_cpuid() accessor to use the x86 semanticsAvi Kivity2012-07-093-47/+30
| | | | | | | | Instead of getting an exact leaf, follow the spec and fall back to the last main leaf instead. This lets us easily emulate the cpuid instruction in the emulator. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Split cpuid register access from computationAvi Kivity2012-07-092-18/+23
| | | | | | | | | | Introduce kvm_cpuid() to perform the leaf limit check and calculate register values, and let kvm_emulate_cpuid() just handle reading and writing the registers from/to the vcpu. This allows us to reuse kvm_cpuid() in a context where directly reading and writing registers is not desired. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Return correct CPL during transition to protected modeAvi Kivity2012-07-091-2/+13
| | | | | | | | | | | | | In protected mode, the CPL is defined as the lower two bits of CS, as set by the last far jump. But during the transition to protected mode, there is no last far jump, so we need to return zero (the inherited real mode CPL). Fix by reading CPL from the cache during the transition. This isn't 100% correct since we don't set the CPL cache on a far jump, but since protected mode transition will always jump to a segment with RPL=0, it will always work. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulationAvi Kivity2012-07-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the MMU's ->new_cr3() callback does nothing when guest paging is disabled or when two-dimentional paging (e.g. EPT on Intel) is active. This means that an emulated write to cr3 can be lost; kvm_set_cr3() will write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its old value and this is what the guest sees. This bug did not have any effect until now because: - with unrestricted guest, or with svm, we never emulate a mov cr3 instruction - without unrestricted guest, and with paging enabled, we also never emulate a mov cr3 instruction - without unrestricted guest, but with paging disabled, the guest's cr3 is ignored until the guest enables paging; at this point the value from arch.cr3 is loaded correctly my the mov cr0 instruction which turns on paging However, the patchset that enables big real mode causes us to emulate mov cr3 instructions in protected mode sometimes (when guest state is not virtualizable by vmx); this mov cr3 is effectively ignored and will crash the guest. The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3 reload. This is awkward because now all the new_cr3 callbacks to the same thing, and because mmu_free_roots() is somewhat of an overkill; but fixing that is more complicated and will be done after this minimal fix. Observed in the Window XP 32-bit installer while bringing up secondary vcpus. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: handle last_boosted_vcpu = 0 caseRik van Riel2012-07-061-1/+1
| | | | | | | | | | | | If last_boosted_vcpu == 0, then we fall through all test cases and may end up with all VCPUs pouncing on vcpu 0. With a large enough guest, this can result in enormous runqueue lock contention, which can prevent vcpu0 from running, leading to a livelock. Changing < to <= makes sure we properly handle that case. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: s390: Fix sigp sense handling.Cornelia Huck2012-07-032-5/+10
| | | | | | | | | | | | | If sigp sense doesn't have any status bits to report, it should set cc 0 and leave the register as-is. Since we know about the external call pending bit, we should report it if it is set as well. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: s390: use sigp condition code definesHeiko Carstens2012-07-031-29/+29
| | | | | | | | | Just use the defines instead of using plain numbers and adding a comment behind each line. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: s390: fix sigp set prefix status stored casesHeiko Carstens2012-07-031-2/+5
| | | | | | | | | | | | If an invalid parameter is passed or the addressed cpu is in an incorrect state sigp set prefix will store a status. This status must only have bits set as defined by the architecture. The current kvm implementation missed to clear bits and also did not set the intended status bit ("and" instead of "or" operation). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: s390: fix sigp sense running condition code handlingHeiko Carstens2012-07-031-2/+2
| | | | | | | | | | | | Only if the sensed cpu is not running a status is stored, which is reflected by condition code 1. If the cpu is running, condition code 0 should be returned. Just the opposite of what the code is doing. Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* s390/smp/kvm: unifiy sigp definitionsHeiko Carstens2012-07-033-85/+64
| | | | | | | | | | | | The smp and the kvm code have different defines for the sigp order codes. Let's just have a single place where these are defined. Also move the sigp condition code and sigp cpu status bits to the new sigp.h header file. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* s390/smp: remove redundant checkHeiko Carstens2012-07-031-2/+2
| | | | | | | | | | | condition code "status stored" for sigp sense running always implies that only the "not running" status bit is set. Therefore no need to check if it is set. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIERMarc Zyngier2012-07-031-2/+2
| | | | | | | | | | | In order to avoid compilation failure when KVM is not compiled in, guard the mmu_notifier specific sections with both CONFIG_MMU_NOTIFIER and KVM_ARCH_WANT_MMU_NOTIFIER, like it is being done in the rest of the KVM code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: code clean for vmx_init()Guo Chao2012-07-031-9/+7
| | | | | Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* apic: fix kvm build on UP without IOAPICMichael S. Tsirkin2012-07-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | On UP i386, when APIC is disabled # CONFIG_X86_UP_APIC is not set # CONFIG_PCI_IOAPIC is not set code looking at apicdrivers never has any effect but it still gets compiled in. In particular, this causes build failures with kvm, but it generally bloats the kernel unnecessarily. Fix by defining both __apicdrivers and __apicdrivers_end to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified that as the result any loop scanning __apicdrivers gets optimized out by the compiler. Warning: a .config with apic disabled doesn't seem to boot for me (even without this patch). Still verifying why, meanwhile this patch is compile-tested only. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: host side for eoi optimizationMichael S. Tsirkin2012-06-256-4/+193
| | | | | | | | | | | | | | | | | | | | | | | | | Implementation of PV EOI using shared memory. This reduces the number of exits an interrupt causes as much as by half. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changed: - Guest EOI is not required - Register is tested & ISR is automatically cleared on exit For testing results see description of previous patch 'kvm_para: guest side for eoi avoidance'. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
OpenPOWER on IntegriCloud