summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/vmx/nested.c
Commit message (Collapse)AuthorAgeFilesLines
...
* KVM: nVMX: unconditionally cancel preemption timer in free_nested ↵Peter Shier2019-02-071-0/+1
| | | | | | | | | | | | | | | | | | | | | (CVE-2019-7221) Bugzilla: 1671904 There are multiple code paths where an hrtimer may have been started to emulate an L1 VMX preemption timer that can result in a call to free_nested without an intervening L2 exit where the hrtimer is normally cancelled. Unconditionally cancel in free_nested to cover all cases. Embargoed until Feb 7th 2019. Signed-off-by: Peter Shier <pshier@google.com> Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@kernel.org Message-Id: <20181011184646.154065-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: vmx: fix some -Wmissing-prototypes warningsYi Wang2019-01-251-1/+1
| | | | | | | | | | | We get some warnings when building kernel with W=1: arch/x86/kvm/vmx/vmx.c:426:5: warning: no previous prototype for ‘kvm_fill_hv_flush_list_func’ [-Wmissing-prototypes] arch/x86/kvm/vmx/nested.c:58:6: warning: no previous prototype for ‘init_vmcs_shadow_fields’ [-Wmissing-prototypes] Make them static to fix this. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86/vmx: Use kzalloc for cached_vmcs12Tom Roeder2019-01-251-4/+8
| | | | | | | | | | | | | | | | | | | | This changes the allocation of cached_vmcs12 to use kzalloc instead of kmalloc. This removes the information leak found by Syzkaller (see Reported-by) in this case and prevents similar leaks from happening based on cached_vmcs12. It also changes vmx_get_nested_state to copy out the full 4k VMCS12_SIZE in copy_to_user rather than only the size of the struct. Tested: rebuilt against head, booted, and ran the syszkaller repro https://syzkaller.appspot.com/text?tag=ReproC&x=174efca3400000 without observing any problems. Reported-by: syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com Fixes: 8fcc4b5923af5de58b80b53a069453b135693304 Cc: stable@vger.kernel.org Signed-off-by: Tom Roeder <tmroeder@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* x86/kvm/nVMX: don't skip emulated instruction twice when vmptr address is ↵Vitaly Kuznetsov2019-01-111-2/+1
| | | | | | | | | | | | | | | | not backed Since commit 09abb5e3e5e50 ("KVM: nVMX: call kvm_skip_emulated_instruction in nested_vmx_{fail,succeed}") nested_vmx_failValid() results in kvm_skip_emulated_instruction() so doing it again in handle_vmptrld() when vmptr address is not backed is wrong, we end up advancing RIP twice. Fixes: fca91f6d60b6e ("kvm: nVMX: Set VM instruction error for VMPTRLD of unbacked page") Reported-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2018-12-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "ARM: - selftests improvements - large PUD support for HugeTLB - single-stepping fixes - improved tracing - various timer and vGIC fixes x86: - Processor Tracing virtualization - STIBP support - some correctness fixes - refactorings and splitting of vmx.c - use the Hyper-V range TLB flush hypercall - reduce order of vcpu struct - WBNOINVD support - do not use -ftrace for __noclone functions - nested guest support for PAUSE filtering on AMD - more Hyper-V enlightenments (direct mode for synthetic timers) PPC: - nested VFIO s390: - bugfixes only this time" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits) KVM: x86: Add CPUID support for new instruction WBNOINVD kvm: selftests: ucall: fix exit mmio address guessing Revert "compiler-gcc: disable -ftracer for __noclone functions" KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range() KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp() KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte() KVM: Make kvm_set_spte_hva() return int KVM: Replace old tlb flush function with new one to flush a specified range. KVM/MMU: Add tlb flush with range helper function KVM/VMX: Add hv tlb range flush support x86/hyper-v: Add HvFlushGuestAddressList hypercall support KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops KVM: x86: Disable Intel PT when VMXON in L1 guest KVM: x86: Set intercept for Intel PT MSRs read/write KVM: x86: Implement Intel PT MSRs read/write emulation ...
* KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routinesSean Christopherson2018-12-211-20/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Transitioning to/from a VMX guest requires KVM to manually save/load the bulk of CPU state that the guest is allowed to direclty access, e.g. XSAVE state, CR2, GPRs, etc... For obvious reasons, loading the guest's GPR snapshot prior to VM-Enter and saving the snapshot after VM-Exit is done via handcoded assembly. The assembly blob is written as inline asm so that it can easily access KVM-defined structs that are used to hold guest state, e.g. moving the blob to a standalone assembly file would require generating defines for struct offsets. The other relevant aspect of VMX transitions in KVM is the handling of VM-Exits. KVM doesn't employ a separate VM-Exit handler per se, but rather treats the VMX transition as a mega instruction (with many side effects), i.e. sets the VMCS.HOST_RIP to a label immediately following VMLAUNCH/VMRESUME. The label is then exposed to C code via a global variable definition in the inline assembly. Because of the global variable, KVM takes steps to (attempt to) ensure only a single instance of the owning C function, e.g. vmx_vcpu_run, is generated by the compiler. The earliest approach placed the inline assembly in a separate noinline function[1]. Later, the assembly was folded back into vmx_vcpu_run() and tagged with __noclone[2][3], which is still used today. After moving to __noclone, an edge case was encountered where GCC's -ftracer optimization resulted in the inline assembly blob being duplicated. This was "fixed" by explicitly disabling -ftracer in the __noclone definition[4]. Recently, it was found that disabling -ftracer causes build warnings for unsuspecting users of __noclone[5], and more importantly for KVM, prevents the compiler for properly optimizing vmx_vcpu_run()[6]. And perhaps most importantly of all, it was pointed out that there is no way to prevent duplication of a function with 100% reliability[7], i.e. more edge cases may be encountered in the future. So to summarize, the only way to prevent the compiler from duplicating the global variable definition is to move the variable out of inline assembly, which has been suggested several times over[1][7][8]. Resolve the aforementioned issues by moving the VMLAUNCH+VRESUME and VM-Exit "handler" to standalone assembly sub-routines. Moving only the core VMX transition codes allows the struct indexing to remain as inline assembly and also allows the sub-routines to be used by nested_vmx_check_vmentry_hw(). Reusing the sub-routines has a happy side-effect of eliminating two VMWRITEs in the nested_early_check path as there is no longer a need to dynamically change VMCS.HOST_RIP. Note that callers to vmx_vmenter() must account for the CALL modifying RSP, e.g. must subtract op-size from RSP when synchronizing RSP with VMCS.HOST_RSP and "restore" RSP prior to the CALL. There are no great alternatives to fudging RSP. Saving RSP in vmx_enter() is difficult because doing so requires a second register (VMWRITE does not provide an immediate encoding for the VMCS field and KVM supports Hyper-V's memory-based eVMCS ABI). The other more drastic alternative would be to use eschew VMCS.HOST_RSP and manually save/load RSP using a per-cpu variable (which can be encoded as e.g. gs:[imm]). But because a valid stack is needed at the time of VM-Exit (NMIs aren't blocked and a user could theoretically insert INT3/INT1ICEBRK at the VM-Exit handler), a dedicated per-cpu VM-Exit stack would be required. A dedicated stack isn't difficult to implement, but it would require at least one page per CPU and knowledge of the stack in the dumpstack routines. And in most cases there is essentially zero overhead in dynamically updating VMCS.HOST_RSP, e.g. the VMWRITE can be avoided for all but the first VMLAUNCH unless nested_early_check=1, which is not a fast path. In other words, avoiding the VMCS.HOST_RSP by using a dedicated stack would only make the code marginally less ugly while requiring at least one page per CPU and forcing the kernel to be aware (and approve) of the VM-Exit stack shenanigans. [1] cea15c24ca39 ("KVM: Move KVM context switch into own function") [2] a3b5ba49a8c5 ("KVM: VMX: add the __noclone attribute to vmx_vcpu_run") [3] 104f226bfd0a ("KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()") [4] 95272c29378e ("compiler-gcc: disable -ftracer for __noclone functions") [5] https://lkml.kernel.org/r/20181218140105.ajuiglkpvstt3qxs@treble [6] https://patchwork.kernel.org/patch/8707981/#21817015 [7] https://lkml.kernel.org/r/ri6y38lo23g.fsf@suse.cz [8] https://lkml.kernel.org/r/20181218212042.GE25620@tassilo.jf.intel.com Suggested-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Martin Jambor <mjambor@suse.cz> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Martin Jambor <mjambor@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobsSean Christopherson2018-12-211-3/+3
| | | | | | | | | | | | | | | Use '%% " _ASM_CX"' instead of '%0' to dereference RCX, i.e. the 'struct vcpu_vmx' pointer, in the VM-Enter asm blobs of vmx_vcpu_run() and nested_vmx_check_vmentry_hw(). Using the symbolic name means that adding/removing an output parameter(s) requires "rewriting" almost all of the asm blob, which makes it nearly impossible to understand what's being changed in even the most minor patches. Opportunistically improve the code comments. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Disable Intel PT when VMXON in L1 guestLuwei Kang2018-12-211-0/+6
| | | | | | | | | | | Currently, Intel Processor Trace do not support tracing in L1 guest VMX operation(IA32_VMX_MISC[bit 14] is 0). As mentioned in SDM, on these type of processors, execution of the VMXON instruction will clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL causes a general-protection exception (#GP). Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: nVMX: NMI-window and interrupt-window exiting should wake L2 from HLTJim Mattson2018-12-211-3/+7
| | | | | | | | | | | | | | | | | According to the SDM, "NMI-window exiting" VM-exits wake a logical processor from the same inactive states as would an NMI and "interrupt-window exiting" VM-exits wake a logical processor from the same inactive states as would an external interrupt. Specifically, they wake a logical processor from the shutdown state and from the states entered using the HLT and MWAIT instructions. Fixes: 6dfacadd5858 ("KVM: nVMX: Add support for activity state HLT") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> [Squashed comments of two Jim's patches and used the simplified code hunk provided by Sean. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: nVMX: Move the checks for Guest Non-Register States to a separate ↵Krish Sadhukhan2018-12-141-3/+14
| | | | | | | | | | | | helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Move the checks for Host Control Registers and MSRs to a separate ↵Krish Sadhukhan2018-12-141-14/+25
| | | | | | | | | | | | helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Move the checks for VM-Entry Control Fields to a separate helper ↵Krish Sadhukhan2018-12-141-43/+54
| | | | | | | | | | | | function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Move the checks for VM-Exit Control Fields to a separate helper ↵Krish Sadhukhan2018-12-141-10/+33
| | | | | | | | | | | | function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Remove param indirection from nested_vmx_check_msr_switch()Sean Christopherson2018-12-141-24/+10
| | | | | | | | | | | | | | | Passing the enum and doing an indirect lookup is silly when we can simply pass the field directly. Remove the "fast path" code in nested_vmx_check_msr_switch_controls() as it's now nothing more than a redundant check. Remove the debug message rather than continue passing the enum for the address field. Having debug messages for the MSRs themselves is useful as MSR legality is a huge space, whereas messing up a physical address means the VMM is fundamentally broken. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Move the checks for VM-Execution Control Fields to a separate ↵Krish Sadhukhan2018-12-141-69/+62
| | | | | | | | | | | | helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Prepend "nested_vmx_" to check_vmentry_{pre,post}reqs()Krish Sadhukhan2018-12-141-7/+8
| | | | | | | | | | .. as they are used only in nested vmx context. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM nVMX: MSRs should not be stored if VM-entry fails during or after ↵Krish Sadhukhan2018-12-141-4/+12
| | | | | | | | | | | | | | | | | loading guest state According to section "VM-entry Failures During or After Loading Guest State" in Intel SDM vol 3C, "No MSRs are saved into the VM-exit MSR-store area." when bit 31 of the exit reason is set. Reported-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helperVitaly Kuznetsov2018-12-141-0/+1
| | | | | | | | | The upcoming KVM_GET_SUPPORTED_HV_CPUID ioctl will need to return Enlightened VMCS version in HYPERV_CPUID_NESTED_FEATURES.EAX when it was enabled. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Move nested code to dedicated filesSean Christopherson2018-12-141-0/+5674
From a functional perspective, this is (supposed to be) a straight copy-paste of code. Code was moved piecemeal to nested.c as not all code that could/should be moved was obviously nested-only. The nested code was then re-ordered as needed to compile, i.e. stats may not show this is being a "pure" move despite there not being any intended changes in functionality. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
OpenPOWER on IntegriCloud