summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/emulate.c
Commit message (Collapse)AuthorAgeFilesLines
* KVM: x86: Use a typedef for fastop functionsSean Christopherson2020-01-271-7/+7
| | | | | | | | | | | Add a typedef to for the fastop function prototype to make the code more readable. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: X86: Add 'else' to unify fastop and execute call pathMiaohe Lin2020-01-271-4/+2
| | | | | | | | It also helps eliminate some duplicated code. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Remove unused ctxt param from emulator's FPU accessorsSean Christopherson2020-01-271-15/+13
| | | | | | | | | | | Remove an unused struct x86_emulate_ctxt * param from low level helpers used to access guest FPU state. The unused param was left behind by commit 6ab0b9feb82a ("x86,kvm: remove KVM emulator get_fpu / put_fpu"). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Ensure guest's FPU state is loaded when accessing for emulationSean Christopherson2020-01-271-0/+39
| | | | | | | | | | | | | | | | | | | Lock the FPU regs and reload the current thread's FPU state, which holds the guest's FPU state, to the CPU registers if necessary prior to accessing guest FPU state as part of emulation. kernel_fpu_begin() can be called from softirq context, therefore KVM must ensure softirqs are disabled (locking the FPU regs disables softirqs) when touching CPU FPU state. Note, for all intents and purposes this reverts commit 6ab0b9feb82a7 ("x86,kvm: remove KVM emulator get_fpu / put_fpu"), but at the time it was applied, removing get/put_fpu() was correct. The re-introduction of {get,put}_fpu() is necessitated by the deferring of FPU state load. Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Refactor prefix decoding to prevent Spectre-v1/L1TF attacksMarios Pomonis2020-01-271-2/+14
| | | | | | | | | | | | This fixes Spectre-v1/L1TF vulnerabilities in vmx_read_guest_seg_selector(), vmx_read_guest_seg_base(), vmx_read_guest_seg_limit() and vmx_read_guest_seg_ar(). When invoked from emulation, these functions contain index computations based on the (attacker-influenced) segment value. Using constants prevents the attack. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Protect x86_decode_insn from Spectre-v1/L1TF attacksMarios Pomonis2020-01-271-3/+8
| | | | | | | | | | | | | | | This fixes a Spectre-v1/L1TF vulnerability in x86_decode_insn(). kvm_emulate_instruction() (an ancestor of x86_decode_insn()) is an exported symbol, so KVM should treat it conservatively from a security perspective. Fixes: 045a282ca415 ("KVM: emulator: implement fninit, fnstsw, fnstcw") Signed-off-by: Nick Finco <nifi@google.com> Signed-off-by: Marios Pomonis <pomonis@google.com> Reviewed-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Add dedicated emulator helpers for querying CPUID featuresSean Christopherson2020-01-211-18/+3
| | | | | | | | | | | | | | | | | | Add feature-specific helpers for querying guest CPUID support from the emulator instead of having the emulator do a full CPUID and perform its own bit tests. The primary motivation is to eliminate the emulator's usage of bit() so that future patches can add more extensive build-time assertions on the usage of bit() without having to expose yet more code to the emulator. Note, providing a generic guest_cpuid_has() to the emulator doesn't work due to the existing built-time assertions in guest_cpuid_has(), which require the feature being checked to be a compile-time constant. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: X86: avoid unused setup_syscalls_segments call when SYSCALL check failedMiaohe Lin2019-11-151-4/+2
| | | | | | | | | When SYSCALL/SYSENTER ability check failed, cs and ss is inited but remain not used. Delay initializing cs and ss until SYSCALL/SYSENTER ability check passed. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: set ctxt->have_exception in x86_decode_insn()Jan Dakinevich2019-09-111-0/+2
| | | | | | | | | | | | | | x86_emulate_instruction() takes into account ctxt->have_exception flag during instruction decoding, but in practice this flag is never set in x86_decode_insn(). Fixes: 6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* x86: KVM: add xsetbv to the emulatorVitaly Kuznetsov2019-08-221-1/+22
| | | | | | | | To avoid hardcoding xsetbv length to '3' we need to support decoding it in the emulator. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Fix x86_decode_insn() return when fetching insn bytes failsSean Christopherson2019-08-221-1/+1
| | | | | | | | | | | | | | | | | Jump to the common error handling in x86_decode_insn() if __do_insn_fetch_bytes() fails so that its error code is converted to the appropriate return type. Although the various helpers used by x86_decode_insn() return X86EMUL_* values, x86_decode_insn() itself returns EMULATION_FAILED or EMULATION_OK. This doesn't cause a functional issue as the sole caller, x86_emulate_instruction(), currently only cares about success vs. failure, and success is indicated by '0' for both types (X86EMUL_CONTINUE and EMULATION_OK). Fixes: 285ca9e948fa ("KVM: emulate: speed up do_insn_fetch") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* x86/kvm: Fix fastop function ELF metadataJosh Poimboeuf2019-07-181-13/+31
| | | | | | | | | | | | | | | | | | | | | Some of the fastop functions, e.g. em_setcc(), are actually just used as global labels which point to blocks of functions. The global labels are incorrectly annotated as functions. Also the functions themselves don't have size annotations. Fixes a bunch of warnings like the following: arch/x86/kvm/emulate.o: warning: objtool: seto() is missing an ELF size annotation arch/x86/kvm/emulate.o: warning: objtool: em_setcc() is missing an ELF size annotation arch/x86/kvm/emulate.o: warning: objtool: setno() is missing an ELF size annotation arch/x86/kvm/emulate.o: warning: objtool: setc() is missing an ELF size annotation Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/c8cc9be60ebbceb3092aa5dd91916039a1f88275.1563413318.git.jpoimboe@redhat.com
* Merge tag 'kvm-arm-for-5.3' of ↵Paolo Bonzini2019-07-111-3/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 5.3 - Add support for chained PMU counters in guests - Improve SError handling - Handle Neoverse N1 erratum #1349291 - Allow side-channel mitigation status to be migrated - Standardise most AArch64 system register accesses to msr_s/mrs_s - Fix host MPIDR corruption on 32bit
| * treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499Thomas Gleixner2019-06-191-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this work is licensed under the terms of the gnu gpl version 2 see the copying file in the top level directory extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 35 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.797835076@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | KVM: x86: Use DR_TRAP_BITS instead of hard-coded 15Liran Alon2019-06-181-1/+1
|/ | | | | | | | | | Make all code consistent with kvm_deliver_exception_payload() by using appropriate symbolic constant instead of hard-coded number. Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernelsSean Christopherson2019-04-161-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Invoking the 64-bit variation on a 32-bit kenrel will crash the guest, trigger a WARN, and/or lead to a buffer overrun in the host, e.g. rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64. KVM allows userspace to report long mode support via CPUID, even though the guest is all but guaranteed to crash if it actually tries to enable long mode. But, a pure 32-bit guest that is ignorant of long mode will happily plod along. SMM complicates things as 64-bit CPUs use a different SMRAM save state area. KVM handles this correctly for 64-bit kernels, e.g. uses the legacy save state map if userspace has hid long mode from the guest, but doesn't fare well when userspace reports long mode support on a 32-bit host kernel (32-bit KVM doesn't support 64-bit guests). Since the alternative is to crash the guest, e.g. by not loading state or explicitly requesting shutdown, unconditionally use the legacy SMRAM save state map for 32-bit KVM. If a guest has managed to get far enough to handle SMIs when running under a weird/buggy userspace hypervisor, then don't deliberately crash the guest since there are no downsides (from KVM's perspective) to allow it to continue running. Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPUSean Christopherson2019-04-161-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save state area, i.e. don't save/restore EFER across SMM transitions. KVM somewhat models this, e.g. doesn't clear EFER on entry to SMM if the guest doesn't support long mode. But during RSM, KVM unconditionally clears EFER so that it can get back to pure 32-bit mode in order to start loading CRs with their actual non-SMM values. Clear EFER only when it will be written when loading the non-SMM state so as to preserve bits that can theoretically be set on 32-bit vCPUs, e.g. KVM always emulates EFER_SCE. And because CR4.PAE is cleared only to play nice with EFER, wrap that code in the long mode check as well. Note, this may result in a compiler warning about cr4 being consumed uninitialized. Re-read CR4 even though it's technically unnecessary, as doing so allows for more readable code and RSM emulation is not a performance critical path. Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: clear SMM flags before loading state while leaving SMMSean Christopherson2019-04-161-6/+6
| | | | | | | | | | | | | | | | | RSM emulation is currently broken on VMX when the interrupted guest has CR4.VMXE=1. Stop dancing around the issue of HF_SMM_MASK being set when loading SMSTATE into architectural state, e.g. by toggling it for problematic flows, and simply clear HF_SMM_MASK prior to loading architectural state (from SMRAM save state area). Reported-by: Jon Doron <arilou@gmail.com> Cc: Jim Mattson <jmattson@google.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Open code kvm_set_hflagsSean Christopherson2019-04-161-0/+3
| | | | | | | | | | Prepare for clearing HF_SMM_MASK prior to loading state from the SMRAM save state map, i.e. kvm_smm_changed() needs to be called after state has been loaded and so cannot be done automatically when setting hflags from RSM. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Load SMRAM in a single shot when leaving SMMSean Christopherson2019-04-161-75/+74
| | | | | | | | | | | | | | | | | | | | | | RSM emulation is currently broken on VMX when the interrupted guest has CR4.VMXE=1. Rather than dance around the issue of HF_SMM_MASK being set when loading SMSTATE into architectural state, ideally RSM emulation itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM architectural state. Ostensibly, the only motivation for having HF_SMM_MASK set throughout the loading of state from the SMRAM save state area is so that the memory accesses from GET_SMSTATE() are tagged with role.smm. Load all of the SMRAM save state area from guest memory at the beginning of RSM emulation, and load state from the buffer instead of reading guest memory one-by-one. This paves the way for clearing HF_SMM_MASK prior to loading state, and also aligns RSM with the enter_smm() behavior, which fills a buffer and writes SMRAM save state in a single go. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* jump_label: move 'asm goto' support test to KconfigMasahiro Yamada2019-01-061-1/+1
| | | | | | | | | | | | | | | | | | | | | Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label". The jump label is controlled by HAVE_JUMP_LABEL, which is defined like this: #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) # define HAVE_JUMP_LABEL #endif We can improve this by testing 'asm goto' support in Kconfig, then make JUMP_LABEL depend on CC_HAS_ASM_GOTO. Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will match to the real kernel capability. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
* x86: Clean up 'sizeof x' => 'sizeof(x)'Jordan Borgner2018-10-291-11/+11
| | | | | | | | | | | | | | | | | | "sizeof(x)" is the canonical coding style used in arch/x86 most of the time. Fix the few places that didn't follow the convention. (Also do some whitespace cleanups in a few places while at it.) [ mingo: Rewrote the changelog. ] Signed-off-by: Jordan Borgner <mail@jordan-borgner.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181028125828.7rgammkgzep2wpam@JordanDesktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/kvm: Add Hygon Dhyana support to KVMPu Wen2018-09-271-1/+10
| | | | | | | | | | | | | | | | | | The Hygon Dhyana CPU has the SVM feature as AMD family 17h does. So enable the KVM infrastructure support to it. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/654dd12876149fba9561698eaf9fc15d030301f8.1537533369.git.puwen@hygon.cn
* kvm: x86: Remove CR3_PCID_INVD flagJunaid Shahid2018-08-061-1/+1
| | | | | | | It is a duplicate of X86_CR3_PCID_NOFLUSH. So just use that instead. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor accessPaolo Bonzini2018-06-121-6/+6
| | | | | | | | | | | | | | The functions that were used in the emulation of fxrstor, fxsave, sgdt and sidt were originally meant for task switching, and as such they did not check privilege levels. This is very bad when the same functions are used in the emulation of unprivileged instructions. This is CVE-2018-10853. The obvious fix is to add a new argument to ops->read_std and ops->write_std, which decides whether the access is a "system" access or should use the processor's CPL. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: introduce linear_{read,write}_systemPaolo Bonzini2018-06-121-32/+32
| | | | | | | | | Wrap the common invocation of ctxt->ops->read_std and ctxt->ops->write_std, so as to have a smaller patch when the functions grow another argument. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: X86: Fix reserved bits check for MOV to CR3Wanpeng Li2018-05-141-1/+3
| | | | | | | | | | | | | | | | | | MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4. It should be checked when PCIDE bit is not set, however commit 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on its physical address width")' removes the bit 63 checking unconditionally. This patch fixes it by checking bit 63 of CR3 when PCIDE bit is not set in CR4. Fixes: d1cd3ce900441 (KVM: MMU: check guest CR3 reserved bits based on its physical address width) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Reviewed-by: Junaid Shahid <junaids@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: Add emulation for movups/movupdStefan Fritsch2018-04-041-1/+7
| | | | | | | | | | | | This is very similar to the aligned versions movaps/movapd. We have seen the corresponding emulation failures with openbsd as guest and with Windows 10 with intel HD graphics pass through. Signed-off-by: Christian Ehrhardt <christian_ehrhardt@genua.de> Signed-off-by: Stefan Fritsch <sf@sfritsch.de> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Add support for VMware backdoor Pseudo-PMCsArbel Moshe2018-03-161-0/+8
| | | | | | | | | | | | | | | | | | | | VMware exposes the following Pseudo PMCs: 0x10000: Physical host TSC 0x10001: Elapsed real time in ns 0x10002: Elapsed apparent time in ns For more info refer to: https://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf VMware allows access to these Pseduo-PMCs even when read via RDPMC in Ring3 and CR4.PCE=0. Therefore, commit modifies x86 emulator to allow access to these PMCs in this situation. In addition, emulation of these PMCs were added to kvm_pmu_rdpmc(). Signed-off-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Always allow access to VMware backdoor I/O portsLiran Alon2018-03-161-0/+11
| | | | | | | | | | VMware allows access to these ports even if denied by TSS I/O permission bitmap. Mimic behavior. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge branch 'x86/hyperv' of ↵Radim Krčmář2018-02-011-11/+30
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Topic branch for stable KVM clockource under Hyper-V. Thanks to Christoffer Dall for resolving the ARM conflict.
| * Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds2018-01-291-4/+5
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/pti updates from Thomas Gleixner: "Another set of melted spectrum related changes: - Code simplifications and cleanups for RSB and retpolines. - Make the indirect calls in KVM speculation safe. - Whitelist CPUs which are known not to speculate from Meltdown and prepare for the new CPUID flag which tells the kernel that a CPU is not affected. - A less rigorous variant of the module retpoline check which merily warns when a non-retpoline protected module is loaded and reflects that fact in the sysfs file. - Prepare for Indirect Branch Prediction Barrier support. - Prepare for exposure of the Speculation Control MSRs to guests, so guest OSes which depend on those "features" can use them. Includes a blacklist of the broken microcodes. The actual exposure of the MSRs through KVM is still being worked on" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Simplify indirect_branch_prediction_barrier() x86/retpoline: Simplify vmexit_fill_RSB() x86/cpufeatures: Clean up Spectre v2 related CPUID flags x86/cpu/bugs: Make retpoline module warning conditional x86/bugs: Drop one "mitigation" from dmesg x86/nospec: Fix header guards names x86/alternative: Print unadorned pointers x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown x86/msr: Add definitions for new speculation control MSRs x86/cpufeatures: Add AMD feature bits for Speculation Control x86/cpufeatures: Add Intel feature bits for Speculation Control x86/cpufeatures: Add CPUID_7_EDX CPUID leaf module/retpoline: Warn about missing retpoline in module KVM: VMX: Make indirect call speculation safe KVM: x86: Make indirect calls in emulator speculation safe
| | * KVM: x86: Make indirect calls in emulator speculation safePeter Zijlstra2018-01-251-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the indirect calls with CALL_NOSPEC. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: rga@amazon.de Cc: Dave Hansen <dave.hansen@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: https://lkml.kernel.org/r/20180125095843.595615683@infradead.org
| * | kvm: x86: fix RSM when PCID is non-zeroPaolo Bonzini2017-12-211-7/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rsm_load_state_64() and rsm_enter_protected_mode() load CR3, then CR4 & ~PCIDE, then CR0, then CR4. However, setting CR4.PCIDE fails if CR3[11:0] != 0. It's probably easier in the long run to replace rsm_enter_protected_mode() with an emulator callback that sets all the special registers (like KVM_SET_SREGS would do). For now, set the PCID field of CR3 only after CR4.PCIDE is 1. Reported-by: Laszlo Ersek <lersek@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: emulate RDPIDPaolo Bonzini2017-12-141-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is encoded as F3 0F C7 /7 with a register argument. The register argument is the second array in the group9 GroupDual, while F3 is the fourth element of a Prefix. Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: emulate sldt and strPaolo Bonzini2017-12-141-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | These are needed to handle the descriptor table vmexits when emulating UMIP. Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: add support for UMIPPaolo Bonzini2017-12-141-0/+8
|/ / | | | | | | | | | | | | | | Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and add UMIP support for instructions that are already emulated by KVM. Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | x86,kvm: remove KVM emulator get_fpu / put_fpuRik van Riel2017-12-051-24/+0
| | | | | | | | | | | | | | | | | | | | | | Now that get_fpu and put_fpu do nothing, because the scheduler will automatically load and restore the guest FPU context for us while we are in this code (deep inside the vcpu_run main loop), we can get rid of the get_fpu and put_fpu hooks. Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: fix em_fxstor() sleeping while in atomicDavid Hildenbrand2017-11-171-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9d643f63128b ("KVM: x86: avoid large stack allocations in em_fxrstor") optimize the stack size, but introduced a guest memory access which might sleep while in atomic. Fix it by introducing, again, a second fxregs_state. Try to avoid large stacks by using noinline. Add some helpful comments. Reported by syzbot: in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109 2 locks held by syzkaller879109/2909: #0: (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>] vcpu_load+0x1c/0x70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:154 #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest arch/x86/kvm/x86.c:6983 [inline] #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run arch/x86/kvm/x86.c:7061 [inline] #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222 CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted 4.13.0-rc4-next-20170811 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014 __might_sleep+0x95/0x190 kernel/sched/core.c:5967 __might_fault+0xab/0x1d0 mm/memory.c:4383 __copy_from_user include/linux/uaccess.h:71 [inline] __kvm_read_guest_page+0x58/0xa0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771 kvm_vcpu_read_guest_page+0x44/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791 kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407 kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466 segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819 em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022 x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471 x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698 kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854 handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400 vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718 vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline] vcpu_run arch/x86/kvm/x86.c:7061 [inline] kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222 kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x437fc9 RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9 RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000 R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000 Fixes: 9d643f63128b ("KVM: x86: avoid large stack allocations in em_fxrstor") Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* | KVM: X86: Fix operand/address-size during instruction decodingWanpeng Li2017-11-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pedro reported: During tests that we conducted on KVM, we noticed that executing a "PUSH %ES" instruction under KVM produces different results on both memory and the SP register depending on whether EPT support is enabled. With EPT the SP is reduced by 4 bytes (and the written value is 0-padded) but without EPT support it is only reduced by 2 bytes. The difference can be observed when the CS.DB field is 1 (32-bit) but not when it's 0 (16-bit). The internal segment descriptor cache exist even in real/vm8096 mode. The CS.D also should be respected instead of just default operand/address-size/66H prefix/67H prefix during instruction decoding. This patch fixes it by also adjusting operand/address-size according to CS.D. Reported-by: Pedro Fonseca <pfonseca@cs.washington.edu> Tested-by: Pedro Fonseca <pfonseca@cs.washington.edu> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Pedro Fonseca <pfonseca@cs.washington.edu> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* | KVM: x86: introduce ISA specific SMM entry/exit callbacksLadi Prosek2017-10-121-0/+9
|/ | | | | | | | | | | | | | | Entering and exiting SMM may require ISA specific handling under certain circumstances. This commit adds two new callbacks with empty implementations. Actual functionality will be added in following commits. * pre_enter_smm() is to be called when injecting an SMM, before any SMM related vcpu state has been changed * pre_leave_smm() is to be called when emulating the RSM instruction, when the vcpu is in real mode and before any SMM related vcpu state has been restored Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* x86/kvm: Move kvm_fastop_exception to .fixup sectionJosh Poimboeuf2017-10-051-2/+4
| | | | | | | | | | | | | | | | | | | | When compiling the kernel with the '-frecord-gcc-switches' flag, objtool complains: arch/x86/kvm/emulate.o: warning: objtool: .GCC.command.line+0x0: special: can't find new instruction And also the kernel fails to link. The problem is that the 'kvm_fastop_exception' code gets placed into the throwaway '.GCC.command.line' section instead of '.text'. Exception fixup code is conventionally placed in the '.fixup' section, so put it there where it belongs. Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2017-09-241-2/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Another round of CR3/PCID related fixes (I think this addresses all but one of the known problems with PCID support), an objtool fix plus a Clang fix that (finally) solves all Clang quirks to build a bootable x86 kernel as-is" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Fix inline asm call constraints for Clang objtool: Handle another GCC stack pointer adjustment bug x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code x86/mm: Factor out CR3-building code
| * x86/asm: Fix inline asm call constraints for ClangJosh Poimboeuf2017-09-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For inline asm statements which have a CALL instruction, we list the stack pointer as a constraint to convince GCC to ensure the frame pointer is set up first: static inline void foo() { register void *__sp asm(_ASM_SP); asm("call bar" : "+r" (__sp)) } Unfortunately, that pattern causes Clang to corrupt the stack pointer. The fix is easy: convert the stack pointer register variable to a global variable. It should be noted that the end result is different based on the GCC version. With GCC 6.4, this patch has exactly the same result as before: defconfig defconfig-nofp distro distro-nofp before 9820389 9491555 8816046 8516940 after 9820389 9491555 8816046 8516940 With GCC 7.2, however, GCC's behavior has changed. It now changes its behavior based on the conversion of the register variable to a global. That somehow convinces it to *always* set up the frame pointer before inserting *any* inline asm. (Therefore, listing the variable as an output constraint is a no-op and is no longer necessary.) It's a bit overkill, but the performance impact should be negligible. And in fact, there's a nice improvement with frame pointers disabled: defconfig defconfig-nofp distro distro-nofp before 9796316 9468236 9076191 8790305 after 9796957 9464267 9076381 8785949 So in summary, while listing the stack pointer as an output constraint is no longer necessary for newer versions of GCC, it's still needed for older versions. Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | KVM: x86: Fix the NULL pointer parameter in check_cr_write()Yu Zhang2017-09-191-3/+5
|/ | | | | | | | | | | | | | Routine check_cr_write() will trigger emulator_get_cpuid()-> kvm_cpuid() to get maxphyaddr, and NULL is passed as values for ebx/ecx/edx. This is problematic because kvm_cpuid() will dereference these pointers. Fixes: d1cd3ce90044 ("KVM: MMU: check guest CR3 reserved bits based on its physical address width.") Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: MMU: Expose the LA57 feature to VM.Yu Zhang2017-08-241-7/+9
| | | | | | | | | This patch exposes 5 level page table feature to the VM. At the same time, the canonical virtual address checking is extended to support both 48-bits and 57-bits address width. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: check guest CR3 reserved bits based on its physical address width.Yu Zhang2017-08-241-2/+12
| | | | | | | | | | | Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the reserved bits in CR3. Yet the length of reserved bits in guest CR3 should be based on the physical address width exposed to the VM. This patch changes CR3 check logic to calculate the reserved bits at runtime. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Add return value to kvm_cpuid().Yu Zhang2017-08-241-6/+6
| | | | | | | | | | Return false in kvm_cpuid() when it fails to find the cpuid entry. Also, this routine(and its caller) is optimized with a new argument - check_limit, so that the check_cpuid_limit() fall back can be avoided. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-07-061-46/+38
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "PPC: - Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending. ARM: - VCPU request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups s390: - initial machine check forwarding - migration support for the CMMA page hinting information - cleanups and fixes x86: - nested VMX bugfixes and improvements - more reliable NMI window detection on AMD - APIC timer optimizations Generic: - VCPU request overhaul + documentation of common code patterns - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) Update my email address kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 kvm: x86: mmu: allow A/D bits to be disabled in an mmu x86: kvm: mmu: make spte mmio mask more explicit x86: kvm: mmu: dead code thanks to access tracking KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code KVM: PPC: Book3S HV: Close race with testing for signals on guest entry KVM: PPC: Book3S HV: Simplify dynamic micro-threading code KVM: x86: remove ignored type attribute KVM: LAPIC: Fix lapic timer injection delay KVM: lapic: reorganize restart_apic_timer KVM: lapic: reorganize start_hv_timer kvm: nVMX: Check memory operand to INVVPID KVM: s390: Inject machine check into the nested guest KVM: s390: Inject machine check into the guest tools/kvm_stat: add new interactive command 'b' tools/kvm_stat: add new command line switch '-i' tools/kvm_stat: fix error on interactive command 'g' KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit ...
| * KVM: x86: remove ignored type attributeNick Desaulniers2017-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | The macro insn_fetch marks the 'type' argument as having a specified alignment. Type attributes can only be applied to structs, unions, or enums, but insn_fetch is only ever invoked with integral types, so Clang produces 19 -Wignored-attributes warnings for this source file. Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
OpenPOWER on IntegriCloud