summaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
* KVM: nVMX: Detect shadow-vmcs capabilityAbel Gordon2013-04-221-1/+24
| | | | | | | | | | Add logic required to detect if shadow-vmcs is supported by the processor. Introduce a new kernel module parameter to specify if L0 should use shadow vmcs (or not) to run L1. Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: nVMX: Shadow-vmcs control fields/bitsAbel Gordon2013-04-222-0/+5
| | | | | | | | | Add definitions for all the vmcs control fields/bits required to enable vmcs-shadowing Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: x86: Fix posted interrupt with CONFIG_SMP=nZhang, Yang Z2013-04-171-0/+2
| | | | | | | ->send_IPI_mask is not defined on UP. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Fix check guest state validity if a guest is in VM86 modeGleb Natapov2013-04-161-1/+1
| | | | | | | | If guest vcpu is in VM86 mode the vcpu state should be checked as if in real mode. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: nVMX: check vmcs12 for valid activity statePaolo Bonzini2013-04-161-2/+5
| | | | | | | | | | | | | | | | KVM does not use the activity state VMCS field, and does not support it in nested VMX either (the corresponding bits in the misc VMX feature MSR are zero). Fail entry if the activity state is set to anything but "active". Since the value will always be the same for L1 and L2, we do not need to read and write the corresponding VMCS field on L1/L2 transitions, either. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Use posted interrupt to deliver virtual interruptYang Zhang2013-04-163-12/+21
| | | | | | | | | If posted interrupt is avaliable, then uses it to inject virtual interrupt to guest. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Add the deliver posted interrupt algorithmYang Zhang2013-04-165-1/+85
| | | | | | | | | Only deliver the posted interrupt when target vcpu is running and there is no previous interrupt pending in pir. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Set TMR when programming ioapic entryYang Zhang2013-04-163-7/+14
| | | | | | | | | | We already know the trigger mode of a given interrupt when programming the ioapice entry. So it's not necessary to set it in each interrupt delivery. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Call common update function when ioapic entry changed.Yang Zhang2013-04-163-5/+11
| | | | | | | | | | Both TMR and EOI exit bitmap need to be updated when ioapic changed or vcpu's id/ldr/dfr changed. So use common function instead eoi exit bitmap specific function. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Check the posted interrupt capabilityYang Zhang2013-04-162-20/+66
| | | | | | | | Detect the posted interrupt feature. If it exists, then set it in vmcs_config. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Register a new IPI for posted interruptYang Zhang2013-04-167-0/+44
| | | | | | | | | | | | | | | Posted Interrupt feature requires a special IPI to deliver posted interrupt to guest. And it should has a high priority so the interrupt will not be blocked by others. Normally, the posted interrupt will be consumed by vcpu if target vcpu is running and transparent to OS. But in some cases, the interrupt will arrive when target vcpu is scheduled out. And host will see it. So we need to register a dump handler to handle it. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Enable acknowledge interupt on vmexitYang Zhang2013-04-164-5/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | The "acknowledge interrupt on exit" feature controls processor behavior for external interrupt acknowledgement. When this control is set, the processor acknowledges the interrupt controller to acquire the interrupt vector on VM exit. After enabling this feature, an interrupt which arrived when target cpu is running in vmx non-root mode will be handled by vmx handler instead of handler in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt table to let real handler to handle it. Further, we will recognize the interrupt and only delivery the interrupt which not belong to current vcpu through idt table. The interrupt which belonged to current vcpu will be handled inside vmx handler. This will reduce the interrupt handle cost of KVM. Also, interrupt enable logic is changed if this feature is turnning on: Before this patch, hypervior call local_irq_enable() to enable it directly. Now IF bit is set on interrupt stack frame, and will be enabled on a return from interrupt handler if exterrupt interrupt exists. If no external interrupt, still call local_irq_enable() to enable it. Refer to Intel SDM volum 3, chapter 33.2. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Let ioapic know the irq line statusYang Zhang2013-04-152-4/+6
| | | | | | | | | Userspace may deliver RTC interrupt without query the status. So we want to track RTC EOI for this case. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Add reset/restore rtc_status supportYang Zhang2013-04-152-0/+11
| | | | | | | | restore rtc_status from migration or save/restore Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Return destination vcpu on interrupt injectionYang Zhang2013-04-152-11/+19
| | | | | | | | Add a new parameter to know vcpus who received the interrupt. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Add vcpu info to ioapic_update_eoi()Yang Zhang2013-04-151-1/+1
| | | | | | | | | Add vcpu info to ioapic_update_eoi, so we can know which vcpu issued this EOI. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: nVMX: Avoid reading VM_EXIT_INTR_ERROR_CODE needlessly on nested exitsJan Kiszka2013-04-141-1/+5
| | | | | | | | We only need to update vm_exit_intr_error_code if there is a valid exit interruption information and it comes with a valid error code. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: nVMX: Fix conditions for interrupt injectionJan Kiszka2013-04-141-8/+12
| | | | | | | | | | | | | | | | | | If we are entering guest mode, we do not want L0 to interrupt this vmentry with all its side effects on the vmcs. Therefore, injection shall be disallowed during L1->L2 transitions, as in the previous version. However, this check is conceptually independent of nested_exit_on_intr, so decouple it. If L1 traps external interrupts, we can kick the guest from L2 to L1, also just like the previous code worked. But we no longer need to consider L1's idt_vectoring_info_field. It will always be empty at this point. Instead, if L2 has pending events, those are now found in the architectural queues and will, thus, prevent vmx_interrupt_allowed from being called at all. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: nVMX: Rework event injection and recoveryJan Kiszka2013-04-141-35/+64
| | | | | | | | | | | | | | | | | | | | | The basic idea is to always transfer the pending event injection on vmexit into the architectural state of the VCPU and then drop it from there if it turns out that we left L2 to enter L1, i.e. if we enter prepare_vmcs12. vmcs12_save_pending_events takes care to transfer pending L0 events into the queue of L1. That is mandatory as L1 may decide to switch the guest state completely, invalidating or preserving the pending events for later injection (including on a different node, once we support migration). This concept is based on the rule that a pending vmlaunch/vmresume is not canceled. Otherwise, we would risk to lose injected events or leak them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the entry of nested_vmx_vmexit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1Jan Kiszka2013-04-141-7/+2
| | | | | | | | | | Check if the interrupt or NMI window exit is for L1 by testing if it has the corresponding controls enabled. This is required when we allow direct injection from L0 to L2 Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: emulator: mark 0xff 0x7d opcode as undefined.Gleb Natapov2013-04-141-1/+1
| | | | Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: emulator: Do not fail on emulation of undefined opcodeGleb Natapov2013-04-141-2/+3
| | | | | | | | Emulation of undefined opcode should inject #UD instead of causing emulation failure. Do that by moving Undefined flag check to emulation stage and injection #UD there. Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: VMX: do not try to reexecute failed instruction while emulating invalid ↵Gleb Natapov2013-04-143-5/+11
| | | | | | | | | | | guest state During invalid guest state emulation vcpu cannot enter guest mode to try to reexecute instruction that emulator failed to emulate, so emulation will happen again and again. Prevent that by telling the emulator that instruction reexecution should not be attempted. Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: emulator: fix unimplemented instruction detectionGleb Natapov2013-04-141-3/+4
| | | | | | | | | Unimplemented instruction detection is broken for group instructions since it relies on "flags" field of opcode to be zero, but all instructions in a group inherit flags from a group encoding. Fix that by having a separate flag for unimplemented instructions. Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: x86 emulator: Fix segment loading in VM86Kevin Wolf2013-04-111-3/+12
| | | | | | | | | | | | | | | | | | | | | | | This fixes a regression introduced in commit 03ebebeb1 ("KVM: x86 emulator: Leave segment limit and attributs alone in real mode"). The mentioned commit changed the segment descriptors for both real mode and VM86 to only update the segment base instead of creating a completely new descriptor with limit 0xffff so that unreal mode keeps working across a segment register reload. This leads to an invalid segment descriptor in the eyes of VMX, which seems to be okay for real mode because KVM will fix it up before the next VM entry or emulate the state, but it doesn't do this if the guest is in VM86, so we end up with: KVM: entry failed, hardware error 0x80000021 Fix this by effectively reverting commit 03ebebeb1 for VM86 and leaving it only in place for real mode, which is where it's really needed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: Move kvm_rebooting declaration out of x86Geoff Levand2013-04-081-1/+0
| | | | | | | | | | | | | The variable kvm_rebooting is a common kvm variable, so move its declaration from arch/x86/include/asm/kvm_host.h to include/asm/kvm_host.h. Fixes this sparse warning when building on arm64: virt/kvm/kvm_main.c:warning: symbol 'kvm_rebooting' was not declared. Should it be static? Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: Move kvm_spurious_fault to x86.cGeoff Levand2013-04-081-0/+7
| | | | | | | | | | | | The routine kvm_spurious_fault() is an x86 specific routine, so move it from virt/kvm/kvm_main.c to arch/x86/kvm/x86.c. Fixes this sparse warning when building on arm64: virt/kvm/kvm_main.c:warning: symbol 'kvm_spurious_fault' was not declared. Should it be static? Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: Move vm_list kvm_lock declarations out of x86Geoff Levand2013-04-081-3/+0
| | | | | | | | | | | | | | The variables vm_list and kvm_lock are common to all architectures, so move the declarations from arch/x86/include/asm/kvm_host.h to include/linux/kvm_host.h. Fixes sparse warnings like these when building for arm64: virt/kvm/kvm_main.c: warning: symbol 'kvm_lock' was not declared. Should it be static? virt/kvm/kvm_main.c: warning: symbol 'vm_list' was not declared. Should it be static? Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: VMX: Add missing braces to avoid redundant error checkJan Kiszka2013-04-081-1/+2
| | | | | | | | | The code was already properly aligned, now also add the braces to avoid that err is checked even if alloc_apic_access_page didn't run and change it. Found via Coccinelle by Fengguang Wu. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: x86: fix memory leak in vmx_initYang Zhang2013-04-081-1/+3
| | | | | | | | Free vmx_msr_bitmap_longmode_x2apic and vmx_msr_bitmap_longmode if kvm_init() fails. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: nVMX: Check exit control for VM_EXIT_SAVE_IA32_PAT, not entry controlsJan Kiszka2013-04-071-1/+1
| | | | | | | | Obviously a copy&paste mistake: prepare_vmcs12 has to check L1's exit controls for VM_EXIT_SAVE_IA32_PAT. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: Call kvm_apic_match_dest() to check destination vcpuYang Zhang2013-04-072-51/+0
| | | | | | | | | For a given vcpu, kvm_apic_match_dest() will tell you whether the vcpu in the destination list quickly. Drop kvm_calculate_eoi_exitmap() and use kvm_apic_match_dest() instead. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* Revert "KVM: MMU: Move kvm_mmu_free_some_pages() into kvm_mmu_alloc_page()"Takuya Yoshikawa2013-04-072-4/+8
| | | | | | | | | | | | With the following commit, shadow pages can be zapped at random during a shadow page talbe walk: KVM: MMU: Move kvm_mmu_free_some_pages() into kvm_mmu_alloc_page() 7ddca7e43c8f28f9419da81a0e7730b66aa60fe9 This patch reverts it and fixes __direct_map() and FNAME(fetch)(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* pmu: prepare for migration supportPaolo Bonzini2013-04-023-6/+14
| | | | | | | | | | | | | | | In order to migrate the PMU state correctly, we need to restore the values of MSR_CORE_PERF_GLOBAL_STATUS (a read-only register) and MSR_CORE_PERF_GLOBAL_OVF_CTRL (which has side effects when written). We also need to write the full 40-bit value of the performance counter, which would only be possible with a v3 architectural PMU's full-width counter MSRs. To distinguish host-initiated writes from the guest's, pass the full struct msr_data to kvm_pmu_set_msr. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: MMU: Rename kvm_mmu_free_some_pages() to make_mmu_pages_available()Takuya Yoshikawa2013-03-212-8/+7
| | | | | | | | | | | | | | The current name "kvm_mmu_free_some_pages" should be used for something that actually frees some shadow pages, as we expect from the name, but what the function is doing is to make some, KVM_MIN_FREE_MMU_PAGES, shadow pages available: it does nothing when there are enough. This patch changes the name to reflect this meaning better; while doing this renaming, the code in the wrapper function is inlined into the main body since the whole function will be inlined into the only caller now. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Move kvm_mmu_free_some_pages() into kvm_mmu_alloc_page()Takuya Yoshikawa2013-03-212-7/+3
| | | | | | | | | | | | | | | | | | | | | What this function is doing is to ensure that the number of shadow pages does not exceed the maximum limit stored in n_max_mmu_pages: so this is placed at every code path that can reach kvm_mmu_alloc_page(). Although it might have some sense to spread this function in each such code path when it could be called before taking mmu_lock, the rule was changed not to do so. Taking this background into account, this patch moves it into kvm_mmu_alloc_page() and simplifies the code. Note: the unlikely hint in kvm_mmu_free_some_pages() guarantees that the overhead of this function is almost zero except when we actually need to allocate some shadow pages, so we do not need to care about calling it multiple times in one path by doing kvm_mmu_get_page() a few times. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* Merge remote-tracking branch 'upstream/master' into queueMarcelo Tosatti2013-03-2110-45/+82
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: From: Alexander Graf <agraf@suse.de> "Just recently this really important patch got pulled into Linus' tree for 3.9: commit 1674400aaee5b466c595a8fc310488263ce888c7 Author: Anton Blanchard <anton <at> samba.org> Date: Tue Mar 12 01:51:51 2013 +0000 Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue. Could you please merge kvm/next against linus/master, so that I can base my trees against that?" * upstream/master: (653 commits) PCI: Use ROM images from firmware only if no other ROM source available sparc: remove unused "config BITS" sparc: delete "if !ULTRA_HAS_POPULATION_COUNT" KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798) KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED inet: limit length of fragment queue hash table bucket lists qeth: Fix scatter-gather regression qeth: Fix invalid router settings handling qeth: delay feature trace sgy-cts1000: Remove __dev* attributes KVM: x86: fix deadlock in clock-in-progress request handling KVM: allow host header to be included even for !CONFIG_KVM hwmon: (lm75) Fix tcn75 prefix hwmon: (lm75.h) Update header inclusion MAINTAINERS: Remove Mark M. Hoffman xfs: ensure we capture IO errors correctly xfs: fix xfs_iomap_eof_prealloc_initial_size type ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-03-192-35/+33
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Marcelo Tosatti. * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798) KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) KVM: x86: fix deadlock in clock-in-progress request handling KVM: allow host header to be included even for !CONFIG_KVM
| | * KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions ↵Andy Honig2013-03-192-29/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (CVE-2013-1797) There is a potential use after free issue with the handling of MSR_KVM_SYSTEM_TIME. If the guest specifies a GPA in a movable or removable memory such as frame buffers then KVM might continue to write to that address even after it's removed via KVM_SET_USER_MEMORY_REGION. KVM pins the page in memory so it's unlikely to cause an issue, but if the user space component re-purposes the memory previously used for the guest, then the guest will be able to corrupt that memory. Tested: Tested against kvmclock unit test Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME ↵Andy Honig2013-03-191-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (CVE-2013-1796) If the guest sets the GPA of the time_page so that the request to update the time straddles a page then KVM will write onto an incorrect page. The write is done byusing kmap atomic to get a pointer to the page for the time structure and then performing a memcpy to that page starting at an offset that the guest controls. Well behaved guests always provide a 32-byte aligned address, however a malicious guest could use this to corrupt host kernel memory. Tested: Tested against kvmclock unit test. Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * KVM: x86: fix deadlock in clock-in-progress request handlingMarcelo Tosatti2013-03-181-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a deadlock in pvclock handling: cpu0: cpu1: kvm_gen_update_masterclock() kvm_guest_time_update() spin_lock(pvclock_gtod_sync_lock) local_irq_save(flags) spin_lock(pvclock_gtod_sync_lock) kvm_make_mclock_inprogress_request(kvm) make_all_cpus_request() smp_call_function_many() Now if smp_call_function_many() called by cpu0 tries to call function on cpu1 there will be a deadlock. Fix by moving pvclock_gtod_sync_lock protected section outside irq disabled section. Analyzed by Gleb Natapov <gleb@redhat.com> Acked-by: Gleb Natapov <gleb@redhat.com> Reported-and-Tested-by: Yongjie Ren <yongjie.ren@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | perf,x86: fix wrmsr_on_cpu() warning on suspend/resumeLinus Torvalds2013-03-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after suspend/resume") fixed a crash when doing PEBS performance profiling after resuming, but in using init_debug_store_on_cpu() to restore the DS_AREA mtrr it also resulted in a new WARN_ON() triggering. init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU cross-calls to do the MSR update. Which is not really valid at the early resume stage, and the warning is quite reasonable. Now, it all happens to _work_, for the simple reason that smp_call_function_single() ends up just doing the call directly on the CPU when the CPU number matches, but we really should just do the wrmsr() directly instead. This duplicates the wrmsr() logic, but hopefully we can just remove the wrmsr_on_cpu() version eventually. Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | perf,x86: fix kernel crash with PEBS/BTS after suspend/resumeStephane Eranian2013-03-152-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a kernel crash when using precise sampling (PEBS) after a suspend/resume. Turns out the CPU notifier code is not invoked on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly by the kernel and keeps it power-on/resume value of 0 causing any PEBS measurement to crash when running on CPU0. The workaround is to add a hook in the actual resume code to restore the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0, the DS_AREA will be restored twice but this is harmless. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | Select VIRT_TO_BUS directly where neededStephen Rothwell2013-03-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 887cbce0adea ("arch Kconfig: centralise ARCH_NO_VIRT_TO_BUS") I introduced the config sybmol HAVE_VIRT_TO_BUS and selected that where needed. I am not sure what I was thinking. Instead, just directly select VIRT_TO_BUS where it is needed. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | x86: Do not try to sync identity map for non-mapped pagesDave Hansen2013-03-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel_map_sync_memtype() is called from a variety of contexts. The pat.c code that calls it seems to ensure that it is not called for non-ram areas by checking via pat_pagerange_is_ram(). It is important that it only be called on the actual identity map because there *IS* no map to sync for highmem pages, or for memory holes. The ioremap.c uses are not as careful as those from pat.c, and call kernel_map_sync_memtype() on PCI space which is in the middle of the kernel identity map _range_, but is not actually mapped. This patch adds a check to kernel_map_sync_memtype() which probably duplicates some of the checks already in pat.c. But, it is necessary for the ioremap.c uses and shouldn't hurt other callers. I have reproduced this bug and this patch fixes it for me and the original bug reporter: https://lkml.org/lkml/2013/2/5/396 Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130307163151.D9B58C4E@kernel.stglabs.ibm.com Signed-off-by: Dave Hansen <dave@sr71.net> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, doc: Be explicit about what the x86 struct boot_params requiresPeter Jones2013-03-061-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the sentinel triggers, we do not want the boot loader authors to just poke it and make the error go away, we want them to actually fix the problem. This should help avoid making the incorrect change in non-compliant bootloaders. [ hpa: dropped the Documentation/x86/boot.txt hunk pending clarifications ] Signed-off-by: Peter Jones <pjones@redhat.com> Link: http://lkml.kernel.org/r/1362592823-28967-1-git-send-email-pjones@redhat.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86: Don't clear efi_info even if the sentinel hitsJosh Boyer2013-03-061-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When boot_params->sentinel is set, all we really know is that some undefined set of fields in struct boot_params contain garbage. In the particular case of efi_info, however, there is a private magic for that substructure, so it is generally safe to leave it even if the bootloader is broken. kexec (for which we did the initial analysis) did not initialize this field, but of course all the EFI bootloaders do, and most EFI bootloaders are broken in this respect (and should be fixed.) Reported-by: Robin Holt <holt@sgi.com> Link: http://lkml.kernel.org/r/CA%2B5PVA51-FT14p4CRYKbicykugVb=PiaEycdQ57CK2km_OQuRQ@mail.gmail.com Tested-by: Josh Boyer <jwboyer@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, mm: Make sure to find a 2M free block for the first mapped areaYinghai Lu2013-03-061-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Henrik reported that his MacAir 3.1 would not boot with | commit 8d57470d8f859635deffe3919d7d4867b488b85a | Date: Fri Nov 16 19:38:58 2012 -0800 | | x86, mm: setup page table in top-down It turns out that we do not calculate the real_end properly: We try to get 2M size with 4K alignment, and later will round down to 2M, so we will get less then 2M for first mapping, in extreme case could be only 4K only. In Henrik's system it has (1M-32K) as last usable rage is [mem 0x7f9db000-0x7fef8fff]. The problem is exposed when EFI booting have several holes and it will force mapping to use PTE instead as we only map usable areas. To fix it, just make it be 2M aligned, so we can be guaranteed to be able to use large pages to map it. Reported-by: Henrik Rydberg <rydberg@euromail.se> Bisected-by: Henrik Rydberg <rydberg@euromail.se> Tested-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/CAE9FiQX4nQ7_1kg5RL_vh56rmcSHXUi1ExrZX7CwED4NGMnHfg@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86: Fix 32-bit *_cpu_data initializersKrzysztof Mazur2013-03-061-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 27be457000211a6903968dfce06d5f73f051a217 ('x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag') removed the hlt_works_ok flag from struct cpuinfo_x86, but boot_cpu_data and new_cpu_data initializers were not changed causing setting f00f_bug flag, instead of fdiv_bug. If CONFIG_X86_F00F_BUG is not set the f00f_bug flag is never cleared. To avoid such problems in future C99-style initialization is now used. Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net> Acked-by: Borislav Petkov <bp@suse.de> Cc: len.brown@intel.com Link: http://lkml.kernel.org/r/1362266082-2227-1-git-send-email-krzysiek@podlesie.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, smpboot: Remove unused variableBorislav Petkov2013-03-051-2/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | The cpuinfo_x86 ptr is unused now. Drop it. Got obsolete by 69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param") removing its only user. [ hpa: fixes gcc warning ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1362428180-8865-2-git-send-email-bp@alien8.de Cc: Len Brown <len.brown@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
OpenPOWER on IntegriCloud