summaryrefslogtreecommitdiffstats
path: root/arch/x86/include/asm/kvm_host.h
Commit message (Collapse)AuthorAgeFilesLines
* KVM: x86: Detect and Initialize AVIC supportSuravee Suthikulpanit2016-05-181-0/+4
| | | | | | | | | | | | | | | | | | This patch introduces AVIC-related data structure, and AVIC initialization code. There are three main data structures for AVIC: * Virtual APIC (vAPIC) backing page (per-VCPU) * Physical APIC ID table (per-VM) * Logical APIC ID table (per-VM) Currently, AVIC is disabled by default. Users can manually enable AVIC via kernel boot option kvm-amd.avic=1 or during kvm-amd module loading with parameter avic=1. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Avoid extra indentation (Boris). - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooksSuravee Suthikulpanit2016-05-181-2/+16
| | | | | | | | | Adding new function pointer in struct kvm_x86_ops, and calling them from the kvm_arch_vcpu[blocking/unblocking]. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Introducing kvm_x86_ops VM init/destroy hooksSuravee Suthikulpanit2016-05-181-0/+3
| | | | | | | | Adding function pointers in struct kvm_x86_ops for processor-specific layer to provide hooks for when KVM initialize and destroy VM. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: halt_polling: provide a way to qualify wakeups during pollChristian Borntraeger2016-05-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some wakeups should not be considered a sucessful poll. For example on s390 I/O interrupts are usually floating, which means that _ALL_ CPUs would be considered runnable - letting all vCPUs poll all the time for transactional like workload, even if one vCPU would be enough. This can result in huge CPU usage for large guests. This patch lets architectures provide a way to qualify wakeups if they should be considered a good/bad wakeups in regard to polls. For s390 the implementation will fence of halt polling for anything but known good, single vCPU events. The s390 implementation for floating interrupts does a wakeup for one vCPU, but the interrupt will be delivered by whatever CPU checks first for a pending interrupt. We prefer the woken up CPU by marking the poll of this CPU as "good" poll. This code will also mark several other wakeup reasons like IPI or expired timers as "good". This will of course also mark some events as not sucessful. As KVM on z runs always as a 2nd level hypervisor, we prefer to not poll, unless we are really sure, though. This patch successfully limits the CPU usage for cases like uperf 1byte transactional ping pong workload or wakeup heavy workload like OLTP while still providing a proper speedup. This also introduced a new vcpu stat "halt_poll_no_tuning" that marks wakeups that are considered not good for polling. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version) Cc: David Matlack <dmatlack@google.com> Cc: Wanpeng Li <kernellwp@gmail.com> [Rename config symbol. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: optimize steal time calculationLiang Chen2016-04-201-1/+0
| | | | | | | | | | | | | | Since accumulate_steal_time is now only called in record_steal_time, it doesn't quite make sense to put the delta calculation in a separate function. The function could be called thousands of times before guest enables the steal time MSR (though the compiler may optimize out this function call). And after it's enabled, the MSR enable bit is tested twice every time. Removing the accumulate_steal_time function also avoids the necessity of having the accum_steal field. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Signed-off-by: Gavin Guo <gavin.guo@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: reduce default value of halt_poll_ns parameterPaolo Bonzini2016-04-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Windows lets applications choose the frequency of the timer tick, and in Windows 10 the maximum rate was changed from 1024 Hz to 2048 Hz. Unfortunately, because of the way the Windows API works, most applications who need a higher rate than the default 64 Hz will just do timeGetDevCaps(&tc, sizeof(tc)); timeBeginPeriod(tc.wPeriodMin); and pick the maximum rate. This causes very high CPU usage when playing media or games on Windows 10, even if the guest does not actually use the CPU very much, because the frequent timer tick causes halt_poll_ns to kick in. There is no really good solution, especially because Microsoft could sooner or later bump the limit to 4096 Hz, but for now the best we can do is lower a bit the upper limit for halt_poll_ns. :-( Reported-by: Jon Panozzo <jonp@lime-technology.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM, pkeys: expose CPUID/CR4 to guestHuaitong Han2016-03-221-1/+2
| | | | | | | | | | | | | | X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation: CPUID.7.0.ECX[3]:PKU. X86_FEATURE_OSPKE is software support for pkeys, enumerated with CPUID.7.0.ECX[4]:OSPKE, and it reflects the setting of CR4.PKE(bit 22). This patch disables CPUID:PKU without ept, because pkeys is not yet implemented for shadow paging. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM, pkeys: add pkeys support for permission_faultHuaitong Han2016-03-221-0/+3
| | | | | | | | | | | | | | | | | Protection keys define a new 4-bit protection key field (PKEY) in bits 62:59 of leaf entries of the page tables, the PKEY is an index to PKRU register(16 domains), every domain has 2 bits(write disable bit, access disable bit). Static logic has been produced in update_pkru_bitmask, dynamic logic need read pkey from page table entries, get pkru value, and deduce the correct result. [ Huaitong: Xiao helps to modify many sections. ] Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM, pkeys: introduce pkru_mask to cache conditionsHuaitong Han2016-03-221-0/+8
| | | | | | | | | | | | | PKEYS defines a new status bit in the PFEC. PFEC.PK (bit 5), if some conditions is true, the fault is considered as a PKU violation. pkru_mask indicates if we need to check PKRU.ADi and PKRU.WDi, and does cache some conditions for permission_fault. [ Huaitong: Xiao helps to modify many sections. ] Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: remove eager_fpu field of struct kvm_vcpu_archPaolo Bonzini2016-03-091-1/+0
| | | | | | It is now equal to use_eager_fpu(), which simply tests a cpufeature bit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: simplify last_pte_bitmapPaolo Bonzini2016-03-081-6/+2
| | | | | | | | | | | | | | Branch-free code is fun and everybody knows how much Avi loves it, but last_pte_bitmap takes it a bit to the extreme. Since the code is simply doing a range check, like (level == 1 || ((gpte & PT_PAGE_SIZE_MASK) && level < N) we can make it branch-free without storing the entire truth table; it is enough to cache N. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: apply page track notifierXiao Guangrong2016-03-031-2/+3
| | | | | | | | | | | Register the notifier to receive write track event so that we can update our shadow page table It makes kvm_mmu_pte_write() be the callback of the notifier, no function is changed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: page track: add notifier supportXiao Guangrong2016-03-031-0/+1
| | | | | | | | | | | | | | | | | | Notifier list is introduced so that any node wants to receive the track event can register to the list Two APIs are introduced here: - kvm_page_track_register_notifier(): register the notifier to receive track event - kvm_page_track_unregister_notifier(): stop receiving track event by unregister the notifier The callback, node->track_write() is called when a write access on the write tracked page happens Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: clear write-flooding on the fast path of tracked pageXiao Guangrong2016-03-031-1/+1
| | | | | | | | | | | | | If the page fault is caused by write access on write tracked page, the real shadow page walking is skipped, we lost the chance to clear write flooding for the page structure current vcpu is using Fix it by locklessly waking shadow page table to clear write flooding on the shadow page structure out of mmu-lock. So that we change the count to atomic_t Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: page track: add the framework of guest page trackingXiao Guangrong2016-03-031-0/+10
| | | | | | | | | | | | | | | | | | | | | | The array, gfn_track[mode][gfn], is introduced in memory slot for every guest page, this is the tracking count for the gust page on different modes. If the page is tracked then the count is increased, the page is not tracked after the count reaches zero We use 'unsigned short' as the tracking count which should be enough as shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2 for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there is enough room for other trackers Two callbacks, kvm_page_track_create_memslot() and kvm_page_track_free_memslot() are implemented in this patch, they are internally used to initialize and reclaim the memory of the array Currently, only write track mode is supported Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowedXiao Guangrong2016-03-031-1/+1
| | | | | | | | | | | | | | | kvm_lpage_info->write_count is used to detect if the large page mapping for the gfn on the specified level is allowed, rename it to disallow_lpage to reflect its purpose, also we rename has_wrprotected_page() to mmu_gfn_lpage_is_disallowed() to make the code more clearer Later we will extend this mechanism for page tracking: if the gfn is tracked then large mapping for that gfn on any level is not allowed. The new name is more straightforward Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Use vector-hashing to deliver lowest-priority interruptsFeng Wu2016-02-091-0/+2
| | | | | | | | | | Use vector-hashing to deliver lowest-priority interrupts, As an example, modern Intel CPUs in server platform use this method to handle lowest-priority interrupts. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: move architecture-dependent requests to arch/Paolo Bonzini2016-01-081-0/+28
| | | | | | | | | Since the numbers now overlap, it makes sense to enumerate them in asm/kvm_host.h rather than linux/kvm_host.h. Functions that refer to architecture-specific requests are also moved to arch/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm/x86: Hyper-V SynIC timersAndrey Smetanin2015-12-161-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per Hyper-V specification (and as required by Hyper-V-aware guests), SynIC provides 4 per-vCPU timers. Each timer is programmed via a pair of MSRs, and signals expiration by delivering a special format message to the configured SynIC message slot and triggering the corresponding synthetic interrupt. Note: as implemented by this patch, all periodic timers are "lazy" (i.e. if the vCPU wasn't scheduled for more than the timer period the timer events are lost), regardless of the corresponding configuration MSR. If deemed necessary, the "catch up" mode (the timer period is shortened until the timer catches up) will be implemented later. Changes v2: * Use remainder to calculate periodic timer expiration time Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new structTakuya Yoshikawa2015-11-251-2/+6
| | | | | | | New struct kvm_rmap_head makes the code type-safe to some extent. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm/x86: Hyper-V kvm exitAndrey Smetanin2015-11-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | A new vcpu exit is introduced to notify the userspace of the changes in Hyper-V SynIC configuration triggered by guest writing to the corresponding MSRs. Changes v4: * exit into userspace only if guest writes into SynIC MSR's Changes v3: * added KVM_EXIT_HYPERV types and structs notes into docs Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm/x86: Hyper-V synthetic interrupt controllerAndrey Smetanin2015-11-251-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SynIC (synthetic interrupt controller) is a lapic extension, which is controlled via MSRs and maintains for each vCPU - 16 synthetic interrupt "lines" (SINT's); each can be configured to trigger a specific interrupt vector optionally with auto-EOI semantics - a message page in the guest memory with 16 256-byte per-SINT message slots - an event flag page in the guest memory with 16 2048-bit per-SINT event flag areas The host triggers a SINT whenever it delivers a new message to the corresponding slot or flips an event flag bit in the corresponding area. The guest informs the host that it can try delivering a message by explicitly asserting EOI in lapic or writing to End-Of-Message (EOM) MSR. The userspace (qemu) triggers interrupts and receives EOM notifications via irqfd with resampler; for that, a GSI is allocated for each configured SINT, and irq_routing api is extended to support GSI-SINT mapping. Changes v4: * added activation of SynIC by vcpu KVM_ENABLE_CAP * added per SynIC active flag * added deactivation of APICv upon SynIC activation Changes v3: * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into docs Changes v2: * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors * add Hyper-V SynIC vectors into EOI exit bitmap * Hyper-V SyniIC SINT msr write logic simplified Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm/x86: per-vcpu apicv deactivation supportAndrey Smetanin2015-11-251-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | The decision on whether to use hardware APIC virtualization used to be taken globally, based on the availability of the feature in the CPU and the value of a module parameter. However, under certain circumstances we want to control it on per-vcpu basis. In particular, when the userspace activates HyperV synthetic interrupt controller (SynIC), APICv has to be disabled as it's incompatible with SynIC auto-EOI behavior. To achieve that, introduce 'apicv_active' flag on struct kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv off. The flag is initialized based on the module parameter and CPU capability, and consulted whenever an APICv-specific action is performed. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm/x86: split ioapic-handled and EOI exit bitmapsAndrey Smetanin2015-11-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | The function to determine if the vector is handled by ioapic used to rely on the fact that only ioapic-handled vectors were set up to cause vmexits when virtual apic was in use. We're going to break this assumption when introducing Hyper-V synthetic interrupts: they may need to cause vmexits too. To achieve that, introduce a new bitmap dedicated specifically for ioapic-handled vectors, and populate EOI exit bitmap from it for now. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: rename update_db_bp_intercept to update_bp_interceptPaolo Bonzini2015-11-101-1/+1
| | | | | | | Because #DB is now intercepted unconditionally, this callback only operates on #BP for both VMX and SVM. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()Haozhong Zhang2015-11-101-0/+1
| | | | | | | | | Both VMX and SVM scales the host TSC in the same way in call-back read_l1_tsc(), so this patch moves the scaling logic from call-back read_l1_tsc() to a common function kvm_read_l1_tsc(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()Haozhong Zhang2015-11-101-12/+1
| | | | | | | | | | | For both VMX and SVM, if the 2nd argument of call-back adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale it first. This patch moves this common TSC scaling logic to its caller adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to adjust_tsc_offset_guest(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Replace call-back compute_tsc_offset() with a common functionHaozhong Zhang2015-11-101-1/+0
| | | | | | | | | Both VMX and SVM calculate the tsc-offset in the same way, so this patch removes the call-back compute_tsc_offset() and replaces it with a common function kvm_compute_tsc_offset(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Replace call-back set_tsc_khz() with a common functionHaozhong Zhang2015-11-101-1/+0
| | | | | | | | | Both VMX and SVM propagate virtual_tsc_khz in the same way, so this patch removes the call-back set_tsc_khz() and replaces it with a common function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Add a common TSC scaling functionHaozhong Zhang2015-11-101-0/+2
| | | | | | | | | | VMX and SVM calculate the TSC scaling ratio in a similar logic, so this patch generalizes it to a common TSC scaling function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> [Inline the multiplication and shift steps into mul_u64_u64_shr. Remove BUG_ON. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_archHaozhong Zhang2015-11-101-0/+1
| | | | | | | | This patch moves the field of TSC scaling ratio from the architecture struct vcpu_svm to the common struct kvm_vcpu_arch. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Collect information for setting TSC scaling ratioHaozhong Zhang2015-11-101-0/+4
| | | | | | | | | | The number of bits of the fractional part of the 64-bit TSC scaling ratio in VMX and SVM is different. This patch makes the architecture code to collect the number of fractional bits and other related information into variables that can be accessed in the common code. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: declare a few variables as __read_mostlyPaolo Bonzini2015-11-101-2/+0
| | | | | | | These include module parameters and variables that are set by kvm_x86_ops->hardware_setup. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'kvm-arm-for-4.4' of ↵Paolo Bonzini2015-11-041-0/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.4-rc1 Includes a number of fixes for the arch-timer, introducing proper level-triggered semantics for the arch-timers, a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups getting rid of redundant state, and finally a stylistic change that gets rid of some ctags warnings. Conflicts: arch/x86/include/asm/kvm_host.h
| * KVM: Add kvm_arch_vcpu_{un}blocking callbacksChristoffer Dall2015-10-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Some times it is useful for architecture implementations of KVM to know when the VCPU thread is about to block or when it comes back from blocking (arm/arm64 needs to know this to properly implement timers, for example). Therefore provide a generic architecture callback function in line with what we do elsewhere for KVM generic-arch interactions. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* | Merge branch 'kvm-master' into HEADPaolo Bonzini2015-10-131-4/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | This merge brings in a couple important SMM fixes, which makes it easier to test latest KVM with unrestricted_guest=0 and to test the in-progress work on SMM support in the firmware. Conflicts: arch/x86/kvm/x86.c
| * | KVM: x86: build kvm_userspace_memory_region in x86_set_memory_regionPaolo Bonzini2015-10-131-4/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | The next patch will make x86_set_memory_region fill the userspace_addr. Since the struct is not used untouched anymore, it makes sense to build it in x86_set_memory_region directly; it also simplifies the callers. Reported-by: Alexandre DERUMIER <aderumier@odiso.com> Cc: stable@vger.kernel.org Fixes: 9da0e4d5ac969909f6b435ce28ea28135a9cbd69 Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: Update Posted-Interrupts Descriptor when vCPU is blockedFeng Wu2015-10-011-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates the Posted-Interrupts Descriptor when vCPU is blocked. pre-block: - Add the vCPU to the blocked per-CPU list - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR post-block: - Remove the vCPU from the per-CPU list Signed-off-by: Feng Wu <feng.wu@intel.com> [Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: select IRQ_BYPASS_MANAGERFeng Wu2015-10-011-0/+1
| | | | | | | | | | | | | | | | Select IRQ_BYPASS_MANAGER for x86 when CONFIG_KVM is set Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: Update IRTE for posted-interruptsFeng Wu2015-10-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds the routine to update IRTE for posted-interrupts when guest changes the interrupt configuration. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> [Squashed in automatically generated patch from the build robot "KVM: x86: vcpu_to_pi_desc() can be static" - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: make kvm_set_msi_irq() publicFeng Wu2015-10-011-0/+4
| | | | | | | | | | | | | | | | | | Make kvm_set_msi_irq() public, we can use this function outside. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: Define a new interface kvm_intr_is_single_vcpu()Feng Wu2015-10-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch defines a new interface kvm_intr_is_single_vcpu(), which can returns whether the interrupt is for single-CPU or not. It is used by VT-d PI, since now we only support single-CPU interrupts, For lowest-priority interrupts, if user configures it via /proc/irq or uses irqbalance to make it single-CPU, we can use PI to deliver the interrupts to it. Full functionality of lowest-priority support will be added later. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | kvm/x86: Hyper-V HV_X64_MSR_VP_RUNTIME supportAndrey Smetanin2015-10-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HV_X64_MSR_VP_RUNTIME msr used by guest to get "the time the virtual processor consumes running guest code, and the time the associated logical processor spends running hypervisor code on behalf of that guest." Calculation of this time is performed by task_cputime_adjusted() for vcpu task. Necessary to support loading of winhv.sys in guest, which in turn is required to support Windows VMBus. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: Add support for local interrupt requests from userspaceSteve Rutherford2015-10-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to enable userspace PIC support, the userspace PIC needs to be able to inject local interrupts even when the APICs are in the kernel. KVM_INTERRUPT now supports sending local interrupts to an APIC when APICs are in the kernel. The ready_for_interrupt_request flag is now only set when the CPU/APIC will immediately accept and inject an interrupt (i.e. APIC has not masked the PIC). When the PIC wishes to initiate an INTA cycle with, say, CPU0, it kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives in userspace. When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is triggered, so that userspace has a chance to inject a PIC interrupt if it had been pending. Overall, this design can lead to a small number of spurious userspace renedezvous. In particular, whenever the PIC transistions from low to high while it is masked and whenever the PIC becomes unmasked while it is low. Note: this does not buffer more than one local interrupt in the kernel, so the VMM needs to enter the guest in order to complete interrupt injection before injecting an additional interrupt. Compiles for x86. Can pass the KVM Unit Tests. Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: Add EOI exit bitmap inferenceSteve Rutherford2015-10-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to support a userspace IOAPIC interacting with an in kernel APIC, the EOI exit bitmaps need to be configurable. If the IOAPIC is in userspace (i.e. the irqchip has been split), the EOI exit bitmaps will be set whenever the GSI Routes are configured. In particular, for the low MSI routes are reservable for userspace IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the destination vector of the route will be set for the destination VCPU. The intention is for the userspace IOAPICs to use the reservable MSI routes to inject interrupts into the guest. This is a slight abuse of the notion of an MSI Route, given that MSIs classically bypass the IOAPIC. It might be worthwhile to add an additional route type to improve clarity. Compile tested for Intel x86. Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: Add KVM exit for IOAPIC EOIsSteve Rutherford2015-10-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI level-triggered IOAPIC interrupts. Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs to be informed (which is identical to the EOI_EXIT_BITMAP field used by modern x86 processors, but can also be used to elide kvm IOAPIC EOI exits on older processors). [Note: A prototype using ResampleFDs found that decoupling the EOI from the VCPU's thread made it possible for the VCPU to not see a recent EOI after reentering the guest. This does not match real hardware.] Compile tested for Intel x86. Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: Split the APIC from the rest of IRQCHIP.Steve Rutherford2015-10-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First patch in a series which enables the relocation of the PIC/IOAPIC to userspace. Adds capability KVM_CAP_SPLIT_IRQCHIP; KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the rest of the irqchip. Compile tested for x86. Signed-off-by: Steve Rutherford <srutherford@google.com> Suggested-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: replace vm_has_apicv hook with cpu_uses_apicvPaolo Bonzini2015-10-011-1/+1
| | | | | | | | | | | | This will avoid an unnecessary trip to ->kvm and from there to the VPIC. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: store IOAPIC-handled vectors in each VCPUPaolo Bonzini2015-10-011-1/+2
|/ | | | | | | | | | | | | | | We can reuse the algorithm that computes the EOI exit bitmap to figure out which vectors are handled by the IOAPIC. The only difference between the two is for edge-triggered interrupts other than IRQ8 that have no notifiers active; however, the IOAPIC does not have to do anything special for these interrupts anyway. This again limits the interactions between the IOAPIC and the LAPIC, making it easier to move the former to userspace. Inspired by a patch from Steve Rutherford. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: disable halt_poll_ns as default for s390xDavid Hildenbrand2015-09-251-0/+1
| | | | | | | | | | | | We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
OpenPOWER on IntegriCloud