summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'stable/for-linus-3.9-rc0-tag' of ↵Linus Torvalds2013-02-2423-85/+1331
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "This has two new ACPI drivers for Xen - a physical CPU offline/online and a memory hotplug. The way this works is that ACPI kicks the drivers and they make the appropiate hypercall to the hypervisor to tell it that there is a new CPU or memory. There also some changes to the Xen ARM ABIs and couple of fixes. One particularly nasty bug in the Xen PV spinlock code was fixed by Stefan Bader - and has been there since the 2.6.32! Features: - Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor to be aware of new CPU and new DIMMs - Cleanups Bug-fixes: - Fixes a long-standing bug in the PV spinlock wherein we did not kick VCPUs that were in a tight loop. - Fixes in the error paths for the event channel machinery" Fix up a few semantic conflicts with the ACPI interface changes in drivers/xen/xen-acpi-{cpu,mem}hotplug.c. * tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: event channel arrays are xen_ulong_t and not unsigned long xen: Send spinlock IPI to all waiters xen: introduce xen_remap, use it instead of ioremap xen: close evtchn port if binding to irq fails xen-evtchn: correct comment and error output xen/tmem: Add missing %s in the printk statement. xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0 xen/acpi: ACPI cpu hotplug xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.h xen/stub: driver for CPU hotplug xen/acpi: ACPI memory hotplug xen/stub: driver for memory hotplug xen: implement updated XENMEM_add_to_physmap_range ABI xen/smp: Move the common CPU init code a bit to prep for PVH patch.
| * xen: event channel arrays are xen_ulong_t and not unsigned longIan Campbell2013-02-204-52/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ARM we want these to be the same size on 32- and 64-bit. This is an ABI change on ARM. X86 does not change. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Keir (Xen.org) <keir@xen.org> Cc: Tim Deegan <tim@xen.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: linux-arm-kernel@lists.infradead.org Cc: xen-devel@lists.xen.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen: Send spinlock IPI to all waitersStefan Bader2013-02-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a loophole between Xen's current implementation of pv-spinlocks and the scheduler. This was triggerable through a testcase until v3.6 changed the TLB flushing code. The problem potentially is still there just not observable in the same way. What could happen was (is): 1. CPU n tries to schedule task x away and goes into a slow wait for the runq lock of CPU n-# (must be one with a lower number). 2. CPU n-#, while processing softirqs, tries to balance domains and goes into a slow wait for its own runq lock (for updating some records). Since this is a spin_lock_irqsave in softirq context, interrupts will be re-enabled for the duration of the poll_irq hypercall used by Xen. 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives an interrupt (e.g. endio) and when processing the interrupt, tries to wake up task x. But that is in schedule and still on_cpu, so try_to_wake_up goes into a tight loop. 4. The runq lock of CPU n-# gets unlocked, but the message only gets sent to the first waiter, which is CPU n-# and that is busily stuck. 5. CPU n-# never returns from the nested interruption to take and release the lock because the scheduler uses a busy wait. And CPU n never finishes the task migration because the unlock notification only went to CPU n-#. To avoid this and since the unlocking code has no real sense of which waiter is best suited to grab the lock, just send the IPI to all of them. This causes the waiters to return from the hyper- call (those not interrupted at least) and do active spinlocking. BugLink: http://bugs.launchpad.net/bugs/1011792 Acked-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen: introduce xen_remap, use it instead of ioremapStefano Stabellini2013-02-195-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | ioremap can't be used to map ring pages on ARM because it uses device memory caching attributes (MT_DEVICE*). Introduce a Xen specific abstraction to map ring pages, called xen_remap, that is defined as ioremap on x86 (no behavioral changes). On ARM it explicitly calls __arm_ioremap with the right caching attributes: MT_MEMORY. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen: close evtchn port if binding to irq failsWei Liu2013-02-191-0/+10
| | | | | | | | | | | | CC: stable@vger.kernel.org Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-evtchn: correct comment and error outputWei Liu2013-02-191-2/+2
| | | | | | | | | | | | | | | | The evtchn device has been moved to /dev/xen. Also change log level to KERN_ERR as other xen drivers. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/tmem: Add missing %s in the printk statement.Konrad Rzeszutek Wilk2013-02-191-1/+1
| | | | | | | | | | | | Seems that it got lost. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0Liu Jinsong2013-02-191-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | To avoid compile issue and it's meanigfull only under CONFIG_XEN_DOM0. In file included from linux/arch/x86/xen/enlighten.c:47:0: linux/include/xen/acpi.h:75:76: error: unknown type name ‘acpi_handle’ make[3]: *** [arch/x86/xen/enlighten.o] Error 1 Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> [v1: Fixed spelling mistakes] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/acpi: ACPI cpu hotplugLiu Jinsong2013-02-196-0/+530
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implement real Xen ACPI cpu hotplug driver as module. When loaded, it replaces Xen stub driver. For booting existed cpus, the driver enumerates them. For hotadded cpus, which added at runtime and notify OS via device or container event, the driver is invoked to add them, parsing cpu information, hypercalling to Xen hypervisor to add them, and finally setting up new /sys interface for them. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.hLiu Jinsong2013-02-192-18/+18
| | | | | | | | | | | | | | So that it could be reused by Xen CPU hotplug logic. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/stub: driver for CPU hotplugLiu Jinsong2013-02-192-2/+44
| | | | | | | | | | | | | | | | Add Xen stub driver for CPU hotplug, early occupy to block native, will be replaced later by real Xen processor driver module. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/acpi: ACPI memory hotplugLiu Jinsong2013-02-194-4/+522
| | | | | | | | | | | | | | | | | | | | | | | | This patch implements real Xen acpi memory hotplug driver as module. When loaded, it replaces Xen stub driver. When an acpi memory device hotadd event occurs, it notifies OS and invokes notification callback, adding related memory device and parsing memory information, finally hypercall to xen hypervisor to add memory. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/stub: driver for memory hotplugLiu Jinsong2013-02-194-0/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch create a file (xen-stub.c) for Xen stub drivers. Xen stub drivers are used to reserve space for Xen drivers, i.e. memory hotplug and cpu hotplug, and to block native drivers loaded, so that real Xen drivers can be modular and loaded on demand. This patch is specific for Xen memory hotplug (other Xen logic can add stub drivers on their own). The xen stub driver will occupied earlier via subsys_initcall (than native memory hotplug driver via module_init and so blocking native). Later real Xen memory hotplug logic will unregister the stub driver and register itself to take effect on demand. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen: implement updated XENMEM_add_to_physmap_range ABIIan Campbell2013-02-192-3/+11
| | | | | | | | | | | | | | | | | | | | Allows for more fine grained error reporting. Only used by PVH and ARM both of which are marked EXPERIMENTAL precisely because the ABI is not yet stable Signed-off-by: Ian Campbell <ian.campbell@citrix.com> [v1: Rebased without PVH patches] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/smp: Move the common CPU init code a bit to prep for PVH patch.Konrad Rzeszutek Wilk2013-02-191-19/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | The PV and PVH code CPU init code share some functionality. The PVH code ("xen/pvh: Extend vcpu_guest_context, p2m, event, and XenBus") sets some of these up, but not all. To make it easier to read, this patch removes the PV specific out of the generic way. No functional change - just code movement. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v2: Fixed compile errors noticed by Fengguang Wu build system] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-02-2465-1671/+4359
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Marcelo Tosatti: "KVM updates for the 3.9 merge window, including x86 real mode emulation fixes, stronger memory slot interface restrictions, mmu_lock spinlock hold time reduction, improved handling of large page faults on shadow, initial APICv HW acceleration support, s390 channel IO based virtio, amongst others" * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) Revert "KVM: MMU: lazily drop large spte" x86: pvclock kvm: align allocation size to page size KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr x86 emulator: fix parity calculation for AAD instruction KVM: PPC: BookE: Handle alignment interrupts booke: Added DBCR4 SPR number KVM: PPC: booke: Allow multiple exception types KVM: PPC: booke: use vcpu reference from thread_struct KVM: Remove user_alloc from struct kvm_memory_slot KVM: VMX: disable apicv by default KVM: s390: Fix handling of iscs. KVM: MMU: cleanup __direct_map KVM: MMU: remove pt_access in mmu_set_spte KVM: MMU: cleanup mapping-level KVM: MMU: lazily drop large spte KVM: VMX: cleanup vmx_set_cr0(). KVM: VMX: add missing exit names to VMX_EXIT_REASONS array KVM: VMX: disable SMEP feature when guest is in non-paging mode KVM: Remove duplicate text in api.txt Revert "KVM: MMU: split kvm_mmu_free_page" ...
| * | Revert "KVM: MMU: lazily drop large spte"Marcelo Tosatti2013-02-201-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit caf6900f2d8aaebe404c976753f6813ccd31d95e. It is causing migration failures, reference https://bugzilla.kernel.org/show_bug.cgi?id=54061. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | x86: pvclock kvm: align allocation size to page sizeMarcelo Tosatti2013-02-181-5/+6
| | | | | | | | | | | | | | | | | | | | | To match whats mapped via vsyscalls to userspace. Reported-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | Merge commit 'origin/next' into kvm-ppc-nextAlexander Graf2013-02-158-87/+66
| |\ \
| | * | KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msrJan Kiszka2013-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We already pass vmcs12 as argument. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | x86 emulator: fix parity calculation for AAD instructionGleb Natapov2013-02-131-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Reported-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | KVM: Remove user_alloc from struct kvm_memory_slotTakuya Yoshikawa2013-02-113-23/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This field was needed to differentiate memory slots created by the new API, KVM_SET_USER_MEMORY_REGION, from those by the old equivalent, KVM_SET_MEMORY_REGION, whose support was dropped long before: commit b74a07beed0e64bfba413dcb70dd6749c57f43dc KVM: Remove kernel-allocated memory regions Although we also have private memory slots to which KVM allocates memory with vm_mmap(), !user_alloc slots in other words, the slot id should be enough for differentiating them. Note: corresponding function parameters will be removed later. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | KVM: VMX: disable apicv by defaultYang Zhang2013-02-111-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without Posted Interrupt, current code is broken. Just disable by default until Posted Interrupt is ready. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | KVM: s390: Fix handling of iscs.Cornelia Huck2013-02-111-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two ways to express an interruption subclass: - As a bitmask, as used in cr6. - As a number, as used in the I/O interruption word. Unfortunately, we have treated the I/O interruption word as if it contained the bitmask as well, which went unnoticed so far as - (not-yet-released) qemu made the same mistake, and - Linux guest kernels don't check the isc value in the I/O interruption word for subchannel interrupts. Make sure that we treat the I/O interruption word correctly. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | KVM: MMU: cleanup __direct_mapXiao Guangrong2013-02-061-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use link_shadow_page to link the sp to the spte in __direct_map Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: remove pt_access in mmu_set_spteXiao Guangrong2013-02-062-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is only used in debug code, so drop it Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: cleanup mapping-levelXiao Guangrong2013-02-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use min() to cleanup mapping_level Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: lazily drop large spteXiao Guangrong2013-02-061-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, kvm zaps the large spte if write-protected is needed, the later read can fault on that spte. Actually, we can make the large spte readonly instead of making them not present, the page fault caused by read access can be avoided The idea is from Avi: | As I mentioned before, write-protecting a large spte is a good idea, | since it moves some work from protect-time to fault-time, so it reduces | jitter. This removes the need for the return value. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: VMX: cleanup vmx_set_cr0().Gleb Natapov2013-02-061-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calculating hw_cr0 teh current code masks bits that should be always on and re-adds them back immediately after. Cleanup the code by masking only those bits that should be dropped from hw_cr0. This allow us to get rid of some defines. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | KVM: PPC: BookE: Handle alignment interruptsAlexander Graf2013-02-132-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the guest triggers an alignment interrupt, we don't handle it properly today and instead BUG_ON(). This really shouldn't happen. Instead, we should just pass the interrupt back into the guest so it can deal with it. Reported-by: Gao Guanhua-B22826 <B22826@freescale.com> Tested-by: Gao Guanhua-B22826 <B22826@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * | | booke: Added DBCR4 SPR numberBharat Bhushan2013-02-131-0/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * | | KVM: PPC: booke: Allow multiple exception typesBharat Bhushan2013-02-135-16/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current kvmppc_booke_handlers uses the same macro (KVM_HANDLER) and all handlers are considered to be the same size. This will not be the case if we want to use different macros for different handlers. This patch improves the kvmppc_booke_handler so that it can support different macros for different handlers. Signed-off-by: Liu Yu <yu.liu@freescale.com> [bharat.bhushan@freescale.com: Substantial changes] Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * | | KVM: PPC: booke: use vcpu reference from thread_structBharat Bhushan2013-02-133-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Like other places, use thread_struct to get vcpu reference. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * | | Merge commit 'origin/next' into kvm-ppc-nextAlexander Graf2013-02-1323-166/+803
| |\ \ \ | | |/ /
| | * | KVM: VMX: add missing exit names to VMX_EXIT_REASONS arrayGleb Natapov2013-02-051-1/+6
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: VMX: disable SMEP feature when guest is in non-paging modeDongxiao Xu2013-02-051-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SMEP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMEP needs to be manually disabled when guest switches to non-paging mode. We met an issue that, SMP Linux guest with recent kernel (enable SMEP support, for example, 3.5.3) would crash with triple fault if setting unrestricted_guest=0. This is because KVM uses an identity mapping page table to emulate the non-paging mode, where the page table is set with USER flag. If SMEP is still enabled in this case, guest will meet unhandlable page fault and then crash. Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: Remove duplicate text in api.txtGeoff Levand2013-02-051-13/+0
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | Revert "KVM: MMU: split kvm_mmu_free_page"Gleb Natapov2013-02-051-18/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bd4c86eaa6ff10abc4e00d0f45d2a28b10b09df4. There is not user for kvm_mmu_isolate_page() any more. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: drop superfluous is_present_gpte() check.Gleb Natapov2013-02-041-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gust page walker puts only present ptes into ptes[] array. No need to check it again. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: drop superfluous min() call.Gleb Natapov2013-02-041-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: set base_role.nxe during mmu initialization.Gleb Natapov2013-02-042-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move base_role.nxe initialisation to where all other roles are initialized. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: drop unneeded checks.Gleb Natapov2013-02-041-3/+2
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: make spte_is_locklessly_modifiable() more clearGleb Natapov2013-02-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | spte_is_locklessly_modifiable() checks that both SPTE_HOST_WRITEABLE and SPTE_MMU_WRITEABLE are present on spte. Make it more explicit. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: set_memory_region: Disallow changing read-only attribute laterTakuya Yoshikawa2013-02-042-29/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Xiao pointed out, there are a few problems with it: - kvm_arch_commit_memory_region() write protects the memory slot only for GET_DIRTY_LOG when modifying the flags. - FNAME(sync_page) uses the old spte value to set a new one without checking KVM_MEM_READONLY flag. Since we flush all shadow pages when creating a new slot, the simplest fix is to disallow such problematic flag changes: this is safe because no one is doing such things. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: set_memory_region: Identify the requested change explicitlyTakuya Yoshikawa2013-02-041-20/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM_SET_USER_MEMORY_REGION forces __kvm_set_memory_region() to identify what kind of change is being requested by checking the arguments. The current code does this checking at various points in code and each condition being used there is not easy to understand at first glance. This patch consolidates these checks and introduces an enum to name the possible changes to clean up the code. Although this does not introduce any functional changes, there is one change which optimizes the code a bit: if we have nothing to change, the new code returns 0 immediately. Note that the return value for this case cannot be changed since QEMU relies on it: we noticed this when we changed it to -EINVAL and got a section mismatch error at the final stage of live migration. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | s390/kvm: Fix instruction decodingChristian Borntraeger2013-01-301-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instructions with long displacement have a signed displacement. Currently the sign bit is interpreted as 2^20: Lets fix it by doing the sign extension from 20bit to 32bit and then use it as a signed variable in the addition (see kvm_s390_get_base_disp_rsy). Furthermore, there are lots of "int" in that code. This is problematic, because shifting on a signed integer is undefined/implementation defined if the bit value happens to be negative. Fortunately the promotion rules will make the right hand side unsigned anyway, so there is no real problem right now. Let's convert them anyway to unsigned where appropriate to avoid problems if the code is changed or copy/pasted later on. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | s390/virtio-ccw: Fix setup_vq error handling.Cornelia Huck2013-01-301-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio_ccw_setup_vq() failed to unwind correctly on errors. In particular, it failed to delete the virtqueue on errors, leading to list corruption when virtio_ccw_del_vqs() iterated over a virtqueue that had not been added to the vcdev's list. Fix this with redoing the error unwinding in virtio_ccw_setup_vq(), using a single path for all errors. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | s390/kvm: Fix store status for ACRS/FPRSChristian Borntraeger2013-01-301-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On store status we need to copy the current state of registers into a save area. Currently we might save stale versions: The sie state descriptor doesnt have fields for guest ACRS,FPRS, those registers are simply stored in the host registers. The host program must copy these away if needed. We do that in vcpu_put/load. If we now do a store status in KVM code between vcpu_put/load, the saved values are not up-to-date. Lets collect the ACRS/FPRS before saving them. This also fixes some strange problems with hotplug and virtio-ccw, since the low level machine check handler (on hotplug a machine check will happen) will revalidate all registers with the content of the save area. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> CC: stable@vger.kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | kvm: Handle yield_to failure return code for potential undercommit caseRaghavendra K T2013-01-291-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | yield_to returns -ESRCH, When source and target of yield_to run queue length is one. When we see three successive failures of yield_to we assume we are in potential undercommit case and abort from PLE handler. The assumption is backed by low probability of wrong decision for even worst case scenarios such as average runqueue length between 1 and 2. More detail on rationale behind using three tries: if p is the probability of finding rq length one on a particular cpu, and if we do n tries, then probability of exiting ple handler is: p^(n+1) [ because we would have come across one source with rq length 1 and n target cpu rqs with length 1 ] so num tries: probability of aborting ple handler (1.5x overcommit) 1 1/4 2 1/8 3 1/16 We can increase this probability with more tries, but the problem is the overhead. Also, If we have tried three times that means we would have iterated over 3 good eligible vcpus along with many non-eligible candidates. In worst case if we iterate all the vcpus, we reduce 1x performance and overcommit performance get hit. note that we do not update last boosted vcpu in failure cases. Thank Avi for raising question on aborting after first fail from yield_to. Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | sched: Bail out of yield_to when source and target runqueue has one taskPeter Zijlstra2013-01-291-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Andrew Jones <drjones@redhat.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
OpenPOWER on IntegriCloud