summaryrefslogtreecommitdiffstats
path: root/arch/s390/include/asm/kvm_host.h
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-03-161-15/+26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "One of the largest releases for KVM... Hardly any generic changes, but lots of architecture-specific updates. ARM: - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - various optimizations to the vgic save/restore code. PPC: - enabled KVM-VFIO integration ("VFIO device") - optimizations to speed up IPIs between vcpus - in-kernel handling of IOMMU hypercalls - support for dynamic DMA windows (DDW). s390: - provide the floating point registers via sync regs; - separated instruction vs. data accesses - dirty log improvements for huge guests - bugfixes and documentation improvements. x86: - Hyper-V VMBus hypercall userspace exit - alternative implementation of lowest-priority interrupts using vector hashing (for better VT-d posted interrupt support) - fixed guest debugging with nested virtualizations - improved interrupt tracking in the in-kernel IOAPIC - generic infrastructure for tracking writes to guest memory - currently its only use is to speedup the legacy shadow paging (pre-EPT) case, but in the future it will be used for virtual GPUs as well - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits) KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch KVM: x86: disable MPX if host did not enable MPX XSAVE features arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit arm64: KVM: vgic-v3: Reset LRs at boot time arm64: KVM: vgic-v3: Do not save an LR known to be empty arm64: KVM: vgic-v3: Save maintenance interrupt state only if required arm64: KVM: vgic-v3: Avoid accessing ICH registers KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit KVM: arm/arm64: vgic-v2: Reset LRs at boot time KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers KVM: s390: allocate only one DMA page per VM KVM: s390: enable STFLE interpretation only if enabled for the guest KVM: s390: wake up when the VCPU cpu timer expires KVM: s390: step the VCPU timer while in enabled wait KVM: s390: protect VCPU cpu timer with a seqcount KVM: s390: step VCPU cpu timer during kvm_run ioctl ...
| * KVM: s390: allocate only one DMA page per VMDavid Hildenbrand2016-03-081-8/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can fit the 2k for the STFLE interpretation and the crypto control block into one DMA page. As we now only have to allocate one DMA page, we can clean up the code a bit. As a nice side effect, this also fixes a problem with crycbd alignment in case special allocation debug options are enabled, debugged by Sascha Silbe. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: protect VCPU cpu timer with a seqcountDavid Hildenbrand2016-03-081-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For now, only the owning VCPU thread (that has loaded the VCPU) can get a consistent cpu timer value when calculating the delta. However, other threads might also be interested in a more recent, consistent value. Of special interest will be the timer callback of a VCPU that executes without having the VCPU loaded and could run in parallel with the VCPU thread. The cpu timer has a nice property: it is only updated by the owning VCPU thread. And speaking about accounting, a consistent value can only be calculated by looking at cputm_start and the cpu timer itself in one shot, otherwise the result might be wrong. As we only have one writing thread at a time (owning VCPU thread), we can use a seqcount instead of a seqlock and retry if the VCPU refreshed its cpu timer. This avoids any heavy locking and only introduces a counter update/check plus a handful of smp_wmb(). The owning VCPU thread should never have to retry on reads, and also for other threads this might be a very rare scenario. Please note that we have to use the raw_* variants for locking the seqcount as lockdep will produce false warnings otherwise. The rq->lock held during vcpu_load/put is also acquired from hardirq context. Lockdep cannot know that we avoid potential deadlocks by disabling preemption and thereby disable concurrent write locking attempts (via vcpu_put/load). Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: step VCPU cpu timer during kvm_run ioctlDavid Hildenbrand2016-03-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Architecturally we should only provide steal time if we are scheduled away, and not if the host interprets a guest exit. We have to step the guest CPU timer in these cases. In the first shot, we will step the VCPU timer only during the kvm_run ioctl. Therefore all time spent e.g. in interception handlers or on irq delivery will be accounted for that VCPU. We have to take care of a few special cases: - Other VCPUs can test for pending irqs. We can only report a consistent value for the VCPU thread itself when adding the delta. - We have to take care of STP sync, therefore we have to extend kvm_clock_sync() and disable preemption accordingly - During any call to disable/enable/start/stop we could get premeempted and therefore get start/stop calls. Therefore we have to make sure we don't get into an inconsistent state. Whenever a VCPU is scheduled out, sleeping, in user space or just about to enter the SIE, the guest cpu timer isn't stepped. Please note that all primitives are prepared to be called from both environments (cpu timer accounting enabled or not), although not completely used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called while cpu timer accounting is enabled). Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: remove old fragment of vector registersDavid Hildenbrand2016-02-101-7/+1
| | | | | | | | | | | | | | | | | | Since commit 9977e886cbbc ("s390/kernel: lazy restore fpu registers"), vregs in struct sie_page is unsed. We can safely remove the field and the definition. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* | KVM: Use simple waitqueue for vcpu->wqMarcelo Tosatti2016-02-251-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem: On -rt, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: This patch reduces the average latency in my tests from 14us to 11us. Daniel writes: Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency benchmark on mainline. The test was run 1000 times on tip/sched/core 4.4.0-rc8-01134-g0905f04: ./x86-run x86/tscdeadline_latency.flat -cpu host with idle=poll. The test seems not to deliver really stable numbers though most of them are smaller. Paolo write: "Anything above ~10000 cycles means that the host went to C1 or lower---the number means more or less nothing in that case. The mean shows an improvement indeed." Before: min max mean std count 1000.000000 1000.000000 1000.000000 1000.000000 mean 5162.596000 2019270.084000 5824.491541 20681.645558 std 75.431231 622607.723969 89.575700 6492.272062 min 4466.000000 23928.000000 5537.926500 585.864966 25% 5163.000000 1613252.750000 5790.132275 16683.745433 50% 5175.000000 2281919.000000 5834.654000 23151.990026 75% 5190.000000 2382865.750000 5861.412950 24148.206168 max 5228.000000 4175158.000000 6254.827300 46481.048691 After min max mean std count 1000.000000 1000.00000 1000.000000 1000.000000 mean 5143.511000 2076886.10300 5813.312474 21207.357565 std 77.668322 610413.09583 86.541500 6331.915127 min 4427.000000 25103.00000 5529.756600 559.187707 25% 5148.000000 1691272.75000 5784.889825 17473.518244 50% 5160.000000 2308328.50000 5832.025000 23464.837068 75% 5172.000000 2393037.75000 5853.177675 24223.969976 max 5222.000000 3922458.00000 6186.720500 42520.379830 [Patch was originaly based on the swait implementation found in the -rt tree. Daniel ported it to mainline's version and gathered the benchmark numbers for tscdeadline_latency test.] Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* KVM: s390: fix memory overwrites when vx is disabledDavid Hildenbrand2016-01-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | The kernel now always uses vector registers when available, however KVM has special logic if support is really enabled for a guest. If support is disabled, guest_fpregs.fregs will only contain memory for the fpu. The kernel, however, will store vector registers into that area, resulting in crazy memory overwrites. Simply extending that area is not enough, because the format of the registers also changes. We would have to do additional conversions, making the code even more complex. Therefore let's directly use one place for the vector/fpu registers + fpc (in kvm_run). We just have to convert the data properly when accessing it. This makes current code much easier. Please note that vector/fpu registers are now always stored to vcpu->run->s.regs.vrs. Although this data is visible to QEMU and used for migration, we only guarantee valid values to user space when KVM_SYNC_VRS is set. As that is only the case when we have vector register support, we are on the safe side. Fixes: b5510d9b68c3 ("s390/fpu: always enable the vector facility if it is available") Cc: stable@vger.kernel.org # v4.4 d9a3a09af54d s390/kvm: remove dependency on struct save_area definition Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [adopt to d9a3a09af54d]
* KVM: move architecture-dependent requests to arch/Paolo Bonzini2016-01-081-0/+4
| | | | | | | | | Since the numbers now overlap, it makes sense to enumerate them in asm/kvm_host.h rather than linux/kvm_host.h. Functions that refer to architecture-specific requests are also moved to arch/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: s390: implement the RI support of guestFan Zhang2016-01-071-1/+2
| | | | | | | | | | | | | | | | This patch adds runtime instrumentation support for KVM guest. We need to setup a save area for the runtime instrumentation-controls control block(RICCB) and implement the necessary interfaces to live migrate the guest settings. We setup the sie control block in a way, that the runtime instrumentation instructions of a guest are handled by hardware. We also add a capability KVM_CAP_S390_RI to make this feature opt-in as it needs migration support. Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: fix mismatch between user and in-kernel guest limitDominik Dingel2015-12-151-0/+1
| | | | | | | | | | | | | | | | | | | | While the userspace interface requests the maximum size the gmap code expects to get a maximum address. This error resulted in bigger page tables than necessary for some guest sizes, e.g. a 2GB guest used 3 levels instead of 2. At the same time we introduce KVM_S390_NO_MEM_LIMIT, which allows in a bright future that a guest spans the complete 64 bit address space. We also switch to TASK_MAX_SIZE for the initial memory size, this is a cosmetic change as the previous size also resulted in a 4 level pagetable creation. Reported-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Enable up to 248 VCPUs per VMEugene (jno) Dvurechenski2015-11-301-1/+1
| | | | | | | | | This patch allows s390 to have more than 64 VCPUs for a guest (up to 248 for memory usage considerations), if supported by the underlaying hardware (sclp.has_esca). Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Introduce switching codeEugene (jno) Dvurechenski2015-11-301-0/+1
| | | | | | | | | | | | | | This patch adds code that performs transparent switch to Extended SCA on addition of 65th VCPU in a VM. Disposal of ESCA is added too. The entier ESCA functionality, however, is still not enabled. The enablement will be provided in a separate patch. This patch also uses read/write lock protection of SCA and its subfields for possible disposal at the BSCA-to-ESCA transition. While only Basic SCA needs such a protection (for the swap), any SCA access is now guarded. Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Make provisions for ESCA utilizationEugene (jno) Dvurechenski2015-11-301-1/+2
| | | | | | | | | | | This patch updates the routines (sca_*) to provide transparent access to and manipulation on the data for both Basic and Extended SCA in use. The kvm.arch.sca is generalized to (void *) to handle BSCA/ESCA cases. Also the kvm.arch.use_esca flag is provided. The actual functionality is kept the same. Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Introduce new structuresEugene (jno) Dvurechenski2015-11-301-6/+41
| | | | | | | | | | | | | This patch adds new structures and updates some existing ones to provide the base for Extended SCA functionality. The old sca_* structures were renamed to bsca_* to keep things uniform. The access to fields of SIGP controls were turned into bitfields instead of hardcoded bitmasks. Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-11-051-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.4. s390: A bunch of fixes and optimizations for interrupt and time handling. PPC: Mostly bug fixes. ARM: No big features, but many small fixes and prerequisites including: - a number of fixes for the arch-timer - introducing proper level-triggered semantics for the arch-timers - a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding) - some tracepoint improvements - a tweak for the EL2 panic handlers - some more VGIC cleanups getting rid of redundant state x86: Quite a few changes: - support for VT-d posted interrupts (i.e. PCI devices can inject interrupts directly into vCPUs). This introduces a new component (in virt/lib/) that connects VFIO and KVM together. The same infrastructure will be used for ARM interrupt forwarding as well. - more Hyper-V features, though the main one Hyper-V synthetic interrupt controller will have to wait for 4.5. These will let KVM expose Hyper-V devices. - nested virtualization now supports VPID (same as PCID but for vCPUs) which makes it quite a bit faster - for future hardware that supports NVDIMM, there is support for clflushopt, clwb, pcommit - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in userspace, which reduces the attack surface of the hypervisor - obligatory smattering of SMM fixes - on the guest side, stable scheduler clock support was rewritten to not require help from the hypervisor" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits) KVM: VMX: Fix commit which broke PML KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0() KVM: x86: allow RSM from 64-bit mode KVM: VMX: fix SMEP and SMAP without EPT KVM: x86: move kvm_set_irq_inatomic to legacy device assignment KVM: device assignment: remove pointless #ifdefs KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic KVM: x86: zero apic_arb_prio on reset drivers/hv: share Hyper-V SynIC constants with userspace KVM: x86: handle SMBASE as physical address in RSM KVM: x86: add read_phys to x86_emulate_ops KVM: x86: removing unused variable KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr() KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings KVM: arm/arm64: Optimize away redundant LR tracking KVM: s390: use simple switch statement as multiplexer KVM: s390: drop useless newline in debugging data KVM: s390: SCA must not cross page boundaries KVM: arm: Do not indent the arguments of DECLARE_BITMAP ...
| * KVM: Add kvm_arch_vcpu_{un}blocking callbacksChristoffer Dall2015-10-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Some times it is useful for architecture implementations of KVM to know when the VCPU thread is about to block or when it comes back from blocking (arm/arm64 needs to know this to properly implement timers, for example). Therefore provide a generic architecture callback function in line with what we do elsewhere for KVM generic-arch interactions. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* | s390/fpu: split fpu-internal.h into fpu internals, api, and type headersHendrik Brueckner2015-10-161-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | Split the API and FPU type definitions into separate header files similar to "x86/fpu: Rename fpu-internal.h to fpu/internal.h" (78f7f1e54b). The new header files and their meaning are: asm/fpu/types.h: FPU related data types, needed for 'struct thread_struct' and 'struct task_struct'. asm/fpu/api.h: FPU related 'public' functions for other subsystems and device drivers. asm/fpu/internal.h: FPU internal functions mainly used to convert FPU register contents in signal handling. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* KVM: disable halt_poll_ns as default for s390xDavid Hildenbrand2015-09-251-0/+1
| | | | | | | | | | | | We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: add halt_attempted_poll to VCPU statsPaolo Bonzini2015-09-161-0/+1
| | | | | | | | | | | | | | | | | This new statistic can help diagnosing VCPUs that, for any reason, trigger bad behavior of halt_poll_ns autotuning. For example, say halt_poll_ns = 480000, and wakeups are spaced exactly like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes 10+20+40+80+160+320+480 = 1110 microseconds out of every 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then is consuming about 30% more CPU than it would use without polling. This would show as an abnormally high number of attempted polling compared to the successful polls. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com< Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2015-08-311-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "The big one is support for fake NUMA, splitting a really large machine in more manageable piece improves performance in some cases, e.g. for a KVM host. The FICON Link Incident handling has been improved, this helps the operator to identify degraded or non-operational FICON connections. The save and restore of floating point and vector registers has been overhauled to allow the future use of vector registers in the kernel. A few small enhancement, magic sys-requests for the vt220 console via SCLP, some more assembler code has been converted to C, the PCI error handling is improved. And the usual cleanup and bug fixing" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (59 commits) s390/jump_label: Use %*ph to print small buffers s390/sclp_vt220: support magic sysrequests s390/ctrlchar: improve handling of magic sysrequests s390/numa: remove superfluous ARCH_WANT defines s390/3270: redraw screen on unsolicited device end s390/dcssblk: correct out of bounds array indexes s390/mm: simplify page table alloc/free code s390/pci: move debug messages to debugfs s390/nmi: initialize control register 0 earlier s390/zcrypt: use msleep() instead of mdelay() s390/hmcdrv: fix interrupt registration s390/setup: fix novx parameter s390/uaccess: remove uaccess_primary kernel parameter s390: remove unneeded sizeof(void *) comparisons s390/facilities: remove transactional-execution bits s390/numa: re-add DIE sched_domain_topology_level s390/dasd: enhance CUIR scope detection s390/dasd: fix failing path verification s390/vdso: emit a GNU hash s390/numa: make core to node mapping data dynamic ...
| * s390/kernel: lazy restore fpu registersHendrik Brueckner2015-07-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve the save and restore behavior of FPU register contents to use the vector extension within the kernel. The kernel does not use floating-point or vector registers and, therefore, saving and restoring the FPU register contents are performed for handling signals or switching processes only. To prepare for using vector instructions and vector registers within the kernel, enhance the save behavior and implement a lazy restore at return to user space from a system call or interrupt. To implement the lazy restore, the save_fpu_regs() sets a CPU information flag, CIF_FPU, to indicate that the FPU registers must be restored. Saving and setting CIF_FPU is performed in an atomic fashion to be interrupt-safe. When the kernel wants to use the vector extension or wants to change the FPU register state for a task during signal handling, the save_fpu_regs() must be called first. The CIF_FPU flag is also set at process switch. At return to user space, the FPU state is restored. In particular, the FPU state includes the floating-point or vector register contents, as well as, vector-enablement and floating-point control. The FPU state restore and clearing CIF_FPU is also performed in an atomic fashion. For KVM, the restore of the FPU register state is performed when restoring the general-purpose guest registers before the SIE instructions is started. Because the path towards the SIE instruction is interruptible, the CIF_FPU flag must be checked again right before going into SIE. If set, the guest registers must be reloaded again by re-entering the outer SIE loop. This is the same behavior as if the SIE critical section is interrupted. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | KVM: s390: Provide global debug logChristian Borntraeger2015-07-291-1/+0
| | | | | | | | | | | | | | | | In addition to the per VM debug logs, let's provide a global one for KVM-wide events, like new guests or fatal errors. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
* | KVM: s390: add kvm stat counter for all diagnosesChristian Borntraeger2015-07-291-0/+3
|/ | | | | | | | | | | | Sometimes kvm stat counters are the only performance metric to check after something went wrong. Let's add additional counters for some diagnoses. In addition do the count for diag 10 all the time, even if we inject a program interrupt. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
* KVM: add memslots argument to kvm_arch_memslots_updatedPaolo Bonzini2015-05-261-1/+1
| | | | | | | Prepare for the case of multiple address spaces. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: s390: make exit_sie_sync more robustChristian Borntraeger2015-05-081-1/+2
| | | | | | | | | | | | | | | | | exit_sie_sync is used to kick CPUs out of SIE and prevent reentering at any point in time. This is used to reload the prefix pages and to set the IBS stuff in a way that guarantees that after this function returns we are no longer in SIE. All current users trigger KVM requests. The request must be set before we block the CPUs to avoid races. Let's make this implicit by adding the request into a new function kvm_s390_sync_requests that replaces exit_sie_sync and split out s390_vcpu_block and s390_vcpu_unblock, that can be used to keep CPUs out of SIE independent of requests. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
* KVM: s390: Enable guest EDAT2 supportGuenther Hutzl2015-05-081-0/+1
| | | | | | | | | | | | | | | | | | 1. Enable EDAT2 in the list of KVM facilities 2. Handle 2G frames in pfmf instruction If we support EDAT2, we may enable handling of 2G frames if not in 24 bit mode. 3. Enable EDAT2 in sie_block If the EDAT2 facility is available we enable GED2 mode control in the sie_block. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Guenther Hutzl <hutzl@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: deliver floating interrupts in order of priorityJens Freimann2015-03-311-3/+27
| | | | | | | | | | | | | | | | | | | | This patch makes interrupt handling compliant to the z/Architecture Principles of Operation with regard to interrupt priorities. Add a bitmap for pending floating interrupts. Each bit relates to a interrupt type and its list. A turned on bit indicates that a list contains items (interrupts) which need to be delivered. When delivering interrupts on a cpu we can merge the existing bitmap for cpu-local interrupts and floating interrupts and have a single mechanism for delivery. Currently we have one list for all kinds of floating interrupts and a corresponding spin lock. This patch adds a separate list per interrupt type. An exception to this are service signal and machine check interrupts, as there can be only one pending interrupt at a time. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
* KVM: s390: represent SIMD cap in kvm facilityMichael Mueller2015-03-171-1/+0
| | | | | | | | | | | | | | The patch represents capability KVM_CAP_S390_VECTOR_REGISTERS by means of the SIMD facility bit. This allows to a) disable the use of SIMD when used in conjunction with a not-SIMD-aware QEMU, b) to enable SIMD when used with a SIMD-aware version of QEMU and c) finally by means of a QEMU version using the future cpu model ioctls. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com> Tested-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: introduce post handlers for STSIEkaterina Tumanova2015-03-171-0/+1
| | | | | | | | | | | | | | | | | | The Store System Information (STSI) instruction currently collects all information it relays to the caller in the kernel. Some information, however, is only available in user space. An example of this is the guest name: The kernel always sets "KVMGuest", but user space knows the actual guest name. This patch introduces a new exit, KVM_EXIT_S390_STSI, guarded by a capability that can be enabled by user space if it wants to be able to insert such data. User space will be provided with the target buffer and the requested STSI function code. Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Enable vector support for capable guestEric Farman2015-03-061-1/+3
| | | | | | | | | | | | | We finally have all the pieces in place, so let's include the vector facility bit in the mask of available hardware facilities for the guest to recognize. Also, enable the vector functionality in the guest control blocks, to avoid a possible vector data exception that would otherwise occur when a vector instruction is issued by the guest operating system. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Add new SIGP order to kernel countersEric Farman2015-03-061-0/+1
| | | | | | | | | | | The new SIGP order Store Additional Status at Address is totally handled by user space, but we should still record the occurrence of this order in the kernel code. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Vector exceptionsEric Farman2015-03-061-0/+1
| | | | | | | | | | | A new exception type for vector instructions is introduced with the new processor, but is handled exactly like a Data Exception which is already handled by the system. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Allocate and save/restore vector registersEric Farman2015-03-061-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Define and allocate space for both the host and guest views of the vector registers for a given vcpu. The 32 vector registers occupy 128 bits each (512 bytes total), but architecturally are paired with 512 additional bytes of reserved space for future expansion. The kvm_sync_regs structs containing the registers are union'ed with 1024 bytes of padding in the common kvm_run struct. The addition of 1024 bytes of new register information clearly exceeds the existing union, so an expansion of that padding is required. When changing environments, we need to appropriately save and restore the vector registers viewed by both the host and guest, into and out of the sync_regs space. The floating point registers overlay the upper half of vector registers 0-15, so there's a bit of data duplication here that needs to be carefully avoided. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: include guest facilities in kvm facility testMichael Mueller2015-03-041-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Most facility related decisions in KVM have to take into account: - the facilities offered by the underlying run container (LPAR/VM) - the facilities supported by the KVM code itself - the facilities requested by a guest VM This patch adds the KVM driver requested facilities to the test routine. It additionally renames struct s390_model_fac to kvm_s390_fac and its field names to be more meaningful. The semantics of the facilities stored in the KVM architecture structure is changed. The address arch.model.fac->list now points to the guest facility list and arch.model.fac->mask points to the KVM facility mask. This patch fixes the behaviour of KVM for some facilities for guests that ignore the guest visible facility bits, e.g. guests could use transactional memory intructions on hosts supporting them even if the chosen cpu model would not offer them. The userspace interface is not affected by this change. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: add cpu model supportMichael Mueller2015-02-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | This patch enables cpu model support in kvm/s390 via the vm attribute interface. During KVM initialization, the host properties cpuid, IBC value and the facility list are stored in the architecture specific cpu model structure. During vcpu setup, these properties are taken to initialize the related SIE state. This mechanism allows to adjust the properties from user space and thus to implement different selectable cpu models. This patch uses the IBC functionality to block instructions that have not been implemented at the requested CPU type and GA level compared to the full host capability. Userspace has to initialize the cpu model before vcpu creation. A cpu model change of running vcpus is not possible. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: use facilities and cpu_id per KVMMichael Mueller2015-02-091-0/+21
| | | | | | | | | | | The patch introduces facilities and cpu_ids per virtual machine. Different virtual machines may want to expose different facilities and cpu ids to the guest, so let's make them per-vm instead of global. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390/CPACF: Choose crypto control block formatTony Krowiak2015-02-091-0/+2
| | | | | | | | | | | We need to specify a different format for the crypto control block depending on whether the APXA facility is installed or not. Let's test for it by executing the PQAP(QCI) function and use either a format-1 or a format-2 crypto control block accordingly. This is a host only change for z13 and does not affect the guest view. Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* kvm: add halt_poll_ns module parameterPaolo Bonzini2015-02-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: s390/cpacf: Enable/disable protected key functions for kvm guestTony Krowiak2015-01-231-2/+8
| | | | | | | | | | | | | | | | Created new KVM device attributes for indicating whether the AES and DES/TDES protected key functions are available for programs running on the KVM guest. The attributes are used to set up the controls in the guest SIE block that specify whether programs running on the guest will be given access to the protected key functions available on the s390 hardware. Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [split MSA4/protected key into two patches]
* KVM: s390: Provide guest TOD Clock Get/Set ControlsJason J. Herne2015-01-231-0/+1
| | | | | | | | | | | | | | | | Provide controls for setting/getting the guest TOD clock based on the VM attribute interface. Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of Day clock value. TOD_HIGH is presently always set to 0. In the future it will contain a high order expansion of the tod clock value after it overflows the 64-bits of the TOD. Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: forward most SIGP orders to user spaceDavid Hildenbrand2015-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Most SIGP orders are handled partially in kernel and partially in user space. In order to: - Get a correct SIGP SET PREFIX handler that informs user space - Avoid race conditions between concurrently executed SIGP orders - Serialize SIGP orders per VCPU We need to handle all "slow" SIGP orders in user space. The remaining ones to be handled completely in kernel are: - SENSE - SENSE RUNNING - EXTERNAL CALL - EMERGENCY SIGNAL - CONDITIONAL EMERGENCY SIGNAL According to the PoP, they have to be fast. They can be executed without conflicting to the actions of other pending/concurrently executing orders (e.g. STOP vs. START). This patch introduces a new capability that will - when enabled - forward all but the mentioned SIGP orders to user space. The instruction counters in the kernel are still updated. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: clear the pfault queue if user space sets the invalid tokenDavid Hildenbrand2015-01-231-1/+0
| | | | | | | | | | | | | We need a way to clear the async pfault queue from user space (e.g. for resets and SIGP SET ARCHITECTURE). This patch simply clears the queue as soon as user space sets the invalid pfault token. The definition of the invalid token is moved to uapi. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: only one external call may be pending at a timeDavid Hildenbrand2015-01-231-3/+5
| | | | | | | | | | | | | | | | | | | | Only one external call may be pending at a vcpu at a time. For this reason, we have to detect whether the SIGP externcal call interpretation facility is available. If so, all external calls have to be injected using this mechanism. SIGP EXTERNAL CALL orders have to return whether another external call is already pending. This check was missing until now. SIGP SENSE hasn't returned yet in all conditions whether an external call was pending. If a SIGP EXTERNAL CALL irq is to be injected and one is already pending, -EBUSY is returned. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: handle stop irqs without action_bitsDavid Hildenbrand2015-01-231-5/+0
| | | | | | | | | | | | | | | | | | | | | This patch removes the famous action_bits and moves the handling of SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt. The new local interrupt infrastructure is used to track pending stop requests. STOP irqs are the only irqs that don't get actively delivered. They remain pending until the stop function is executed (=stop intercept). If another STOP irq is already pending, -EBUSY will now be returned (needed for the SIGP handling code). Migration of pending SIGP STOP (AND STORE STATUS) orders should now be supported out of the box. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: new parameter for SIGP STOP irqsDavid Hildenbrand2015-01-231-0/+2
| | | | | | | | | | | | | | | In order to get rid of the action_flags and to properly migrate pending SIGP STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember whether to store the status when stopping. For this reason, a new parameter (flags) for the SIGP STOP irq is introduced. These flags further define details of the requested STOP and can be easily migrated. Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: handle pending local interrupts via bitmapJens Freimann2014-11-281-2/+0
| | | | | | | | | | | | | | | | | | | This patch adapts handling of local interrupts to be more compliant with the z/Architecture Principles of Operation and introduces a data structure which allows more efficient handling of interrupts. * get rid of li->active flag, use bitmap instead * Keep interrupts in a bitmap instead of a list * Deliver interrupts in the order of their priority as defined in the PoP * Use a second bitmap for sigp emergency requests, as a CPU can have one request pending from every other CPU in the system. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: add bitmap for handling cpu-local interruptsJens Freimann2014-11-281-0/+86
| | | | | | | | | | | | Adds a bitmap to the vcpu structure which is used to keep track of local pending interrupts. Also add enum with all interrupt types sorted in order of priority (highest to lowest) Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Fix rewinding of the PSW pointing to an EXECUTE instructionThomas Huth2014-11-281-1/+1
| | | | | | | | | | | | | | | | | A couple of our interception handlers rewind the PSW to the beginning of the instruction to run the intercepted instruction again during the next SIE entry. This normally works fine, but there is also the possibility that the instruction did not get run directly but via an EXECUTE instruction. In this case, the PSW does not point to the instruction that caused the interception, but to the EXECUTE instruction! So we've got to rewind the PSW to the beginning of the EXECUTE instruction instead. This is now accomplished with a new helper function kvm_s390_rewind_psw(). Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: sigp: instruction counters for all sigp ordersDavid Hildenbrand2014-10-281-0/+7
| | | | | | | | | This patch introduces instruction counters for all known sigp orders and also a separate one for unknown orders that are passed to user space. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Make the simple ipte mutex specific to a VM instead of globalThomas Huth2014-10-281-0/+2
| | | | | | | | | | | | The ipte-locking should be done for each VM seperately, not globally. This way we avoid possible congestions when the simple ipte-lock is used and multiple VMs are running. Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
OpenPOWER on IntegriCloud