summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/paging_tmpl.h
Commit message (Collapse)AuthorAgeFilesLines
* KVM: fix typo in copyright noticeNicolas Kaiser2010-10-241-1/+1
| | | | | | | Fix typo in copyright notice. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: move access code parsing to FNAME(walk_addr) functionXiao Guangrong2010-10-241-24/+16
| | | | | | | Move access code parsing from caller site to FNAME(walk_addr) function Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: cleanup for error mask set while walk guest page tableXiao Guangrong2010-10-241-10/+7
| | | | | | | Small cleanup for set page fault error code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Track NX state in struct kvm_mmuJoerg Roedel2010-10-241-2/+2
| | | | | | | | | | | With Nested Paging emulation the NX state between the two MMU contexts may differ. To make sure that always the right fault error code is recorded this patch moves the NX state into struct kvm_mmu so that the code can distinguish between L1 and L2 NX state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Introduce kvm_pdptr_read_mmuJoerg Roedel2010-10-241-1/+1
| | | | | | | | | | This function is implemented to load the pdptr pointers of the currently running guest (l1 or l2 guest). Therefore it takes care about the current paging mode and can read pdptrs out of l2 guest physical memory. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Make walk_addr_generic capable for two-level walkingJoerg Roedel2010-10-241-7/+23
| | | | | | | | | This patch uses kvm_read_guest_page_tdp to make the walk_addr_generic functions suitable for two-level page table walking. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Implement nested gva_to_gpa functionsJoerg Roedel2010-10-241-0/+31
| | | | | | | | This patch adds the functions to do a nested l2_gva to l1_gpa page table walk. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Introduce generic walk_addr functionJoerg Roedel2010-10-241-8/+18
| | | | | | | | | | | This is the first patch in the series towards a generic walk_addr implementation which could walk two-dimensional page tables in the end. In this first step the walk_addr function is renamed into walk_addr_generic which takes a mmu context as an additional parameter. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Track page fault data in struct vcpuJoerg Roedel2010-10-241-1/+5
| | | | | | | | | | | | | | | | | | This patch introduces a struct with two new fields in vcpu_arch for x86: * fault.address * fault.error_code This will be used to correctly propagate page faults back into the guest when we could have either an ordinary page fault or a nested page fault. In the case of a nested page fault the fault-address is different from the original address that should be walked. So we need to keep track about the real fault-address. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpuJoerg Roedel2010-10-241-3/+4
| | | | | | | | This patch changes is_rsvd_bits_set() function prototype to take only a kvm_mmu context instead of a full vcpu. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Introduce get_cr3 function pointerJoerg Roedel2010-10-241-2/+2
| | | | | | | | This function pointer in the MMU context is required to implement Nested Nested Paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Check for root_level instead of long modeJoerg Roedel2010-10-241-2/+2
| | | | | | | | | | The walk_addr function checks for !is_long_mode in its 64 bit version. But what is meant here is a check for pae paging. Change the condition to really check for pae paging so that it also works with nested nested paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: support disable/enable mmu audit dynamiclyXiao Guangrong2010-10-241-2/+2
| | | | | | | | | | | | | | | | Add a r/w module parameter named 'mmu_audit', it can control audit enable/disable: enable: echo 1 > /sys/module/kvm/parameters/mmu_audit disable: echo 0 > /sys/module/kvm/parameters/mmu_audit This patch not change the logic Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: fix wrong not write protected sp reportXiao Guangrong2010-10-241-1/+2
| | | | | | | | | | | | The audit code reports some sp not write protected in current code, it's just the bug in audit_write_protection(), since: - the invalid sp not need write protected - using uninitialize local variable('gfn') - call kvm_mmu_audit() out of mmu_lock's protection Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: combine guest pte read between fetch and pte prefetchXiao Guangrong2010-10-241-19/+21
| | | | | | | Combine guest pte read between guest pte check in the fetch path and pte prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: prefetch ptes when intercepted guest #PFXiao Guangrong2010-10-241-0/+72
| | | | | | | | | | | Support prefetch ptes when intercept guest #PF, avoid to #PF by later access If we meet any failure in the prefetch path, we will exit it and not try other ptes to avoid become heavy path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: add missing reserved bits check in speculative pathXiao Guangrong2010-08-021-2/+3
| | | | | | | | | In the speculative path, we should check guest pte's reserved bits just as the real processor does Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Eliminate redundant temporaries in FNAME(fetch)Avi Kivity2010-08-021-35/+24
| | | | | | | | 'level' and 'sptep' are aliases for 'interator.level' and 'iterator.sptep', no need for them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Validate all gptes during fetch, not just those used for new pagesAvi Kivity2010-08-021-9/+24
| | | | | | | | | | | | | | | | | | | | | | Currently, when we fetch an spte, we only verify that gptes match those that the walker saw if we build new shadow pages for them. However, this misses the following race: vcpu1 vcpu2 walk change gpte walk instantiate sp fetch existing sp Fix by validating every gpte, regardless of whether it is used for building a new sp or not. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Simplify spte fetch() functionAvi Kivity2010-08-021-44/+49
| | | | | | | | | | | | | | | | | | | | Partition the function into three sections: - fetching indirect shadow pages (host_level > guest_level) - fetching direct shadow pages (page_level < host_level <= guest_level) - the final spte (page_level == host_level) Instead of the current spaghetti. A slight change from the original code is that we call validate_direct_spte() more often: previously we called it only for gw->level, now we also call it for lower levels. The change should have no effect. [xiao: fix regression caused by validate_direct_spte() called too late] Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Add gpte_valid() helperAvi Kivity2010-08-021-8/+17
| | | | | | | | Move the code to check whether a gpte has changed since we fetched it into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Add validate_direct_spte() helperAvi Kivity2010-08-021-21/+6
| | | | | | | | | Add a helper to verify that a direct shadow page is valid wrt the required access permissions; drop the page if it is not valid. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Add drop_large_spte() helperAvi Kivity2010-08-021-4/+1
| | | | | | | To clarify spte fetching code, move large spte handling into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Add link_shadow_page() helperAvi Kivity2010-08-021-5/+2
| | | | | | | | | To simplify the process of fetching an spte, add a helper that links a shadow page to an spte. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Keep going on permission errorAvi Kivity2010-08-021-22/+30
| | | | | | | | Real hardware disregards permission errors when computing page fault error code bit 0 (page present). Do the same. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Only indicate a fetch fault in page fault error code if nx is enabledAvi Kivity2010-08-021-1/+1
| | | | | | | Bit 4 of the page fault error code is set only if EFER.NX is set. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Introduce drop_spte()Avi Kivity2010-08-021-7/+6
| | | | | | | | | | | When we call rmap_remove(), we (almost) always immediately follow it by an __set_spte() to a nonpresent pte. Since we need to perform the two operations atomically, to avoid losing the dirty and accessed bits, introduce a helper drop_spte() and convert all call sites. The operation is still nonatomic at this point. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: cleanup FNAME(fetch)() functionsXiao Guangrong2010-08-011-11/+8
| | | | | | | Cleanup this function that we are already get the direct sp's access Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: fix direct sp's access corruptedXiao Guangrong2010-08-011-2/+26
| | | | | | | | | | | | | | If the mapping is writable but the dirty flag is not set, we will find the read-only direct sp and setup the mapping, then if the write #PF occur, we will mark this mapping writable in the read-only direct sp, now, other real read-only mapping will happily write it without #PF. It may hurt guest's COW Fixed by re-install the mapping when write #PF occur. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: fix conflict access permissions in direct spXiao Guangrong2010-08-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> |---| P[W] | | LPA \ PDE2 -> |---| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Remove memory alias supportAvi Kivity2010-08-011-2/+1
| | | | | | | As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: don't mark pte notrap if it's just sync transientXiao Guangrong2010-08-011-2/+3
| | | | | | | If the sync-sp just sync transient, don't mark its pte notrap Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: cleanup for dirty page judgmentXiao Guangrong2010-08-011-2/+2
| | | | | | | Using wrap function to cleanup page dirty judgment Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: rename 'page' and 'shadow_page' to 'sp'Xiao Guangrong2010-08-011-11/+11
| | | | | | | Rename 'page' and 'shadow_page' to 'sp' to better fit the context Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fix unused but set warningsAndi Kleen2010-08-011-0/+1
| | | | | | | No real bugs in this one. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: calculate correct gfn for small host pages backing large guest pagesLai Jiangshan2010-08-011-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In Documentation/kvm/mmu.txt: gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. But in function FNAME(fetch)(), sp->gfn is incorrect when one of following situations occurred: 1) guest is 32bit paging and the guest PDE maps a 4-MByte page (backed by 4k host pages), FNAME(fetch)() miss handling the quadrant. And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);" is incorrect. 2) guest is long mode paging and the guest PDPTE maps a 1-GByte page (backed by 4k or 2M host pages). So we fix it to suit to the document and suit to the code which requires sp->gfn correct when sp->role.direct=1. We use the goal mapping gfn(gw->gfn) to calculate the base page frame for linear translations, it is simple and easy to be understood. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Don't allocate gfns page for direct mmu pagesLai Jiangshan2010-08-011-0/+3
| | | | | | | | | | | | | | | | | | | When sp->role.direct is set, sp->gfns does not contain any essential information, leaf sptes reachable from this sp are for a continuous guest physical memory range (a linear range). So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL) Obviously, it is not essential information, we can calculate it when need. It means we don't need sp->gfns when sp->role.direct=1, Thus we can save one page usage for every kvm_mmu_page. Note: Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn() or kvm_mmu_page_set_gfn(). It is only exposed in FNAME(sync_page). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Update Red Hat copyrightsAvi Kivity2010-08-011-0/+1
| | | | Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: only update unsync page in invlpg pathXiao Guangrong2010-08-011-2/+6
| | | | | | | | Only unsync pages need updated at invlpg time since other shadow pages are write-protected Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_pageXiao Guangrong2010-08-011-3/+4
| | | | | | | | sp->gfns[] contain unaliased gfns, but gpte might contain pointer to aliased region. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Fix debug output error in walk_addr()Gui Jianfeng2010-08-011-1/+1
| | | | | | | Fix a debug output error in walk_addr Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: mark page table dirty when a pte is actually modifiedGui Jianfeng2010-08-011-2/+2
| | | | | | | | Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark page table page as dirty. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Avoid killing userspace through guest SRAO MCE on unmapped pagesHuang Ying2010-08-011-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In common cases, guest SRAO MCE will cause corresponding poisoned page be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay the MCE to guest OS. But it is reported that if the poisoned page is accessed in guest after unmapping and before MCE is relayed to guest OS, userspace will be killed. The reason is as follows. Because poisoned page has been un-mapped, guest access will cause guest exit and kvm_mmu_page_fault will be called. kvm_mmu_page_fault can not get the poisoned page for fault address, so kernel and user space MMIO processing is tried in turn. In user MMIO processing, poisoned page is accessed again, then userspace is killed by force_sig_info. To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM and do not try kernel and user space MMIO processing for poisoned page. [xiao: fix warning introduced by avi] Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: fix conflict access permissions in direct spXiao Guangrong2010-07-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> |---| P[W] | | LPA \ PDE2 -> |---| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: cleanup invlpg codeXiao Guangrong2010-05-191-3/+1
| | | | | | | Using is_last_spte() to cleanup invlpg code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: fix for calculating gpa in invlpg codeXiao Guangrong2010-05-191-1/+6
| | | | | | | | If the guest is 32-bit, we should use 'quadrant' to adjust gpa offset Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Make use of is_large_pte() in walkerGui Jianfeng2010-05-171-2/+2
| | | | | | | | Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK bit directly. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Move sync_page() first pte address calculation out of loopGui Jianfeng2010-05-171-2/+4
| | | | | | | Move first pte address calculation out of loop to save some cycles. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: remove unnecessary NX check in walk_addrXiao Guangrong2010-05-171-1/+1
| | | | | | | After is_rsvd_bits_set() checks, EFER.NXE must be enabled if NX bit is seted Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Reinstate pte prefetch on invlpgAvi Kivity2010-05-171-0/+15
| | | | | | | | | | | | | | | | | | | | Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races. However, the SDM is adamant that prefetch is allowed: "The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path." And, in fact, there was a race in the prefetch code: we picked up the pte without the mmu lock held, so an older invlpg could install the pte over a newer invlpg. Reinstate the prefetch logic, but this time note whether another invlpg has executed using a counter. If a race occured, do not install the pte. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
OpenPOWER on IntegriCloud