diff options
author | Kirill A. Shutemov <kirill.shutemov@linux.intel.com> | 2017-07-17 01:59:54 +0300 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2017-07-21 10:05:19 +0200 |
commit | 77ef56e4f0fbb350d93289aa025c7d605af012d4 (patch) | |
tree | 74562d80774df19f6be3a40304b0e77d65d6f9b4 /Documentation/x86/x86_64 | |
parent | ee00f4a32a76ef631394f31d5b6028d50462b357 (diff) | |
download | talos-op-linux-77ef56e4f0fbb350d93289aa025c7d605af012d4.tar.gz talos-op-linux-77ef56e4f0fbb350d93289aa025c7d605af012d4.zip |
x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
Most of things are in place and we can enable support for 5-level paging.
The patch makes XEN_PV and XEN_PVH dependent on !X86_5LEVEL. Both are
not ready to work with 5-level paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170716225954.74185-9-kirill.shutemov@linux.intel.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'Documentation/x86/x86_64')
-rw-r--r-- | Documentation/x86/x86_64/5level-paging.txt | 64 |
1 files changed, 64 insertions, 0 deletions
diff --git a/Documentation/x86/x86_64/5level-paging.txt b/Documentation/x86/x86_64/5level-paging.txt new file mode 100644 index 000000000000..087251a0d99c --- /dev/null +++ b/Documentation/x86/x86_64/5level-paging.txt @@ -0,0 +1,64 @@ +== Overview == + +Original x86-64 was limited by 4-level paing to 256 TiB of virtual address +space and 64 TiB of physical address space. We are already bumping into +this limit: some vendors offers servers with 64 TiB of memory today. + +To overcome the limitation upcoming hardware will introduce support for +5-level paging. It is a straight-forward extension of the current page +table structure adding one more layer of translation. + +It bumps the limits to 128 PiB of virtual address space and 4 PiB of +physical address space. This "ought to be enough for anybody" ©. + +QEMU 2.9 and later support 5-level paging. + +Virtual memory layout for 5-level paging is described in +Documentation/x86/x86_64/mm.txt + +== Enabling 5-level paging == + +CONFIG_X86_5LEVEL=y enables the feature. + +So far, a kernel compiled with the option enabled will be able to boot +only on machines that supports the feature -- see for 'la57' flag in +/proc/cpuinfo. + +The plan is to implement boot-time switching between 4- and 5-level paging +in the future. + +== User-space and large virtual address space == + +On x86, 5-level paging enables 56-bit userspace virtual address space. +Not all user space is ready to handle wide addresses. It's known that +at least some JIT compilers use higher bits in pointers to encode their +information. It collides with valid pointers with 5-level paging and +leads to crashes. + +To mitigate this, we are not going to allocate virtual address space +above 47-bit by default. + +But userspace can ask for allocation from full address space by +specifying hint address (with or without MAP_FIXED) above 47-bits. + +If hint address set above 47-bit, but MAP_FIXED is not specified, we try +to look for unmapped area by specified address. If it's already +occupied, we look for unmapped area in *full* address space, rather than +from 47-bit window. + +A high hint address would only affect the allocation in question, but not +any future mmap()s. + +Specifying high hint address on older kernel or on machine without 5-level +paging support is safe. The hint will be ignored and kernel will fall back +to allocation from 47-bit address space. + +This approach helps to easily make application's memory allocator aware +about large address space without manually tracking allocated virtual +address space. + +One important case we need to handle here is interaction with MPX. +MPX (without MAWA extension) cannot handle addresses above 47-bit, so we +need to make sure that MPX cannot be enabled we already have VMA above +the boundary and forbid creating such VMAs once MPX is enabled. + |