summaryrefslogtreecommitdiffstats
path: root/kernel
diff options
context:
space:
mode:
authorMel Gorman <mel@csn.ul.ie>2008-07-23 21:27:23 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2008-07-24 10:47:16 -0700
commita1e78772d72b2616ed20e54896e68e0e7044854e (patch)
treed752dd96c2a4fcc555779a7aa99f95069c9b95ae /kernel
parentfc1b8a73dd71226902a11928dd5500326e101df9 (diff)
downloadtalos-op-linux-a1e78772d72b2616ed20e54896e68e0e7044854e.tar.gz
talos-op-linux-a1e78772d72b2616ed20e54896e68e0e7044854e.zip
hugetlb: reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork()
This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in a similar manner to the reservations taken for MAP_SHARED mappings. The reserve count is accounted both globally and on a per-VMA basis for private mappings. This guarantees that a process that successfully calls mmap() will successfully fault all pages in the future unless fork() is called. The characteristics of private mappings of hugetlbfs files behaviour after this patch are; 1. The process calling mmap() is guaranteed to succeed all future faults until it forks(). 2. On fork(), the parent may die due to SIGKILL on writes to the private mapping if enough pages are not available for the COW. For reasonably reliable behaviour in the face of a small huge page pool, children of hugepage-aware processes should not reference the mappings; such as might occur when fork()ing to exec(). 3. On fork(), the child VMAs inherit no reserves. Reads on pages already faulted by the parent will succeed. Successful writes will depend on enough huge pages being free in the pool. 4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper and at fault time otherwise. Before this patch, all reads or writes in the child potentially needs page allocations that can later lead to the death of the parent. This applies to reads and writes of uninstantiated pages as well as COW. After the patch it is only a write to an instantiated page that causes problems. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'kernel')
-rw-r--r--kernel/fork.c9
1 files changed, 9 insertions, 0 deletions
diff --git a/kernel/fork.c b/kernel/fork.c
index adefc1131f27..552c8d8e77ad 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -33,6 +33,7 @@
#include <linux/cpu.h>
#include <linux/cgroup.h>
#include <linux/security.h>
+#include <linux/hugetlb.h>
#include <linux/swap.h>
#include <linux/syscalls.h>
#include <linux/jiffies.h>
@@ -307,6 +308,14 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
}
/*
+ * Clear hugetlb-related page reserves for children. This only
+ * affects MAP_PRIVATE mappings. Faults generated by the child
+ * are not guaranteed to succeed, even if read-only
+ */
+ if (is_vm_hugetlb_page(tmp))
+ reset_vma_resv_huge_pages(tmp);
+
+ /*
* Link in the new vma and copy the page table entries.
*/
*pprev = tmp;
OpenPOWER on IntegriCloud