hugetlb: introduce nr_overcommit_hugepages sysctl

hugetlb: introduce nr_overcommit_hugepages sysctl While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I became convinced that having a boolean sysctl was insufficient: 1) To support per-node control of hugepages, I have previously submitted patches to add a sysfs attribute related to nr_hugepages. However, with a boolean global value and per-mount quota enforcement constraining the dynamic pool, adding corresponding control of the dynamic pool on a per-node basis seems inconsistent to me. 2) Administration of the hugetlb dynamic pool with multiple hugetlbfs mount points is, arguably, more arduous than it needs to be. Each quota would need to be set separately, and the sum would need to be monitored. To ease the administration, and to help make the way for per-node control of the static & dynamic hugepage pool, I added a separate sysctl, nr_overcommit_hugepages. This value serves as a high watermark for the overall hugepage pool, while nr_hugepages serves as a low watermark. The boolean sysctl can then be removed, as the condition nr_overcommit_hugepages > 0 indicates the same administrative setting as hugetlb_dynamic_pool == 1 Quotas still serve as local enforcement of the size of the pool on a per-mount basis. A few caveats: 1) There is a race whereby the global surplus huge page counter is incremented before a hugepage has allocated. Another process could then try grow the pool, and fail to convert a surplus huge page to a normal huge page and instead allocate a fresh huge page. I believe this is benign, as no memory is leaked (the actual pages are still tracked correctly) and the counters won't go out of sync. 2) Shrinking the static pool while a surplus is in effect will allow the number of surplus huge pages to exceed the overcommit value. As long as this condition holds, however, no more surplus huge pages will be allowed on the system until one of the two sysctls are increased sufficiently, or the surplus huge pages go out of use and are freed. Successfully tested on x86_64 with the current libhugetlbfs snapshot, modified to use the new sysctl. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Nishanth Aravamudan <nacc@us.ibm.com> 2007-12-17 16:20:12 -0800
committer: Linus Torvalds <torvalds@woody.linux-foundation.org> 2007-12-17 19:28:17 -0800
commit: d1c3fb1f8f29c41b0d098d7cfb3c32939043631f (patch)
tree: b91983662da7ec4c28ac0788e835c2d51eea20e1 /include/linux/hugetlb.h
parent: 7a3f595cc8298df14a7c71b0d876bafd8e9e1cbf (diff)
download: talos-op-linux-d1c3fb1f8f29c41b0d098d7cfb3c32939043631f.tar.gz
talos-op-linux-d1c3fb1f8f29c41b0d098d7cfb3c32939043631f.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 24968790bc3e..f7bc869a29b8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -34,6 +34,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
 extern unsigned long max_huge_pages;
 extern unsigned long hugepages_treat_as_movable;
 extern int hugetlb_dynamic_pool;
+extern unsigned long nr_overcommit_huge_pages;
 extern const unsigned long hugetlb_zero, hugetlb_infinity;
 extern int sysctl_hugetlb_shm_group;
author	Nishanth Aravamudan <nacc@us.ibm.com>	2007-12-17 16:20:12 -0800
committer	Linus Torvalds <torvalds@woody.linux-foundation.org>	2007-12-17 19:28:17 -0800
commit	d1c3fb1f8f29c41b0d098d7cfb3c32939043631f (patch)
tree	b91983662da7ec4c28ac0788e835c2d51eea20e1 /include/linux/hugetlb.h
parent	7a3f595cc8298df14a7c71b0d876bafd8e9e1cbf (diff)
download	talos-op-linux-d1c3fb1f8f29c41b0d098d7cfb3c32939043631f.tar.gz talos-op-linux-d1c3fb1f8f29c41b0d098d7cfb3c32939043631f.zip