talos-obmc-linux/include/linux/mempolicy.h, branch dev-4.13-fsi

mm, mempolicy: simplify rebinding mempolicies when updating cpusets

2017-07-06T23:24:34+00:00

Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing cpuset's mems") has introduced a two-step protocol when rebinding task's mempolicy due to cpuset update, in order to avoid a parallel allocation seeing an empty effective nodemask and failing. Later, commit cc9a6c877661 ("cpuset: mm: reduce large amounts of memory barrier related damage v3") introduced a seqlock protection and removed the synchronization point between the two update steps. At that point (or perhaps later), the two-step rebinding became unnecessary. Currently it only makes sure that the update first adds new nodes in step 1 and then removes nodes in step 2. Without memory barriers the effects are questionable, and even then this cannot prevent a parallel zonelist iteration checking the nodemask at each step to observe all nodes as unusable for allocation. We now fully rely on the seqlock to prevent premature OOMs and allocation failures. We can thus remove the two-step update parts and simplify the code. Link: http://lkml.kernel.org/r/20170517081140.30654-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Anshuman Khandual Cc: Christoph Lameter Cc: David Rientjes Cc: Dimitri Sivanich Cc: Hugh Dickins Cc: Li Zefan Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm, page_alloc: pass preferred nid instead of zonelist to allocator

2017-07-06T23:24:34+00:00

The main allocator function __alloc_pages_nodemask() takes a zonelist pointer as one of its parameters. All of its callers directly or indirectly obtain the zonelist via node_zonelist() using a preferred node id and gfp_mask. We can make the code a bit simpler by doing the zonelist lookup in __alloc_pages_nodemask(), passing it a preferred node id instead (gfp_mask is already another parameter). There are some code size benefits thanks to removal of inlined node_zonelist(): bloat-o-meter add/remove: 2/2 grow/shrink: 4/36 up/down: 399/-1351 (-952) This will also make things simpler if we proceed with converting cpusets to zonelists. Link: http://lkml.kernel.org/r/20170517081140.30654-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka Reviewed-by: Christoph Lameter Acked-by: Michal Hocko Cc: Dimitri Sivanich Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Anshuman Khandual Cc: David Rientjes Cc: Hugh Dickins Cc: Li Zefan Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: disable numa migration faults for dax vmas

2016-12-13T02:55:07+00:00

Mark dax vmas as not migratable to exclude them from task_numa_work(). This is especially relevant for device-dax which wants to ensure predictable access latency and not incur periodic faults. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/147892450132.22062.16875659431109209179.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Reported-by: Aneesh Kumar K.V Cc: Michal Hocko Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm, mempolicy: task->mempolicy must be NULL before dropping final reference

2016-09-02T00:52:01+00:00

KASAN allocates memory from the page allocator as part of kmem_cache_free(), and that can reference current->mempolicy through any number of allocation functions. It needs to be NULL'd out before the final reference is dropped to prevent a use-after-free bug: BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr ffff88010b48102c CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140 ... Call Trace: dump_stack kasan_object_err kasan_report_error __asan_report_load2_noabort alloc_pages_current <-- use after free depot_save_stack save_stack kasan_slab_free kmem_cache_free __mpol_put <-- free do_exit This patch sets current->mempolicy to NULL before dropping the final reference. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1608301442180.63329@chino.kir.corp.google.com Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: David Rientjes Reported-by: Vegard Nossum Acked-by: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: [4.6+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

tmpfs: preliminary minor tidyups

2016-05-20T02:12:14+00:00

Make a few cleanups in mm/shmem.c, before going on to complicate it. shmem_alloc_page() will become more complicated: we can't afford to to have that complication duplicated between a CONFIG_NUMA version and a !CONFIG_NUMA version, so rearrange the #ifdef'ery there to yield a single shmem_swapin() and a single shmem_alloc_page(). Yes, it's a shame to inflict the horrid pseudo-vma on non-NUMA configurations, but eliminating it is a larger cleanup: I have an alloc_pages_mpol() patchset not yet ready - mpol handling is subtle and bug-prone, and changed yet again since my last version. Move __SetPageLocked, __SetPageSwapBacked from shmem_getpage_gfp() to shmem_alloc_page(): that SwapBacked flag will be useful in future, to help to distinguish different cases appropriately. And the SGP_DIRTY variant of SGP_CACHE is hard to understand and of little use (IIRC it dates back to when shmem_getpage() returned the page unlocked): kill it and do the necessary in shmem_file_read_iter(). But an arm64 build then complained that info may be uninitialized (where shmem_getpage_gfp() deletes a freshly alloced page beyond eof), and advancing to an "sgp <= SGP_CACHE" test jogged it back to reality. Signed-off-by: Hugh Dickins Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Andres Lagar-Cavilla Cc: Yang Shi Cc: Ning Qu Cc: Mel Gorman Cc: Konstantin Khlebnikov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/mempolicy.c: vma_migratable() can return bool

2016-05-20T02:12:14+00:00

Make vma_migratable() return bool due to this particular function only using either one or zero as its return value. Signed-off-by: Yaowei Bai Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/mempolicy.c: convert the shared_policy lock to a rwlock

2016-01-15T00:00:49+00:00

When running the SPECint_rate gcc on some very large boxes it was noticed that the system was spending lots of time in mpol_shared_policy_lookup(). The gamess benchmark can also show it and is what I mostly used to chase down the issue since the setup for that I found to be easier. To be clear the binaries were on tmpfs because of disk I/O requirements. We then used text replication to avoid icache misses and having all the copies banging on the memory where the instruction code resides. This results in us hitting a bottleneck in mpol_shared_policy_lookup() since lookup is serialised by the shared_policy lock. I have only reproduced this on very large (3k+ cores) boxes. The problem starts showing up at just a few hundred ranks getting worse until it threatens to livelock once it gets large enough. For example on the gamess benchmark at 128 ranks this area consumes only ~1% of time, at 512 ranks it consumes nearly 13%, and at 2k ranks it is over 90%. To alleviate the contention in this area I converted the spinlock to an rwlock. This allows a large number of lookups to happen simultaneously. The results were quite good reducing this consumtion at max ranks to around 2%. [akpm@linux-foundation.org: tidy up code comments] Signed-off-by: Nathan Zimmer Acked-by: David Rientjes Acked-by: Vlastimil Babka Cc: Nadia Yvette Chambers Cc: Naoya Horiguchi Cc: Mel Gorman Cc: "Aneesh Kumar K.V" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mempolicy: unexport get_vma_policy() and remove its "task" arg

2014-10-10T02:25:56+00:00

- get_vma_policy(task) is not safe if task != current, remove this argument. - get_vma_policy() no longer has callers outside of mempolicy.c, make it static. Signed-off-by: Oleg Nesterov Cc: KAMEZAWA Hiroyuki Cc: David Rientjes Cc: KOSAKI Motohiro Cc: Alexander Viro Cc: Cyrill Gorcunov Cc: "Eric W. Biederman" Cc: "Kirill A. Shutemov" Cc: Peter Zijlstra Cc: Hugh Dickins Cc: Andi Kleen Cc: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mempolicy: introduce __get_vma_policy(), export get_task_policy()

2014-10-10T02:25:56+00:00

Extract the code which looks for vma's policy from get_vma_policy() into the new helper, __get_vma_policy(). Export get_task_policy(). Signed-off-by: Oleg Nesterov Cc: KAMEZAWA Hiroyuki Cc: David Rientjes Cc: KOSAKI Motohiro Cc: Alexander Viro Cc: Cyrill Gorcunov Cc: "Eric W. Biederman" Cc: "Kirill A. Shutemov" Cc: Peter Zijlstra Cc: Hugh Dickins Cc: Andi Kleen Cc: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mempolicy: remove the "task" arg of vma_policy_mof() and simplify it

2014-10-10T02:25:56+00:00

1. vma_policy_mof(task) is simply not safe unless task == current, it can race with do_exit()->mpol_put(). Remove this arg and update its single caller. 2. vma can not be NULL, remove this check and simplify the code. Signed-off-by: Oleg Nesterov Cc: KAMEZAWA Hiroyuki Cc: David Rientjes Cc: KOSAKI Motohiro Cc: Alexander Viro Cc: Cyrill Gorcunov Cc: "Eric W. Biederman" Cc: "Kirill A. Shutemov" Cc: Peter Zijlstra Cc: Hugh Dickins Cc: Andi Kleen Cc: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds