| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Because of false positives, the memory scanning thread may print too
much information. This patch changes the scanning thread to only print
the number of newly suspected leaks. Further information can be read
from the /sys/kernel/debug/kmemleak file.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is to reduce the number of false positives reported.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
(feature suggested by Sergey Senozhatsky)
Kmemleak needs to track all the memory allocations but some of these
happen before kmemleak is initialised. These are stored in an internal
buffer which may be exceeded in some kernel configurations. This patch
adds a configuration option with a default value of 400 and also removes
the stack dump when the early log buffer is exceeded.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Nathan reported that
| commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
| Author: Yinghai Lu <yinghai@kernel.org>
| Date: Tue Jun 16 15:33:00 2009 -0700
|
| page-allocator: clear N_HIGH_MEMORY map before we set it again
|
| SRAT tables may contains nodes of very small size. The arch code may
| decide to not activate such a node. However, currently the early boot
| code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be
| active although these nodes have no present pages.
|
| For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
an i386 kvm guest, meaning that cpuset.mems can not be used.
Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn
Reported-by: Nathan Lynch <ntl@pobox.com>
Tested-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
balance_dirty_pages can overreact and move all of the dirty pages to
writeback unnecessarily.
balance_dirty_pages makes its decision to throttle based on the number of
dirty plus writeback pages that are over the calculated limit,so it will
continue to move pages even when there are plenty of pages in writeback
and less than the threshold still dirty.
This allows it to overshoot its limits and move all the dirty pages to
writeback while waiting for the drives to catch up and empty the writeback
list.
A simple fio test easily demonstrates this problem.
fio --name=f1 --directory=/disk1 --size=2G -rw=write --name=f2 --directory=/disk2 --size=1G --rw=write --startdelay=10
This is the simplest fix I could find, but I'm not entirely sure that it
alone will be enough for all cases. But it certainly is an improvement on
my desktop machine writing to 2 disks.
Do we need something more for machines with large arrays where
bdi_threshold * number_of_drives is greater than the dirty_ratio ?
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
show_pools() walks the page_list of a pool w/o protection against the list
modifications in alloc/free. Take pool->lock to avoid stomping into
nirvana.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, delay: tsc based udelay should have rdtsc_barrier
x86, setup: correct include file in <asm/boot.h>
x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h>
x86, mce: percpu mcheck_timer should be pinned
x86: Add sysctl to allow panic on IOCK NMI error
x86: Fix uv bau sending buffer initialization
x86, mce: Fix mce resume on 32bit
x86: Move init_gbpages() to setup_arch()
x86: ensure percpu lpage doesn't consume too much vmalloc space
x86: implement percpu_alloc kernel parameter
x86: fix pageattr handling for lpage percpu allocator and re-enable it
x86: reorganize cpa_process_alias()
x86: prepare setup_pcpu_lpage() for pageattr fix
x86: rename remap percpu first chunk allocator to lpage
x86: fix duplicate free in setup_pcpu_remap() failure path
percpu: fix too lazy vunmap cache flushing
x86: Set cpu_llc_id on AMD CPUs
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
According to Andi, it isn't clear whether lpage allocator is worth the
trouble as there are many processors where PMD TLB is far scarcer than
PTE TLB. The advantage or disadvantage probably depends on the actual
size of percpu area and specific processor. As performance
degradation due to TLB pressure tends to be highly workload specific
and subtle, it is difficult to decide which way to go without more
data.
This patch implements percpu_alloc kernel parameter to allow selecting
which first chunk allocator to use to ease debugging and testing.
While at it, make sure all the failure paths report why something
failed to help determining why certain allocator isn't working. Also,
kill the "Great future plan" comment which had already been realized
quite some time ago.
[ Impact: allow explicit percpu first chunk allocator selection ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In pcpu_unmap(), flushing virtual cache on vunmap can't be delayed as
the page is going to be returned to the page allocator. Only TLB
flushing can be put off such that vmalloc code can handle it lazily.
Fix it.
[ Impact: fix subtle virtual cache flush bug ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
|
| |/ /
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently the 4th parameter of get_user_pages() is called len, but its
in pages, not bytes. Rename the thing to nr_pages to avoid future
confusion.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| \ \ | |
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/{vfs-2.6,audit-current}
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
another race fix in jfs_check_acl()
Get "no acls for this inode" right, fix shmem breakage
inline functions left without protection of ifdef (acl)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL
|
| |/ / /
| | | |
| | | |
| | | | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
SLUB uses higher order allocations by default but falls back to small
orders under memory pressure. Make sure the GFP mask used in the initial
allocation doesn't include __GFP_NOFAIL.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Traditionally, we never failed small orders (even regardless of any
__GFP_NOFAIL flags), and slab will allocate order-1 allocations even for
small allocations that could fit in a single page (in order to avoid
excessive fragmentation).
Maybe we should remove this warning entirely, but before making that
judgement, at least limit it to bigger allocations.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |/
|/|
| |
| | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After downing/upping a cpu, an attempt to set
/proc/sys/vm/percpu_pagelist_fraction results in an oops in
percpu_pagelist_fraction_sysctl_handler().
If a processor is downed then we need to set the pageset pointer back to
the boot pageset.
Updates of the high water marks should not access pagesets of unpopulated
zones (those pointer go to the boot pagesets which would be no longer
functional if their size would be increased beyond zero).
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If a kthread happens to use get_user_pages() on an mm (as KSM does),
there's a chance that it will end up trying to read in a swap page, then
oops in grab_swap_token() because the kthread has no mm: GUP passes down
the right mm, so grab_swap_token() ought to be using it.
We have not identified a stronger case than KSM's daemon (not yet in
mainline), but the issue must have come up before, since RHEL has included
a fix for this for years (though a different fix, they just back out of
grab_swap_token if current->mm is unset: which is what we first proposed,
but using the right mm here seems more correct).
Reported-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | | |
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Do not force the slab debugging Kconfig options
kmemleak: use pr_fmt
|
| |/
| |
| |
| |
| | |
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Indeed FOLL_WRITE matches FAULT_FLAG_WRITE, matches GUP_FLAGS_WRITE,
and it's tempting to devise a set of Grand Unified Paging flags;
but not today. So until then, let's rely upon the compiler to spot
the coincidence, "rather than have that subtle dependency and a
comment for it" - as you remarked in another context yesterday.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
handle_mm_fault() is now passing fault flags rather than write_access
down to hugetlb_fault(), so better recognize that in hugetlb_fault(),
and in hugetlb_no_page().
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
The isolated page is "cursor_page" not "page".
This could cause LRU list corruption under memory pressure, caught by
CONFIG_DEBUG_LIST.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault(). All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fault handling routines really want more fine-grained flags than a
single "was it a write fault" boolean - the callers will want to set
flags like "you can return a retry error" etc.
And that's actually how the VM works internally, but right now the
top-level fault handling functions in mm/memory.c all pass just the
'write_access' boolean around.
This switches them over to pass around the FAULT_FLAG_xyzzy 'flags'
variable instead. The 'write_access' calling convention still exists
for the exported 'handle_mm_fault()' function, but that is next.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
da456f1 "page allocator: do not disable interrupts in free_page_mlock()" moved
the PG_mlocked clearing after the flag sanity checking which makes mlocked
pages always trigger 'bad page'. Fix this by clearing the bit up front.
Reported--and-debugged-by: Peter Chubb <peter.chubb@nicta.com.au>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
The page allocator also needs the masking of gfp flags during boot,
so this moves it out of slab/slub and uses it with the page allocator
as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Try to fix memcg's lru rotation sanity: make memcg use the same logic as
the global LRU does.
Now, at __isolate_lru_page() retruns -EBUSY, the page is rotated to the
tail of LRU in global LRU's isolate LRU pages. But in memcg, it's not
handled. This makes memcg do the same behavior as global LRU and rotate
LRU in the page is busy.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A user can set memcg.limit_in_bytes == memcg.memsw.limit_in_bytes when the
user just want to limit the total size of applications, in other words,
not very interested in memory usage itself. In this case, swap-out will
be done only by global-LRU.
But, under current implementation, memory.limit_in_bytes is checked at
first and try_to_free_page() may do swap-out. But, that swap-out is
useless for memsw.limit_in_bytes and the thread may hit limit again.
This patch tries to fix the current behavior at memory.limit ==
memsw.limit case. And documentation is updated to explain the behavior of
this special case.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes mis-accounting of swap usage in memcg.
In the current implementation, memcg's swap account is uncharged only when
swap is completely freed. But there are several cases where swap cannot
be freed cleanly. For handling that, this patch changes that memcg
uncharges swap account when swap has no references other than cache.
By this, memcg's swap entry accounting can be fully synchronous with the
application's behavior.
This patch also changes memcg's hooks for swap-out.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
We don't need to check do_swap_account in the case that the function which
checks do_swap_account will never get called if do_swap_account == 0.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add file RSS tracking per memory cgroup
We currently don't track file RSS, the RSS we report is actually anon RSS.
All the file mapped pages, come in through the page cache and get
accounted there. This patch adds support for accounting file RSS pages.
It should
1. Help improve the metrics reported by the memory resource controller
2. Will form the basis for a future shared memory accounting heuristic
that has been proposed by Kamezawa.
Unfortunately, we cannot rename the existing "rss" keyword used in
memory.stat to "anon_rss". We however, add "mapped_file" data and hope to
educate the end user through documentation.
[hugh.dickins@tiscali.co.uk: fix mem_cgroup_update_mapped_file_stat oops]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.cn>
Cc: Paul Menage <menage@google.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Fix some cgroup messages to read better.
Update MAINTAINERS to include mm/*cgroup* files.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| | |
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Fix some typos in comments
kmemleak: Rename kmemleak_panic to kmemleak_stop
kmemleak: Only use GFP_KERNEL|GFP_ATOMIC for the internal allocations
|
| |
| |
| |
| | |
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| |
| |
| |
| |
| |
| |
| | |
This is to avoid the confusion created by the "panic" word.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Kmemleak allocates memory for pointer tracking and it tries to avoid
using GFP_ATOMIC if the caller doesn't require it. However other gfp
flags may be passed by the caller which aren't required by kmemleak.
This patch filters the gfp flags so that only GFP_KERNEL | GFP_ATOMIC
are used.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
|\ \
| | |
| | |
| | |
| | | |
Conflicts:
mm/slub.c
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Recent change to use slab allocations earlier exposed a bug where
SLUB can call schedule_work and try to call sysfs before it is
safe to do so.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
| | | | |
| \ \ | |
| \ \ | |
| \ \ | |
| \ \ | |
| \ \ | |
|\ \ \ \ \ \
| |_|_|_|_|/
|/| | | | |
| | | | | | |
'slub/fixes' into for-linus
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We must check for __GFP_NOFAIL like the page allocator does; otherwise we end
up with false positives. While at it, add the printk_ratelimit() check in SLUB
as well.
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fix this build error when CONFIG_SLUB_DEBUG is not set:
mm/slub.c: In function 'slab_out_of_memory':
mm/slub.c:1551: error: 'struct kmem_cache_node' has no member named 'nr_slabs'
mm/slub.c:1552: error: 'struct kmem_cache_node' has no member named 'total_objects'
[ penberg@cs.helsinki.fi: cleanups ]
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
As suggested by Mel Gorman, add out-of-memory diagnostics to the SLUB allocator
to make debugging OOM conditions easier. This patch helped hunt down a nasty
OOM issue that popped up every now that was caused by SLUB debugging code which
forced 4096 byte allocations to use order 1 pages even in the fallback case.
An example print out looks like this:
<snip page allocator out-of-memory message>
SLUB: Unable to allocate memory on node -1 (gfp=20)
cache: kmalloc-4096, object size: 4096, buffer size: 4168, default order: 3, min order: 1
node 0: slabs: 95, objs: 665, free: 0
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
For the sake of consistency.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
A generic page poisoning mechanism was added with commit:
6a11f75b6a17b5d9ac5025f8d048382fd1f47377
which destructively poisons full pages with a bitpattern.
On arches where PAGE_POISONING is used, this conflicts with the slab
redzone checking enabled by DEBUG_SLAB, scribbling bits all over its
magic words and making it complain about that quite emphatically.
On x86 (and I presume at present all the other arches which set
ARCH_SUPPORTS_DEBUG_PAGEALLOC too), the kernel_map_pages() operation
is non destructive so it can coexist with the other DEBUG_SLAB
mechanisms just fine.
This patch favours the expensive full page destruction test for
cases where there is a collision and it is explicitly selected.
Signed-off-by: Ron Lee <ron@debian.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As suggested by Alan Cox, document the fact that kzfree() can zero out a great
deal more memory than the what the user requested from kmalloc().
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
* akpm: (182 commits)
fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
fbdev: *bfin*: fix __dev{init,exit} markings
fbdev: *bfin*: drop unnecessary calls to memset
fbdev: bfin-t350mcqb-fb: drop unused local variables
fbdev: blackfin has __raw I/O accessors, so use them in fb.h
fbdev: s1d13xxxfb: add accelerated bitblt functions
tcx: use standard fields for framebuffer physical address and length
fbdev: add support for handoff from firmware to hw framebuffers
intelfb: fix a bug when changing video timing
fbdev: use framebuffer_release() for freeing fb_info structures
radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
s3c-fb: CPUFREQ frequency scaling support
s3c-fb: fix resource releasing on error during probing
carminefb: fix possible access beyond end of carmine_modedb[]
acornfb: remove fb_mmap function
mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
mb862xxfb: restrict compliation of platform driver to PPC
Samsung SoC Framebuffer driver: add Alpha Channel support
atmel-lcdc: fix pixclock upper bound detection
offb: use framebuffer_alloc() to allocate fb_info struct
...
Manually fix up conflicts due to kmemcheck in mm/slab.c
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
At lumpy reclaim, a page failed to be taken by __isolate_lru_page() can be
pushed back to "src" list by list_move(). But the page may not be from
"src" list. This pushes the page back to wrong LRU. And list_move()
itself is unnecessary because the page is not on top of LRU. Then, leave
it as it is if __isolate_lru_page() fails.
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
On NUMA machines, the administrator can configure zone_reclaim_mode that
is a more targetted form of direct reclaim. On machines with large NUMA
distances for example, a zone_reclaim_mode defaults to 1 meaning that
clean unmapped pages will be reclaimed if the zone watermarks are not
being met.
There is a heuristic that determines if the scan is worthwhile but it is
possible that the heuristic will fail and the CPU gets tied up scanning
uselessly. Detecting the situation requires some guesswork and
experimentation so this patch adds a counter "zreclaim_failed" to
/proc/vmstat. If during high CPU utilisation this counter is increasing
rapidly, then the resolution to the problem may be to set
/proc/sys/vm/zone_reclaim_mode to 0.
[akpm@linux-foundation.org: name things consistently]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
On NUMA machines, the administrator can configure zone_reclaim_mode that
is a more targetted form of direct reclaim. On machines with large NUMA
distances for example, a zone_reclaim_mode defaults to 1 meaning that
clean unmapped pages will be reclaimed if the zone watermarks are not
being met. The problem is that zone_reclaim() failing at all means the
zone gets marked full.
This can cause situations where a zone is usable, but is being skipped
because it has been considered full. Take a situation where a large tmpfs
mount is occuping a large percentage of memory overall. The pages do not
get cleaned or reclaimed by zone_reclaim(), but the zone gets marked full
and the zonelist cache considers them not worth trying in the future.
This patch makes zone_reclaim() return more fine-grained information about
what occured when zone_reclaim() failued. The zone only gets marked full
if it really is unreclaimable. If it's a case that the scan did not occur
or if enough pages were not reclaimed with the limited reclaim_mode, then
the zone is simply skipped.
There is a side-effect to this patch. Currently, if zone_reclaim()
successfully reclaimed SWAP_CLUSTER_MAX, an allocation attempt would go
ahead. With this patch applied, zone watermarks are rechecked after
zone_reclaim() does some work.
This bug was introduced by commit 9276b1bc96a132f4068fdee00983c532f43d3a26
("memory page_alloc zonelist caching speedup") way back in 2.6.19 when the
zonelist_cache was introduced. It was not intended that zone_reclaim()
aggressively consider the zone to be full when it failed as full direct
reclaim can still be an option. Due to the age of the bug, it should be
considered a -stable candidate.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|