summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* sysctl.c: remove unused variableSukanto Ghosh2009-06-181-3/+1
| | | | | | | | | | | | | | Remoce the unused variable 'val' from __do_proc_dointvec() The integer has been declared and used as 'val = -val' and there is no reference to it anywhere. Signed-off-by: Sukanto Ghosh <sukanto.cse.iitb@gmail.com> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Sukanto Ghosh <sukanto.cse.iitb@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ppdev: reduce kernel log spamMichael Buesch2009-06-181-18/+11
| | | | | | | | | | | | | | | One of my programs frequently grabs the parport, does something with it and then drops it again. This results in spamming of the kernel log with "... registered pardevice" "... unregistered pardevice" These messages are completely useless, except for debugging ppdev, probably. So put them under DEBUG (or dynamic debug). Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Char: isicom: fix build warningJiri Slaby2009-06-181-1/+1
| | | | | | | | | | | | | Fix this: isicom.c: In function `isicom_probe': isicom.c:1587: warning: `signature' may be used uninitialized in this function by uninitialized_var(), because if the signature is not initialized in reset_card(), we won't use it. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/char/mem.c: memory_open() cleanup: lookup minor device number from ↵Adriano dos Santos Fernandes2009-06-181-70/+45
| | | | | | | | | | | | | | | devlist memory_open() ignores devlist and does a switch for each item, duplicating code and conditional definitions. Clean it up by adding backing_dev_info to devlist and use it to lookup for the minor device. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Adriano dos Santos Fernandes <adrianosf@uol.com.br> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc: use __ARCH_WANT_IPC_PARSE_VERSION in ipc/util.hArnd Bergmann2009-06-181-1/+1
| | | | | | | | | | | | | The definition of ipc_parse_version depends on __ARCH_WANT_IPC_PARSE_VERSION, but the header file declares it conditionally based on the architecture. Use the macro consistently to make it easier to add new architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kthreads: simplify migration_thread() exit pathOleg Nesterov2009-06-181-10/+4
| | | | | | | | | | | | | | | | | | | | | Now that kthread_stop() can be used even if the task has already exited, we can kill the "wait_to_die:" loop in migration_thread(). But we must pin rq->migration_thread after creation. Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for ->migration_thread exit. Perhaps we can simplify this code a bit more. migration_call() can set ->should_stop and forget about this thread. But we need a new helper in kthred.c for that. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kthreads: rework kthread_stop()Oleg Nesterov2009-06-181-41/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on Eric's patch which in turn was based on my patch. kthread_stop() has the nasty problems: - it runs unpredictably long with the global semaphore held. - it deadlocks if kthread itself does kthread_stop() before it obeys the kthread_should_stop() request. - it is not useable if kthread exits on its own, see for example the ugly "wait_to_die:" hack in migration_thread() - it is not possible to just tell kthread it should stop, we must always wait for its exit. With this patch kthread() allocates all neccesary data (struct kthread) on its own stack, globals kthread_stop_xxx are deleted. ->vfork_done is used as a pointer into "struct kthread", this means kthread_stop() can easily wait for kthread's exit. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kthreads: simplify the startup synchronizationOleg Nesterov2009-06-181-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | We use two completions two create the kernel thread, this is a bit ugly. kthread() wakes up create_kthread() via ->started, then create_kthread() wakes up the caller kthread_create() via ->done. But kthread() does not need to wait for kthread(), it can just return. Instead kthread() itself can wake up the caller of kthread_create(). Kill kthread_create_info->started, ->done is enough. This improves the scalability a bit and sijmplifies the code. The only problem if kernel_thread() fails, in that case create_kthread() must do complete(&create->done). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: exit.c reorder wait_opts to remove padding on 64 bit buildsRichard Kennedy2009-06-181-1/+1
| | | | | | | | | | | | Reorder struct wait_opts to remove 8 bytes of alignment padding on 64 bit builds. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_wait: fix the theoretical race with stop/trace/contOleg Nesterov2009-06-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | do_wait: current->state = TASK_INTERRUPTIBLE; read_lock(&tasklist_lock); ... search for the task to reap ... In theory, the ->state changing can leak into the critical section. Since the child can change its status under read_lock(tasklist) in parallel (finish_stop/ptrace_stop), we can miss the wakeup if __wake_up_parent() sees us in TASK_RUNNING state. Add the barrier. Also, use __set_current_state() to set TASK_RUNNING. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_wait: kill the old BUG_ON, use while_each_thread()Oleg Nesterov2009-06-181-3/+1
| | | | | | | | | | | | | | do_wait() does BUG_ON(tsk->signal != current->signal), this looks like a raher obsolete check. At least, I don't think do_wait() is the best place to verify that all threads have the same ->signal. Remove it. Also, change the code to use while_each_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_wait: simplify retval/tsk_result/notask_error messOleg Nesterov2009-06-181-12/+8
| | | | | | | | | | | | | | | | | | | | Now that we don't pass &retval down to other helpers we can simplify the code more. - kill tsk_result, just use retval - add the "notask" label right after the main loop, and s/got end/goto notask/ after the fastpath pid check. This way we don't need to initialize retval before this check and the code becomes a bit more clean, if this pid has no attached tasks we should just skip the list search. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* introduce "struct wait_opts" to simplify do_wait() patchesOleg Nesterov2009-06-181-97/+110
| | | | | | | | | | | | | | | | | Introduce "struct wait_opts" which holds the parameters for misc helpers in do_wait() pathes. This adds 13 lines to kernel/exit.c, but saves 256 bytes from .o and imho makes the code much more readable. This patch temporary uglifies rusage/siginfo code a little bit, will be addressed by further cleanups. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* shift "ptrace implies WUNTRACED" from ptrace_do_wait() to wait_task_stopped()Oleg Nesterov2009-06-181-6/+4
| | | | | | | | | | | | | | | | | No functional changes, preparation for the next patch. ptrace_do_wait() adds WUNTRACED to options for wait_task_stopped() which should always accept the stopped tracee, even if do_wait() was called without WUNTRACED. Change wait_task_stopped() to check "ptrace || WUNTRACED" instead. This makes the code more explicit, and "int options" argument becomes const in do_wait() pathes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* elf_core_dump: use rcu_read_lock() to access ->real_parentOleg Nesterov2009-06-182-4/+12
| | | | | | | | | | | In theory it is not safe to dereference ->parent/real_parent without tasklist or rcu lock, we can race with re-parenting. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* copy_process(): remove the unneeded clear_tsk_thread_flag(TIF_SIGPENDING)Oleg Nesterov2009-06-181-1/+0
| | | | | | | | | | | The forked child can have TIF_SIGPENDING if it was copied from parent's ti->flags. But this is harmless and actually almost never happens, because copy_process() can't succeed if signal_pending() == T. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* wait_task_zombie: do not use thread_group_cputime()Oleg Nesterov2009-06-181-10/+6
| | | | | | | | | | | | | | | | There is no reason for thread_group_cputime() in wait_task_zombie(), there must be no other threads. This call was previously needed to collect the per-cpu data which we do not have any longer. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Roland McGrath <roland@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: don't take tasklist to get/set ->last_siginfoOleg Nesterov2009-06-181-10/+6
| | | | | | | | | | | Change ptrace_getsiginfo/ptrace_setsiginfo to use lock_task_sighand() without tasklist_lock. Perhaps it makes sense to make a single helper with "bool rw" argument. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: do_notify_parent_cldstop: fix the wrong ->nsproxy usageOleg Nesterov2009-06-181-1/+1
| | | | | | | | | | | | | | | | If the non-traced sub-thread calls do_notify_parent_cldstop(), we send the notification to group_leader->real_parent and we report group_leader's pid. But, if group_leader is traced we use the wrong ->parent->nsproxy->pid_ns, the tracer and parent can live in different namespaces. Change the code to use "parent" instead of tsk->parent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: wait_task_zombie: s/->parent/->real_parent/Oleg Nesterov2009-06-181-4/+4
| | | | | | | | | | | | | Change wait_task_zombie() to use ->real_parent instead of ->parent. We could even use current afaics, but ->real_parent is more clean. We know that the child is not ptrace_reparented() and thus they are equal. But we should avoid using task_struct->parent, we are going to remove it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace_get_task_struct: s/tasklist/rcu/, make it staticOleg Nesterov2009-06-182-14/+3
| | | | | | | | | | | | | | | | - Use rcu_read_lock() instead of tasklist_lock to find/get the task in ptrace_get_task_struct(). - Make it static, it has no callers outside of ptrace.c. - The comment doesn't match the reality, this helper does not do any checks. Beacuse it is really trivial and static I removed the whole comment. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: do not use task_lock() for attachOleg Nesterov2009-06-181-46/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Remove the "Nasty, nasty" lock dance in ptrace_attach()/ptrace_traceme() - from now task_lock() has nothing to do with ptrace at all. With the recent changes nobody uses task_lock() to serialize with ptrace, but in fact it was never needed and it was never used consistently. However ptrace_attach() calls __ptrace_may_access() and needs task_lock() to pin task->mm for get_dumpable(). But we can call __ptrace_may_access() before we take tasklist_lock, ->cred_exec_mutex protects us against do_execve() path which can change creds and MMF_DUMP* flags. (ugly, but we can't use ptrace_may_access() because it hides the error code, so we have to take task_lock() and use __ptrace_may_access()). NOTE: this change assumes that LSM hooks, security_ptrace_may_access() and security_ptrace_traceme(), can be called without task_lock() held. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: cleanup check/set of PT_PTRACED during attachOleg Nesterov2009-06-181-50/+51
| | | | | | | | | | | | | | | | | | ptrace_attach() and ptrace_traceme() are the last functions which look as if the untraced task can have task->ptrace != 0, this must not be possible. Change the code to just check ->ptrace != 0 and s/|=/=/ to set PT_PTRACED. Also, a couple of trivial whitespace cleanups in ptrace_attach(). And move ptrace_traceme() up near ptrace_attach() to keep them close to each other. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: ptrace_attach: check PF_KTHREAD + exit_state instead of ->mmOleg Nesterov2009-06-181-9/+10
| | | | | | | | | | | | | | | | | | | | | | | - Add PF_KTHREAD check to prevent attaching to the kernel thread with a borrowed ->mm. With or without this change we can race with daemonize() which can set PF_KTHREAD or clear ->mm after ptrace_attach() does the check, but this doesn't matter because reparent_to_kthreadd() does ptrace_unlink(). - Kill "!task->mm" check. We don't really care about ->mm != NULL, and the task can call exit_mm() right after we drop task_lock(). What we need is to make sure we can't attach after exit_notify(), check task->exit_state != 0 instead. Also, move the "already traced" check down for cosmetic reasons. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: do not use task->ptrace directly in core kernelOleg Nesterov2009-06-182-8/+8
| | | | | | | | | | | | | | | No functional changes. - Nobody except ptrace.c & co should use ptrace flags directly, we have task_ptrace() for that. - No need to specially check PT_PTRACED, we must not have other PT_ bits set without PT_PTRACED. And no need to know this flag exists. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: tracehook_unsafe_exec(): remove the stale commentOleg Nesterov2009-06-181-1/+1
| | | | | | | | | tracehook_unsafe_exec() doesn't need task_lock(), remove the old comment. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: mm_need_new_owner: use ->real_parent to search in the siblingsOleg Nesterov2009-06-181-1/+1
| | | | | | | | | | | "Search in the siblings" should use ->real_parent, not ->parent. If the task is traced then ->parent == tracer, while the task's parent is always ->real_parent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: remove PT_DTRACE from arch/m32rOleg Nesterov2009-06-182-9/+0
| | | | | | | | | | | m32r: PTRACE_SINGLESTEP sets PT_DTRACE, it is never used except cleared after do_execve(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Hirokazu Takata <takata@linux-m32r.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: remove PT_DTRACE from m68k, m68knommuOleg Nesterov2009-06-183-3/+0
| | | | | | | | | | | | m68k sets PT_DTRACE in trap_c() but never uses it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Cc: Roman Zippel <zippel@linux-m68k.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: remove PT_DTRACE from avr32, mn10300, parisc, s390, sh, xtensaOleg Nesterov2009-06-1810-41/+0
| | | | | | | | | | | | | | | | | | | | avr32, mn10300, parisc, s390, sh, xtensa: They never set PT_DTRACE, but clear it after do_execve(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <matthew@wil.cx> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Chris Zankel <chris@zankel.net> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ptrace: remove PT_DTRACE from arch/h8300Oleg Nesterov2009-06-181-1/+0
| | | | | | | | | | | | h8300 defines PT_DTRACE for asm but never uses it. DEFINE(PT_PTRACED, PT_PTRACED) seems to be unused too. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* allow_signal: kill the bogus ->mm check, add a note about CLONE_SIGHANDOleg Nesterov2009-06-181-10/+9
| | | | | | | | | | | | | | | allow_signal() checks ->mm == NULL. Not sure why. Perhaps to make sure current is the kernel thread. But this helper must not be used unless we are the kernel thread, kill this check. Also, document the fact that the CLONE_SIGHAND kthread must not use allow_signal(), unless the caller really wants to change the parent's ->sighand->action as well. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: fix lru rotation in isolate_pagesKAMEZAWA Hiroyuki2009-06-182-2/+15
| | | | | | | | | | | | | | | | | | | Try to fix memcg's lru rotation sanity: make memcg use the same logic as the global LRU does. Now, at __isolate_lru_page() retruns -EBUSY, the page is rotated to the tail of LRU in global LRU's isolate LRU pages. But in memcg, it's not handled. This makes memcg do the same behavior as global LRU and rotate LRU in the page is busy. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: add interface to reset limitsDaisuke Nishimura2009-06-183-1/+14
| | | | | | | | | | | | | | | | We don't have an interface to reset mem.limit or memsw.limit now. This patch allows to reset mem.limit or memsw.limit when they are being set to -1. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: fix behavior under memory.limit equals to memsw.limitKAMEZAWA Hiroyuki2009-06-182-6/+30
| | | | | | | | | | | | | | | | | | | | | | | | A user can set memcg.limit_in_bytes == memcg.memsw.limit_in_bytes when the user just want to limit the total size of applications, in other words, not very interested in memory usage itself. In this case, swap-out will be done only by global-LRU. But, under current implementation, memory.limit_in_bytes is checked at first and try_to_free_page() may do swap-out. But, that swap-out is useless for memsw.limit_in_bytes and the thread may hit limit again. This patch tries to fix the current behavior at memory.limit == memsw.limit case. And documentation is updated to explain the behavior of this special case. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: fix swap accountingKAMEZAWA Hiroyuki2009-06-183-11/+27
| | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes mis-accounting of swap usage in memcg. In the current implementation, memcg's swap account is uncharged only when swap is completely freed. But there are several cases where swap cannot be freed cleanly. For handling that, this patch changes that memcg uncharges swap account when swap has no references other than cache. By this, memcg's swap entry accounting can be fully synchronous with the application's behavior. This patch also changes memcg's hooks for swap-out. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: Balbir Singh <balbir@in.ibm.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: remove unneeded forward declaration from sched.hLi Zefan2009-06-181-1/+0
| | | | | | | | | | This forward declaration seems pointless. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: remove some redundant checksLi Zefan2009-06-182-15/+4
| | | | | | | | | | | We don't need to check do_swap_account in the case that the function which checks do_swap_account will never get called if do_swap_account == 0. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: remove mem_cgroup_cache_charge_swapin()Daisuke Nishimura2009-06-181-6/+0
| | | | | | | | | | | mem_cgroup_cache_charge_swapin() isn't used any more, so remove no-op definition of it in header file. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: add file-based RSS accountingBalbir Singh2009-06-184-3/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add file RSS tracking per memory cgroup We currently don't track file RSS, the RSS we report is actually anon RSS. All the file mapped pages, come in through the page cache and get accounted there. This patch adds support for accounting file RSS pages. It should 1. Help improve the metrics reported by the memory resource controller 2. Will form the basis for a future shared memory accounting heuristic that has been proposed by Kamezawa. Unfortunately, we cannot rename the existing "rss" keyword used in memory.stat to "anon_rss". We however, add "mapped_file" data and hope to educate the end user through documentation. [hugh.dickins@tiscali.co.uk: fix mem_cgroup_update_mapped_file_stat oops] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.cn> Cc: Paul Menage <menage@google.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* devcgroup: skip superfluous checks when found the DEV_ALL elemLi Zefan2009-06-181-4/+6
| | | | | | | | | | While walking through the whitelist, if the DEV_ALL item is found, no more check is needed. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroups: forbid noprefix if mounting more than just cpuset subsystemLi Zefan2009-06-181-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | The 'noprefix' option was introduced for backwards-compatibility of cpuset, but actually it can be used when mounting other subsystems. This results in possibility of name collision, and now the collision can really happen, because we have 'stat' file in both memory and cpuacct subsystem: # mount -t cgroup -o noprefix,memory,cpuacct xxx /mnt Cgroup will happily mount the 2 subsystems, but only 'stat' file of memory subsys can be seen. We don't want users to use nopreifx, and also want to avoid name collision, so we change to allow noprefix only if mounting just the cpuset subsystem. [akpm@linux-foundation.org: fix shift for cpuset_subsys_id >= 32] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroups: make messages more readableRandy Dunlap2009-06-182-7/+8
| | | | | | | | | | | Fix some cgroup messages to read better. Update MAINTAINERS to include mm/*cgroup* files. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Documentation/connector/cn_test.c comment unused cn_test_want_notify()Jaswinder Singh Rajput2009-06-181-0/+7
| | | | | | | | | | | Currently cn_test_want_notify() has no user. So add an ifdef and a comment which tells us to not remove it. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Documentation/Changes: perl is needed to build the kernelJose Luis Perez Diez2009-06-181-0/+7
| | | | | | | | | | | | Perl is used on the kernel Makefile to generate documentation, firmwares in c source form, sources, graphs, and some headers and this fact is undocumented. [akpm@linux-foundation.org: 80-columns, please] Signed-off-by: Jose Luis Perez Diez <jluis@escomposlinux.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* reiserfs: fix warnings with gcc 4.4Jeff Mahoney2009-06-183-15/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | Several code paths in reiserfs have a construct like: if (is_direntry_le_ih(ih = B_N_PITEM_HEAD(src, item_num))) ... which, in addition to being ugly, end up causing compiler warnings with gcc 4.4.0. Previous compilers didn't issue a warning. fs/reiserfs/do_balan.c:1273: warning: operation on `aux_ih' may be undefined fs/reiserfs/lbalance.c:393: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:421: warning: operation on `ih' may be undefined fs/reiserfs/lbalance.c:777: warning: operation on `ih' may be undefined I believe this is due to the ih being passed to macros which evaluate the argument more than once. This is old code and we haven't seen any problems with it, but this patch eliminates the warnings. It converts the multiple evaluation macros to static inlines and does a preassignment for the cases that were causing the warnings because that code is just ugly. Reported-by: Chris Mason <mason@oracle.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ufs: sector_t cannot be negativeRoel Kluin2009-06-181-9/+1
| | | | | | | | | unsigned i_block,fragment cannot be negative. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* isofs: cleanup mount option processingJan Kara2009-06-184-45/+40
| | | | | | | | | | Remove unused variables from isofs_sb_info (used to be some mount options), unify variables for option to use 0/1 (some options used 'y'/'n'), use bit fields for option flags in superblock. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* isofs: fix setting of uid and gid to 0Jan Kara2009-06-182-14/+14
| | | | | | | | | | | | isofs allows setting of default uid and gid of files but value 0 was used to indicate that user did not specify any uid/gid mount option. Since this option also overrides uid/gid set in Rock Ridge extension, it makes sense to allow forcing uid/gid 0. Fix option processing to allow this. Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* isofs: let mode and dmode mount options override rock ridge mode settingJan Kara2009-06-183-14/+48
| | | | | | | | | | | | | | So far, permissions set via 'mode' and/or 'dmode' mount options were effective only if the medium had no rock ridge extensions (or was mounted without them). Add 'overriderockmode' mount option to indicate that these options should override permissions set in rock ridge extensions. Maybe this should be default but the current behavior is there since mount options were created so I think we should not change how they behave. Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud