summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* f2fs: fix wrong memory footprint statistics in debugfsChao Yu2015-02-111-4/+11
| | | | | | | | Our value of memory footprint statistics showed in debugfs is not calculated correctly. Fix it in this patch. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid infinite loop on cp_errorJaegeuk Kim2015-02-111-0/+4
| | | | | | | If cp_error is set, we should avoid all the infinite loop. In f2fs_sync_file, there is a hole, and this patch fixes that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: pids_lock can be statickbuild test robot2015-01-091-1/+1
| | | | | | | fs/f2fs/trace.c:19:12: sparse: symbol 'pids_lock' was not declared. Should it be static? Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add f2fs_destroy_trace_ios to free radix treeJaegeuk Kim2015-01-093-0/+40
| | | | | | This patch removes radix tree after finishing tracing IOs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add spin_lock to cover radix operations in IO tracerJaegeuk Kim2015-01-093-3/+19
| | | | | | This patch adds spin_lock to cover radix tree operations in IO tracer. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add nat/sit entries into statusJaegeuk Kim2015-01-092-4/+6
| | | | | | This patch adds NAT/SIT entry informations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: free radix_tree_nodes used by nat_set entriesJaegeuk Kim2015-01-092-2/+20
| | | | | | | In the normal case, the radix_tree_nodes are freed successfully. But, when cp_error was detected, we should destroy them forcefully. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix wrong unlock_page callJaegeuk Kim2015-01-091-1/+0
| | | | | | This patch removes wrongly called unlock_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: get rid of kzalloc in __recover_inline_statusChao Yu2015-01-091-21/+14
| | | | | | | | | | | We use kzalloc to allocate memory in __recover_inline_status, and use this all-zero memory to check the inline date content of inode page by comparing them. This is low effective and not needed, let's check inline date content directly. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: make the code more neat] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: align direct_io'ed data to sectionJaegeuk Kim2015-01-093-11/+26
| | | | | | | | | | | | | | This patch aligns the start block address of a file for direct io to the f2fs's section size. Some flash devices manage an over 4KB-sized page as a write unit, and if the direct_io'ed data are written but not aligned to that unit, the performance can be degraded due to the partial page copies. Thus, since f2fs has a section that is well aligned to FTL units, we can align the block address to the section size so that f2fs avoids this misalignment. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove uncovered code pathJaegeuk Kim2015-01-091-13/+3
| | | | | | This patch removes unnecessary function calls. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid potential unnecessary codesJaegeuk Kim2015-01-092-3/+5
| | | | | | This patch relocates some operations to avoid unnecessary execution. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: clean up to remove parameterJaegeuk Kim2015-01-096-17/+21
| | | | | | | This patch uses dn->data_blkaddr as a parameter for the destination block address. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: reuse inode_entry_slab in gc procedure for using slab more effectivelyChao Yu2015-01-096-45/+24
| | | | | | | | | | | | | | | | | | | | | | | | | There are two slab cache inode_entry_slab and winode_slab using the same structure as below: struct dir_inode_entry { struct list_head list; /* list head */ struct inode *inode; /* vfs inode pointer */ }; struct inode_entry { struct list_head list; struct inode *inode; }; It's a little waste that the two cache can not share their memory space for each other. So in this patch we remove one redundant winode_slab slab cache, then use more universal name struct inode_entry as remaining data structure name of slab, finally we reuse the inode_entry_slab to store dirty dir item and gc item for more effective. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: cleanup parameters for ↵Chao Yu2015-01-092-23/+24
| | | | | | | | | | | trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio Cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio as one parameter. Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: cleanup trace event of f2fs_submit_page_{m,}bio with DECLARE_EVENT_CLASSChao Yu2015-01-092-66/+53
| | | | | | | | | | | | | | This patch adds missing parameter _type_ for trace_f2fs_submit_page_bio, then use DECLARE_EVENT_CLASS/DEFINE_EVENT_CONDITION pair to cleanup some trace event code related to f2fs_submit_page_{m,}bio. Additionally, after we remove redundant code, size of code can be reduced: text data bss dec hex filename 176787 8712 56 185555 2d4d3 f2fs.ko.org 174408 8648 56 183112 2cb48 f2fs.ko Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix missing cold bit during recoveryJaegeuk Kim2015-01-092-1/+11
| | | | | | | | | | In do_recover_data, we find and update previous node pages after updating its new block addresses. After then, we call fill_node_footer without reset field, we erase its cold bit so that this new cold node block is written to wrong log area. This patch fixes not to miss its old flag. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add block count by in-place-update in stat infoChangman Lee2015-01-093-1/+10
| | | | | | | | This patch adds block count by in-place-update in stat. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid double lock for cp_rwsemJaegeuk Kim2015-01-091-2/+2
| | | | | | | | | | | | | | | | | | The __f2fs_add_link is covered by cp_rwsem all the time. This calls init_inode_metadata, which conducts some acl operations including memory allocation with GFP_KERNEL previously. But, under memory pressure, f2fs_write_data_page can be called, which also grabs cp_rwsem too. In this case, this incurs a deadlock pointed by Chao. Thread #1 Thread #2 down_read down_write down_read -> here down_read should wait forever. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: activate f2fs_trace_iosJaegeuk Kim2015-01-093-0/+7
| | | | | | This patch activates f2fs_trace_ios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: activate f2fs_trace_pidJaegeuk Kim2015-01-093-0/+7
| | | | | | This patch activates f2fs_trace_pid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add key functions for f2fs_io_tracerJaegeuk Kim2015-01-092-0/+104
| | | | | | | | | | | This patch adds two key functions to trace process ids and IOs. The basic idea is to 1. remain process ids, pids, in page->private. 2. show pids in IO traces. So, later we can retrieve process information according to IO traces. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add f2fs_io_tracer supportJaegeuk Kim2015-01-094-0/+59
| | | | | | | | This patch adds: o initial trace.c and trace.h with skeleton functions o Kconfig and Makefile to activate this feature Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use f2fs_io_info to clean up messy parameters during IO pathJaegeuk Kim2015-01-096-66/+87
| | | | | | | | | This patch cleans up parameters on IO paths. The key idea is to use f2fs_io_info adding a parameter, block address, and then use this structure as parameters. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use ra_meta_pages to simplify readahead code in restore_node_summaryChao Yu2015-01-091-52/+13
| | | | | | | | | | | | | | | | Use more common function ra_meta_pages() with META_POR to readahead node blocks in restore_node_summary() instead of ra_sum_pages(), hence we can simplify the readahead code there, and also we can remove unused function ra_sum_pages(). changes from v2: o use invalidate_mapping_pages as before suggested by Changman Lee. changes from v1: o fix one bug when using truncate_inode_pages_range which is pointed out by Jaegeuk Kim. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: merge two uchar variable in struct node_info to reduce memory costChao Yu2015-01-092-13/+24
| | | | | | | | | | | | | | | | | | | This patch moves one member of struct nat_entry: _flag_ to struct node_info, so _version_ in struct node_info and _flag_ which are unsigned char type will merge to one 32-bit space in register/memory. So the size of nat_entry will be reduced from 28 bytes to 24 bytes (for 64-bit machine, reduce its size from 40 bytes to 32 bytes) and then slab memory using by f2fs will be reduced. changes from v2: o update description of memory usage gain for 64-bit machine suggested by Changman Lee. changes from v1: o introduce inline copy_node_info() to copy valid data from node info suggested by Jaegeuk Kim, it can avoid bug. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: readahead contiguous current summary blocks in checkpointChao Yu2015-01-093-5/+20
| | | | | | | | | | | Let's add readahead code for reading contiguous compact/normal summary blocks in checkpoint, then we will gain better performance in mount procedure. Changes from v1 o remove inappropriate 'unlikely' in npages_for_summary_flush. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use missing the use of f2fs_kunmap_pageJaegeuk Kim2015-01-091-2/+1
| | | | | | This patch calls f2fs_kunmap_page which I missed before. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove unnecessary call to invalidate inmemory pagesJaegeuk Kim2015-01-093-21/+0
| | | | | | | Now we use inmemory pages for atomic write only and provide abort procedure, we don't need to truncate them explicitly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix small discards not to issue redundantlyJaegeuk Kim2015-01-091-3/+5
| | | | | | | | | | | The ckpt_valid_map and cur_valid_map are synced by seg_info_to_raw_sit. In the case of small discards, the candidates are selected before sync, while fitrim selects candidates after sync. So, for small discards, we need to add candidates only just being obsoleted. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: change atomic and volatile write policiesJaegeuk Kim2015-01-097-14/+88
| | | | | | | | | | | | | | | This patch adds two new ioctls to release inmemory pages grabbed by atomic writes. o f2fs_ioc_abort_volatile_write - If transaction was failed, all the grabbed pages and data should be written. o f2fs_ioc_release_volatile_write - This is to enhance the performance of PERSIST mode in sqlite. In order to avoid huge memory consumption which causes OOM, this patch changes volatile writes to use normal dirty pages, instead blocked flushing to the disk as long as system does not suffer from memory pressure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't need to call lock_op and lock_page for abortJaegeuk Kim2015-01-091-15/+20
| | | | | | We don't need to call lock_op and lock_page at the aborting path. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix wrong condition check to trigger f2fs_sync_fsJaegeuk Kim2015-01-091-1/+1
| | | | | | If there is not enough available memory, we need to trigger f2fs_sync_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove checking dirty_exceedJaegeuk Kim2015-01-091-2/+0
| | | | | | | We don't need to force to write dirty_exceeded for f2fs_balance_fs_bg. This flag was only meaningful to write bypassing conditions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Merge branch 'akpm' (patches from Andrew)Linus Torvalds2015-01-0917-101/+155
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "12 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed memcg: fix destination cgroup leak on task charges migration mm: memcontrol: switch soft limit default back to infinity mm/debug_pagealloc: remove obsolete Kconfig options vfs: renumber FMODE_NONOTIFY and add to uniqueness check arch/blackfin/mach-bf533/boards/stamp.c: add linux/delay.h ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file MAINTAINERS: update rydberg's addresses mm: protect set_page_dirty() from ongoing truncation mm: prevent endless growth of anon_vma hierarchy exit: fix race between wait_consider_task() and wait_task_zombie() ocfs2: remove bogus check in dlm_process_recovery_data
| * mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process ↵Vlastimil Babka2015-01-081-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | being killed Charles Shirron and Paul Cassella from Cray Inc have reported kswapd stuck in a busy loop with nothing left to balance, but kswapd_try_to_sleep() failing to sleep. Their analysis found the cause to be a combination of several factors: 1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait 2. The process has been killed (by OOM in this case), but has not yet been scheduled to remove itself from the waitqueue and die. 3. kswapd checks for throttled processes in prepare_kswapd_sleep(): if (waitqueue_active(&pgdat->pfmemalloc_wait)) { wake_up(&pgdat->pfmemalloc_wait); return false; // kswapd will not go to sleep } However, for a process that was already killed, wake_up() does not remove the process from the waitqueue, since try_to_wake_up() checks its state first and returns false when the process is no longer waiting. 4. kswapd is running on the same CPU as the only CPU that the process is allowed to run on (through cpus_allowed, or possibly single-cpu system). 5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd encounters no voluntary preemption points and repeatedly fails prepare_kswapd_sleep(), blocking the process from running and removing itself from the waitqueue, which would let kswapd sleep. So, the source of the problem is that we prevent kswapd from going to sleep until there are processes waiting on the pfmemalloc_wait queue, and a process waiting on a queue is guaranteed to be removed from the queue only when it gets scheduled. This was done to make sure that no process is left sleeping on pfmemalloc_wait when kswapd itself goes to sleep. However, it isn't necessary to postpone kswapd sleep until the pfmemalloc_wait queue actually empties. To prevent processes from being left sleeping, it's actually enough to guarantee that all processes waiting on pfmemalloc_wait queue have been woken up by the time we put kswapd to sleep. This patch therefore fixes this issue by substituting 'wake_up' with 'wake_up_all' and removing 'return false' in the code snippet from prepare_kswapd_sleep() above. Note that if any process puts itself in the queue after this waitqueue_active() check, or after the wake up itself, it means that the process will also wake up kswapd - and since we are under prepare_to_wait(), the wake up won't be missed. Also we update the comment prepare_kswapd_sleep() to hopefully more clearly describe the races it is preventing. Fixes: 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [3.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * memcg: fix destination cgroup leak on task charges migrationVladimir Davydov2015-01-081-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are supposed to take one css reference per each memory page and per each swap entry accounted to a memory cgroup. However, during task charges migration we take a reference to the destination cgroup twice per each swap entry: first in mem_cgroup_do_precharge()->try_charge() and then in mem_cgroup_move_swap_account(), permanently leaking the destination cgroup. The hunk taking the second reference seems to be a leftover from the pre-00501b531c472 ("mm: memcontrol: rewrite charge API") era. Remove it to fix the leak. Fixes: e8ea14cc6ead (mm: memcontrol: take a css reference for each charged page) Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm: memcontrol: switch soft limit default back to infinityJohannes Weiner2015-01-081-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") accidentally switched the soft limit default from infinity to zero, which turns all memcgs with even a single page into soft limit excessors and engages soft limit reclaim on all of them during global memory pressure. This makes global reclaim generally more aggressive, but also inverts the meaning of existing soft limit configurations where unset soft limits are usually more generous than set ones. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm/debug_pagealloc: remove obsolete Kconfig optionsJoonsoo Kim2015-01-081-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are obsolete since commit e30825f1869a ("mm/debug-pagealloc: prepare boottime configurable") was merged. So remove them. [pebolle@tiscali.nl: find obsolete Kconfig options] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * vfs: renumber FMODE_NONOTIFY and add to uniqueness checkDavid Drysdale2015-01-083-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc. The clashing O_PATH value was added in commit 5229645bdc35 ("vfs: add nonconflicting values for O_PATH") but this can't be changed as it is user-visible. FMODE_NONOTIFY is only used internally in the kernel, but it is in the same numbering space as the other O_* flags, as indicated by the comment at the top of include/uapi/asm-generic/fcntl.h (and its use in fs/notify/fanotify/fanotify_user.c). So renumber it to avoid the clash. All of this has happened before (commit 12ed2e36c98a: "fanotify: FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will happen again -- so update the uniqueness check in fcntl_init() to include __FMODE_NONOTIFY. Signed-off-by: David Drysdale <drysdale@google.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jan Kara <jack@suse.cz> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * arch/blackfin/mach-bf533/boards/stamp.c: add linux/delay.hOleg Nesterov2015-01-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | build error arch/blackfin/mach-bf533/boards/stamp.c:834:2: error: implicit declaration of function 'mdelay' Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when ↵Xue jiufei2015-01-081-8/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | link file In ocfs2_link(), the parent directory inode passed to function ocfs2_lookup_ino_from_name() is wrong. Parameter dir is the parent of new_dentry not old_dentry. We should get old_dir from old_dentry and lookup old_dentry in old_dir in case another node remove the old dentry. With this change, hard linking works again, when paths are relative with at least one subdirectory. This is how the problem was reproducable: # mkdir a # mkdir b # touch a/test # ln a/test b/test ln: failed to create hard link `b/test' => `a/test': No such file or directory However when creating links in the same dir, it worked well. Now the link gets created. Fixes: 0e048316ff57 ("ocfs2: check existence of old dentry in ocfs2_link()") Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reported-by: Szabo Aron - UBIT <aron@ubit.hu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Tested-by: Aron Szabo <aron@ubit.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * MAINTAINERS: update rydberg's addressesHenrik Rydberg2015-01-082-6/+7
| | | | | | | | | | | | | | | | | | | | My ISP finally gave up on the old mail address, so I am moving things over to bitmath.org instead. Also change the status fields to better reflect reality. Signed-off-by: Henrik Rydberg <rydberg@bitmath.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm: protect set_page_dirty() from ongoing truncationJohannes Weiner2015-01-083-42/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tejun, while reviewing the code, spotted the following race condition between the dirtying and truncation of a page: __set_page_dirty_nobuffers() __delete_from_page_cache() if (TestSetPageDirty(page)) page->mapping = NULL if (PageDirty()) dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); if (page->mapping) account_page_dirtied(page) __inc_zone_page_state(page, NR_FILE_DIRTY); __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE. Dirtiers usually lock out truncation, either by holding the page lock directly, or in case of zap_pte_range(), by pinning the mapcount with the page table lock held. The notable exception to this rule, though, is do_wp_page(), for which this race exists. However, do_wp_page() already waits for a locked page to unlock before setting the dirty bit, in order to prevent a race where clear_page_dirty() misses the page bit in the presence of dirty ptes. Upgrade that wait to a fully locked set_page_dirty() to also cover the situation explained above. Afterwards, the code in set_page_dirty() dealing with a truncation race is no longer needed. Remove it. Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm: prevent endless growth of anon_vma hierarchyKonstantin Khlebnikov2015-01-082-1/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Constantly forking task causes unlimited grow of anon_vma chain. Each next child allocates new level of anon_vmas and links vma to all previous levels because pages might be inherited from any level. This patch adds heuristic which decides to reuse existing anon_vma instead of forking new one. It adds counter anon_vma->degree which counts linked vmas and directly descending anon_vmas and reuses anon_vma if counter is lower than two. As a result each anon_vma has either vma or at least two descending anon_vmas. In such trees half of nodes are leafs with alive vmas, thus count of anon_vmas is no more than two times bigger than count of vmas. This heuristic reuses anon_vmas as few as possible because each reuse adds false aliasing among vmas and rmap walker ought to scan more ptes when it searches where page is might be mapped. Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu Fixes: 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue") [akpm@linux-foundation.org: fix typo, per Rik] Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Daniel Forrest <dan.forrest@ssec.wisc.edu> Tested-by: Michal Hocko <mhocko@suse.cz> Tested-by: Jerome Marchand <jmarchan@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [2.6.34+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * exit: fix race between wait_consider_task() and wait_task_zombie()Oleg Nesterov2015-01-081-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wait_consider_task() checks EXIT_ZOMBIE after EXIT_DEAD/EXIT_TRACE and both checks can fail if we race with EXIT_ZOMBIE -> EXIT_DEAD/EXIT_TRACE change in between, gcc needs to reload p->exit_state after security_task_wait(). In this case ->notask_error will be wrongly cleared and do_wait() can hang forever if it was the last eligible child. Many thanks to Arne who carefully investigated the problem. Note: this bug is very old but it was pure theoretical until commit b3ab03160dfa ("wait: completely ignore the EXIT_DEAD tasks"). Before this commit "-O2" was probably enough to guarantee that compiler won't read ->exit_state twice. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Arne Goedeke <el@laramies.com> Tested-by: Arne Goedeke <el@laramies.com> Cc: <stable@vger.kernel.org> [3.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * ocfs2: remove bogus check in dlm_process_recovery_dataJoseph Qi2015-01-081-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | In dlm_process_recovery_data, only when dlm_new_lock failed the ret will be set to -ENOMEM. And in this case, newlock is definitely NULL. So test newlock is meaningless, remove it. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'rc-fixes' of ↵Linus Torvalds2015-01-081-8/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild fix from Michal Marek: "make mrproper / distclean stopped removing the generated debian/ directory in v3.16. This fixes it" * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kbuild: Fix removal of the debian/ directory
| * | kbuild: Fix removal of the debian/ directoryMichal Marek2015-01-021-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | scripts/Makefile.clean treats absolute path specially, but $(objtree)/debian is no longer an absolute path since 7e1c0477 (kbuild: Use relative path for $(objtree). Work around this by checking if the path starts with $(objtree)/. Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com> Fixes: 7e1c0477 (kbuild: Use relative path for $(objtree) Signed-off-by: Michal Marek <mmarek@suse.cz>
* | | Makefile: include arch/*/include/generated/uapi before .../generatedMichal Marek2015-01-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The introduction of the uapi directories in v3.7-rc1 moved some of the generated headers from arch/*/include/generated to the uapi directory, keeping the #include directives intact. This creates a problem when bisecting, because the unversioned files are not cleaned automatically by git and the compiler might include stale headers as a result. Instead of cleaning them in the Makefiles, promote arch/*/include/generated/uapi in the search path. Under normal circumstances, there is no overlap between this uapi subdirectory and its parent, so the include choices remain the same. We keep arch/*/include/generated/uapi in the USERINCLUDE variable so that it is usable standalone. Note that we cannot completely swap the order of the uapi and kernel-only directories, since the headers in include/uapi/asm-generic are meant to be wrapped by their include/asm-generic counterparts when building kernel code. Reported-by: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Reported-by: David Drysdale <dmd@lurklurk.org> Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud