summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'lockdep-for-linus' of ↵Linus Torvalds2009-07-221-2/+2
|\ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep * 'lockdep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep: lockdep: Fix lockdep annotation for pipe_double_lock()
| * lockdep: Fix lockdep annotation for pipe_double_lock()Peter Zijlstra2009-07-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The presumed use of the pipe_double_lock() routine is to lock 2 locks in a deadlock free way by ordering the locks by their address. However it fails to keep the specified lock classes in order and explicitly annotates a deadlock. Rectify this. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Miklos Szeredi <mszeredi@suse.cz> LKML-Reference: <1248163763.15751.11098.camel@twins>
* | Merge branch 'for-linus' of ↵Linus Torvalds2009-07-222-26/+26
|\ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: fs/Kconfig: move nilfs2 out
| * | fs/Kconfig: move nilfs2 outRyusuke Konishi2009-07-142-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | fs/Kconfig file was split into individual fs/*/Kconfig files before nilfs was merged. I've found the current config entry of nilfs is tainting the work. Sorry, I didn't notice. This fixes the violation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Alexey Dobriyan <adobriyan@gmail.com>
* | | NFSv4: Fix a problem whereby a buggy server can oops the kernelTrond Myklebust2009-07-212-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We just had a case in which a buggy server occasionally returns the wrong attributes during an OPEN call. While the client does catch this sort of condition in nfs4_open_done(), and causes the nfs4_atomic_open() to return -EISDIR, the logic in nfs_atomic_lookup() is broken, since it causes a fallback to an ordinary lookup instead of just returning the error. When the buggy server then returns a regular file for the fallback lookup, the VFS allows the open, and bad things start to happen, since the open file doesn't have any associated NFSv4 state. The fix is firstly to return the EISDIR/ENOTDIR errors immediately, and secondly to ensure that we are always careful when dereferencing the nfs_open_context state pointer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | NFSv4: Fix an NFSv4 mount regressionTrond Myklebust2009-07-213-22/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 008f55d0e019943323c20a03493a2ba5672a4cc8 (nfs41: recover lease in _nfs4_lookup_root) forces the state manager to always run on mount. This is a bug in the case of NFSv4.0, which doesn't require us to send a setclientid until we want to grab file state. In any case, this is completely the wrong place to be doing state management. Moving that code into nfs4_init_session... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | NFSv4: Fix an Oops in nfs4_free_lock_stateTrond Myklebust2009-07-211-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | The oops http://www.kerneloops.org/raw.php?rawid=537858&msgid= appears to be due to the nfs4_lock_state->ls_state field being uninitialised. This happens if the call to nfs4_free_lock_state() is triggered at the end of nfs4_get_lock_state(). The fix is to move the initialisation of ls_state into the allocator. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2009-07-201-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: Fix incorrect parameters to v9fs_file_readn. 9p: Possible regression in p9_client_stat 9p: default 9p transport module fix
| * | 9p: Fix incorrect parameters to v9fs_file_readn.Abhishek Kulkarni2009-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix v9fs_vfs_readpage. The offset and size parameters to v9fs_file_readn were interchanged and hence passed incorrectly. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | cifs: free nativeFileSystem field before allocating a new oneJeff Layton2009-07-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | ...otherwise, we'll leak this memory if we have to reconnect (e.g. after network failure). Signed-off-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6Steve French2009-07-1687-559/+432
|\ \ \ | |/ /
| * | Merge branch 'for-linus' of ↵Linus Torvalds2009-07-143-9/+14
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: free socket in error exit path dlm: fix plock use-after-free dlm: Fix uninitialised variable warning in lock.c
| | * | dlm: free socket in error exit pathCasey Dahlin2009-07-141-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the tcp_connect_to_sock() error exit path, the socket allocated at the top of the function was not being freed. Signed-off-by: Casey Dahlin <cdahlin@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| | * | dlm: fix plock use-after-freeDavid Teigland2009-06-181-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a regression from the original addition of nfs lock support 586759f03e2e9031ac5589912a51a909ed53c30a. When a synchronous (non-nfs) plock completes, the waiting thread will wake up and free the op struct. This races with the user thread in dev_write() which goes on to read the op's callback field to check if the lock is async and needs a callback. This check can happen on the freed op. The fix is to note the callback value before the op can be freed. Signed-off-by: David Teigland <teigland@redhat.com>
| | * | dlm: Fix uninitialised variable warning in lock.cSteven Whitehouse2009-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CC [M] fs/dlm/lock.o fs/dlm/lock.c: In function ‘find_rsb’: fs/dlm/lock.c:438: warning: ‘r’ may be used uninitialized in this function Since r is used on the error path to set r_ret, set it to NULL. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * | | Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds2009-07-141-4/+4
| |\ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing/function-profiler: do not free per cpu variable stat tracing/events: Move TRACE_SYSTEM outside of include guard
| | * | tracing/events: Move TRACE_SYSTEM outside of include guardLi Zefan2009-07-131-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If TRACE_INCLDUE_FILE is defined, <trace/events/TRACE_INCLUDE_FILE.h> will be included and compiled, otherwise it will be <trace/events/TRACE_SYSTEM.h> So TRACE_SYSTEM should be defined outside of #if proctection, just like TRACE_INCLUDE_FILE. Imaging this scenario: #include <trace/events/foo.h> -> TRACE_SYSTEM == foo ... #include <trace/events/bar.h> -> TRACE_SYSTEM == bar ... #define CREATE_TRACE_POINTS #include <trace/events/foo.h> -> TRACE_SYSTEM == bar !!! and then bar.h will be included and compiled. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A5A9CF1.2010007@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Merge branch 'for_linus' of ↵Linus Torvalds2009-07-1310-348/+232
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix race between write_metadata_buffer and get_write_access ext4: Fix ext4_mb_initialize_context() to initialize all fields ext4: fix null handler of ioctls in no journal mode ext4: Fix buffer head reference leak in no-journal mode ext4: Move __ext4_journalled_writepage() to avoid forward declaration ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaoc ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocation ext4: Don't look at buffer_heads outside i_size. ext4: Fix goal inum check in the inode allocator ext4: fix no journal corruption with locale-gen ext4: Calculate required journal credits for inserting an extent properly ext4: Fix truncation of symlinks after failed write jbd2: Fix a race between checkpointing code and journal_get_write_access() ext4: Use rcu_barrier() on module unload. ext4: naturally align struct ext4_allocation_request ext4: mark several more functions in mballoc.c as noinline ext4: Fix potential reclaim deadlock when truncating partial block jbd2: Remove GFP_ATOMIC kmalloc from inside spinlock critical region ext4: Fix type warning on 64-bit platforms in tracing events header
| | * | | jbd2: fix race between write_metadata_buffer and get_write_accessdingdinghua2009-07-131-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function jbd2_journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in) too early; this could potentially allow another thread to call get_write_access on the buffer head, modify the data, and dirty it, and allowing the wrong data to be written into the journal. Fortunately, if we lose this race, the only time this will actually cause filesystem corruption is if there is a system crash or other unclean shutdown of the system before the next commit can take place. Signed-off-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Fix ext4_mb_initialize_context() to initialize all fieldsTheodore Ts'o2009-07-131-18/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pavel Roskin pointed out that kmemcheck indicated that ext4_mb_store_history() was accessing uninitialized values of ac->ac_tail and ac->ac_buddy leading to garbage in the mballoc history. Fix this by initializing the entire structure to all zeros first. Also, two fields were getting doubly initialized by the caller of ext4_mb_initialize_context, so remove them for efficiency's sake. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: fix null handler of ioctls in no journal modePeng Tao2009-07-131-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EXT4_IOC_GROUP_ADD and EXT4_IOC_GROUP_EXTEND ioctls should not flush the journal in no_journal mode. Otherwise, running resize2fs on a mounted no_journal partition triggers the following error messages: BUG: unable to handle kernel NULL pointer dereference at 00000014 IP: [<c039d282>] _spin_lock+0x8/0x19 *pde = 00000000 Oops: 0002 [#1] SMP Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Fix buffer head reference leak in no-journal modeCurt Wohlgemuth2009-07-133-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We found a problem with buffer head reference leaks when using an ext4 partition without a journal. In particular, calls to ext4_forget() would not to a brelse() on the input buffer head, which will cause pages they belong to to not be reclaimable. Further investigation showed that all places where ext4_journal_forget() and ext4_journal_revoke() are called are subject to the same problem. The patch below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit release of the buffer head when the journal handle isn't valid. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Move __ext4_journalled_writepage() to avoid forward declarationAneesh Kumar K.V2009-06-141-58/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In addition, fix two unused variable warnings. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaocAneesh Kumar K.V2009-06-141-177/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the mmap/truncate race that was fixed for delayed allocation by merging ext4_{journalled,normal,da}_writepage() into ext4_writepage(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocationAneesh Kumar K.V2009-06-141-15/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to see buffer_heads which are not mapped in the writepage callback in the following scneario (where the fs blocksize is 1k and the page size is 4k): 1) truncate(f, 1024) 2) mmap(f, 0, 4096) 3) a[0] = 'a' 4) truncate(f, 4096) 5) writepage(...) Now if we get a writepage callback immediately after (4) and before an attempt to write at any other offset via mmap address (which implies we are yet to get a pagefault and do a get_block) what we would have is the page which is dirty have first block allocated and the other three buffer_heads unmapped. In the above case the writepage should go ahead and try to write the first blocks and clear the page_dirty flag. Further attempts to write to the page will again create a fault and result in allocating blocks and marking page dirty. If we don't write any other offset via mmap address we would still have written the first block to the disk and rest of the space will be considered as a hole. So to address this, we change all of the places where we look for delayed, unmapped, or unwritten buffer heads, and only check for delayed or unwritten buffer heads instead. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Don't look at buffer_heads outside i_size.Aneesh Kumar K.V2009-06-041-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Buffer heads outside i_size will be unmapped. So when we are doing "walk_page_buffers" limit ourself to i_size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Josef Bacik <jbacik@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> ----
| | * | | ext4: Fix goal inum check in the inode allocatorJohann Lombardi2009-07-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal inode is specificed by inode number which belongs to [1; s_inodes_count]. Signed-off-by: Johann Lombardi <johann@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: fix no journal corruption with locale-genTheodore Ts'o2009-07-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is no journal, ext4_should_writeback_data() should return TRUE. This will fix ext4_set_aops() to set ext4_da_ops in the case of delayed allocation; otherwise ext4_journaled_aops gets used by default, which doesn't handle delayed allocation properly. The advantage of using ext4_should_writeback_data() approach is that it should handle nobh better as well. Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh Kumar for suggesting this approach. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Calculate required journal credits for inserting an extent properlyAneesh Kumar K.V2009-07-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we have space in the extent tree leaf node we should be able to insert the extent with much less journal credits. The code was doing proper calculation but missed a return statement. Reported-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Fix truncation of symlinks after failed writeJan Kara2009-07-131-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Contents of long symlinks is written via standard write methods. So when the write fails, we add inode to orphan list. But symlinks don't have .truncate method defined so nobody properly removes them from the on disk orphan list. Fix this by calling ext4_truncate() directly instead of calling vmtruncate() (which is saner anyway since we don't need anything vmtruncate() does except from calling .truncate in these paths). We also add inode to orphan list only if ext4_can_truncate() is true (currently, it can be false for symlinks when there are no blocks allocated) - otherwise orphan list processing will complain and ext4_truncate() will not remove inode from on-disk orphan list. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | jbd2: Fix a race between checkpointing code and journal_get_write_access()Jan Kara2009-07-131-33/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following race can happen: CPU1 CPU2 checkpointing code checks the buffer, adds it to an array for writeback do_get_write_access() ... lock_buffer() unlock_buffer() flush_batch() submits the buffer for IO __jbd2_journal_file_buffer() So a buffer under writeout is returned from do_get_write_access(). Since the filesystem code relies on the fact that journaled buffers cannot be written out, it does not take the buffer lock and so it can modify buffer while it is under writeout. That can lead to a filesystem corruption if we crash at the right moment. We fix the problem by clearing the buffer dirty bit under buffer_lock even if the buffer is on BJ_None list. Actually, we clear the dirty bit regardless the list the buffer is in and warn about the fact if the buffer is already journalled. Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>. Reported-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Use rcu_barrier() on module unload.Jesper Dangaard Brouer2009-07-051-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ext4 module uses rcu_call() thus it should use rcu_barrier()on module unload. The kmem cache ext4_pspace_cachep is sometimes free'ed using call_rcu() callbacks. Thus, we must wait for completion of call_rcu() before doing kmem_cache_destroy(). Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: naturally align struct ext4_allocation_requestEric Sandeen2009-07-131-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Ted noted, the ext4_allocation_request isn't well aligned. Looking at it with pahole we're wasting space on 64-bit arches: struct ext4_allocation_request { struct inode * inode; /* 0 8 */ ext4_lblk_t logical; /* 8 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t goal; /* 16 8 */ ext4_lblk_t lleft; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t pleft; /* 32 8 */ ext4_lblk_t lright; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t pright; /* 48 8 */ unsigned int len; /* 56 4 */ unsigned int flags; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 9 */ /* sum members: 52, holes: 3, sum holes: 12 */ }; Grouping 32-bit members together closes these holes and shrinks the structure by 12 bytes. which is important since ext4 can get on the hairy edge of stack overruns. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: mark several more functions in mballoc.c as noinlineEric Sandeen2009-07-051-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ted noticed a stack-deep callchain through writepages->ext4_mb_regular_allocator->ext4_mb_init_cache->submit_bh ... With all the static functions in mballoc.c, gcc helpfully inlines for us, and we get something like this: ext4_mb_regular_allocator (232 bytes stack) ext4_mb_init_cache (232 bytes stack) submit_bh (starts 464 deeper) the 2 ext4 functions here get several others inlined; by telling gcc not to inline them, we can save stack space for when we head off into submit_bh land and associated block layer callchains. The following noinlined functions are only called once, so this won't impact any other callchains: ext4_mb_regular_allocator (104) (was 232) ext4_mb_find_by_goal (56) (noinlined) ext4_mb_init_group (24) (noinlined) ext4_mb_init_cache (136) (was 232) ext4_mb_generate_buddy (88) (noinlined) ext4_mb_generate_from_pa (40) (noinlined) submit_bh ext4_mb_simple_scan_group (24) (noinlined) ext4_mb_scan_aligned (56) (noinlined) ext4_mb_complex_scan_group (40) (noinlined) ext4_mb_try_best_found (24) (noinlined) now when we head off into submit_bh() we're only 264 bytes deeper in stack than when we entered ext4_mb_regular_allocator() (vs. 464 bytes before). Every 200 bytes helps. :) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | ext4: Fix potential reclaim deadlock when truncating partial blockTheodore Ts'o2009-07-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ext4_block_truncate_page() function previously called grab_cache_page(), which called find_or_create_page() with the __GFP_FS flag potentially set. This could cause a deadlock if the system is low on memory and it attempts a memory reclaim, which could potentially call back into ext4. So we need to call find_or_create_page() directly, and remove the __GFP_FP flag to avoid this potential deadlock. Thanks to Roland Dreier for reporting a lockdep warning which showed this problem. [20786.363249] ================================= [20786.363257] [ INFO: inconsistent lock state ] [20786.363265] 2.6.31-2-generic #14~rbd4gitd960eea9 [20786.363270] --------------------------------- [20786.363276] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. [20786.363285] http/8397 [HC0[0]:SC0[0]:HE1:SE1] takes: [20786.363291] (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150 [20786.363314] {IN-RECLAIM_FS-W} state was registered at: [20786.363320] [<ffffffff8108bef6>] mark_irqflags+0xc6/0x1a0 [20786.363334] [<ffffffff8108d347>] __lock_acquire+0x287/0x430 [20786.363345] [<ffffffff8108d595>] lock_acquire+0xa5/0x150 [20786.363355] [<ffffffff812008da>] jbd2_journal_start+0xfa/0x150 [20786.363365] [<ffffffff811d98a8>] ext4_journal_start_sb+0x58/0x90 [20786.363377] [<ffffffff811cce85>] ext4_delete_inode+0xc5/0x2c0 [20786.363389] [<ffffffff81146fa3>] generic_delete_inode+0xd3/0x1a0 [20786.363401] [<ffffffff81147095>] generic_drop_inode+0x25/0x30 [20786.363411] [<ffffffff81145ce2>] iput+0x62/0x70 [20786.363420] [<ffffffff81142878>] dentry_iput+0x98/0x110 [20786.363429] [<ffffffff81142a00>] d_kill+0x50/0x80 [20786.363438] [<ffffffff811444c5>] dput+0x95/0x180 [20786.363447] [<ffffffff8120de4b>] ecryptfs_d_release+0x2b/0x70 [20786.363459] [<ffffffff81142978>] d_free+0x28/0x60 [20786.363468] [<ffffffff81142a18>] d_kill+0x68/0x80 [20786.363477] [<ffffffff81142ad3>] prune_one_dentry+0xa3/0xc0 [20786.363487] [<ffffffff81142d61>] __shrink_dcache_sb+0x271/0x290 [20786.363497] [<ffffffff81142e89>] prune_dcache+0x109/0x1b0 [20786.363506] [<ffffffff81142f6f>] shrink_dcache_memory+0x3f/0x50 [20786.363516] [<ffffffff810f6d3d>] shrink_slab+0x12d/0x190 [20786.363527] [<ffffffff810f97d7>] balance_pgdat+0x4d7/0x640 [20786.363537] [<ffffffff810f9a57>] kswapd+0x117/0x170 [20786.363546] [<ffffffff810773ce>] kthread+0x9e/0xb0 [20786.363558] [<ffffffff8101430a>] child_rip+0xa/0x20 [20786.363569] [<ffffffffffffffff>] 0xffffffffffffffff [20786.363598] irq event stamp: 15997 [20786.363603] hardirqs last enabled at (15997): [<ffffffff81125f9d>] kmem_cache_alloc+0xfd/0x1a0 [20786.363617] hardirqs last disabled at (15996): [<ffffffff81125f01>] kmem_cache_alloc+0x61/0x1a0 [20786.363628] softirqs last enabled at (15966): [<ffffffff810631ea>] __do_softirq+0x14a/0x220 [20786.363641] softirqs last disabled at (15861): [<ffffffff8101440c>] call_softirq+0x1c/0x30 [20786.363651] [20786.363653] other info that might help us debug this: [20786.363660] 3 locks held by http/8397: [20786.363665] #0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<ffffffff8112ed24>] do_truncate+0x64/0x90 [20786.363685] #1: (&sb->s_type->i_alloc_sem_key#5){+++++.}, at: [<ffffffff81147f90>] notify_change+0x250/0x350 [20786.363707] #2: (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150 [20786.363724] [20786.363726] stack backtrace: [20786.363734] Pid: 8397, comm: http Tainted: G C 2.6.31-2-generic #14~rbd4gitd960eea9 [20786.363741] Call Trace: [20786.363752] [<ffffffff8108ad7c>] print_usage_bug+0x18c/0x1a0 [20786.363763] [<ffffffff8108b0c0>] ? check_usage_backwards+0x0/0xb0 [20786.363773] [<ffffffff8108bad2>] mark_lock_irq+0xf2/0x280 [20786.363783] [<ffffffff8108bd97>] mark_lock+0x137/0x1d0 [20786.363793] [<ffffffff8108c03c>] mark_held_locks+0x6c/0xa0 [20786.363803] [<ffffffff8108c11f>] lockdep_trace_alloc+0xaf/0xe0 [20786.363813] [<ffffffff810efbac>] __alloc_pages_nodemask+0x7c/0x180 [20786.363824] [<ffffffff810e9411>] ? find_get_page+0x91/0xf0 [20786.363835] [<ffffffff8111d3b7>] alloc_pages_current+0x87/0xd0 [20786.363845] [<ffffffff810e9827>] __page_cache_alloc+0x67/0x70 [20786.363856] [<ffffffff810eb7df>] find_or_create_page+0x4f/0xb0 [20786.363867] [<ffffffff811cb3be>] ext4_block_truncate_page+0x3e/0x460 [20786.363876] [<ffffffff812008da>] ? jbd2_journal_start+0xfa/0x150 [20786.363885] [<ffffffff812008bb>] ? jbd2_journal_start+0xdb/0x150 [20786.363895] [<ffffffff811c6415>] ? ext4_meta_trans_blocks+0x75/0xf0 [20786.363905] [<ffffffff811e8d8b>] ext4_ext_truncate+0x1bb/0x1e0 [20786.363916] [<ffffffff811072c5>] ? unmap_mapping_range+0x75/0x290 [20786.363926] [<ffffffff811ccc28>] ext4_truncate+0x498/0x630 [20786.363938] [<ffffffff8129b4ce>] ? _raw_spin_unlock+0x5e/0xb0 [20786.363947] [<ffffffff81107306>] ? unmap_mapping_range+0xb6/0x290 [20786.363957] [<ffffffff8108c3ad>] ? trace_hardirqs_on+0xd/0x10 [20786.363966] [<ffffffff811ffe58>] ? jbd2_journal_stop+0x1f8/0x2e0 [20786.363976] [<ffffffff81107690>] vmtruncate+0xb0/0x110 [20786.363986] [<ffffffff81147c05>] inode_setattr+0x35/0x170 [20786.363995] [<ffffffff811c9906>] ext4_setattr+0x186/0x370 [20786.364005] [<ffffffff81147eab>] notify_change+0x16b/0x350 [20786.364014] [<ffffffff8112ed30>] do_truncate+0x70/0x90 [20786.364021] [<ffffffff8112f48b>] T.657+0xeb/0x110 [20786.364021] [<ffffffff8112f4be>] sys_ftruncate+0xe/0x10 [20786.364021] [<ffffffff81013132>] system_call_fastpath+0x16/0x1b Reported-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| | * | | jbd2: Remove GFP_ATOMIC kmalloc from inside spinlock critical regionTheodore Ts'o2009-06-201-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix jbd2_dev_to_name(), a function used when pretty-printting jbd2 and ext4 tracepoints. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds2009-07-131-1/+1
| |\ \ \ \ | | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: wm97xx_batery: replace driver_data with dev_get_drvdata() omap: video: remove direct access of driver_data Sound: remove direct access of driver_data driver model: fix show/store prototypes in doc. Firmware: firmware_class, fix lock imbalance Driver Core: remove BUS_ID_SIZE sparc: remove driver-core BUS_ID_SIZE partitions: fix broken uevent_suppress conversion devres: WARN() and return, don't crash on device_del() of uninitialized device
| | * | | partitions: fix broken uevent_suppress conversionHeiko Carstens2009-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git commit f67f129e "Driver core: implement uevent suppress in kobject" contains this chunk for fs/partitions/check.c: /* suppress uevent if the disk supresses it */ - if (!ddev->uevent_suppress) + if (!dev_get_uevent_suppress(pdev)) kobject_uevent(&pdev->kobj, KOBJ_ADD); However that should have been - if (!ddev->uevent_suppress) + if (!dev_get_uevent_suppress(ddev)) Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Ming Lei <tom.leiming@gmail.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | AFS: Fix compilation warningArtem Bityutskiy2009-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following warning: fs/afs/dir.c: In function 'afs_d_revalidate': fs/afs/dir.c:567: warning: 'fid.vnode' may be used uninitialized in this function fs/afs/dir.c:567: warning: 'fid.unique' may be used uninitialized in this function by marking the 'fid' variable as an uninitialized_var. The problem is that gcc doesn't always manage to work out that fid is always set on the path through the function that uses it. Cc: linux-afs@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | headers: smp_lock.h reduxAlexey Dobriyan2009-07-1246-29/+17
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | Revert "fuse: Fix build error" as unnecessaryLinus Torvalds2009-07-112-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 097041e576ee3a50d92dd643ee8ca65bf6a62e21. Trond had a better fix, which is the parent of this one ("Fix compile error due to congestion_wait() changes") Requested-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | isofs: fix Joliet regressionBartlomiej Zolnierkiewicz2009-07-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5404ac8e4418ab3d254950ee4f9bcafc1da20b4a ("isofs: cleanup mount option processing") missed conversion of joliet option flag resulting in non-working Joliet support. CC: walt <w41ter@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds2009-07-106-76/+92
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: fix corruption dump UBIFS: clean up free space checking UBIFS: small amendments in the LEB scanning code UBIFS: dump a little more in case of corruptions MAINTAINERS: update ahunter's e-mail address UBIFS: allow more than one volume to be mounted UBIFS: fix assertion warning UBIFS: minor spelling and grammar fixes UBIFS: fix 64-bit divisions in debug print UBIFS: few spelling fixes UBIFS: set write-buffer timout to 3-5 seconds UBIFS: slightly optimize write-buffer timer usage UBIFS: improve debugging messaged UBIFS: fix integer overflow warning
| | * | | UBIFS: fix corruption dumpArtem Bityutskiy2009-07-091-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the 'ubifs_recover_leb()' function, when we find corrupted empty space, we dump 8K starting from the offset where the last node ends. This is OK if the corrupted empty space is somewhere near that offset. But if the corruption is far at the end of the LEB, we will dump all 0xFF bytes and complitely ignore the interesting data. This is observed on a PPC ("kilauea") with NOR flash. This patch changes the behavior and teaches UBIFS to print only interesting data. I.e., now we find where corruption starts and start dumping from that offset. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
| | * | | UBIFS: clean up free space checkingArtem Bityutskiy2009-07-091-19/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | recovery.c has 'is_empty()' helper and it is better to use this helper instead of re-implementing it in several places. This patch does this and removes some amount of unneeded code. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
| | * | | UBIFS: small amendments in the LEB scanning codeArtem Bityutskiy2009-07-093-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes few minor things I've spotted while going through code: 1. Better document return codes 2. If 'ubifs_scan_a_node()' returns some thing we do not expect, treat this as an error. 3. Try to do recovery only when 'ubifs_scan()' returns %-EUCLEAN, not on any error. 4. If empty space starts at a non-aligned address, print a message. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
| | * | | UBIFS: dump a little more in case of corruptionsArtem Bityutskiy2009-07-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of corruptions, dump 8192 bytes instead of 4096. The largest node is 4096+ bytes, so it is better to see a node boundary, which is not always possible when only 4096 bytes are printed. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
| | * | | UBIFS: allow more than one volume to be mountedDaniel Mack2009-07-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBIFS uses a bdi device per volume, but does not care to hand out unique names to each of them. This causes an error when trying to mount more than one volumes. Append the UBI volume and device ID to avoid that. [Amended a bit by Artem Bityutskiy] Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Adrian Hunter <ext-adrian.hunter@nokia.com> Cc: linux-mtd@lists.infradead.org Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| | * | | UBIFS: fix assertion warningArtem Bityutskiy2009-07-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When debugging is enabled and an unclean file-system is mounter, the following assertion is triggered: UBIFS assert failed in ubifs_tnc_start_commit at 805 (pid 1081) Call Trace: [cfaffbd0] [c0006cf8] show_stack+0x44/0x16c (unreliable) [cfaffc10] [c011b738] ubifs_tnc_start_commit+0xbb8/0xd18 [cfaffc90] [c0112670] do_commit+0x150/0xa44 [cfaffd10] [c0125234] ubifs_rcvry_gc_commit+0xd8/0x544 [cfaffd60] [c0100e9c] ubifs_fill_super+0xe78/0x15f8 [cfaffdf0] [c0102118] ubifs_get_sb+0x20c/0x320 [cfaffe70] [c007f764] vfs_kern_mount+0x58/0xe0 [cfaffe90] [c007f83c] do_kern_mount+0x40/0xf8 [cfaffeb0] [c0095c24] do_mount+0x550/0x758 [cfafff10] [c0095ebc] sys_mount+0x90/0xe0 [cfafff40] [c000ed4c] ret_from_syscall+0x0/0x3c The reason is that we initialize 'c->min_leb_idx' early, and do not re-calculate it after journal replay. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| | * | | UBIFS: minor spelling and grammar fixesAdrian Hunter2009-07-052-3/+3
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
OpenPOWER on IntegriCloud