| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The ocfs2 directory index updates two blocks when we remove an entry -
the dx root and the dx leaf. OCFS2_DELETE_INODE_CREDITS was only
accounting for the dx leaf. This shows up when ocfs2_delete_inode()
runs out of credits in jbd2_journal_dirty_metadata() at
"J_ASSERT_JH(jh, handle->h_buffer_credits > 0);".
The test that caught this was running dirop_file_racer from the
ocfs2-test suite with a 250-character filename PREFIX. Run on a 512B
blocksize, it forces the orphan dir index to grow large enough to
trigger.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
With indexed dir enabled, now we use ocfs2_dir_lookup_result to
wrap all the bh used for dir. So remove the 2 unused variables.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In ocfs2_dentry_attach_lock(), if unable to get the dentry lock, we need to
call iput(inode) because a failure here means no d_instantiate(), which means
the normally matching iput() will not be called during dput(dentry).
This patch fixes the oops that accompanies the following message:
(3996,1):dlm_empty_lockres:2708 ERROR: lockres W00000000000000000a1046b06a4382 still has local locks!
kernel BUG in dlm_empty_lockres at /rpmbuild/smushran/BUILD/ocfs2-1.4.2/fs/ocfs2/dlm/dlmmaster.c:2709!
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The old %llu vs u64 battle. Cast them correctly.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
fs/ocfs2/dir.c: In function ‘ocfs2_extend_dir’:
fs/ocfs2/dir.c:2700: warning: ‘ret’ may be used uninitialized in this function
fs/ocfs2/suballoc.c: In function ‘ocfs2_get_suballoc_slot_bit’:
fs/ocfs2/suballoc.c:2216: warning: comparison is always true due to limited range of data type
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In ocfs2_expand_inline_dir, we calculate whether we need 1 extra
cluster if we can't store the dx inline the root and save it in
dx_alloc. So add it when we call ocfs2_reserve_clusters.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
->real_parent is the parent. ->parent may be the tracer.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The Committed_AS field can underflow in certain situations:
> # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
> 1 Committed_AS: 18446744073709323392 kB
> 11 Committed_AS: 18446744073709455488 kB
> 6 Committed_AS: 35136 kB
> 5 Committed_AS: 18446744073709454400 kB
> 7 Committed_AS: 35904 kB
> 3 Committed_AS: 18446744073709453248 kB
> 2 Committed_AS: 34752 kB
> 9 Committed_AS: 18446744073709453248 kB
> 8 Committed_AS: 34752 kB
> 3 Committed_AS: 18446744073709320960 kB
> 7 Committed_AS: 18446744073709454080 kB
> 3 Committed_AS: 18446744073709320960 kB
> 5 Committed_AS: 18446744073709454080 kB
> 6 Committed_AS: 18446744073709320960 kB
Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.
But NR_CPUS proportional isn't good calculation. In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).
The current kernel has generic percpu-counter stuff. using it is right
way. it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.
Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org> [All kernel versions]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This fixes the problem introduced by commit 3bfacef412 (get rid of
special-casing the /sbin/loader on alpha): osf/1 ecoff binary segfaults
when binfmt_aout built as module. That happens because aout binary
handler gets on the top of the binfmt list due to late registration, and
kernel attempts to execute the binary without preparatory work that must
be done by binfmt_loader.
Fixed by changing the registration order of the default binfmt handlers
using list_add_tail() and introducing insert_binfmt() function which
places new handler on the top of the binfmt list. This might be generally
useful for installing arch-specific frontends for default handlers or just
for overriding them.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Richard Henderson <rth@twiddle.net
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The intention of commit aae8679b0ebcaa92f99c1c3cb0cd651594a43915
("pagemap: fix bug in add_to_pagemap, require aligned-length reads of
/proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a
multiple of 8 bytes, but now it allows to read 0 bytes, which actually
puts some data to user's buffer. According to POSIX, if count is zero,
read() should return zero and has no other results.
Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Cc: Thomas Tuttle <ttuttle@google.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty. This allows the filesystem to have
full control of page dirtying events coming from the VM.
Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.
The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its ->page_mkwrite. It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).
In this window, the VM could write out the page, clearing page-dirty. The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.
It is not always possible to perform the required metadata manipulation in
->set_page_dirty, because that function cannot block or fail. The
filesystem may need to allocate some data structure, for example.
And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.
This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window. This provides the filesystem with a strong
synchronisation against the VM here.
- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
under dirty pages might be susceptible to i_size changing under partial page
at the end of file (we also have a buffer.c-side problem here, but it cannot
be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
page_mkwrite functions themselves.
[ This also moves page_mkwrite another step closer to fault, which should
eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
filesystem calldown and page lock/unlock cycle in __do_fault. ]
[akpm@linux-foundation.org: fix derefs of NULL ->mapping]
Cc: Sage Weil <sage@newdream.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |_|_|_|/
|/| | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fix an obvious incorrect return status in autofs4_mount_busy().
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This warning shows up on 64 bit builds:
fs/ecryptfs/inode.c:693: warning: comparison of distinct pointer types
lacks a cast
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fs/ecryptfs/inode.c:670: warning: format '%d' expects type 'int', but argument 3 has type 'size_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
st driver uses blk_rq_map_user() in order to just build a request out
of page frames. In this case, map_data->offset is a non zero value and
iov[0].iov_base is NULL. We need to increase nr_pages for that.
Cc: stable@kernel.org
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: look for acls during btrfs_read_locked_inode
Btrfs: fix acl caching
Btrfs: Fix a bunch of printk() warnings.
Btrfs: Fix a trivial warning using max() of u64 vs ULL.
Btrfs: remove unused btrfs_bit_radix slab
Btrfs: ratelimit IO error printks
Btrfs: remove #if 0 code
Btrfs: When shrinking, only update disk size on success
Btrfs: fix deadlocks and stalls on dead root removal
Btrfs: fix fallocate deadlock on inode extent lock
Btrfs: kill btrfs_cache_create
Btrfs: don't export symbols
Btrfs: simplify makefile
Btrfs: try to keep a healthy ratio of metadata vs data block groups
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This changes btrfs_read_locked_inode() to peek ahead in the btree for acl items.
If it is certain a given inode has no acls, it will set the in memory acl
fields to null to avoid acl lookups completely.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Linus noticed the btrfs code to cache acls wasn't properly caching
a NULL acl when the inode didn't have any acls. This meant the common
case of no acls resulted in expensive btree searches every time the
kernel checked permissions (which is quite often).
This is a modified version of Linus' original patch:
Properly set initial acl fields to BTRFS_ACL_NOT_CACHED in the inode.
This forces an acl lookup when permission checks are done.
Fix btrfs_get_acl to avoid lookups and locking when the inode acls fields
are set to null.
Fix btrfs_get_acl to use the right return value from __btrfs_getxattr
when deciding to cache a NULL acl. It was storing a NULL acl when
__btrfs_getxattr return -ENOENT, but __btrfs_getxattr was actually returning
-ENODATA for this case.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Just happened to notice a bunch of %llu vs u64 warnings. Here's a patch
to cast them all.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
A small warning popped up on ia64 because inode-map.c was comparing a
u64 object id with the ULL FIRST_FREE_OBJECTID. My first thought was
that all the OBJECTID constants should contain the u64 cast because
btrfs code deals entirely in u64s. But then I saw how large that was,
and figured I'd just fix the max() call.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Btrfs has printks for various IO errors, including bad checksums and
mismatches between what we expect the block headers to contain and what
we actually find on the disk.
Longer term we need a real reporting mechanism for this, but for now
printk is going to have to do.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Btrfs had some old code sitting around under #if 0, this drops it.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Previously, we updated a device's size prior to attempting a shrink
operation. This patch moves the device resizing logic to only happen if
the shrink completes successfully. In the process, it introduces a new
field to btrfs_device -- disk_total_bytes -- to track the on-disk size.
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
After a transaction commit, the old root of the subvol btrees are sent through
snapshot removal. This is what actually frees up any blocks replaced by
COW, and anything the old blocks pointed to.
Snapshot deletion will pause when a transaction commit has started, which
helps to avoid a huge amount of delayed reference count updates piling up
as the transaction is trying to close.
But, this pause happens after the snapshot deletion process has asked other
procs on the system to throttle back a bit so that it can make progress.
We don't want to throttle everyone while we're waiting for the transaction
commit, it leads to deadlocks in the user transaction ioctls used by Ceph
and makes things slower in general.
This patch changes things to avoid the throttling while we sleep.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The btrfs fallocate call takes an extent lock on the entire range
being fallocated, and then runs through insert_reserved_extent on each
extent as they are allocated.
The problem with this is that btrfs_drop_extents may decide to try
and take the same extent lock fallocate was already holding. The solution
used here is to push down knowledge of the range that is already locked
going into btrfs_drop_extents.
It turns out that at least one other caller had the same bug.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Just use kmem_cache_create directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Currently the extent_map code is only for btrfs so don't export it's
symbols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Get rid of the hacks for building out of tree, and always use += for
assigning to the object lists.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This patch makes the chunk allocator keep a good ratio of metadata vs data
block groups. By default for every 8 data block groups, we'll allocate 1
metadata chunk, or about 12% of the disk will be allocated for metadata. This
can be changed by specifying the metadata_ratio mount option.
This is simply the number of data block groups that have to be allocated to
force a metadata chunk allocation. By making sure we allocate metadata chunks
more often, we are less likely to get into situations where the whole disk
has been allocated as data block groups.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6:
ext2: missing unlock in ext2_quota_write()
quota: remove obsolete comments in fs/quota/Makefile
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The inode->i_mutex should be unlocked.
Found by smatch (http://repo.or.cz/w/smatch.git). Compile tested.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Get rid of useless comments and the equally useless obj-y
initialization.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The EXTENTS_FL flag should never be set on special files, but if it
is, don't bother trying to validate that the extents tree is valid,
since only files, directories, and non-fast symlinks will ever have an
extent data structure. We perhaps should flag the filesystem as being
corrupted if we see a special file (named pipes, device nodes, Unix
domain sockets, etc.) with the EXTENTS_FL flag, but e2fsck doesn't
currently check this case, so we'll just ignore this for now, since
it's harmless.
Without this fix, a special device with the extents flag is flagged as
an error by the kernel, so it is impossible to access or delete the
inode, but e2fsck doesn't see it as a problem, leading to
confused/frustrated users.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Don't try to look at i_file_acl_high unless the INCOMPAT_64BIT feature
bit is set. The field is normally zero, but older versions of e2fsck
didn't automatically check to make sure of this, so in the spirit of
"be liberal in what you accept", don't look at i_file_acl_high unless
we are using a 64-bit filesystem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|/ / / / / /
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If the block containing external extended attributes (which is stored
in i_file_acl and i_file_acl_high) is larger than the on-disk
filesystem, the process which tried to access the extended attributes
will endlessly issue kernel printks complaining that
"__find_get_block_slow() failed", locking up that CPU until the system
is forcibly rebooted.
So when we read in the inode, make sure the i_file_acl value is legal,
and if not, flag the filesystem as being corrupted.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
RomFS should advance the destination buffer pointer when reading data from a
blockdev source (the data may be split over multiple blocks, each requiring its
own sb_read() call). Without this, all the data is copied to the beginning of
the output buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
romfs_lookup() should be using a routine akin to strcmp() on the backing store,
rather than one akin to strncmp(). If it uses the latter, it's liable to match
/bin/shutdown when looking up /bin/sh.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix potential inode allocation soft lockup in Orlov allocator
ext4: Make the extent validity check more paranoid
jbd: use SWRITE_SYNC_PLUG when writing synchronous revoke records
jbd2: use SWRITE_SYNC_PLUG when writing synchronous revoke records
ext4: really print the find_group_flex fallback warning only once
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
If the Orlov allocator is having trouble finding an appropriate block
group, the fallback code could loop forever, causing a soft lockup
warning in find_group_orlov():
BUG: soft lockup - CPU#0 stuck for 61s! [cp:11728]
...
Pid: 11728, comm: cp Not tainted (2.6.30-rc1-dirty #77) Lenovo
EIP: 0060:[<c021650e>] EFLAGS: 00000246 CPU: 0
EIP is at ext4_get_group_desc+0x54/0x9d
...
Call Trace:
[<c0218021>] find_group_orlov+0x2ee/0x334
[<c0120a5f>] ? sched_clock+0x8/0xb
[<c02188e3>] ext4_new_inode+0x2cf/0xb1a
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Instead of just checking that the extent block number is greater or
equal than s_first_data_block, make sure it it is not pointing into
the block group descriptors, since that is clearly wrong. This helps
prevent filesystem from getting very badly corrupted in case an extent
block is corrupted.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The revoke records must be written using the same way as the rest of
the blocks during the commit process; that is, either marked as
synchronous writes or as asynchornous writes.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The revoke records must be written using the same way as the rest of
the blocks during the commit process; that is, either marked as
synchronous writes or as asynchornous writes.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Missing braces caused the warning to print more than once.
Signed-Off-By: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Larger buffer for encrypted symlink targets
eCryptfs: Lock lower directory inode mutex during lookup
eCryptfs: Remove ecryptfs_unlink_sigs warnings
eCryptfs: Fix data corruption when using ecryptfs_passthrough
eCryptfs: Print FNEK sig properly in /proc/mounts
eCryptfs: NULL pointer dereference in ecryptfs_send_miscdev()
eCryptfs: Copy lower inode attrs before dentry instantiation
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
When using filename encryption with eCryptfs, the value of the symlink
in the lower filesystem is encrypted and stored as a Tag 70 packet.
This results in a longer symlink target than if the target value wasn't
encrypted.
Users were reporting these messages in their syslog:
[ 45.653441] ecryptfs_parse_tag_70_packet: max_packet_size is [56]; real
packet size is [51]
[ 45.653444] ecryptfs_decode_and_decrypt_filename: Could not parse tag
70 packet from filename; copying through filename as-is
This was due to bufsiz, one the arguments in readlink(), being used to
when allocating the buffer passed to the lower inode's readlink().
That symlink target may be very large, but when decoded and decrypted,
could end up being smaller than bufsize.
To fix this, the buffer passed to the lower inode's readlink() will
always be PATH_MAX in size when filename encryption is enabled. Any
necessary truncation occurs after the decoding and decrypting.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This patch locks the lower directory inode's i_mutex before calling
lookup_one_len() to find the appropriate dentry in the lower filesystem.
This bug was found thanks to the warning set in commit 2f9092e1.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
A feature was added to the eCryptfs umount helper to automatically
unlink the keys used for an eCryptfs mount from the kernel keyring upon
umount. This patch keeps the unrecognized mount option warnings for
ecryptfs_unlink_sigs out of the logs.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
ecryptfs_passthrough is a mount option that allows eCryptfs to allow
data to be written to non-eCryptfs files in the lower filesystem. The
passthrough option was causing data corruption due to it not always
being treated as a non-eCryptfs file.
The first 8 bytes of an eCryptfs file contains the decrypted file size.
This value was being written to the non-eCryptfs files, too. Also,
extra 0x00 characters were being written to make the file size a
multiple of PAGE_CACHE_SIZE.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The filename encryption key signature is not properly displayed in
/proc/mounts. The "ecryptfs_sig=" mount option name is displayed for
all global authentication tokens, included those for filename keys.
This patch checks the global authentication token flags to determine if
the key is a FEKEK or FNEK and prints the appropriate mount option name
before the signature.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
|