| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
* 'compat-ioctl-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
usbdevfs: move compat_ioctl handling to devio.c
lp: move compat_ioctl handling into lp.c
compat_ioctl: pass compat pointer directly to handlers
compat_ioctl: simplify lookup table
compat_ioctl: simplify calling of handlers
compat_ioctl: inline all conversion handlers
compat_ioctl: Remove BKL
compat_ioctl: remove all VT ioctl handling
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Half the compat_ioctl handling is in devio.c, the other
half is in fs/compat_ioctl.c. This moves everything into
one place for consistency.
As a positive side-effect, push down the BKL into the
ioctl methods.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Cc: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: David Vrabel <david.vrabel@csr.com>
Cc: linux-usb@vger.kernel.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Handling for LPSETTIMEOUT can easily be done in lp_ioctl, which
is the only user. As a positive side-effect, push the BKL
into the ioctl methods.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of having each handler call compat_ptr, we can now
convert the pointer once and pass that to each handler.
This saves a little bit of both source and object code size.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The compat_ioctl table now only contains entries for
COMPATIBLE_IOCTL, so we only need to know if a number
is listed in it or now.
As an optimization, we hash the table entries with a
reversible transformation to get a more uniform distribution
over it, sort the table at startup and then guess the
position in the table when an ioctl number gets called
to do a linear search from there.
With the current set of ioctl numbers and the chosen
transformation function, we need an average of four
steps to find if a number is in the set, all of the
accesses within one or two cache lines.
This at least as good as the previous hash table
approach but saves 8.5 kb of kernel memory.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The compat_ioctl array now contains only entries for ioctl numbers
that do not require a separate handler. By special-casing the
ULONG_IOCTL case in the do_ioctl_trans function, we can kill the
final use of a function pointer in the array.
text data bss dec hex filename
7539 13352 2080 22971 59bb before/fs/compat_ioctl.o
7910 8552 2080 18542 486e after/fs/compat_ioctl.o
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This makes all ioctl conversion handlers called from
a single switch statement, leaving only COMPATIBLE_IOCTL
and ULONG_IOCTL statements in the table. This is somewhat
more space efficient and also lets us simplify the
handling of the lookup table significantly.
before:
text data bss dec hex filename
7619 14024 2080 23723 5cab obj/fs/compat_ioctl.o
after:
7567 13352 2080 22999 59d7 obj/fs/compat_ioctl.o
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We have always called ioctl conversion handlers under the big kernel lock,
although that is generally not necessary. In particular it is not needed
for conversion of data structures and for calling sys_ioctl or
do_vfs_ioctl, which will get the BKL again if needed.
Handlers doing more than those two have been moved out, so we can kill off
the BKL from compat_sys_ioctl. This may significantly improve latencies
with 32 bit applications, and it avoids a common scenario where a thread
acquires the BKL twice.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The VT driver now handles all of these ioctls directly, so we can remove
the handlers from common code.
These are the only handlers that require the BKL because they directly
perform the ioctl action rather than just converting the data structures.
Once they are gone, we can remove the BKL from the remaining ioctl
conversion handlers.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (58 commits)
tty: split the lock up a bit further
tty: Move the leader test in disassociate
tty: Push the bkl down a bit in the hangup code
tty: Push the lock down further into the ldisc code
tty: push the BKL down into the handlers a bit
tty: moxa: split open lock
tty: moxa: Kill the use of lock_kernel
tty: moxa: Fix modem op locking
tty: moxa: Kill off the throttle method
tty: moxa: Locking clean up
tty: moxa: rework the locking a bit
tty: moxa: Use more tty_port ops
tty: isicom: fix deadlock on shutdown
tty: mxser: Use the new locking rules to fix setserial properly
tty: mxser: use the tty_port_open method
tty: isicom: sort out the board init logic
tty: isicom: switch to the new tty_port_open helper
tty: tty_port: Add a kref object to the tty port
tty: istallion: tty port open/close methods
tty: stallion: Convert to the tty_port_open/close methods
...
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
devpts_get_tty() assumes that the inode passed in is associated with a valid
pty. But if the only reference to the pty is via a bind-mount, the inode
passed to devpts_get_tty() while valid, would refer to a pty that no longer
exists.
With a lot of debug effort, Grzegorz Nosek developed a small program (see
below) to reproduce a crash on recent kernels. This crash is a regression
introduced by the commit:
commit 527b3e4773628b30d03323a2cb5fb0d84441990f
Author: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Date: Mon Oct 13 10:43:08 2008 +0100
To fix, ensure that the dentry associated with the inode has not yet been
deleted/unhashed by devpts_pty_kill().
See also:
https://lists.linux-foundation.org/pipermail/containers/2009-July/019273.html
tty-bug.c:
#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/signal.h>
#include <unistd.h>
#include <stdio.h>
#include <linux/fs.h>
void dummy(int sig)
{
}
static int child(void *unused)
{
int fd;
signal(SIGINT, dummy); signal(SIGHUP, dummy);
pause(); /* cheesy synchronisation to wait for /dev/pts/0 to appear */
mount("/dev/pts/0", "/dev/console", NULL, MS_BIND, NULL);
sleep(2);
fd = open("/dev/console", O_RDWR);
dup(0); dup(0);
write(1, "Hello world!\n", sizeof("Hello world!\n")-1);
return 0;
}
int main(void)
{
pid_t pid;
char *stack;
stack = malloc(16384);
pid = clone(child, stack+16384, CLONE_NEWNS|SIGCHLD, NULL);
open("/dev/ptmx", O_RDWR|O_NOCTTY|O_NONBLOCK);
unlockpt(fd); grantpt(fd);
sleep(2);
kill(pid, SIGHUP);
sleep(1);
return 0; /* exit before child opens /dev/console */
}
Reported-by: Grzegorz Nosek <root@localdomain.pl>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits)
ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
ext3: Fix data / filesystem corruption when write fails to copy data
ext4: Support for 64-bit quota format
ext3: Support for vfsv1 quota format
quota: Implement quota format with 64-bit space and inode limits
quota: Move definition of QFMT_OCFS2 to linux/quota.h
ext2: fix comment in ext2_find_entry about return values
ext3: Unify log messages in ext3
ext2: clear uptodate flag on super block I/O error
ext2: Unify log messages in ext2
ext3: make "norecovery" an alias for "noload"
ext3: Don't update the superblock in ext3_statfs()
ext3: journal all modifications in ext3_xattr_set_handle
ext2: Explicitly assign values to on-disk enum of filetypes
quota: Fix WARN_ON in lookup_one_len
const: struct quota_format_ops
ubifs: remove manual O_SYNC handling
afs: remove manual O_SYNC handling
kill wait_on_page_writeback_range
vfs: Implement proper O_SYNC semantics
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Return the PTR_ERR of the correct pointer.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When ext3_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks already
instantiated beyond i_size. Although these blocks were never inside i_size, we
have to truncate pagecache of these blocks so that corresponding buffers get
unmapped. Otherwise subsequent __block_prepare_write (called because we are
retrying the write) will find the buffers mapped, not call ->get_block, and
thus the page will be backed by already freed blocks leading to filesystem and
data corruption.
Reported-by: James Y Knight <foom@fuhm.net>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add support for new 64-bit quota format. It is enough to add proper
mount options handling. The rest is done by the generic code.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We just have to add proper mount options handling. The rest is handled by
the generic quota code.
CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
So far the maximum quota space limit was 4TB. Apparently this isn't enough
for Lustre guys anymore. So implement new quota format which raises block
limits to 2^64 bytes. Also store number of inodes and inode limits in
64-bit variables as 2^32 files isn't that insanely high anymore.
The first version of the patch has been developed by Andrew Perepechko
<Andrew.Perepechko@Sun.COM>.
CC: Andrew.Perepechko@Sun.COM
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Move definition of this constant to linux/quota.h so that it
cannot clash with other format IDs.
CC: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Jérémy Cochoy <jeremy.cochoy@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make messages produced by ext3 more unified. It should be
easy to parse.
dmesg before patch:
[ 4893.684892] reservations ON
[ 4893.684896] xip option not supported
[ 4893.684964] EXT3-fs warning: maximal mount count reached, running
e2fsck is recommended
dmesg after patch:
[ 873.300792] EXT3-fs (loop0): using internal journaln
[ 873.300796] EXT3-fs (loop0): mounted filesystem with writeback data mode
[ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0.
[ 723.755642] EXT3-fs (loop0): error: bad blocksize 8192
[ 357.874687] EXT3-fs (loop0): error: no journal found. mounting ext3 over ext2?
[ 873.300764] EXT3-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended
[ 924.163657] EXT3-fs (loop0): error: can't find ext3 filesystem on dev loop0.
Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes a WARN backtrace in mark_buffer_dirty() that occurs during
unmount when a USB or floppy device is removed. I reported this a kernel
regression, but looks like it might have been there for longer
than that.
The super block update from a previous operation has marked the buffer
as in error, and the flag has to be cleared before doing the update.
(Similar code already exists in ext4).
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
make messages produced by ext2 more unified. It should be
easy to parse.
dmesg before patch:
[ 4893.684892] reservations ON
[ 4893.684896] xip option not supported
[ 4893.684961] EXT2-fs warning: mounting ext3 filesystem as ext2
[ 4893.684964] EXT2-fs warning: maximal mount count reached, running
e2fsck is recommended
[ 4893.684990] EXT II FS: 0.5b, 95/08/09, bs=1024, fs=1024, gc=2,
bpg=8192, ipg=1280, mo=80010]
dmesg after patch:
[ 4893.684892] EXT2-fs (loop0): reservations ON
[ 4893.684896] EXT2-fs (loop0): xip option not supported
[ 4893.684961] EXT2-fs (loop0): warning: mounting ext3 filesystem as
ext2
[ 4893.684964] EXT2-fs (loop0): warning: maximal mount count reached,
running e2fsck is recommended
[ 4893.684990] EXT2-fs (loop0): 0.5b, 95/08/09, bs=1024, fs=1024, gc=2,
bpg=8192, ipg=1280, mo=80010]
Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net>
Reviewed-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Users on the list recently complained about differences across
filesystems w.r.t. how to mount without a journal replay.
In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext3.
Also show this status in /proc/mounts
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext3_statfs()
to update the on-disk superblock counters, but modified this buffer
directly without any journaling of the change. This is one of the
accesses that was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
The modifications were originally to keep the sb "more" in sync,
so that a readonly fsck of the device didn't flag this as an
error (as often), but apparently e2fsprogs deals with this differently
now, anyway.
Based on Ted's patch for ext4, which was in turn based on my
work on that bug and another preliminary patch...
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ext3_xattr_set_handle() was zeroing out an inode outside
of journaling constraints; this is one of the accesses that
was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
Although ext3 doesn't have the crc issue, modifications
out of journal control are a Bad Thing.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We should hold i_mutex when looking up quota files for journaled quotas,
otherwise a WARN_ON in lookup_one_len triggers. The fact that we didn't
hold i_mutex previously probably could not lead to a real bug since the
filesystem is just being mounted / remounted read-write and thus the
root directory cannot change anyway but it's definitely cleaner with
i_mutex.
Reported-by: Bastien ROUCARIES <roucaries.bastien@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC.
Remove the duplicate call to ubifs_sync_wbufs_by_inode which is already
covered by ubifs_fsync.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC.
Remove the duplicate manual invocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
All callers really want the more logical filemap_fdatawait_range interface,
so convert them to use it and merge wait_on_page_writeback_range into
filemap_fdatawait_range.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment. This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.
This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag. To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.
This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition. Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set. The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.
We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.
Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op. We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
compress block size
Also split and cleanup zisofs_readpage() when we are changing it anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: Fix error return for fallocate() on XFS
xfs: cleanup dmapi macros in the umount path
xfs: remove incorrect sparse annotation for xfs_iget_cache_miss
xfs: kill the STATIC_INLINE macro
xfs: uninline xfs_get_extsz_hint
xfs: rename xfs_attr_fetch to xfs_attr_get_int
xfs: simplify xfs_buf_get / xfs_buf_read interfaces
xfs: remove IO_ISAIO
xfs: Wrapped journal record corruption on read at recovery
xfs: cleanup data end I/O handlers
xfs: use WRITE_SYNC_PLUG for synchronous writeout
xfs: reset the i_iolock lock class in the reclaim path
xfs: I/O completion handlers must use NOFS allocations
xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocks
xfs: simplify inode teardown
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Noticed that through glibc fallocate would return 28 rather than -1
and errno = 28 for ENOSPC. The xfs routines uses XFS_ERROR format
positive return error codes while the syscalls use negative return
codes. Fixup the two cases in xfs_vn_fallocate syscall to convert to
negative.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Stop the flag saving as we never mangle those in the unmount path, and
hide all the weird arguents to the dmapi code inside the
XFS_SEND_PREUNMOUNT / XFS_SEND_UNMOUNT macros.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
xfs_iget_cache_miss does not get called with the pag_ici_lock held, so
the __releases annotation is incorrect.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Remove our own STATIC_INLINE macro. For small function inside
implementation files just use STATIC and let gcc inline it, and for
those in headers do the normal static inline - they are all small
enough to be inlined for debug builds, too.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This function is too large to efficiently be inlined.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Using a totally different name for the low-level get operation does
not fit the _int convention used in the rest of the attr code, so
rename it.
While we're at it also fix the prototype to use the normal convention
and mark it static as it's never used outside of xfs_attr.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently the low-level buffer cache interfaces are highly confusing
as we have a _flags variant of each that does actually respect the
flags, and one without _flags which has a flags argument that gets
ignored and overriden with a default set. Given that very few places
use the default arguments get rid of the duplication and convert all
callers to pass the flags explicitly. Also remove the now confusing
_flags postfix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We set the IO_ISAIO flag for all read/write I/O since early Linux
2.6.x. Remove it as it has lost it's purpose long ago.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Summary of problem:
If a journal record wraps at the physical end of the journal, it has to be
read in two parts in xlog_do_recovery_pass(): a read at the physical end and a
read at the physical beginning. If xlog_bread() has to re-align the first
read, the second read request does not take that re-alignment into account.
If the first read was re-aligned, the second read over-writes the end of the
data from the first read, effectively corrupting it. This can happen either
when reading the record header or reading the record data.
The first sanity check in xlog_recover_process_data() is to check for a valid
clientid, so that is the error reported.
Summary of fix:
If there was a first read at the physical end, XFS_BUF_PTR() returns where the
data was requested to begin. Conversely, because it is the result of
xlog_align(), offset indicates where the requested data for the first read
actually begins - whether or not xlog_bread() has re-aligned it.
Using offset as the base for the calculation of where to place the second read
data ensures that it will be correctly placed immediately following the data
from the first read instead of sometimes over-writing the end of it.
The attached patch has resolved the reported problem of occasional inability
to recover the journal (reporting "bad clientid").
Signed-off-by: Andy Poling <andy@realbig.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently we have different end I/O handlers for read vs the different
types of write I/O. But they are all very similar so we could just
use one with a few conditionals and reduce code size a lot.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The VM and I/O schedulers now expect us to use WRITE_SYNC_PLUG for
synchronous writeout. Right now I can't see any changes in performance
numbers with this, but we're getting some beating for not using it,
and the knowledge definitely could help the block code to make better
decisions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The iolock is used for protecting reads, writes and block truncates
against each other. We have two classes of callers, the first one is
induced by a file operation and requires a reference to the inode be
held and not dropped after the operation is done:
- xfs_vm_vmap, xfs_vn_fallocate, xfs_read, xfs_write, xfs_splice_read,
xfs_splice_write and xfs_setattr are all implementations of VFS
methods that require a live inode
- xfs_getbmap and xfs_swap_extents are ioctl subcommand for which the
same is true
- xfs_truncate_file is only called on quota inodes just returned from
xfs_iget
- xfs_sync_inode_data does the lock just after an igrab()
- xfs_filestream_associate and xfs_filestream_new_ag take the iolock
on the parent inode of an inode which by VFS rules must be referenced
And we have various calls to truncate blocks past EOF or the whole
file when dropping the last reference to an inode. Unfortunately
lockdep complains when we do memory allocations that can recurse into
the filesystem in the first class because the second class happens to
take the same lock. To avoid this re-init the iolock in the beginning
of xfs_fs_clear_inode to get a new lock class.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When completing I/O requests we must not allow the memory allocator to
recurse into the filesystem, as we might deadlock on waiting for the
I/O completion otherwise. The only thing currently allocating normal
GFP_KERNEL memory is the allocation of the transaction structure for
the unwritten extent conversion. Add a memflags argument to
_xfs_trans_alloc to allow controlling the allocator behaviour.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Thomas Neumann <tneumann@users.sourceforge.net>
Tested-by: Thomas Neumann <tneumann@users.sourceforge.net>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When xfs_free_eofblocks is called from ->release the VM might already
hold the mmap_sem, but in the write path we take the iolock before
taking the mmap_sem in the generic write code.
Switch xfs_free_eofblocks to only trylock the iolock if called from
->release and skip trimming the prellocated blocks in that case.
We'll still free them later on the final iput.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently the reclaim code for the case where we don't reclaim the
final reclaim is overly complicated. We know that the inode is clean
but instead of just directly reclaiming the clean inode we go through
the whole process of marking the inode reclaimable just to directly
reclaim it from the calling context. Besides being overly complicated
this introduces a race where iget could recycle an inode between
marked reclaimable and actually being reclaimed leading to panics.
This patch gets rid of the existing reclaim path, and replaces it with
a simple call to xfs_ireclaim if the inode was clean. While we're at
it we also use the slightly more lax xfs_inode_clean check we'd use
later to determine if we need to flush the inode here.
Finally get rid of xfs_reclaim function and place the remaining small
bits of reclaim code directly into xfs_fs_destroy_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Reported-by: Tommy van Leeuwen <tommy@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (27 commits)
Driver core: fix race in dev_driver_string
Driver Core: Early platform driver buffer
sysfs: sysfs_setattr remove unnecessary permission check.
sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir
sysfs: Propagate renames to the vfs on demand
sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish
sysfs: In sysfs_chmod_file lazily propagate the mode change.
sysfs: Implement sysfs_getattr & sysfs_permission
sysfs: Nicely indent sysfs_symlink_inode_operations
sysfs: Update s_iattr on link and unlink.
sysfs: Fix locking and factor out sysfs_sd_setattr
sysfs: Simplify iattr time assignments
sysfs: Simplify sysfs_chmod_file semantics
sysfs: Use dentry_ops instead of directly playing with the dcache
sysfs: Rename sysfs_d_iput to sysfs_dentry_iput
sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex
debugfs: fix create mutex racy fops and private data
Driver core: Don't remove kobjects in device_shutdown.
firmware_class: make request_firmware_nowait more useful
Driver-Core: devtmpfs - set root directory mode to 0755
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
inode_change_ok already clears the SGID bit when necessary
so there is no reason for sysfs_setattr to carry code to do
the same, and it is good to kill the extra copy because when
I moved the code last in certain corner cases the code will
look at the wrong gid.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|