blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	fs: Limit sys_mount to only request filesystem modules.	Eric W. Biederman	2013-03-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2013-02-26	4	-9/+9
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle and ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
\| *	new helper: file_inode(file)	Al Viro	2013-02-22	4	-9/+9
\| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	coda: Cache permisions in struct coda_inode_info in a kuid_t.	Eric W. Biederman	2013-02-13	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Change c_uid in struct coda_indoe_info from a vuid_t to a kuid_t. - Initialize c_uid to GLOBAL_ROOT_UID instead of 0. - Use uid_eq to compare cached kuids. Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* \|	coda: Restrict coda messages to the initial user namespace	Eric W. Biederman	2013-02-13	3	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the slight chance that uids and gids in coda messages will be interpreted in the wrong user namespace. - Only allow processes in the initial user namespace to open the coda character device to communicate with coda filesystems. - Explicitly convert the uids in the coda header into the initial user namespace. - In coda_vattr_to_attr make kuids and kgids from the initial user namespace uids and gids in struct coda_vattr that just came from userspace. - In coda_iattr_to_vattr convert kuids and kgids into uids and gids in the intial user namespace and store them in struct coda_vattr for sending to coda userspace programs. Nothing needs to be changed with mounts as coda does not support being mounted in anything other than the initial user namespace. Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* \|	coda: Restrict coda messages to the initial pid namespace	Eric W. Biederman	2013-02-13	3	-2/+10
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the slight chance that pids in coda messages will be interpreted in the wrong pid namespace. - Explicitly send all pids in coda messages in the initial pid namespace. - Only allow mounts from processes in the initial pid namespace. - Only allow processes in the initial pid namespace to open the coda character device to communicate with coda. Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
*	fs: push rcu_barrier() from deactivate_locked_super() to filesystems	Kirill A. Shutemov	2012-10-02	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fast paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	switch simple cases of fget_light to fdget	Al Viro	2012-09-26	1	-7/+7
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	switch coda get_device_index() to fget_light()	Al Viro	2012-09-26	1	-17/+15
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	don't pass nameidata to ->create()	Al Viro	2012-07-14	1	-2/+2
\| \| \| \| \| \| \| \|	boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	stop passing nameidata to ->lookup()	Al Viro	2012-07-14	1	-2/+2
\| \| \| \| \| \| \| \| \|	Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	stop passing nameidata * to ->d_revalidate()	Al Viro	2012-07-14	1	-3/+3
\| \| \| \| \| \|	Just the lookup flags. Die, bastard, die... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	coda: use list_for_each_entry	Al Viro	2012-07-14	1	-7/+3
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: Rename end_writeback() to clear_inode()	Jan Kara	2012-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \|	After we moved inode_sync_wait() from end_writeback() it doesn't make sense to call the function end_writeback() anymore. Rename it to clear_inode() which well says what the function really does - set I_CLEAR flag. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
*	Remove all #inclusions of asm/system.h	David Howells	2012-03-28	3	-3/+0
\| \| \| \| \| \| \| \| \|	Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\sinclude\s<asm/system[.]h>.\n!!' `grep -Irl '^#\sinclude\s<asm/system[.]h>' ` Signed-off-by: David Howells <dhowells@redhat.com>
*	switch open-coded instances of d_make_root() to new helper	Al Viro	2012-03-20	1	-2/+1
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	coda: clean failure exits in coda_fill_super()	Al Viro	2012-03-20	1	-4/+1
\| \| \| \| \| \|	same as for cifs, move iput() to the right place, make it unconditional Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	coda: switch coda_cnode_make() to sane API as well, clean coda_lookup()	Al Viro	2012-01-10	4	-31/+27
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	coda: deal correctly with allocation failure from coda_cnode_makectl()	Al Viro	2012-01-10	3	-15/+12
\| \| \| \| \| \| \|	lookup should fail with ENOMEM, not silently make dentry negative. Switched to saner calling conventions, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	switch ->create() to umode_t	Al Viro	2012-01-03	1	-2/+2
\| \| \| \| \| \| \| \|	vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	switch vfs_mkdir() and ->mkdir() to umode_t	Al Viro	2012-01-03	1	-2/+2
\| \| \| \| \| \| \|	vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: fix the stupidity with i_dentry in inode destructors	Al Viro	2012-01-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	filesystems: add set_nlink()	Miklos Szeredi	2011-11-02	1	-1/+1
\| \| \| \| \| \| \| \| \|	Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
*	filesystems: add missing nlink wrappers	Miklos Szeredi	2011-11-02	1	-1/+1
\| \| \| \| \| \| \|	Replace direct i_nlink updates with the respective updater function (inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
*	fs: Convert vmalloc/memset to vzalloc	Joe Perches	2011-09-15	1	-3/+2
\| \| \| \| \| \|	Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Alex Elder <aelder@sgi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
*	fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers	Josef Bacik	2011-07-20	2	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	don't open-code parent_ino() in assorted ->readdir()	Al Viro	2011-07-20	1	-1/+1
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	coda_venus_readdir(): use offsetof()	Al Viro	2011-07-20	1	-2/+1
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	->permission() sanitizing: don't pass flags to ->permission()	Al Viro	2011-07-20	3	-5/+5
\| \| \| \| \| \|	not used by the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	coda_ioctl_permission() is safe in RCU mode	Al Viro	2011-06-20	1	-2/+0
\| \| \| \| \| \|	return (mask & MAY_EXEC) ? -EACCES : 0; is non-blocking... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	coda: remove unnecessary dentry_unhash on rmdir, dir rename	Sage Weil	2011-05-28	1	-5/+0
\| \| \| \| \| \| \| \| \| \|	Coda has no problems with references to unlinked directories. CC: Jan Harkes <jaharkes@cs.cmu.edu> CC: coda@cs.cmu.edu CC: codalist@coda.cs.cmu.edu Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: push dentry_unhash on rename_dir into file systems	Sage Weil	2011-05-26	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Only a few file systems need this. Start by pushing it down into each rename method (except gfs2 and xfs) so that it can be dealt with on a per-fs basis. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: push dentry_unhash on rmdir into file systems	Sage Weil	2011-05-26	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Only a few file systems need this. Start by pushing it down into each fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs basis. This does not change behavior for any in-tree file systems. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	codafs: fix build break when CONFIG_PROC_SYSCTL=n	Rakib Mullick	2011-03-25	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 0bc825d240ab ("codafs: fix compile warning when CONFIG_SYSCTL=n") introduces build breakage, when CONFIG_PROC_SYSCTL=n and CONFIG_CODA_FS=y: fs/built-in.o: In function `init_coda': psdev.c:(.init.text+0xc02): undefined reference to `coda_sysctl_init' psdev.c:(.init.text+0xc7c): undefined reference to `coda_sysctl_clean' fs/built-in.o: In function `exit_coda': psdev.c:(.exit.text+0xa9): undefined reference to `coda_sysctl_clean' make: *** [.tmp_vmlinux1] Error 1 Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Reported-by: Ingo Molnar <mingo@elte.hu> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	codafs: fix compile warning when CONFIG_SYSCTL=n	Rakib Mullick	2011-03-22	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When CONFIG_SYSCTL=n, we get the following warning: fs/coda/sysctl.c:18: warning: `coda_tabl' defined but not used Fix the warning by making sure coda_table and it's callee function are in the same context. Also clean up the code by removing extra #ifdef. [akpm@linux-foundation.org: remove unneeded stub macros] Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fs: change to new flag variable	matt mooney	2011-03-17	1	-1/+1
\| \| \| \| \| \| \| \| \|	Replace EXTRA_CFLAGS with ccflags-y. And change ntfs-objs to ntfs-y for cleaner conditional inclusion. Signed-off-by: matt mooney <mfm@muteddisk.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2011-01-13	13	-27/+200
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits) fs: add documentation on fallocate hole punching Gfs2: fail if we try to use hole punch Btrfs: fail if we try to use hole punch Ext4: fail if we try to use hole punch Ocfs2: handle hole punching via fallocate properly XFS: handle hole punching via fallocate properly fs: add hole punching to fallocate vfs: pass struct file to do_truncate on O_TRUNC opens (try #2) fix signedness mess in rw_verify_area() on 64bit architectures fs: fix kernel-doc for dcache::prepend_path fs: fix kernel-doc for dcache::d_validate sanitize ecryptfs ->mount() switch afs move internal-only parts of ncpfs headers to fs/ncpfs switch ncpfs switch 9p pass default dentry_operations to mount_pseudo() switch hostfs switch affs switch configfs ...
\| *	take coda-private headers out of include/linux	Al Viro	2011-01-12	13	-24/+198
\| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	switch coda	Al Viro	2011-01-12	2	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Coda ->d_revalidate() actually checks for root, ->d_delete() is irrelevant. So we can use the same d_op for all coda dentries Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	Merge branch 'for-next' of ↵	Linus Torvalds	2011-01-13	1	-1/+1
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
\| *	coda: kill redundant cast in coda_alloc_inode()	Jesper Juhl	2010-12-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	kmem_cache_alloc() returns a void pointer which there is no need to cast. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* \|	fs: provide rcu-walk aware permission i_ops	Nick Piggin	2011-01-07	2	-3/+8
\| \| \| \| \| \| \| \|	Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \|	fs: rcu-walk aware d_revalidate method	Nick Piggin	2011-01-07	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \|	fs: dcache reduce branches in lookup path	Nick Piggin	2011-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/\([^\t ]\)->d_op = \(.\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]\)\.d_op = \(.\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \|	fs: icache RCU free inodes	Nick Piggin	2011-01-07	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \|	fs: dcache remove dcache_lock	Nick Piggin	2011-01-07	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \|	fs: dcache scale subdirs	Nick Piggin	2011-01-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \|	fs: dcache scale dentry refcount	Nick Piggin	2011-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \|	fs: change d_delete semantics	Nick Piggin	2011-01-07	1	-2/+2
\|/ \| \| \| \| \| \| \| \| \| \| \|	Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
*	convert get_sb_nodev() users	Al Viro	2010-10-29	1	-4/+4
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>