summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* switch xfs to generic acl caching helpersAl Viro2009-06-244-75/+9
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* helpers for acl caching + switch to thoseAl Viro2009-06-249-333/+91
| | | | | | | | | helpers: get_cached_acl(inode, type), set_cached_acl(inode, type, acl), forget_cached_acl(inode, type). ubifs/xattr.c needed includes reordered, the rest is a plain switchover. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch reiserfs to inode->i_aclAl Viro2009-06-243-34/+4
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch reiserfs to usual conventions for caching ACLsAl Viro2009-06-242-19/+14
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reiserfs: minimal fix for ACL cachingAl Viro2009-06-241-1/+1
| | | | | | | | | | | | | | | reiserfs uses NULL as "unknown" and ERR_PTR(-ENODATA) as "no ACL"; several codepaths store the former instead of the latter. All those codepaths go through iset_acl() and all cases when it's called with NULL acl are for the second variety, so the minimal fix is to teach iset_acl() to deal with that. Proper fix is to switch to more usual conventions and avoid back and forth between internally used ERR_PTR(-ENODATA) and NULL expected by the rest of the kernel. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch nilfs2 to inode->i_aclAl Viro2009-06-243-22/+0
| | | | | | | Actually, get rid of private analog, since nothing in there is using ACLs at all so far. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch btrfs to inode->i_aclAl Viro2009-06-244-27/+9
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch jffs2 to inode->i_aclAl Viro2009-06-245-48/+19
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch jfs to inode->i_aclAl Viro2009-06-244-41/+15
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch ext4 to inode->i_aclAl Viro2009-06-245-40/+10
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch ext3 to inode->i_aclAl Viro2009-06-244-36/+10
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch ext2 to inode->i_aclAl Viro2009-06-245-41/+11
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* add caching of ACLs in struct inodeAl Viro2009-06-241-0/+10
| | | | | | No helpers, no conversions yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ↵Ankit Jain2009-06-243-29/+112
| | | | | | | | | | | | | | | | ioctls This patch adds ioctls to vfs for compatibility with legacy XFS pre-allocation ioctls (XFS_IOC_*RESVP*). The implementation effectively invokes sys_fallocate for the new ioctls. Also handles the compat_ioctl case. Note: These legacy ioctls are also implemented by OCFS2. [AV: folded fixes from hch] Signed-off-by: Ankit Jain <me@ankitjain.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* cleanup __writeback_single_inodeChristoph Hellwig2009-06-241-50/+50
| | | | | | | | | | | | There is no reason to for the split between __writeback_single_inode and __sync_single_inode, the former just does a couple of checks before tail-calling the latter. So merge the two, and while we're at it split out the I_SYNC waiting case for data integrity writers, as it's logically separate function. Finally rename __writeback_single_inode to writeback_single_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ... and the same for vfsmount id/mount group idAl Viro2009-06-242-6/+26
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Make allocation of anon devices cheaperAl Viro2009-06-241-1/+6
| | | | | | | | | | | | | | | | | | | Standard trick - add a new variable (start) such that for each n < start n is known to be busy. Allocation can skip checking everything in [0..start) and if it returns n, we can set start to n + 1. Freeing below start sets start to what we'd just freed. Of course, it still sucks if we do something like free 0 allocate allocate in a loop - still O(n^2) time. However, on saner loads it improves the things a lot and the entire thing is not worth the trouble of switching to something with better worst-case behaviour. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* devpts: remove module-related codeH. Peter Anvin2009-06-241-10/+0
| | | | | | | | | | | | These days, the devpts filesystem is closely integrated with the pty memory management, and cannot be built as a module, even less removed from the kernel. Accordingly, remove all module-related stuff from this filesystem. [ v2: only remove code that's actually dead ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Switch init_mount_tree() to use the new create_mnt_ns() helperTrond Myklebust2009-06-241-9/+2
| | | | | | | Eliminates some duplicated code... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: fix nd->root leak in do_filp_open()J. R. Okajima2009-06-241-1/+10
| | | | | | | | commit 2a737871108de9ba8930f7650d549f1383767f8b "Cache root in nameidata" introduced a new member nd->root, but forgot to put it in do_filp_open(). Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reiserfs: remove stray unlock_super in reiserfs_resizeChristoph Hellwig2009-06-241-1/+0
| | | | | | | | | | Reiserfs doesn't use lock_super anywhere internally, and ->remount_fs which calls reiserfs_resize does have it currently but also expects it to be held on return, so there's no business for the unlock_super here. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked by Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* NFS: Correct the NFS mount path when following a referralTrond Myklebust2009-06-221-0/+24
| | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* NFS: Fix nfs_path() to always return a '/' at the beginning of the pathTrond Myklebust2009-06-221-0/+5
| | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespaceTrond Myklebust2009-06-221-21/+157
| | | | | | | | | | | | | | | As noted in the previous patch, the NFSv4 client mount code currently has several limitations. If the mount path contains symlinks, or referrals, or even if it just contains a '..', then the client code in nfs4_path_walk() will fail with an error. This patch replaces the nfs4_path_walk()-based lookup with a helper function that sets up a private namespace to represent the namespace on the server, then uses the ordinary VFS and NFS path lookup code to walk down the mount path in that namespace. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: Add VFS helper functions for setting up private namespacesTrond Myklebust2009-06-221-8/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of this patch is to improve the remote mount path lookup support for distributed filesystems such as the NFSv4 client. When given a mount command of the form "mount server:/foo/bar /mnt", the NFSv4 client is required to look up the filehandle for "server:/", and then look up each component of the remote mount path "foo/bar" in order to find the directory that is actually going to be mounted on /mnt. Following that remote mount path may involve following symlinks, crossing server-side mount points and even following referrals to filesystem volumes on other servers. Since the standard VFS path lookup code already supports walking paths that contain all these features (using in-kernel automounts for following referrals) we would like to be able to reuse that rather than duplicate the full path traversal functionality in the NFSv4 client code. This patch therefore defines a VFS helper function create_mnt_ns(), that sets up a temporary filesystem namespace and attaches a root filesystem to it. It exports the create_mnt_ns() and put_mnt_ns() function for use by filesystem modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: Uninline the function put_mnt_ns()Trond Myklebust2009-06-221-2/+6
| | | | | | | In order to allow modules to use it without having to export vfsmount_lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.infradead.org/mtd-2.6Linus Torvalds2009-06-222-54/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/mtd-2.6: (63 commits) mtd: OneNAND: Allow setting of boundary information when built as module jffs2: leaking jffs2_summary in function jffs2_scan_medium mtd: nand: Fix memory leak on txx9ndfmc probe failure. mtd: orion_nand: use burst reads with double word accesses mtd/nand: s3c6400 support for s3c2410 driver [MTD] [NAND] S3C2410: Use DIV_ROUND_UP [MTD] [NAND] S3C2410: Deal with unaligned lengths in S3C2440 buffer read/write [MTD] [NAND] S3C2410: Allow the machine code to get the BBT table from NAND [MTD] [NAND] S3C2410: Added a kerneldoc for s3c2410_nand_set mtd: physmap_of: Add multiple regions and concatenation support mtd: nand: max_retries off by one in mxc_nand mtd: nand: s3c2410_nand_setrate(): use correct macros for 2412/2440 mtd: onenand: add bbt_wait & unlock_all as replaceable for some platform mtd: Flex-OneNAND support mtd: nand: add OMAP2/OMAP3 NAND driver mtd: maps: Blackfin async: fix memory leaks in probe/remove funcs mtd: uclinux: mark local stuff static mtd: uclinux: do not allow to be built as a module mtd: uclinux: allow systems to override map addr/size mtd: blackfin NFC: fix hang when using NAND on BF527-EZKITs ...
| * jffs2: leaking jffs2_summary in function jffs2_scan_mediumChristian Engelmayer2009-06-151-2/+2
| | | | | | | | | | | | | | | | | | | | In case of an error returned by file_dirty() 's' is not freed as the cleanup path is skipped. Reported by Coverity. Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * Merge branch 'next-mtd' of git://aeryn.fluff.org.uk/bjdooks/linuxDavid Woodhouse2009-06-0870-719/+831
| |\
| * | mtd: Handle compat ioctls directly; remove all trace from compat_ioctl.cKevin Cernekee2009-05-291-22/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Remove all references to MTD ioctls from fs/compat_ioctl.c and let them all be handled by mtd_compat_ioctl(). Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | mtd: add OOB ioctls for >4GiB devicesKevin Cernekee2009-05-291-0/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | mtd: compat_ioctl cleanupKevin Cernekee2009-05-291-42/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Move the MEMREADOOB/MEMWRITEOOB compat_ioctl wrappers from fs/compat_ioctl.c into mtdchar.c . Original request was here: http://lkml.org/lkml/2009/4/1/295 2) Add missing COMPATIBLE_IOCTL lines, so that mtd-utils does not error out when running in 64/32 compatibility mode. LKML-Reference: <200904011650.22928.arnd@arndb.de> Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | mtd: add MEMERASE64 ioctl for >4GiB devicesKevin Cernekee2009-05-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | New MEMERASE/MEMREADOOB/MEMWRITEOOB ioctls are needed in order to support 64-bit offsets into large NAND flash devices. Signed-off-by: Kevin Cernekee <kpc.mtd@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* | | Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsdLinus Torvalds2009-06-2217-597/+1142
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits) SUNRPC: Fix the TCP server's send buffer accounting nfsd41: Backchannel: minorversion support for the back channel nfsd41: Backchannel: cleanup nfs4.0 callback encode routines nfsd41: Remove ip address collision detection case nfsd: optimise the starting of zero threads when none are running. nfsd: don't take nfsd_mutex twice when setting number of threads. nfsd41: sanity check client drc maxreqs nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct NFS: kill off complicated macro 'PROC' sunrpc: potential memory leak in function rdma_read_xdr nfsd: minor nfsd_vfs_write cleanup nfsd: Pull write-gathering code out of nfsd_vfs_write nfsd: track last inode only in use_wgather case sunrpc: align cache_clean work's timer nfsd: Use write gathering only with NFSv2 NFSv4: kill off complicated macro 'PROC' NFSv4: do exact check about attribute specified knfsd: remove unreported filehandle stats counters knfsd: fix reply cache memory corruption knfsd: reply cache cleanups ...
| * | | nfsd41: Backchannel: minorversion support for the back channelAndy Adamson2009-06-182-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prepare to share backchannel code with NFSv4.1. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> [nfsd41: use nfsd4_cb_sequence for callback minorversion] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd41: Backchannel: cleanup nfs4.0 callback encode routinesAndy Adamson2009-06-181-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mimic the client and prepare to share the back channel xdr with NFSv4.1. Bump the number of operations in each encode routine, then backfill the number of operations. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd41: Remove ip address collision detection caseMike Sager2009-06-181-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Verified that cthon and pynfs exchange id tests pass (except for the two expected fails: EID8 and EID50) Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd: optimise the starting of zero threads when none are running.NeilBrown2009-06-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if we ask to set then number of nfsd threads to zero when there are none running, we set up all the sockets and register the service, and then tear it all down again. This is pointless. So detect that case and exit promptly. (also remove an assignment to 'error' which was never used. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com>
| * | | nfsd: don't take nfsd_mutex twice when setting number of threads.NeilBrown2009-06-182-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when we write a number to 'threads' in nfsdfs, we take the nfsd_mutex, update the number of threads, then take the mutex again to read the number of threads. Mostly this isn't a big deal. However if we are write '0', and portmap happens to be dead, then we can get unpredictable behaviour. If the nfsd threads all got killed quickly and the last thread is waiting for portmap to respond, then the second time we take the mutex we will block waiting for the last thread. However if the nfsd threads didn't die quite that fast, then there will be no contention when we try to take the mutex again. Unpredictability isn't fun, and waiting for the last thread to exit is pointless, so avoid taking the lock twice. To achieve this, get nfsd_svc return a non-negative number of active threads when not returning a negative error. Signed-off-by: NeilBrown <neilb@suse.de>
| * | | nfsd41: sanity check client drc maxreqsAndy Adamson2009-06-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure the client requested maximum requests are between 1 and NFSD_MAX_SLOTS_PER_SESSION Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr ↵Alexandros Batsakis2009-06-162-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct the change is valid for both the forechannel and the backchannel (currently dummy) Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | NFS: kill off complicated macro 'PROC'Yu Zhiguo2009-06-152-56/+379
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kill off obscure macro 'PROC' of NFSv2&3 in order to make the code more clear. Among other things, this makes it simpler to grep for callers of these functions--something which has frequently caused confusion among nfs developers. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd: minor nfsd_vfs_write cleanupJ. Bruce Fields2009-06-151-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no need to check host_err >= 0 every time here when we could check host_err < 0 once, following the usual kernel style. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd: Pull write-gathering code out of nfsd_vfs_writeJ. Bruce Fields2009-06-151-30/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a relatively self-contained piece of code that handles a special case--move it to its own function. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd: track last inode only in use_wgather caseJ. Bruce Fields2009-06-151-15/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updating last_ino and last_dev probably isn't useful in the !use_wgather case. Also remove some pointless ifdef'd-out code. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd: Use write gathering only with NFSv2Trond Myklebust2009-06-151-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSv3 and above can use unstable writes whenever they are sending more than one write, rather than relying on the flaky write gathering heuristics. More often than not, write gathering is currently getting it wrong when the NFSv3 clients are sending a single write with FILE_SYNC for efficiency reasons. This patch turns off write gathering for NFSv3/v4, and ensures that it only applies to the one case that can actually benefit: namely NFSv2. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | Merge commit 'v2.6.30' into for-2.6.31J. Bruce Fields2009-06-15144-3168/+2531
| |\ \ \
| * | | | NFSv4: kill off complicated macro 'PROC'Yu Zhiguo2009-06-011-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | J. Bruce Fields wrote: ... > (This is extremely confusing code to track down: note that > proc->pc_decode is set to nfs4svc_decode_compoundargs() by the PROC() > macro at the end of fs/nfsd/nfs4proc.c. Which means, for example, that > grepping for nfs4svc_decode_compoundargs() gets you nowhere. Patches to > kill off that macro would be welcomed....) the macro 'PROC' is complicated and obscure, it had better be killed off in order to make the code more clear. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | NFSv4: do exact check about attribute specifiedYu Zhiguo2009-06-012-38/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is not supported in current environment. Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check. This bug is found when do newpynfs tests. The names of the tests that failed are following: CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s Add function do_check_fattr() to do exact check: 1, Check attribute specified is supported by the NFSv4 server or not. 2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported in current environment or not. 3, Check attribute specified is writable or not. step 1 and 3 are done in function nfsd4_decode_fattr() but removed to this function now. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | | knfsd: remove unreported filehandle stats countersGreg Banks2009-05-271-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The file nfsfh.c contains two static variables nfsd_nr_verified and nfsd_nr_put. These are counters which are incremented as a side effect of the fh_verify() fh_compose() and fh_put() operations, i.e. at least twice per NFS call for any non-trivial workload. Needless to say this makes the cacheline that contains them (and any other innocent victims) a very hot contention point indeed under high call-rate workloads on multiprocessor NFS server. It also turns out that these counters are not used anywhere. They're not reported to userspace, they're not used in logic, they're not even exported from the object file (let alone the module). All they do is waste CPU time. So this patch removes them. Tests on a 16 CPU Altix A4700 with 2 10gige Myricom cards, configured separately (no bonding). Workload is 640 client threads doing directory traverals with random small reads, from server RAM. Before ====== Kernel profile: % cumulative self self total time samples samples calls 1/call 1/call name 6.05 2716.00 2716.00 30406 0.09 1.02 svc_process 4.44 4706.00 1990.00 1975 1.01 1.01 spin_unlock_irqrestore 3.72 6376.00 1670.00 1666 1.00 1.00 svc_export_put 3.41 7907.00 1531.00 1786 0.86 1.02 nfsd_ofcache_lookup 3.25 9363.00 1456.00 10965 0.13 1.01 nfsd_dispatch 3.10 10752.00 1389.00 1376 1.01 1.01 nfsd_cache_lookup 2.57 11907.00 1155.00 4517 0.26 1.03 svc_tcp_recvfrom ... 2.21 15352.00 1003.00 1081 0.93 1.00 nfsd_choose_ofc <---- ^^^^ Here the function nfsd_choose_ofc() reads a global variable which by accident happened to be located in the same cacheline as nfsd_nr_verified. Call rate: nullarbor:~ # pmdumptext nfs3.server.calls ... Thu Dec 13 00:15:27 184780.663 Thu Dec 13 00:15:28 184885.881 Thu Dec 13 00:15:29 184449.215 Thu Dec 13 00:15:30 184971.058 Thu Dec 13 00:15:31 185036.052 Thu Dec 13 00:15:32 185250.475 Thu Dec 13 00:15:33 184481.319 Thu Dec 13 00:15:34 185225.737 Thu Dec 13 00:15:35 185408.018 Thu Dec 13 00:15:36 185335.764 After ===== kernel profile: % cumulative self self total time samples samples calls 1/call 1/call name 6.33 2813.00 2813.00 29979 0.09 1.01 svc_process 4.66 4883.00 2070.00 2065 1.00 1.00 spin_unlock_irqrestore 4.06 6687.00 1804.00 2182 0.83 1.00 nfsd_ofcache_lookup 3.20 8110.00 1423.00 10932 0.13 1.00 nfsd_dispatch 3.03 9456.00 1346.00 1343 1.00 1.00 nfsd_cache_lookup 2.62 10622.00 1166.00 4645 0.25 1.01 svc_tcp_recvfrom [...] 0.10 42586.00 44.00 74 0.59 1.00 nfsd_choose_ofc <--- HA!! ^^^^ Call rate: nullarbor:~ # pmdumptext nfs3.server.calls ... Thu Dec 13 01:45:28 194677.118 Thu Dec 13 01:45:29 193932.692 Thu Dec 13 01:45:30 194294.364 Thu Dec 13 01:45:31 194971.276 Thu Dec 13 01:45:32 194111.207 Thu Dec 13 01:45:33 194999.635 Thu Dec 13 01:45:34 195312.594 Thu Dec 13 01:45:35 195707.293 Thu Dec 13 01:45:36 194610.353 Thu Dec 13 01:45:37 195913.662 Thu Dec 13 01:45:38 194808.675 i.e. about a 5.3% improvement in call rate. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Reviewed-by: David Chinner <dgc@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
OpenPOWER on IntegriCloud