summaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
Commit message (Collapse)AuthorAgeFilesLines
* NTFS: writev() fix and maintenance/contact details updateAnton Altaparmakov2011-01-121-0/+3
| | | | | | | | | Fix writev() to not keep writing the first segment over and over again instead of moving onto subsequent segments and update the NTFS entry in MAINTAINERS to reflect that Tuxera Inc. now supports the NTFS driver. Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'tty-next' of ↵Linus Torvalds2011-01-071-0/+24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 * 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (36 commits) serial: apbuart: Fixup apbuart_console_init() TTY: Add tty ioctl to figure device node of the system console. tty: add 'active' sysfs attribute to tty0 and console device drivers: serial: apbuart: Handle OF failures gracefully Serial: Avoid unbalanced IRQ wake disable during resume tty: fix typos/errors in tty_driver.h comments pch_uart : fix warnings for 64bit compile 8250: fix uninitialized FIFOs ip2: fix compiler warning on ip2main_pci_tbl specialix: fix compiler warning on specialix_pci_tbl rocket: fix compiler warning on rocket_pci_ids 8250: add a UPIO_DWAPB32 for 32 bit accesses 8250: use container_of() instead of casting serial: omap-serial: Add support for kernel debugger serial: fix pch_uart kconfig & build drivers: char: hvc: add arm JTAG DCC console support RS485 documentation: add 16C950 UART description serial: ifx6x60: fix memory leak serial: ifx6x60: free IRQ on error Serial: EG20T: add PCH_UART driver ... Fixed up conflicts in drivers/serial/apbuart.c with evil merge that makes the code look fairly sane (unlike either side).
| * console: add /proc/consolesJiri Slaby2010-11-161-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It allows users to see what consoles are currently known to the system and with what flags. It is based on Werner's patch, the part about traversing fds was removed, the code was moved to kernel/printk.c, where consoles are handled and it makes more sense to me. Signed-off-by: Jiri Slaby <jslaby@suse.cz> [cleanups] Signed-off-by: "Dr. Werner Fink" <werner@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | fs: provide rcu-walk aware permission i_opsNick Piggin2011-01-074-7/+58
| | | | | | | | Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: rcu-walk aware d_revalidate methodNick Piggin2011-01-074-12/+40
| | | | | | | | | | | | | | | | Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: rcu-walk for path lookupNick Piggin2011-01-072-172/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perform common cases of path lookups without any stores or locking in the ancestor dentry elements. This is called rcu-walk, as opposed to the current algorithm which is a refcount based walk, or ref-walk. This results in far fewer atomic operations on every path element, significantly improving path lookup performance. It also avoids cacheline bouncing on common dentries, significantly improving scalability. The overall design is like this: * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk. * Take the RCU lock for the entire path walk, starting with the acquiring of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are not required for dentry persistence. * synchronize_rcu is called when unregistering a filesystem, so we can access d_ops and i_ops during rcu-walk. * Similarly take the vfsmount lock for the entire path walk. So now mnt refcounts are not required for persistence. Also we are free to perform mount lookups, and to assume dentry mount points and mount roots are stable up and down the path. * Have a per-dentry seqlock to protect the dentry name, parent, and inode, so we can load this tuple atomically, and also check whether any of its members have changed. * Dentry lookups (based on parent, candidate string tuple) recheck the parent sequence after the child is found in case anything changed in the parent during the path walk. * inode is also RCU protected so we can load d_inode and use the inode for limited things. * i_mode, i_uid, i_gid can be tested for exec permissions during path walk. * i_op can be loaded. When we reach the destination dentry, we lock it, recheck lookup sequence, and increment its refcount and mountpoint refcount. RCU and vfsmount locks are dropped. This is termed "dropping rcu-walk". If the dentry refcount does not match, we can not drop rcu-walk gracefully at the current point in the lokup, so instead return -ECHILD (for want of a better errno). This signals the path walking code to re-do the entire lookup with a ref-walk. Aside from the final dentry, there are other situations that may be encounted where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take a reference on the last good dentry) and continue with a ref-walk. Again, if we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup using ref-walk. But it is very important that we can continue with ref-walk for most cases, particularly to avoid the overhead of double lookups, and to gain the scalability advantages on common path elements (like cwd and root). The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * dentries with d_revalidate * Following links In future patches, permission checks and d_revalidate become rcu-walk aware. It may be possible eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: icache RCU free inodesNick Piggin2011-01-071-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: dcache remove dcache_lockNick Piggin2011-01-073-30/+34
| | | | | | | | | | | | dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: change d_hash for rcu-walkNick Piggin2011-01-073-4/+16
| | | | | | | | | | | | | | | | | | Change d_hash so it may be called from lock-free RCU lookups. See similar patch for d_compare for details. For in-tree filesystems, this is just a mechanical change. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: change d_compare for rcu-walkNick Piggin2011-01-073-4/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | Change d_compare so it may be called from lock-free RCU lookups. This does put significant restrictions on what may be done from the callback, however there don't seem to have been any problems with in-tree fses. If some strange use case pops up that _really_ cannot cope with the rcu-walk rules, we can just add new rcu-unaware callbacks, which would cause name lookup to drop out of rcu-walk mode. For in-tree filesystems, this is just a mechanical change. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | fs: change d_delete semanticsNick Piggin2011-01-072-14/+21
| | | | | | | | | | | | | | | | | | | | | | | | Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | remove trim_fs method from Documentation/filesystems/LockingChristoph Hellwig2011-01-041-2/+0
| | | | | | | | | | | | | | | | | | The ->trim_fs has been removed meanwhile, so remove it from the documentation as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | update Documentation/filesystems/LockingChristoph Hellwig2010-12-301-112/+102
| | | | | | | | | | | | | | | | Mostly inspired by all the recent BKL removal changes, but a lot of older updates also weren't properly recorded. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2010-12-142-1/+13
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix panic after nfs_umount() nfs: remove extraneous and problematic calls to nfs_clear_request nfs: kernel should return EPROTONOSUPPORT when not support NFSv4 NFS: Fix fcntl F_GETLK not reporting some conflicts nfs: Discard ACL cache on mode update NFS: Readdir cleanups NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found NFS: Fix a memory leak in nfs_readdir Call the filesystem back whenever a page is removed from the page cache NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler
| * | Call the filesystem back whenever a page is removed from the page cacheLinus Torvalds2010-12-022-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS needs to be able to release objects that are stored in the page cache once the page itself is no longer visible from the page cache. This patch adds a callback to the address space operations that allows filesystems to perform page cleanups once the page has been removed from the page cache. Original patch by: Linus Torvalds <torvalds@linux-foundation.org> [trondmy: cover the cases of invalidate_inode_pages2() and truncate_inode_pages()] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | Documentation/filesystems/vfs.txt: fix ->repeasepage() descriptionAndrew Morton2010-12-021-5/+4
|/ / | | | | | | | | | | | | | | | | ->releasepage() does not remove the page from the mapping. Acked-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Documentation: make configfs example code simpler, clearerDan Carpenter2010-11-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | If "p" is NULL then it will cause an oops when we pass it to simple_strtoul(). In this case "p" can not be NULL so I removed the check. I also changed the check a little to make it more explicit that we are testing whether p points to the NUL char. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | xfs: remove experimental tag from the delaylog optionChristoph Hellwig2010-11-101-11/+0
|/ | | | | | | | | We promised to do this for 2.6.37, and the code looks stable enough to keep that promise. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* locks: remove fl_copy_lock lock_manager operationChristoph Hellwig2010-10-311-2/+0
| | | | | | | | This one was only used for a nasty hack in nfsd, which has recently been removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6Linus Torvalds2010-10-282-10/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (841 commits) Staging: brcm80211: fix usage of roundup in structures Staging: bcm: fix up network device reference counting Staging: keucr: fix up US_ macro change staging: brcm80211: brcmfmac: Removed codeversion from firmware filenames. staging: brcm80211: Remove unnecessary header files. staging: brcm80211: Remove unnecessary includes from bcmutils.c staging: brcm80211: Removed unnecessary pktsetprio() function. Staging: brcm80211: remove typedefs.h Staging: brcm80211: remove uintptr typedef usage Staging: hv: remove struct vmbus_channel_interface Staging: hv: remove Open from struct vmbus_channel_interface Staging: hv: storvsc: call vmbus_open directly Staging: hv: netvsc: call vmbus_open directly Staging: hv: channel: export vmbus_open to modules Staging: hv: remove Close from struct vmbus_channel_interface Staging: hv: netvsc: call vmbus_close directly Staging: hv: storvsc: call vmbus_close directly Staging: hv: channel: export vmbus_close to modules Staging: hv: remove SendPacket from struct vmbus_channel_interface Staging: hv: storvsc: call vmbus_sendpacket directly ... Fix up conflicts in drivers/staging/cx25821/cx25821-audio-upstream.c drivers/staging/cx25821/cx25821-audio.h due to warring whitespace cleanups (neither of which were all that great)
| * Merge 'staging-next' to Linus's treeGreg Kroah-Hartman2010-10-282-10/+0
| |\ | | | | | | | | | | | | | | | | | | | | | This merges the staging-next tree to Linus's tree and resolves some conflicts that were present due to changes in other trees that were affected by files here. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| | * smbfs: move to drivers/stagingArnd Bergmann2010-10-052-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | smbfs has been scheduled for removal in 2.6.27, so maybe we can now move it to drivers/staging on the way out. smbfs still uses the big kernel lock and nobody is going to fix that, so we should be getting rid of it soon. This removes the 32 bit compat mount and ioctl handling code, which is implemented in common fs code, and moves all smbfs related files into drivers/staging/smbfs. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | | fs/9p: Add access = client option to opt in acl evaluation.Aneesh Kumar K.V2010-10-281-1/+3
|/ / | | | | | | | | | | Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | Merge branch 'next' into upstream-mergeTheodore Ts'o2010-10-271-0/+14
|\ \ | | | | | | | | | | | | | | | | | | Conflicts: fs/ext4/inode.c fs/ext4/mballoc.c include/trace/events/ext4.h
| * | ext4: add support for lazy inode table initializationLukas Czerner2010-10-271-0/+14
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the lazy_itable_init extended option is passed to mke2fs, it considerably speeds up filesystem creation because inode tables are not zeroed out. The fact that parts of the inode table are uninitialized is not a problem so long as the block group descriptors, which contain information regarding how much of the inode table has been initialized, has not been corrupted However, if the block group checksums are not valid, e2fsck must scan the entire inode table, and the the old, uninitialized data could potentially cause e2fsck to report false problems. Hence, it is important for the inode tables to be initialized as soon as possble. This commit adds this feature so that mke2fs can safely use the lazy inode table initialization feature to speed up formatting file systems. This is done via a new new kernel thread called ext4lazyinit, which is created on demand and destroyed, when it is no longer needed. There is only one thread for all ext4 filesystems in the system. When the first filesystem with inititable mount option is mounted, ext4lazyinit thread is created, then the filesystem can register its request in the request list. This thread then walks through the list of requests picking up scheduled requests and invoking ext4_init_inode_table(). Next schedule time for the request is computed by multiplying the time it took to zero out last inode table with wait multiplier, which can be set with the (init_itable=n) mount option (default is 10). We are doing this so we do not take the whole I/O bandwidth. When the thread is no longer necessary (request list is empty) it frees the appropriate structures and exits (and can be created later later by another filesystem). We do not disturb regular inode allocations in any way, it just do not care whether the inode table is, or is not zeroed. But when zeroing, we have to skip used inodes, obviously. Also we should prevent new inode allocations from the group, while zeroing is on the way. For that we take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem in the ext4_claim_inode, so when we are unlucky and allocator hits the group which is currently being zeroed, it just has to wait. This can be suppresed using the mount option no_init_itable. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | /proc/pid/pagemap: document in Documentation/filesystems/proc.txtNikanth Karthikesan2010-10-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Document /proc/pid/pagemap in Documentation/filesystems/proc.txt Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Cc: Richard Guenther <rguenther@suse.de> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | /proc/pid/smaps: export amount of anonymous memory in a mappingNikanth Karthikesan2010-10-271-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export the number of anonymous pages in a mapping via smaps. Even the private pages in a mapping backed by a file, would be marked as anonymous, when they are modified. Export this information to user-space via smaps. Exporting this count will help gdb to make a better decision on which areas need to be dumped in its coredump; and should be useful to others studying the memory usage of a process. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2010-10-262-10/+25
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...
| * | update block_device_operations documentationChristoph Hellwig2010-10-251-8/+23
| | | | | | | | | | | | | | | | | | Updated Documentation/filesystems/Locking to match the code. Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | Documentation: Fix trivial typo in filesystems/sharedsubtree.txtValerie Aurora2010-10-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Documentation: Fix trivial typo in filesystems/sharedsubtree.txt This typo is easy to ignore unless you have spent a great deal of time thinking about how to eliminate duplicate dentries in unions. Signed-off-by: Valerie Aurora <vaurora@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge branch 'nfs-for-2.6.37' of ↵Linus Torvalds2010-10-261-14/+14
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: rename nfs.upcall -> nfs.idmap NFS: Fix a compile issue in nfs_root
| * | | NFS: rename nfs.upcall -> nfs.idmapBryan Schumaker2010-10-261-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch renames the idmapper upcall program from nfs.upcall to nfs.idmap in the NFS documentation. This is because the program has been renamed in the nfs-utils source. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | Documentation/filesystems/proc.txt: improve smaps field documentationMatt Mackall2010-10-261-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'nfs-for-2.6.37' of ↵Linus Torvalds2010-10-262-0/+50
|\ \ \ \ | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: net/sunrpc: Use static const char arrays nfs4: fix channel attribute sanity-checks NFSv4.1: Use more sensible names for 'initialize_mountpoint' NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure NFS: client needs to maintain list of inodes with active layouts NFS: create and destroy inode's layout cache NFSv4.1: pnfs: filelayout: introduce minimal file layout driver NFSv4.1: pnfs: full mount/umount infrastructure NFS: set layout driver NFS: ask for layouttypes during v4 fsinfo call NFS: change stateid to be a union NFSv4.1: pnfsd, pnfs: protocol level pnfs constants SUNRPC: define xdr_decode_opaque_fixed NFSD: remove duplicate NFS4_STATEID_SIZE
| * | | NFSv4.1: pnfs: full mount/umount infrastructureFred Isaman2010-10-242-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow a module implementing a layout type to register, and have its mount/umount routines called for filesystems that the server declares support it. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | Merge branch 'nfs-for-2.6.37' of ↵Linus Torvalds2010-10-253-0/+91
|\ \ \ \ | |/ / / | | / / | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (67 commits) SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred nfs: fix unchecked value Ask for time_delta during fsinfo probe Revalidate caches on lock SUNRPC: After calling xprt_release(), we must restart from call_reserve NFSv4: Fix up the 'dircount' hint in encode_readdir NFSv4: Clean up nfs4_decode_dirent NFSv4: nfs4_decode_dirent must clear entry->fattr->valid NFSv4: Fix a regression in decode_getfattr NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer NFS: Ensure we check all allocation return values in new readdir code NFS: Readdir plus in v4 NFS: introduce generic decode_getattr function NFS: check xdr_decode for errors NFS: nfs_readdir_filler catch all errors NFS: readdir with vmapped pages NFS: remove page size checking code NFS: decode_dirent should use an xdr_stream SUNRPC: Add a helper function xdr_inline_peek NFS: remove readdir plus limit ...
| * | NFS: new idmapperBryan Schumaker2010-10-072-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch creates a new idmapper system that uses the request-key function to place a call into userspace to map user and group ids to names. The old idmapper was single threaded, which prevented more than one request from running at a single time. This means that a user would have to wait for an upcall to finish before accessing a cached result. The upcall result is stored on a keyring of type id_resolver. See the file Documentation/filesystems/nfs/idmapper.txt for instructions. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> [Trond: fix up the return value of nfs_idmap_lookup_name and clean up code] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Allow NFSROOT debugging messages to be enabled dynamicallyChuck Lever2010-09-171-0/+22
| |/ | | | | | | | | | | | | | | As a convenience, introduce a kernel command line option to enable NFSROOT debugging messages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | Revert "tty: Add a new file /proc/tty/consoles"Linus Torvalds2010-10-231-32/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit f4a3e0bceb57466c31757f25e4e0ed108d1299ec. Jiri Sladby points out that the tty structure we're using may already be gone, and Al Viro doesn't hold back in complaining about the random loading of 'filp->private_data' which doesn't have to be a pointer at all, nor does checking the magic field for TTY_MAGIC prove anything. Belated review by Al: "a) global variable depending on stdin of the last opener? Affecting output of read(2)? Really? b) iterator is broken; list should be locked in ->start(), unlocked in ->stop() and *NOT* unlocked/relocked in ->next() c) ->show() ought to do nothing in case of ->device == NULL, instead of skipping those in ->next()/->start() d) regardless of the merits of the bright idea about asterisk at that line in output *and* regardless of (a), the implementation is not only atrociously ugly, it's actually very likely to be a roothole. Verifying that Cthulhu knows what number happens to be address of a tty_struct by blindly dereferencing memory at that address... Ouch. Please revert that crap." And Christoph pipes in and NAK's the approach of walking fd tables etc too. So it's pretty unanimous. Noticed-by: Jri Slaby <jslaby@suse.cz> Requested-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Werner Fink <werner@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6Linus Torvalds2010-10-221-0/+32
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (49 commits) serial8250: ratelimit "too much work" error serial: bfin_sport_uart: speed up sport RX sample rate to be 3% faster serial: abstraction for 8250 legacy ports serial/imx: check that the buffer is non-empty before sending it out serial: mfd: add more baud rates support jsm: Remove the uart port on errors Alchemy: Add UART PM methods. 8250: allow platforms to override PM hook. altera_uart: Don't use plain integer as NULL pointer altera_uart: Fix missing prototype for registering an early console altera_uart: Fixup type usage of port flags altera_uart: Make it possible to use Altera UART and 8250 ports together altera_uart: Add support for different address strides altera_uart: Add support for getting mapbase and IRQ from resources altera_uart: Add support for polling mode (IRQ-less) serial: Factor out uart_poll_timeout() from 8250 driver serial: mark the 8250 driver as maintained serial: 8250: Don't delay after transmitter is ready. tty: MAINTAINERS: add drivers/serial/jsm/ as maintained driver vcs: invoke the vt update callback when /dev/vcs* is written to ...
| * | tty: Add a new file /proc/tty/consolesDr. Werner Fink2010-10-221-0/+32
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Add a new file /proc/tty/consoles to be able to determine the registered system console lines. If the reading process holds /dev/console open at the regular standard input stream the active device will be marked by an asterisk. Show possible operations and also decode the used flags of the listed console lines. Signed-off-by: Werner Fink <werner@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | ocfs2: Add a mount option "coherency=*" to handle cluster coherency for ↵Tristan Ye2010-10-111-0/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | O_DIRECT writes. Currently, the default behavior of O_DIRECT writes was allowing concurrent writing among nodes to the same file, with no cluster coherency guaranteed (no EX lock held). This can leave stale data in the cache for buffered reads on other nodes. The new mount option introduce a chance to choose two different behaviors for O_DIRECT writes: * coherency=full, as the default value, will disallow concurrent O_DIRECT writes by taking EX locks. * coherency=buffered, allow concurrent O_DIRECT writes without EX lock among nodes, which gains high performance at risk of getting stale data on other nodes. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* bkl: Remove locked .ioctl file operationArnd Bergmann2010-08-142-12/+2
| | | | | | | | | | The last user is gone, so we can safely remove this Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: John Kacur <jkacur@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linusLinus Torvalds2010-08-111-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: fix checkpatch.pl warnings Squashfs: fix filename typo Squashfs: update Kconfig and documentation for LZO Squashfs: fix block size use in LZO decompressor Squashfs: Add LZO compression support squashfs: fix filename in header comment Squashfs: Make XATTR config name consistent with other file systems squashfs: fix compiler inline warning
| * Squashfs: update Kconfig and documentation for LZOPhillip Lougher2010-08-051-1/+1
| | | | | | | | | | | | | | | | | | | | Update compression types supported and add some help text for the LZO Kconfig option. Also add missing "default n" line and make some trivial whitespace cleanups too. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
* | Merge branch 'for-linus' of ↵Linus Torvalds2010-08-102-10/+57
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c
| * | update VFS documentation for method changes.Al Viro2010-08-092-10/+39
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | update documentation for the new truncate sequenceChristoph Hellwig2010-08-091-0/+18
| | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | oom: deprecate oom_adj tunableDavid Rientjes2010-08-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/pid/oom_adj is now deprecated so that that it may eventually be removed. The target date for removal is August 2012. A warning will be printed to the kernel log if a task attempts to use this interface. Future warning will be suppressed until the kernel is rebooted to prevent spamming the kernel log. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | oom: badness heuristic rewriteDavid Rientjes2010-08-091-37/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This a complete rewrite of the oom killer's badness() heuristic which is used to determine which task to kill in oom conditions. The goal is to make it as simple and predictable as possible so the results are better understood and we end up killing the task which will lead to the most memory freeing while still respecting the fine-tuning from userspace. Instead of basing the heuristic on mm->total_vm for each task, the task's rss and swap space is used instead. This is a better indication of the amount of memory that will be freeable if the oom killed task is chosen and subsequently exits. This helps specifically in cases where KDE or GNOME is chosen for oom kill on desktop systems instead of a memory hogging task. The baseline for the heuristic is a proportion of memory that each task is currently using in memory plus swap compared to the amount of "allowable" memory. "Allowable," in this sense, means the system-wide resources for unconstrained oom conditions, the set of mempolicy nodes, the mems attached to current's cpuset, or a memory controller's limit. The proportion is given on a scale of 0 (never kill) to 1000 (always kill), roughly meaning that if a task has a badness() score of 500 that the task consumes approximately 50% of allowable memory resident in RAM or in swap space. The proportion is always relative to the amount of "allowable" memory and not the total amount of RAM systemwide so that mempolicies and cpusets may operate in isolation; they shall not need to know the true size of the machine on which they are running if they are bound to a specific set of nodes or mems, respectively. Root tasks are given 3% extra memory just like __vm_enough_memory() provides in LSMs. In the event of two tasks consuming similar amounts of memory, it is generally better to save root's task. Because of the change in the badness() heuristic's baseline, it is also necessary to introduce a new user interface to tune it. It's not possible to redefine the meaning of /proc/pid/oom_adj with a new scale since the ABI cannot be changed for backward compatability. Instead, a new tunable, /proc/pid/oom_score_adj, is added that ranges from -1000 to +1000. It may be used to polarize the heuristic such that certain tasks are never considered for oom kill while others may always be considered. The value is added directly into the badness() score so a value of -500, for example, means to discount 50% of its memory consumption in comparison to other tasks either on the system, bound to the mempolicy, in the cpuset, or sharing the same memory controller. /proc/pid/oom_adj is changed so that its meaning is rescaled into the units used by /proc/pid/oom_score_adj, and vice versa. Changing one of these per-task tunables will rescale the value of the other to an equivalent meaning. Although /proc/pid/oom_adj was originally defined as a bitshift on the badness score, it now shares the same linear growth as /proc/pid/oom_score_adj but with different granularity. This is required so the ABI is not broken with userspace applications and allows oom_adj to be deprecated for future removal. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud