summaryrefslogtreecommitdiffstats
path: root/fs/fuse/fuse_i.h
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2012-01-121-1/+9
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: FUSE: Notifying the kernel of deletion. fuse: support ioctl on directories fuse: Use kcalloc instead of kzalloc to allocate array fuse: llseek optimize SEEK_CUR and SEEK_SET
| * FUSE: Notifying the kernel of deletion.John Muir2011-12-131-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allows a FUSE file-system to tell the kernel when a file or directory is deleted. If the specified dentry has the specified inode number, the kernel will unhash it. The current 'fuse_notify_inval_entry' does not cause the kernel to clean up directories that are in use properly, and as a result the users of those directories see incorrect semantics from the file-system. The error condition seen when 'fuse_notify_inval_entry' is used to notify of a deleted directory is avoided when 'fuse_notify_delete' is used instead. The following scenario demonstrates the difference: 1. User A chdirs into 'testdir' and starts reading 'testfile'. 2. User B rm -rf 'testdir'. 3. User B creates 'testdir'. 4. User C chdirs into 'testdir'. If you run the above within the same machine on any file-system (including fuse file-systems), there is no problem: user C is able to chdir into the new testdir. The old testdir is removed from the dentry tree, but still open by user A. If operations 2 and 3 are performed via the network such that the fuse file-system uses one of the notify functions to tell the kernel that the nodes are gone, then the following error occurs for user C while user A holds the original directory open: muirj@empacher:~> ls /test/testdir ls: cannot access /test/testdir: No such file or directory The issue here is that the kernel still has a dentry for testdir, and so it is requesting the attributes for the old directory, while the file-system is responding that the directory no longer exists. If on the other hand, if the file-system can notify the kernel that the directory is deleted using the new 'fuse_notify_delete' function, then the above ls will find the new directory as expected. Signed-off-by: John Muir <john@jmuir.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * fuse: support ioctl on directoriesMiklos Szeredi2011-12-131-0/+2
| | | | | | | | | | | | | | | | | | | | Multiplexing filesystems may want to support ioctls on the underlying files and directores (e.g. FS_IOC_{GET,SET}FLAGS). Ioctl support on directories was missing so add it now. Reported-by: Antonio SJ Musumeci <bile@landofbile.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* | fuse: propagate umode_tAl Viro2012-01-031-1/+1
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-08-241-1/+7
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message fuse: mark pages accessed when written to fuse: delete dead .write_begin and .write_end aops fuse: fix flock fuse: fix non-ANSI void function notation
| * fuse: fix flockMiklos Szeredi2011-08-081-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a9ff4f87 "fuse: support BSD locking semantics" overlooked a number of issues with supporing flock locks over existing POSIX locking infrastructure: - it's not backward compatible, passing flock(2) calls to userspace unconditionally (if userspace sets FUSE_POSIX_LOCKS) - it doesn't cater for the fact that flock locks are automatically unlocked on file release - it doesn't take into account the fact that flock exclusive locks (write locks) don't need an fd opened for write. The last one invalidates the original premise of the patch that flock locks can be emulated with POSIX locks. This patch fixes the first two issues. The last one needs to be fixed in userspace if the filesystem assumed that a write lock will happen only on a file operned for write (as in the case of the current fuse library). Reported-by: Sebastian Pipping <webmaster@hartwork.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* | fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlersJosef Bacik2011-07-201-1/+2
|/ | | | | | | | | | | | | | | Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fuse: reduce size of struct fuse_requestMiklos Szeredi2011-03-211-1/+0
| | | | | | | | Reduce the size of struct fuse_request by removing cuse_init_out from the request structure and allocating it dinamically instead. CC: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: fix hang of single threaded fuseblk filesystemMiklos Szeredi2011-02-251-1/+5
| | | | | | | | | | | | | | | | | Single threaded NTFS-3G could get stuck if a delayed RELEASE reply triggered a DESTROY request via path_put(). Fix this by a) making RELEASE requests synchronous, whenever possible, on fuseblk filesystems b) if not possible (triggered by an asynchronous read/write) then do the path_put() in a separate thread with schedule_work(). Reported-by: Oliver Neukum <oneukum@suse.de> Cc: stable@kernel.org Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: allow batching of FORGET requestsMiklos Szeredi2010-12-071-2/+1
| | | | | | | | | | | | | | | Terje Malmedal reports that a fuse filesystem with 32 million inodes on a machine with lots of memory can take up to 30 minutes to process FORGET requests when all those inodes are evicted from the icache. To solve this, create a BATCH_FORGET request that allows up to about 8000 FORGET requests to be sent in a single message. This request is only sent if userspace supports interface version 7.16 or later, otherwise fall back to sending individual FORGET messages. Reported-by: Terje Malmedal <terje.malmedal@usit.uio.no> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: separate queue for FORGET requestsMiklos Szeredi2010-12-071-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Terje Malmedal reports that a fuse filesystem with 32 million inodes on a machine with lots of memory can go unresponsive for up to 30 minutes when all those inodes are evicted from the icache. The reason is that FORGET messages, sent when the inode is evicted, are queued up together with regular filesystem requests, and while the huge queue of FORGET messages are processed no other filesystem operation can proceed. Since a full fuse request structure is allocated for each inode, these take up quite a bit of memory as well. To solve these issues, create a slim 'fuse_forget_link' structure containing just the minimum of information required to send the FORGET request and chain these on a separate queue. When userspace is asking for a request make sure that FORGET and non-FORGET requests are selected fairly: for each 8 non-FORGET allow 16 FORGET requests. This will make sure FORGETs do not pile up, yet other requests are also allowed to proceed while the queued FORGETs are processed. Reported-by: Terje Malmedal <terje.malmedal@usit.uio.no> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: add retrieve requestMiklos Szeredi2010-07-121-0/+1
| | | | | | | | | | | | | | | | Userspace filesystem can request data to be retrieved from the inode's mapping. This request is synchronous and the retrieved data is queued as a new request. If the write to the fuse device returns an error then the retrieve request was not completed and a reply will not be sent. Only present pages are returned in the retrieve reply. Retrieving stops when it finds a non-present page and only data prior to that is returned. This request doesn't change the dirty state of pages. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: add store requestMiklos Szeredi2010-07-121-0/+2
| | | | | | | | | | | | | | Userspace filesystem can request data to be stored in the inode's mapping. This request is synchronous and has no reply. If the write to the fuse device returns an error then the store request was not fully completed (but may have updated some pages). If the stored data overflows the current file size, then the size is extended, similarly to a write(2) on the filesystem. Pages which have been completely stored are marked uptodate. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* Merge branch 'for-linus' of ↵Linus Torvalds2010-05-301-0/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: mm: export generic_pipe_buf_*() to modules fuse: support splice() reading from fuse device fuse: allow splice to move pages mm: export remove_from_page_cache() to modules mm: export lru_cache_add_*() to modules fuse: support splice() writing to fuse device fuse: get page reference for readpages fuse: use get_user_pages_fast() fuse: remove unneeded variable
| * fuse: allow splice to move pagesMiklos Szeredi2010-05-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When splicing buffers to the fuse device with SPLICE_F_MOVE, try to move pages from the pipe buffer into the page cache. This allows populating the fuse filesystem's cache without ever touching the page contents, i.e. zero copy read capability. The following steps are performed when trying to move a page into the page cache: - buf->ops->confirm() to make sure the new page is uptodate - buf->ops->steal() to try to remove the new page from it's previous place - remove_from_page_cache() on the old page - add_to_page_cache_locked() on the new page If any of the above steps fail (non fatally) then the code falls back to copying the page. In particular ->steal() will fail if there are external references (other than the page cache and the pipe buffer) to the page. Also since the remove_from_page_cache() + add_to_page_cache_locked() are non-atomic it is possible that the page cache is repopulated in between the two and add_to_page_cache_locked() will fail. This could be fixed by creating a new atomic replace_page_cache_page() function. fuse_readpages_end() needed to be reworked so it works even if page->mapping is NULL for some or all pages which can happen if the add_to_page_cache_locked() failed. A number of sanity checks were added to make sure the stolen pages don't have weird flags set, etc... These could be moved into generic splice/steal code. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* | drop unused dentry argument to ->fsyncChristoph Hellwig2010-05-271-2/+1
|/ | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* truncate: use new helpersnpiggin@suse.de2009-09-241-2/+0
| | | | | | | | | | | | | | | Update some fs code to make use of new helper functions introduced in the previous patch. Should be no significant change in behaviour (except CIFS now calls send_sig under i_lock, via inode_newsize_ok). Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-nfs@vger.kernel.org Cc: Trond.Myklebust@netapp.com Cc: linux-cifs-client@lists.samba.org Cc: sfrench@samba.org Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fuse: add fusectl interface to max_backgroundCsaba Henk2009-09-161-1/+5
| | | | | | | | | Make the max_background and congestion_threshold parameters of a FUSE mount tunable at runtime by adding the respective knobs to its directory within the fusectl filesystem. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: make the number of max background requests and congestion threshold ↵Csaba Henk2009-07-071-6/+6
| | | | | | | | | | tunable The practical values for these limits depend on the design of the filesystem server so let userspace set them at initialization time. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: invalidation reverse callsJohn Muir2009-06-301-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | Add notification messages that allow the filesystem to invalidate VFS caches. Two notifications are added: 1) inode invalidation - invalidate cached attributes - invalidate a range of pages in the page cache (this is optional) 2) dentry invalidation - try to invalidate a subtree in the dentry cache Care must be taken while accessing the 'struct super_block' for the mount, as it can go away while an invalidation is in progress. To prevent this, introduce a rw-semaphore, that is taken for read during the invalidation and taken for write in the ->kill_sb callback. Cc: Csaba Henk <csaba@gluster.com> Cc: Anand Avati <avati@zresearch.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: allow umask processing in userspaceMiklos Szeredi2009-06-301-0/+3
| | | | | | | | | | | | | | This patch lets filesystems handle masking the file mode on creation. This is needed if filesystem is using ACLs. - The CREATE, MKDIR and MKNOD requests are extended with a "umask" parameter. - A new FUSE_DONT_MASK flag is added to the INIT request/reply. With this the filesystem may request that the create mode is not masked. CC: Jean-Pierre André <jean-pierre.andre@wanadoo.fr> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: export symbols to be used by CUSETejun Heo2009-04-281-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | Export the following symbols for CUSE. fuse_conn_put() fuse_conn_get() fuse_conn_kill() fuse_send_init() fuse_do_open() fuse_sync_release() fuse_direct_io() fuse_do_ioctl() fuse_file_poll() fuse_request_alloc() fuse_get_req() fuse_put_request() fuse_request_send() fuse_abort_conn() fuse_dev_release() fuse_dev_operations Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: update fuse_conn_init() and separate out fuse_conn_kill()Tejun Heo2009-04-281-1/+4
| | | | | | | | | | | Update fuse_conn_init() such that it doesn't take @sb and move bdi registration into a separate function. Also separate out fuse_conn_kill() from fuse_put_super(). These will be used to implement cuse. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: don't use inode in fuse_sync_release()Miklos Szeredi2009-04-281-3/+2
| | | | | | | | | Make fuse_sync_release() a generic helper function that doesn't need a struct inode pointer. This makes it suitable for use by CUSE. Change return value of fuse_release_common() from int to void. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: create fuse_do_open() helper for CUSEMiklos Szeredi2009-04-281-1/+1
| | | | | | | Create a helper for sending an OPEN request that doesn't need a struct inode pointer. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: clean up args in fuse_finish_open() and fuse_release_fill()Miklos Szeredi2009-04-281-3/+6
| | | | | | | | | | | Move setting ff->fh, ff->nodeid and file->private_data outside fuse_finish_open(). Add ->open_flags member to struct fuse_file. This simplifies the argument passing to fuse_finish_open() and fuse_release_fill(), and paves the way for creating an open helper that doesn't need an inode pointer. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: don't use inode in helpers called by fuse_direct_io()Miklos Szeredi2009-04-281-1/+1
| | | | | | | | | Use ff->fc and ff->nodeid instead of passing down the inode. This prepares this function for use by CUSE, where the inode is not owned by a fuse filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: add members to struct fuse_fileMiklos Szeredi2009-04-281-2/+8
| | | | | | | | Add new members ->fc and ->nodeid to struct fuse_file. This will aid in converting functions for use by CUSE, where the inode is not owned by a fuse filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: use struct path in release structureMiklos Szeredi2009-04-281-2/+1
| | | | | | | Use struct path instead of separate dentry and vfsmount in req->misc.release. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* constify dentry_operations: FUSEAl Viro2009-03-271-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fuse: add fuse_conn->release()Tejun Heo2008-11-261-0/+3
| | | | | | | | Add fuse_conn->release() so that fuse_conn can be embedded in other structures. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: separate out fuse_conn_init() from new_conn()Tejun Heo2008-11-261-0/+5
| | | | | | | | | | Separate out fuse_conn_init() from new_conn() and while at it initialize fuse_conn->entry during conn initialization. This will be used by CUSE. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: add fuse_ prefix to several functionsTejun Heo2008-11-261-4/+5
| | | | | | | | | | Add fuse_ prefix to request_send*() and get_root_inode() as some of those functions will be exported for CUSE. With or without CUSE export, having the function names scoped is a good idea for debuggability. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: implement poll supportTejun Heo2008-11-261-0/+20
| | | | | | | | | | | | | | | | | | Implement poll support. Polled files are indexed using kh in a RB tree rooted at fuse_conn->polled_files. Client should send FUSE_NOTIFY_POLL notification once after processing FUSE_POLL which has FUSE_POLL_SCHEDULE_NOTIFY set. Sending notification unconditionally after the latest poll or everytime file content might have changed is inefficient but won't cause malfunction. fuse_file_poll() can sleep and requires patches from the following thread which allows f_op->poll() to sleep. http://thread.gmane.org/gmane.linux.kernel/726176 Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: add file kernel handleTejun Heo2008-11-261-1/+7
| | | | | | | | | | | | | | The file handle, fuse_file->fh, is opaque value supplied by userland FUSE server and uniqueness is not guaranteed. Add file kernel handle, fuse_file->kh, which is allocated by the kernel on file allocation and guaranteed to be unique. This will be used by poll to match notification to the respective file but can be used for other purposes where unique file handle is necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: style fixesMiklos Szeredi2008-11-261-19/+19
| | | | | | | Fix coding style errors reported by checkpatch and others. Uptdate copyright date to 2008. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: add include protectorsTejun Heo2008-10-161-0/+5
| | | | | | | Add include protectors to include/linux/fuse.h and fs/fuse/fuse_i.h. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* fuse: nfs export special lookupsMiklos Szeredi2008-07-251-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Implement the get_parent export operation by sending a LOOKUP request with ".." as the name. Implement looking up an inode by node ID after it has been evicted from the cache. This is done by seding a LOOKUP request with "." as the name (for all file types, not just directories). The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to indicate that it supports these special lookups. Thanks to John Muir for the original implementation of this feature. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: add export operationsMiklos Szeredi2008-07-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement export_operations, to allow fuse filesystems to be exported to NFS. This feature has been in the out-of-tree fuse module, and is widely used and tested. It has not been originally merged into mainline, because doing the NFS export in userspace was thought to be a cleaner and more efficient way of doing it, than through the kernel. While that is true, it would also have involved a lot of duplicated effort at reimplementing NFS exporting (all the different versions of the protocol). This effort was unfortunately not undertaken by anyone, so we are left with doing it the easy but less efficient way. If this feature goes in, the out-of-tree fuse module can go away, which would have several advantages: - not having to maintain two versions - less confusion for users - no bugs due to kernel API changes Comment from hch: - Use the same fh_type values as XFS, since we use the same fh encoding. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: add flag to turn on big writesMiklos Szeredi2008-05-131-0/+3
| | | | | | | | | | | | | | | | | Prior to 2.6.26 fuse only supported single page write requests. In theory all fuse filesystem should be able support bigger than 4k writes, as there's nothing in the API to prevent it. Unfortunately there's a known case in NTFS-3G where big writes cause filesystem corruption. There could also be other filesystems, where the lack of testing with big write requests would result in bugs. To prevent such problems on a kernel upgrade, disable big writes by default, but let filesystems set a flag to turn it on. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Szabolcs Szakacsits <szaka@ntfs-3g.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: fix node ID typeMiklos Szeredi2008-04-301-2/+2
| | | | | | | | | Node ID is 64bit but it is passed as unsigned long to some functions. This breakage wasn't noticed, because libfuse uses unsigned long too. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: update file size on short readMiklos Szeredi2008-04-301-1/+6
| | | | | | | | | | | | | | If the READ request returned a short count, then either - cached size is incorrect - filesystem is buggy, as short reads are only allowed on EOF So assume that the size is wrong and refresh it, so that cached read() doesn't zero fill the missing chunk. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: support writable mmapMiklos Szeredi2008-04-301-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quoting Linus (3 years ago, FUSE inclusion discussions): "User-space filesystems are hard to get right. I'd claim that they are almost impossible, unless you limit them somehow (shared writable mappings are the nastiest part - if you don't have those, you can reasonably limit your problems by limiting the number of dirty pages you accept through normal "write()" calls)." Instead of attempting the impossible, I've just waited for the dirty page accounting infrastructure to materialize (thanks to Peter Zijlstra and others). This nicely solved the biggest problem: limiting the number of pages used for write caching. Some small details remained, however, which this largish patch attempts to address. It provides a page writeback implementation for fuse, which is completely safe against VM related deadlocks. Performance may not be very good for certain usage patterns, but generally it should be acceptable. It has been tested extensively with fsx-linux and bash-shared-mapping. Fuse page writeback design -------------------------- fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM. It copies the contents of the original page, and queues a WRITE request to the userspace filesystem using this temp page. The writeback is finished instantly from the MM's point of view: the page is removed from the radix trees, and the PageDirty and PageWriteback flags are cleared. For the duration of the actual write, the NR_WRITEBACK_TEMP counter is incremented. The per-bdi writeback count is not decremented until the actual write completes. On dirtying the page, fuse waits for a previous write to finish before proceeding. This makes sure, there can only be one temporary page used at a time for one cached page. This approach is wasteful in both memory and CPU bandwidth, so why is this complication needed? The basic problem is that there can be no guarantee about the time in which the userspace filesystem will complete a write. It may be buggy or even malicious, and fail to complete WRITE requests. We don't want unrelated parts of the system to grind to a halt in such cases. Also a filesystem may need additional resources (particularly memory) to complete a WRITE request. There's a great danger of a deadlock if that allocation may wait for the writepage to finish. Currently there are several cases where the kernel can block on page writeback: - allocation order is larger than PAGE_ALLOC_COSTLY_ORDER - page migration - throttle_vm_writeout (through NR_WRITEBACK) - sync(2) Of course in some cases (fsync, msync) we explicitly want to allow blocking. So for these cases new code has to be added to fuse, since the VM is not tracking writeback pages for us any more. As an extra safetly measure, the maximum dirty ratio allocated to a single fuse filesystem is set to 1% by default. This way one (or several) buggy or malicious fuse filesystems cannot slow down the rest of the system by hogging dirty memory. With appropriate privileges, this limit can be raised through '/sys/class/bdi/<bdi>/max_ratio'. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: bdi: expose the BDI object in sysfs for FUSEMiklos Szeredi2008-04-301-2/+2
| | | | | | | | | | | | | Register FUSE's backing_dev_info under sysfs with the name "fuse-MAJOR:MINOR" Make the fuse control filesystem use s_dev instead of a fuse specific ID. This makes it easier to match directories under /sys/fs/fuse/connections/ with directories under /sys/class/bdi, and with actual mounts. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: limit queued background requestsMiklos Szeredi2008-02-061-0/+6
| | | | | | | | | | | | | | | | Libfuse basically creates a new thread for each new request. This is fine for synchronous requests, which are naturally limited. However background requests (especially writepage) can cause a thread creation storm. To avoid this, limit the number of background requests available to userspace. This is done by introducing another queue for background requests, and a counter for the number of "active" requests, which are currently available for userspace. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: save space in struct fuse_reqMiklos Szeredi2008-02-061-7/+5
| | | | | | | | | Move the fields 'dentry' and 'vfsmount' into the request specific union, since these are only used for the RELEASE request. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: pass open flags to read and writeMiklos Szeredi2007-11-291-1/+1
| | | | | | | | | | | | Some open flags (O_APPEND, O_DIRECT) can be changed with fcntl(F_SETFL, ...) after open, but fuse currently only sends the flags to userspace in open. To make it possible to correcly handle changing flags, send the current value to userspace in each read and write. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: fix reading past EOFMiklos Szeredi2007-11-291-0/+3
| | | | | | | | | | | | | | | Currently reading a fuse file will stop at cached i_size and return EOF, even though the file might have grown since the attributes were last updated. So detect if trying to read past EOF, and refresh the attributes before continuing with the read. Thanks to mpb for the report. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: add support for mandatory lockingMiklos Szeredi2007-10-181-0/+2
| | | | | | | | | | | For mandatory locking the userspace filesystem needs to know the lock ownership for read, write and truncate operations. This patch adds the necessary fields to the protocol. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: add helper for asynchronous writesMiklos Szeredi2007-10-181-0/+4
| | | | | | | | | | | | | | This patch adds a new helper function fuse_write_fill() which makes it possible to send WRITE requests asynchronously. A new flag for WRITE requests is also added which indicates that this a write from the page cache, and not a "normal" file write. This patch is in preparation for writable mmap support. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud