summaryrefslogtreecommitdiffstats
path: root/fs/ceph
Commit message (Collapse)AuthorAgeFilesLines
...
| * | ceph: fix minor typo in unsafe_request_waitJeff Layton2016-12-121-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: record truncate size/seq for snap data writebackYan, Zheng2016-12-123-13/+22
| | | | | | | | | | | | | | | | | | | | | | | | Dirty snapshot data needs to be flushed unconditionally. If they were created before truncation, writeback should use old truncate size/seq. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: check availability of mds cluster on mountYan, Zheng2016-12-124-11/+182
| | | | | | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: fix splice read for no Fc capability caseYan, Zheng2016-12-121-54/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When iov_iter type is ITER_PIPE, copy_page_to_iter() increases the page's reference and add the page to a pipe_buffer. It also set the pipe_buffer's ops to page_cache_pipe_buf_ops. The comfirm callback in page_cache_pipe_buf_ops expects the page is from page cache and uptodate, otherwise it return error. For ceph_sync_read() case, pages are not from page cache. So we can't call copy_page_to_iter() when iov_iter type is ITER_PIPE. The fix is using iov_iter_get_pages_alloc() to allocate pages for the pipe. (the code is similar to default_file_splice_read) Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: try getting buffer capability for readahead/fadviseYan, Zheng2016-12-124-11/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | For readahead/fadvise cases, caller of ceph_readpages does not hold buffer capability. Pages can be added to page cache while there is no buffer capability. This can cause data integrity issue. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: fix scheduler warning due to nested blockingNikolay Borisov2016-12-121-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | try_get_cap_refs can be used as a condition in a wait_event* calls. This is all fine until it has to call __ceph_do_pending_vmtruncate, which in turn acquires the i_truncate_mutex. This leads to a situation in which a task's state is !TASK_RUNNING and at the same time it's trying to acquire a sleeping primitive. In essence a nested sleeping primitives are being used. This causes the following warning: WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109447d>] prepare_to_wait_event+0x5d/0x110 ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6 CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G O 4.4.20-clouder2 #6 Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015 0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0 ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c 0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0 Call Trace: [<ffffffff812f4409>] dump_stack+0x67/0x9e [<ffffffff81052b46>] warn_slowpath_common+0x86/0xc0 [<ffffffff81052bcc>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110 [<ffffffff8107767f>] __might_sleep+0x9f/0xb0 [<ffffffff81612d30>] mutex_lock+0x20/0x40 [<ffffffffa04eea14>] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph] [<ffffffffa04fa692>] try_get_cap_refs+0xa2/0x320 [ceph] [<ffffffffa04fd6f5>] ceph_get_caps+0x255/0x2b0 [ceph] [<ffffffff81094370>] ? wait_woken+0xb0/0xb0 [<ffffffffa04f2c11>] ceph_write_iter+0x2b1/0xde0 [ceph] [<ffffffff81613f22>] ? schedule_timeout+0x202/0x260 [<ffffffff8117f01a>] ? kmem_cache_free+0x1ea/0x200 [<ffffffff811b46ce>] ? iput+0x9e/0x230 [<ffffffff81077632>] ? __might_sleep+0x52/0xb0 [<ffffffff81156147>] ? __might_fault+0x37/0x40 [<ffffffff8119e123>] ? cp_new_stat+0x153/0x170 [<ffffffff81198cfa>] __vfs_write+0xaa/0xe0 [<ffffffff81199369>] vfs_write+0xa9/0x190 [<ffffffff811b6d01>] ? set_close_on_exec+0x31/0x70 [<ffffffff8119a056>] SyS_write+0x46/0xa0 This happens since wait_event_interruptible can interfere with the mutex locking code, since they both fiddle with the task state. Fix the issue by using the newly-added nested blocking infrastructure in 61ada528dea0 ("sched/wait: Provide infrastructure to deal with nested blocking") Link: https://lwn.net/Articles/628628/ Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: fix printing wrong return variable in ceph_direct_read_write()Zhi Zhang2016-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix printing wrong return variable for invalidate_inode_pages2_range in ceph_direct_read_write(). Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | libceph: drop len argument of *verify_authorizer_reply()Ilya Dryomov2016-12-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The length of the reply is protocol-dependent - for cephx it's ceph_x_authorize_reply. Nothing sensible can be passed from the messenger layer anyway. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2016-12-166-100/+15
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: - more ->d_init() stuff (work.dcache) - pathname resolution cleanups (work.namei) - a few missing iov_iter primitives - copy_from_iter_full() and friends. Either copy the full requested amount, advance the iterator and return true, or fail, return false and do _not_ advance the iterator. Quite a few open-coded callers converted (and became more readable and harder to fuck up that way) (work.iov_iter) - several assorted patches, the big one being logfs removal * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: logfs: remove from tree vfs: fix put_compat_statfs64() does not handle errors namei: fold should_follow_link() with the step into not-followed link namei: pass both WALK_GET and WALK_MORE to should_follow_link() namei: invert WALK_PUT logics namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link() namei: saner calling conventions for mountpoint_last() namei.c: get rid of user_path_parent() switch getfrag callbacks to ..._full() primitives make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success [iov_iter] new primitives - copy_from_iter_full() and friends don't open-code file_inode() ceph: switch to use of ->d_init() ceph: unify dentry_operations instances lustre: switch to use of ->d_init()
| * | Merge branches 'work.namei', 'work.dcache' and 'work.iov_iter' into for-linusAl Viro2016-12-156-100/+15
| |\ \ | | |/ | |/|
| | * ceph: switch to use of ->d_init()Al Viro2016-10-286-67/+5
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * ceph: unify dentry_operations instancesAl Viro2016-10-284-33/+10
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | ceph: don't set req->r_locked_dir in ceph_d_revalidateJeff Layton2016-12-081-10/+14
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function sets req->r_locked_dir which is supposed to indicate to ceph_fill_trace that the parent's i_rwsem is locked for write. Unfortunately, there is no guarantee that the dir will be locked when d_revalidate is called, so we really don't want ceph_fill_trace to do any dcache manipulation from this context. Clear req->r_locked_dir since it's clearly not safe to do that. What we really want to know with d_revalidate is whether the dentry still points to the same inode. ceph_fill_trace installs a pointer to the inode in req->r_target_inode, so we can just compare that to d_inode(dentry) to see if it's the same one after the lookup. Also, since we aren't generally interested in the parent here, we can switch to using a GETATTR to hint that to the MDS, which also means that we only need to reserve one cap. Finally, just remove the d_unhashed check. That's really outside the purview of a filesystem's d_revalidate. If the thing became unhashed while we're checking it, then that's up to the VFS to handle anyway. Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry") Link: http://tracker.ceph.com/issues/18041 Reported-by: Donatas Abraitis <donatas.abraitis@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: use default file splice read callbackYan, Zheng2016-11-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Splice read/write implementation changed recently. When using generic_file_splice_read(), iov_iter with type == ITER_PIPE is passed to filesystem's read_iter callback. But ceph_sync_read() can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter expects pages from page cache). Fixing ceph_sync_read() requires a big patch. So use default splice read callback for now. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: fix non static symbol warningWei Yongjun2016-10-181-2/+2
| | | | | | | | | | | | | | | | | | | | Fixes the following sparse warning: fs/ceph/xattr.c:19:28: warning: symbol 'ceph_other_xattr_handler' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: fix uninitialized dentry pointer in ceph_real_mount()Geert Uytterhoeven2016-10-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | fs/ceph/super.c: In function ‘ceph_real_mount’: fs/ceph/super.c:818: warning: ‘root’ may be used uninitialized in this function If s_root is already valid, dentry pointer root is never initialized, and returned by ceph_real_mount(). This will cause a crash later when the caller dereferences the pointer. Fixes: ce2728aaa82bbeba ("ceph: avoid accessing / when mounting a subpath") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | ceph: fix readdir vs fragmentation raceYan, Zheng2016-10-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | following sequence of events tigger the race - client readdir frag 0* -> got item 'A' - MDS merges frag 0* and frag 1* - client send readdir request (frag 1*, offset 2, readdir_start 'A') - MDS reply items (that are after item 'A') in frag * Link: http://tracker.ceph.com/issues/17286 Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | ceph: fix error handling in ceph_read_iterNikolay Borisov2016-10-151-1/+2
|/ | | | | | | | | | | | | | | In case __ceph_do_getattr returns an error and the retry_op in ceph_read_iter is not READ_INLINE, then it's possible to invoke __free_page on a page which is NULL, this naturally leads to a crash. This can happen when, for example, a process waiting on a MDS reply receives sigterm. Fix this by explicitly checking whether the page is set or not. Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: Nikolay Borisov <kernel@kyup.com> Reviewed-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2016-10-104-5/+9
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning
| * Merge remote-tracking branch 'ovl/rename2' into for-linusAl Viro2016-10-101-1/+5
| |\
| | * fs: rename "rename2" i_op to "rename"Miklos Szeredi2016-09-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Generated patch: sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2` sed -i "s/\brename2\b/rename/g" `git grep -wl rename2` Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| | * fs: make remaining filesystems use .rename2Miklos Szeredi2016-09-271-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is trivial to do: - add flags argument to foo_rename() - check if flags is zero - assign foo_rename() to .rename2 instead of .rename This doesn't mean it's impossible to support RENAME_NOREPLACE for these filesystems, but it is not trivial, like for local filesystems. RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible for a file to be created on one host while it is overwritten by rename on another host). Filesystems converted: 9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs. After this, we can get rid of the duplicate interfaces for rename. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Howells <dhowells@redhat.com> [AFS] Acked-by: Mike Marshall <hubcap@omnibond.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Mark Fasheh <mfasheh@suse.com>
| * | fs: Replace current_fs_time() with current_time()Deepa Dinamani2016-09-273-4/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | current_fs_time() uses struct super_block* as an argument. As per Linus's suggestion, this is changed to take struct inode* as a parameter instead. This is because the function is primarily meant for vfs inode timestamps. Also the function was renamed as per Arnd's suggestion. Change all calls to current_fs_time() to use the new current_time() function instead. current_fs_time() will be deleted. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'work.xattr' of ↵Linus Torvalds2016-10-102-9/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr updates from Al Viro: "xattr stuff from Andreas This completes the switch to xattr_handler ->get()/->set() from ->getxattr/->setxattr/->removexattr" * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Remove {get,set,remove}xattr inode operations xattr: Stop calling {get,set,remove}xattr inode operations vfs: Check for the IOP_XATTR flag in listxattr xattr: Add __vfs_{get,set,remove}xattr helpers libfs: Use IOP_XATTR flag for empty directory handling vfs: Use IOP_XATTR flag for bad-inode handling vfs: Add IOP_XATTR inode operations flag vfs: Move xattr_resolve_name to the front of fs/xattr.c ecryptfs: Switch to generic xattr handlers sockfs: Get rid of getxattr iop sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names kernfs: Switch to generic xattr handlers hfs: Switch to generic xattr handlers jffs2: Remove jffs2_{get,set,remove}xattr macros xattr: Remove unnecessary NULL attribute name check
| * | vfs: Remove {get,set,remove}xattr inode operationsAndreas Gruenbacher2016-10-072-9/+0
| |/ | | | | | | | | | | | | These inode operations are no longer used; remove them. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2016-10-107-53/+61
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Ceph updates from Ilya Dryomov: "The big ticket item here is support for rbd exclusive-lock feature, with maintenance operations offloaded to userspace (Douglas Fuller, Mike Christie and myself). Another block device bullet is a series fixing up layering error paths (myself). On the filesystem side, we've got patches that improve our handling of buffered vs dio write races (Neil Brown) and a few assorted fixes from Zheng. Also included a couple of random cleanups and a minor CRUSH update" * tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client: (39 commits) crush: remove redundant local variable crush: don't normalize input of crush_ln iteratively libceph: ceph_build_auth() doesn't need ceph_auth_build_hello() libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello() ceph: fix description for rsize and rasize mount options rbd: use kmalloc_array() in rbd_header_from_disk() ceph: use list_move instead of list_del/list_add ceph: handle CEPH_SESSION_REJECT message ceph: avoid accessing / when mounting a subpath ceph: fix mandatory flock check ceph: remove warning when ceph_releasepage() is called on dirty page ceph: ignore error from invalidate_inode_pages2_range() in direct write ceph: fix error handling of start_read() rbd: add rbd_obj_request_error() helper rbd: img_data requests don't own their page array rbd: don't call rbd_osd_req_format_read() for !img_data requests rbd: rework rbd_img_obj_exists_submit() error paths rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback() rbd: move bumping img_request refcount into rbd_obj_request_submit() rbd: mark the original request as done if stat request fails ...
| * | ceph: use list_move instead of list_del/list_addWei Yongjun2016-10-031-2/+1
| | | | | | | | | | | | | | | | | | | | | Using list_move() instead of list_del() + list_add(). Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | ceph: handle CEPH_SESSION_REJECT messageYan, Zheng2016-10-033-5/+25
| | | | | | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: avoid accessing / when mounting a subpathYan, Zheng2016-10-031-29/+20
| | | | | | | | | | | | | | | | | | Accessing / causes failuire if the client has caps that restrict path Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: fix mandatory flock checkYan, Zheng2016-10-031-2/+2
| | | | | | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: remove warning when ceph_releasepage() is called on dirty pageNeilBrown2016-10-031-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If O_DIRECT writes are racing with buffered writes, then the call to invalidate_inode_pages2_range() can call ceph_releasepage() on dirty pages. Most filesystems hold inode_lock() across O_DIRECT writes so they do not suffer this race, but cephfs deliberately drops the lock, and opens a window for the race. This race can be triggered with the generic/036 test from the xfstests test suite. It doesn't happen every time, but it does happen often. As the possibilty is expected, remove the warning, and instead include the PageDirty() status in the debug message. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: ignore error from invalidate_inode_pages2_range() in direct writeNeilBrown2016-10-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This call can fail if there are dirty pages. The preceding call to filemap_write_and_wait_range() will normally remove dirty pages, but as inode_lock() is not held over calls to ceph_direct_read_write(), it could race with non-direct writes and pages could be dirtied immediately after filemap_write_and_wait_range() returns If there are dirty pages, they will be removed by the subsequent call to truncate_inode_pages_range(), so having them here is not a problem. If the 'ret' value is left holding an error, then in the async IO case (aio_req is not NULL) the loop that would normally call ceph_osdc_start_request() will see the error in 'ret' and abort all requests. This doesn't seem like correct behaviour. So use separate 'ret2' instead of overloading 'ret'. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * | ceph: fix error handling of start_read()Yan, Zheng2016-10-031-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If start_page() fails to add a page to page cache or fails to send OSD request. It should cal put_page() (instead of free_page()) for relevant pages. Besides, start_page() need to cancel fscache readpage if it fails to send OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reported-by: Zhi Zhang <zhang.david2011@gmail.com>
* | | Merge remote-tracking branch 'jk/vfs' into work.miscAl Viro2016-10-082-12/+18
|\ \ \ | |_|/ |/| |
| * | fs: Give dentry to inode_change_ok() instead of inodeJan Kara2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
| * | ceph: Propagate dentry down to inode_change_ok()Jan Kara2016-09-222-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid clearing of capabilities or security related extended attributes too early, inode_change_ok() will need to take dentry instead of inode. ceph_setattr() has the dentry easily available but __ceph_setattr() is also called from ceph_set_acl() where dentry is not easily available. Luckily that call path does not need inode_change_ok() to be called anyway. So reorganize functions a bit so that inode_change_ok() is called only from paths where dentry is available. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | posix_acl: Clear SGID bit when setting file permissionsJan Kara2016-09-221-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When file permissions are modified via chmod(2) and the user is not in the owning group or capable of CAP_FSETID, the setgid bit is cleared in inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file permissions as well as the new ACL, but doesn't clear the setgid bit in a similar way; this allows to bypass the check in chmod(2). Fix that. References: CVE-2016-7097 Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | | ceph: do not modify fi->frag in need_reset_readdir()Nicolas Iooss2016-09-051-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |= fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a comparison operator with an assignment one. This looks like a typo which is reported by clang when building the kernel with some warning flags: fs/ceph/dir.c:600:22: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] } else if (fi->frag |= fpos_frag(new_pos)) { ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ fs/ceph/dir.c:600:22: note: place parentheses around the assignment to silence this warning } else if (fi->frag |= fpos_frag(new_pos)) { ^ ( ) fs/ceph/dir.c:600:22: note: use '!=' to turn this compound assignment into an inequality comparison } else if (fi->frag |= fpos_frag(new_pos)) { ^~ != Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: initialize pathbase in the !dentry case in encode_caps_cb()Ilya Dryomov2016-08-091-0/+1
| | | | | | | | | | | | | | | | pathbase is the base inode; set it to 0 if we've got no path. Coverity-id: 146348 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
* | ceph: fix null pointer dereference in ceph_flush_snaps()Yan, Zheng2016-08-081-1/+4
|/ | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
* Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2016-08-0213-752/+999
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Ceph updates from Ilya Dryomov: "The highlights are: - RADOS namespace support in libceph and CephFS (Zheng Yan and myself). The stopgaps added in 4.5 to deny access to inodes in namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature bit is now fully supported - A large rework of the MDS cap flushing code (Zheng Yan) - Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were overly pessimistic before, bailing at the first sight of LOOKUP_RCU On top of that we've got a few CephFS bug fixes, a couple of cleanups and Arnd's workaround for a weird genksyms issue" * tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits) ceph: fix symbol versioning for ceph_monc_do_statfs ceph: Correctly return NXIO errors from ceph_llseek ceph: Mark the file cache as unreclaimable ceph: optimize cap flush waiting ceph: cleanup ceph_flush_snaps() ceph: kick cap flushes before sending other cap message ceph: introduce an inode flag to indicates if snapflush is needed ceph: avoid sending duplicated cap flush message ceph: unify cap flush and snapcap flush ceph: use list instead of rbtree to track cap flushes ceph: update types of some local varibles ceph: include 'follows' of pending snapflush in cap reconnect message ceph: update cap reconnect message to version 3 ceph: mount non-default filesystem by name libceph: fsmap.user subscription support ceph: handle LOOKUP_RCU in ceph_d_revalidate ceph: allow dentry_lease_is_valid to work under RCU walk ceph: clear d_fsinfo pointer under d_lock ceph: remove ceph_mdsc_lease_release ceph: don't use ->d_time ...
| * ceph: Correctly return NXIO errors from ceph_llseekPhil Turnbull2016-07-281-7/+5
| | | | | | | | | | | | | | | | | | ceph_llseek does not correctly return NXIO errors because the 'out' path always returns 'offset'. Fixes: 06222e491e66 ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek") Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: Mark the file cache as unreclaimableNikolay Borisov2016-07-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so that it can satisfy its internal needs. Inspecting the code shows that most of the caches are indeed reclaimable since they are directly related to the generic inode/dentry shrinkers. However, one of the cache used to satisfy struct file is not reclaimable since its entries are freed only when the last reference to the file is dropped. If a heavily loaded node opens a lot of files it can introduce non-trivial discrepancies between memory shown as reclaimable and what is actually reclaimed when drop_caches is used. Fix this by removing the reclaimable flag for the file's cache. Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: optimize cap flush waitingYan, Zheng2016-07-283-27/+73
| | | | | | | | | | | | | | | | | | | | | | | | Add a 'wake' flag to ceph_cap_flush struct, which indicates if there is someone waiting for it to finish. When getting flush ack message, we check the 'wake' flag in corresponding ceph_cap_flush struct to decide if we should wake up waiters. One corner case is that the acked cap flush has 'wake' flags is set, but it is not the first one on the flushing list. We do not wake up waiters in this case, set 'wake' flags of preceding ceph_cap_flush struct instead Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: cleanup ceph_flush_snaps()Yan, Zheng2016-07-283-88/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch devide __ceph_flush_snaps() into two stags. In the first stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them to cap flush lists. __ceph_flush_snaps() keeps holding the i_ceph_lock in this stagge. So inode's auth cap can not change. In the second stage, __ceph_flush_snaps() send flushsnap cap messages. i_ceph_lock is unlocked before sending each cap message. If auth cap changes in the middle, __ceph_flush_snaps() just stops. This is OK because kick_flushing_inode_caps() will re-send flushsnap cap messages to inode's new auth MDS. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: kick cap flushes before sending other cap messageYan, Zheng2016-07-281-9/+34
| | | | | | | | | | | | | | If ceph_check_caps() wants to send cap message to a recovering MDS, make sure it kicks cap flushes first. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: introduce an inode flag to indicates if snapflush is neededYan, Zheng2016-07-283-15/+36
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: avoid sending duplicated cap flush messageYan, Zheng2016-07-282-1/+18
| | | | | | | | | | | | | | make ceph_kick_flushing_caps() ignore inodes whose cap flushes have already been re-sent by ceph_early_kick_flushing_caps() Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: unify cap flush and snapcap flushYan, Zheng2016-07-285-222/+175
| | | | | | | | | | | | | | | | | | | | | | | | This patch includes following changes - Assign flush tid to snapcap flush - Remove session's s_cap_snaps_flushing list. Add inode to session's s_cap_flushing list instead. Inode is removed from the list when there is no pending snapcap flush or cap flush. - make __kick_flushing_caps() re-send both snapcap flushes and cap flushes. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: use list instead of rbtree to track cap flushesYan, Zheng2016-07-285-118/+56
| | | | | | | | | | | | | | | | We don't have requirement of searching cap flush by TID. In most cases, we just need to know TID of the oldest cap flush. List is ideal for this usage. Signed-off-by: Yan, Zheng <zyan@redhat.com>
OpenPOWER on IntegriCloud