summaryrefslogtreecommitdiffstats
path: root/fs/ceph/mds_client.c
Commit message (Collapse)AuthorAgeFilesLines
...
* ceph: use i_release_count to indicate dir's completenessYan, Zheng2013-05-011-7/+3
| | | | | | | | | | | | | Current ceph code tracks directory's completeness in two places. ceph_readdir() checks i_release_count to decide if it can set the I_COMPLETE flag in i_ceph_flags. All other places check the I_COMPLETE flag. This indirection introduces locking complexity. This patch adds a new variable i_complete_count to ceph_inode_info. Set i_release_count's value to it when marking a directory complete. By comparing the two variables, we know if a directory is complete Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: only set message data pointers if non-emptyAlex Elder2013-05-011-3/+10
| | | | | | | | | | | | | | Change it so we only assign outgoing data information for messages if there is outgoing data to send. This then allows us to add a few more (currently commented-out) assertions. This is related to: http://tracker.ceph.com/issues/4284 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* libceph: isolate other message data fieldsAlex Elder2013-05-011-1/+1
| | | | | | | | | | | | | Define ceph_msg_data_set_pagelist(), ceph_msg_data_set_bio(), and ceph_msg_data_set_trail() to clearly abstract the assignment of the remaining data-related fields in a ceph message structure. Use the new functions in the osd client and mds client. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: set page info with byte lengthAlex Elder2013-05-011-1/+1
| | | | | | | | When setting page array information for message data, provide the byte length rather than the page count ceph_msg_data_set_pages(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: isolate message page field manipulationAlex Elder2013-05-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Define a function ceph_msg_data_set_pages(), which more clearly abstracts the assignment page-related fields for data in a ceph message structure. Use this new function in the osd client and mds client. Ideally, these fields would never be set more than once (with BUG_ON() calls to guarantee that). At the moment though the osd client sets these every time it receives a message, and in the event of a communication problem this can happen more than once. (This will be resolved shortly, but setting up these helpers first makes it all a bit easier to work with.) Rearrange the field order in a ceph_msg structure to group those that are used to define the possible data payloads. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: no need for alignment for mds messageAlex Elder2013-05-011-1/+0
| | | | | | | | Currently, incoming mds messages never use page data, which means there is no need to set the page_alignment field in the message. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* libceph: define mds_alloc_msg() methodAlex Elder2013-05-011-0/+23
| | | | | | | | | | | | | The only user of the ceph messenger that doesn't define an alloc_msg method is the mds client. Define one, such that it works just like it did before, and simplify ceph_con_in_msg_alloc() by assuming the alloc_msg method is always present. This and the next patch resolve: http://tracker.ceph.com/issues/4322 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* libceph: kill ceph_msg->pagelist_countAlex Elder2013-05-011-1/+0
| | | | | | | The pagelist_count field is never actually used, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* ceph: revert commit 22cddde104Sage Weil2013-05-011-0/+1
| | | | | | | | | | | commit 22cddde104 breaks the atomicity of write operation, it also introduces a deadlock between write and truncate. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Conflicts: fs/ceph/addr.c
* ceph: use I_COMPLETE inode flag instead of D_COMPLETE flagYan, Zheng2013-05-011-3/+3
| | | | | | | | | | | | | | | | | commit c6ffe10015 moved the flag that tracks if the dcache contents for a directory are complete to dentry. The problem is there are lots of places that use ceph_dir_{set,clear,test}_complete() while holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may sleep because they call dput(). This patch basically reverts that commit. For ceph_d_prune(), it's called with both the dentry to prune and the parent dentry are locked. So it's safe to access the parent dentry's d_inode and clear I_COMPLETE flag. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: queue cap release when trimming capYan, Zheng2013-05-011-0/+2
| | | | | | | So the client will later send cap release message to MDS Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* ceph: fix LSSNAP regressionYan, Zheng2013-05-011-1/+2
| | | | | | | commit 6e8575faa8 makes parse_reply_info_extra() return -EIO for LSSNAP Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* libceph: distinguish page array and pagelist countAlex Elder2013-05-011-2/+2
| | | | | | | | | Use distinct fields for tracking the number of pages in a message's page array and in a message's page list. Currently only one or the other is used at a time, but that will be changing soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-02-281-2/+31
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "A few groups of patches here. Alex has been hard at work improving the RBD code, layout groundwork for understanding the new formats and doing layering. Most of the infrastructure is now in place for the final bits that will come with the next window. There are a few changes to the data layout. Jim Schutt's patch fixes some non-ideal CRUSH behavior, and a set of patches from me updates the client to speak a newer version of the protocol and implement an improved hashing strategy across storage nodes (when the server side supports it too). A pair of patches from Sam Lang fix the atomicity of open+create operations. Several patches from Yan, Zheng fix various mds/client issues that turned up during multi-mds torture tests. A final set of patches expose file layouts via virtual xattrs, and allow the policies to be set on directories via xattrs as well (avoiding the awkward ioctl interface and providing a consistent interface for both kernel mount and ceph-fuse users)." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits) libceph: add support for HASHPSPOOL pool flag libceph: update osd request/reply encoding libceph: calculate placement based on the internal data types ceph: update support for PGID64, PGPOOL3, OSDENC protocol features ceph: update "ceph_features.h" libceph: decode into cpu-native ceph_pg type libceph: rename ceph_pg -> ceph_pg_v1 rbd: pass length, not op for osd completions rbd: move rbd_osd_trivial_callback() libceph: use a do..while loop in con_work() libceph: use a flag to indicate a fault has occurred libceph: separate non-locked fault handling libceph: encapsulate connection backoff libceph: eliminate sparse warnings ceph: eliminate sparse warnings in fs code rbd: eliminate sparse warnings libceph: define connection flag helpers rbd: normalize dout() calls rbd: barriers are hard rbd: ignore zero-length requests ...
| * ceph: Check for created flag in response from mdsSam Lang2013-01-171-2/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mds now sends back a created inode if the create request performed the create. If the file already existed, no inode is returned in the reply. This allows ceph to set the created flag in atomic_open so that permissions are properly checked in the case that the file wasn't created by the create call to the mds. To ensure compability with previous kernels, a feature for sending back the inode in the create reply was added, so that the mds will only send back the inode if the client indicates it supports the feature. Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | ceph: Convert struct ceph_mds_request to use kuid_t and kgid_tEric W. Biederman2013-02-121-2/+2
|/ | | | | | | | | | Hold the uid and gid for a pending ceph mds request using the types kuid_t and kgid_t. When a request message is finally created convert the kuid_t and kgid_t values into uids and gids in the initial user namespace. Cc: Sage Weil <sage@inktank.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* ceph: Fix infinite loop in __wake_requestsYan, Zheng2012-12-131-2/+7
| | | | | | | | | | __wake_requests() will enter infinite loop if we use it to wake requests in the session->s_waiting list. __wake_requests() deletes requests from the list and __do_request() adds requests back to the list. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: Fix NULL ptr crash in strlen()David Zafman2012-10-261-1/+1
| | | | | | | | | set_request_path_attr() checks for NULL ptr before calling strlen() This fixes http://tracker.newdream.net/issues/3404 Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: Fix oops when handling mdsmap that decreases max_mdsYan, Zheng2012-10-011-1/+2
| | | | | | | | When i >= newmap->m_max_mds, ceph_mdsmap_get_addr(newmap, i) return NULL. Passing NULL to memcmp() triggers oops. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: close old con before reopening on mds reconnectSage Weil2012-07-301-0/+1
| | | | | | | | | | When we detect a mds session reset, close the old ceph_connection before reopening it. This ensures we clean up the old socket properly and keep the ceph_connection state correct. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
* libceph: move feature bits to separate headerSage Weil2012-07-301-0/+1
| | | | | | | | | This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
* ceph: clean up useless d_parent checksSage Weil2012-07-301-11/+0
| | | | | | | | d_parent is never NULL, and IS_ROOT() is the proper way to check for a (non-self-referential) parent. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com>
* libceph: set peer name on con_open, not initSage Weil2012-07-051-3/+4
| | | | | | | The peer name may change on each open attempt, even when the connection is reused. Signed-off-by: Sage Weil <sage@inktank.com>
* libceph: fully initialize connection in con_init()Alex Elder2012-06-061-5/+2
| | | | | | | | | | | Move the initialization of a ceph connection's private pointer, operations vector pointer, and peer name information into ceph_con_init(). Rearrange the arguments so the connection pointer is first. Hide the byte-swapping of the peer entity number inside ceph_con_init() Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* libceph: embed ceph messenger structure in ceph_clientAlex Elder2012-06-011-1/+1
| | | | | | | | | | | | | | A ceph client has a pointer to a ceph messenger structure in it. There is always exactly one ceph messenger for a ceph client, so there is no need to allocate it separate from the ceph client structure. Switch the ceph_client structure to embed its ceph_messenger structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: use info returned by get_authorizerAlex Elder2012-05-171-8/+1
| | | | | | | | | | Rather than passing a bunch of arguments to be filled in with the content of the ceph_auth_handshake buffer now returned by the get_authorizer method, just use the returned information in the caller, and drop the unnecessary arguments. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: have get_authorizer methods return pointersAlex Elder2012-05-171-7/+13
| | | | | | | | | | Have the get_authorizer auth_client method return a ceph_auth pointer rather than an integer, pointer-encoding any returned error value. This is to pave the way for making use of the returned value in an upcoming patch. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: ensure auth ops are defined before useAlex Elder2012-05-171-8/+6
| | | | | | | | | | | | | | | | In the create_authorizer method for both the mds and osd clients, the auth_client->ops pointer is blindly dereferenced. There is no obvious guarantee that this pointer has been assigned. And furthermore, even if the ops pointer is non-null there is definitely no guarantee that the create_authorizer or destroy_authorizer methods are defined. Add checks in both routines to make sure they are defined (non-null) before use. Add similar checks in a few other spots in these files while we're at it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: messenger: reduce args to create_authorizerAlex Elder2012-05-171-15/+12
| | | | | | | | | | Make use of the new ceph_auth_handshake structure in order to reduce the number of arguments passed to the create_authorizor method in ceph_auth_client_ops. Use a local variable of that type as a shorthand in the get_authorizer method definitions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: define ceph_auth_handshake typeAlex Elder2012-05-171-16/+16
| | | | | | | | | | | | | The definitions for the ceph_mds_session and ceph_osd both contain five fields related only to "authorizers." Encapsulate those fields into their own struct type, allowing for better isolation in some upcoming patches. Fix the #includes in "linux/ceph/osd_client.h" to lay out their more complete canonical path. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: don't reset s_cap_ttl to zeroAlex Elder2012-03-221-4/+3
| | | | | | | | | Avoid the need to check for a special zero s_cap_ttl value by just using (jiffies - 1) as the value assigned to indicate "sometime in the past." Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-02-021-3/+7
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix safety of rbd_put_client() rbd: fix a memory leak in rbd_get_client() ceph: create a new session lock to avoid lock inversion ceph: fix length validation in parse_reply_info() ceph: initialize client debugfs outside of monc->mutex ceph: change "ceph.layout" xattr to be "ceph.file.layout"
| * ceph: create a new session lock to avoid lock inversionAlex Elder2012-02-021-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep was reporting a possible circular lock dependency in dentry_lease_is_valid(). That function needs to sample the session's s_cap_gen and and s_cap_ttl fields coherently, but needs to do so while holding a dentry lock. The s_cap_lock field was being used to protect the two fields, but that can't be taken while holding a lock on a dentry within the session. In most cases, the s_cap_gen and s_cap_ttl fields only get operated on separately. But in three cases they need to be updated together. Implement a new lock to protect the spots updating both fields atomically is required. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
| * ceph: fix length validation in parse_reply_info()Xi Wang2012-02-021-0/+2
| | | | | | | | | | | | | | | | | | "len" is read from network and thus needs validation. Otherwise, given a bogus "len" value, p+len could be an out-of-bounds pointer, which is used in further parsing. Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | ceph: remove unnecessary d_fsdata conditional checksSage Weil2012-01-101-2/+2
|/ | | | | | | We now set d_fsdata unconditionally on all dentries prior to setting up the d_ops, so all of these checks are unnecessary. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add missing spin_unlock at ceph_mdsc_build_path()Yehuda Sadeh2011-12-131-0/+1
| | | | | | one of the paths was missing spin_unlock Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: use i_ceph_lock instead of i_lockSage Weil2011-12-071-16/+16
| | | | | | | | | | | | | | | | | We have been using i_lock to protect all kinds of data structures in the ceph_inode_info struct, including lists of inodes that we need to iterate over while avoiding races with inode destruction. That requires grabbing a reference to the inode with the list lock protected, but igrab() now takes i_lock to check the inode flags. Changing the list lock ordering would be a painful process. However, using a ceph-specific i_ceph_lock in the ceph inode instead of i_lock is a simple mechanical change and avoids the ordering constraints imposed by igrab(). Reported-by: Amon Ott <a.ott@m-privacy.de> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph/mds_client.c: quiet sparse noiseH Hartley Sweeten2011-11-051-2/+2
| | | | | | | | | | | | | Quiet the following sparse noise: warning: symbol 'get_nonsnap_parent' was not declared. Should it be static? warning: symbol 'done_closing_sessions' was not declared. Should it be static? Local functions don't need external visability. Make them static. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Sage Weil <sage@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use new D_COMPLETE dentry flagSage Weil2011-11-051-3/+3
| | | | | | | | We used to use a flag on the directory inode to track whether the dcache contents for a directory were a complete cached copy. Switch to a dentry flag CEPH_D_COMPLETE that is safely updated by ->d_prune(). Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: don't complain on msgpool alloc failuresSage Weil2011-10-251-5/+6
| | | | | | | | | | The pool allocation failures are masked by the pool; there is no need to spam the console about them. (That's the whole point of having the pool in the first place.) Mark msg allocations whose failure is safely handled as such. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix encoding of ino only (not relative) pathsSage Weil2011-08-151-1/+1
| | | | | | | | A 'path' consists of a starting ino and relative component. Encode even when there is no relative component. This is primarily needed by the NFS reexport code. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: document unlocked d_parent accessesSage Weil2011-07-261-3/+10
| | | | | | | | | For the most part we don't care about racing with rename when directing MDS requests; either the old or new parent is fine. Document that, and do some minor cleanup. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: explicitly reference rename old_dentry parent dir in requestSage Weil2011-07-261-10/+13
| | | | | | | | | | | We carry a pin on the parent directory for the rename source and dest dentries. For the source it's r_locked_dir; we need to explicitly reference the old_dentry parent as well, since the dentry's d_parent may change between when the request was created and pinned and when it is freed. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bugSage Weil2011-07-261-1/+1
| | | | | | | | | | | | | | Have caller pass in a safely-obtained reference to the parent directory for calculating a dentry's hash valud. While we're here, simpify the flow through ceph_encode_fh() so that there is a single exit point and cleanup. Also fix a bug with the dentry hash calculation: calculate the hash for the dentry we were given, not its parent. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: ignore lease maskSage Weil2011-07-261-11/+7
| | | | | | | | The lease mask is no longer used (and it changed a while back). Instead, use a non-zero duration to indicate that there is a lease being issued. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph analog of cifs build_path_from_dentry() race fixAl Viro2011-07-161-3/+16
| | | | | | ... unfortunately, cifs bug got copied. Fix is essentially the same. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ceph: fix cap flush race reentrancySage Weil2011-05-241-0/+1
| | | | | | | | | | | | | | | | | | | | | In e9964c10 we change cap flushing to do a delicate dance because some inodes on the cap_dirty list could be in a migrating state (got EXPORT but not IMPORT) in which we couldn't actually flush and move from dirty->flushing, breaking the while (!empty) { process first } loop structure. It worked for a single sync thread, but was not reentrant and triggered infinite loops when multiple syncers came along. Instead, move inodes with dirty to a separate cap_dirty_migrating list when in the limbo export-but-no-import state, allowing us to go back to the simple loop structure (which was reentrant). This is cleaner and more robust. Audited the cap_dirty users and this looks fine: list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we have dirty caps (which list we're on is irrelevant) and list_del_init() calls still do the right thing. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: remove unused variableSage Weil2011-05-191-2/+0
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: take reference on mds request r_unsafe_dirSage Weil2011-05-191-0/+4
| | | | | | | | | | We put ourselves on an inode list for the parent directory of metadata operations so that an fsync on the directory will wait for metadata updates to commit to disk. We weren't holding a reference to that directory, however, and under certain workloads (fsstress in this case) the directory can go away. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: print debug message before put mds sessionHenry C Chang2011-05-111-1/+1
| | | | | | | | The mds session, s, could be freed during ceph_put_mds_session. Move dout before ceph_put_mds_session. Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
OpenPOWER on IntegriCloud