summaryrefslogtreecommitdiffstats
path: root/fs/ceph/mds_client.c
Commit message (Collapse)AuthorAgeFilesLines
...
* ceph: fix use after free on mds __unregister_requestSage Weil2010-03-281-1/+2
| | | | | | | | There was a use after free in __unregister_request that would trigger whenever the request map held the last reference. This appears to have triggered an oops during 'umount -f' when requests are being torn down. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix session check on mds replySage Weil2010-03-231-1/+1
| | | | | | | | | Fix a broken check that a reply came back from the same MDS we sent the request to. I don't think a case that actually triggers this would ever come up in practice, but it's clearly wrong and easy to fix. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: handle kmalloc() failureDan Carpenter2010-03-231-0/+2
| | | | | | | | Return ERR_PTR(-ENOMEM) if kmalloc() fails. We handle allocation failures the same way later in the function. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: propagate mds session allocation failures to callerSage Weil2010-03-231-1/+6
| | | | | | Return error to original caller if register_session() fails. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: prevent dup stale messages to console for restarting mdsSage Weil2010-03-231-1/+1
| | | | | | | | Prevent duplicate 'mds0 caps stale' message from spamming the console every few seconds while the MDS restarts. Set s_renew_requested earlier, so that we only print the message once, even if we don't send an actual request. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix mds sync() race with completing requestsSage Weil2010-03-231-7/+20
| | | | | | | | | | | | | | | | | | | | The wait_unsafe_requests() helper dropped the mdsc mutex to wait for each request to complete, and then examined r_node to get the next request after retaking the lock. But the request completion removes the request from the tree, so r_node was always undefined at this point. Since it's a small race, it usually led to a valid request, but not always. The result was an occasional crash in rb_next() while dereferencing node->rb_left. Fix this by clearing the rb_node when removing the request from the request tree, and not walking off into the weeds when we are done waiting for a request. Since the request we waited on will _always_ be out of the request tree, take a ref on the next request, in the hopes that it won't be. But if it is, it's ok: we can start over from the beginning (and traverse over older read requests again). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: remove bogus mds forward warningSage Weil2010-02-261-5/+1
| | | | | | | The must_resend flag is always true, not false. In any case, we can just ignore it anyway. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix client_request_forward decodingSage Weil2010-02-231-8/+2
| | | | | | | | | The tid is in the message header, not body. Broken since 6df058c0. No need to look at next mds session; just mark the request and be done. (The old error path was broken too, but now it's gone.) Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: drop messages on unregistered mds sessions; cleanupSage Weil2010-02-231-43/+42
| | | | | | | | | | Verify the mds session is currently registered before handling incoming messages. Clean up message handlers to pull mds out of session->s_mds instead of less trustworthy src field. Clean up con_{get,put} debug output. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix iterate_caps removal raceSage Weil2010-02-171-9/+42
| | | | | | | | | | | | | | | | | | | | | We need to be able to iterate over all caps on a session with a possibly slow callback on each cap. To allow this, we used to prevent cap reordering while we were iterating. However, we were not safe from races with removal: removing the 'next' cap would make the next pointer from list_for_each_entry_safe be invalid, and cause a lock up or similar badness. Instead, we keep an iterator pointer in the session pointing to the current cap. As before, we avoid reordering. For removal, if the cap isn't the current cap we are iterating over, we are fine. If it is, we clear cap->ci (to mark the cap as pending removal) but leave it in the session list. In iterate_caps, we can safely finish removal and get the next cap pointer. While we're at it, clean up put_cap to not take a cap reservation context, as it was never used. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use rbtree for snap_realmsSage Weil2010-02-161-11/+5
| | | | | | | Switch from radix tree to rbtree for snap realms. This is much more appropriate given that realm keys are few and far between. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use rbtree for mds requestsSage Weil2010-02-161-59/+90
| | | | | | | | | | The rbtree is a more appropriate data structure than a radix_tree. It avoids extra memory usage and simplifies the code. It also fixes a bug where the debugfs 'mdsc' file wasn't including the most recent mds request. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: allow renewal of auth credentialsSage Weil2010-02-101-0/+13
| | | | | | | | | Add infrastructure to allow the mon_client to periodically renew its auth credentials. Also add a messenger callback that will force such a renewal if a peer rejects our authenticator. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: include type in ceph_entity_addr, filepathSage Weil2010-01-291-1/+1
| | | | | | | Include a type/version in ceph_entity_addr and filepath. Include extra byte in filepath encoding as necessary. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: allocate middle of message before stating to readYehuda Sadeh2010-01-251-2/+0
| | | | | | | Both front and middle parts of the message are now being allocated at the ceph_alloc_msg(). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: properly handle aborted mds requestsSage Weil2010-01-251-5/+23
| | | | | | | | | | | | | | | | | | Previously, if the MDS request was interrupted, we would unregister the request and ignore any reply. This could cause the caps or other cache state to become out of sync. (For instance, aborting dbench and doing rm -r on clients would complain about a non-empty directory because the client didn't realize it's aborted file create request completed.) Even we don't unregister, we still can't process the reply normally because we are no longer holding the caller's locks (like the dir i_mutex). So, mark aborted operations with r_aborted, and in the reply handler, be sure to process all the caps. Do not process the namespace changes, though, since we no longer will hold the dir i_mutex. The dentry lease state can also be ignored as it's more forgiving. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use ceph_pagelist for mds reconnect message; change encoding (protocol ↵Sage Weil2009-12-231-100/+56
| | | | | | | | | | | | change) Use the ceph_pagelist to encode the MDS reconnect message. We change the message encoding (protocol change!) at the same time to make our life easier (we don't know how many snaprealms we have when we start encoding). An empty message implies the session is closed/does not exist. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: include transaction id in ceph_msg_header (protocol change)Sage Weil2009-12-231-2/+3
| | | | | | | | | | Many (most?) message types include a transaction id. By including it in the fixed size header, we always have it available even when we are unable to allocate memory for the (larger, variable sized) message body. This will allow us to error out the appropriate request instead of (silently) dropping the reply. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do not touch_caps while iterating over caps listSage Weil2009-12-231-4/+10
| | | | | | | | | | Avoid confusing iterate_session_caps(), flag the session while we are iterating so that __touch_cap does not rearrange items on the list. All other modifiers of session->s_caps do so under the protection of s_mutex. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: make mds ops interruptibleSage Weil2009-12-211-6/+9
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: hex dump corrupt server data to KERN_DEBUGSage Weil2009-12-211-0/+4
| | | | | | Also, print fsid using standard format, NOT hex dump. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use kref for struct ceph_mds_requestSage Weil2009-12-071-35/+34
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: reset requested max_size after mds reconnectSage Weil2009-11-201-4/+17
| | | | | | | | | | | | The max_size increase request to the MDS can get lost during an MDS restart and reconnect. Reset our requested value after the MDS recovers, so that any blocked writes will re-request a larger max_size upon waking. Also, explicit wake session caps after the reconnect. Normally the cap renewal catches this, but not in the cases where the caps didn't go stale in the first place, which would leave writers waiting on max_size asleep. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix debugfs entry, simplify fsid checksSage Weil2009-11-201-10/+2
| | | | | | | | | | | We may first learn our fsid from any of the mon, osd, or mds maps (whichever the monitor sends first). Consolidate checks in a single helper. Initialize the client debugfs entry then, since we need the fsid (and global_id) for the directory name. Also remove dead mount code. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: negotiate authentication protocol; implement AUTH_NONE protocolSage Weil2009-11-181-4/+65
| | | | | | | | | | | | | | | | When we open a monitor session, we send an initial AUTH message listing the auth protocols we support, our entity name, and (possibly) a previously assigned global_id. The monitor chooses a protocol and responds with an initial message. Initially implement AUTH_NONE, a dummy protocol that provides no security, but works within the new framework. It generates 'authorizers' that are used when connecting to (mds, osd) services that simply state our entity name and global_id. This is a wire protocol change. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: handle errors during osd client initSage Weil2009-11-181-1/+2
| | | | | | Unwind initializing if we get ENOMEM during client initialization. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: remove bad calls to ceph_con_shutdownSage Weil2009-11-181-12/+18
| | | | | | | | | We want to ceph_con_close when we're done with the connection, before the ref count reaches 0. Once it does, do not call ceph_con_shutdown, as that takes the con mutex and may sleep, and besides that is unnecessary. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: pr_info when mds reconnect completesSage Weil2009-11-111-0/+1
| | | | | | | This helps the user know what's going on during the (involved) reconnect process. They already see when the mds fails and reconnect starts. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: remove recon_gen logicSage Weil2009-11-101-14/+1
| | | | | | | | We don't get an explicit affirmative confirmation that our caps reconnect, nor do we necessarily want to pay that cost. So, take all this code out for now. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do not confuse stale and dead (unreconnected) capsSage Weil2009-11-091-3/+6
| | | | | | | | | | | | We were using the cap_gen to track both stale caps (caps that timed out due to temporarily losing touch with the mds) and dead caps that did not reconnect after an MDS failure. Introduce a recon_gen counter to track reconnections to restarted MDSs and kill dead caps based on that instead. Rename gen to cap_gen while we're at it to make it more clear which is which. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: allocate and parse mount args before client instanceSage Weil2009-10-271-3/+3
| | | | | | | | This simplifies much of the error handling during mount. It also means that we have the mount args before client creation, and we can initialize based on those options. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: flush dirty caps via the cap_dirty listSage Weil2009-10-151-3/+3
| | | | | | | | | | | | | | | | | | Previously we were flushing dirty caps by passing an extra flag when traversing the delayed caps list. Besides being a bit ugly, that can also miss caps that are dirty but didn't result in a cap requeue: notably, mark_caps_dirty(). Separate the flushing into a separate helper, and traverse the cap_dirty list. This also brings i_dirty_item in line with i_dirty_caps: we are on the list IFF caps != 0. We carry an inode ref IFF dirty_caps|flushing_caps != 0. Lose the unused return value from __ceph_mark_caps_dirty(). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: convert encode/decode macros to inlinesSage Weil2009-10-141-10/+10
| | | | | | | This avoids the fugly pass by reference and makes the code a bit easier to read. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: MDS clientSage Weil2009-10-061-0/+2912
The MDS (metadata server) client is responsible for submitting requests to the MDS cluster and parsing the response. We decide which MDS to submit each request to based on cached information about the current partition of the directory hierarchy across the cluster. A stateful session is opened with each MDS before we submit requests to it, and a mutex is used to control the ordering of messages within each session. An MDS request may generate two responses. The first indicates the operation was a success and returns any result. A second reply is sent when the operation commits to disk. Note that locking on the MDS ensures that the results of updates are visible only to the updating client before the operation commits. Requests are linked to the containing directory so that an fsync will wait for them to commit. If an MDS fails and/or recovers, we resubmit requests as needed. We also reconnect existing capabilities to a recovering MDS to reestablish that shared session state. Old dentry leases are invalidated. Signed-off-by: Sage Weil <sage@newdream.net>
OpenPOWER on IntegriCloud