summaryrefslogtreecommitdiffstats
path: root/net/ceph
Commit message (Collapse)AuthorAgeFilesLines
...
* libceph: make ceph_msgr_wq privateAlex Elder2012-03-221-1/+1
| | | | | | | | The messenger workqueue has no need to be public. So give it static scope. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: encapsulate connection kvec operationsAlex Elder2012-03-221-61/+56
| | | | | | | | | | | | | Encapsulate the operation of adding a new chunk of data to the next open slot in a ceph_connection's out_kvec array. Also add a "reset" operation to make subsequent add operations start at the beginning of the array again. Use these routines throughout, avoiding duplicate code and ensuring all calls are handled consistently. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: move prepare_write_banner()Alex Elder2012-03-221-3/+4
| | | | | | | | | | | | | | | One of the arguments to prepare_write_connect() indicates whether it is being called immediately after a call to prepare_write_banner(). Move the prepare_write_banner() call inside prepare_write_connect(), and reinterpret (and rename) the "after_banner" argument so it indicates that prepare_write_connect() should *make* the call rather than should know it has already been made. This was split out from the next patch to highlight this change in logic. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* rbd: make ceph_parse_options() return a pointerAlex Elder2012-03-221-8/+8
| | | | | | | | | | | | ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: eliminate some abusive castsAlex Elder2012-03-221-4/+4
| | | | | | | | | This fixes some spots where a type cast to (void *) was used as as a universal type hiding mechanism. Instead, properly cast the type to the intended target type. Signed-off-by: Alex Elder <elder@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: eliminate some needless castsAlex Elder2012-03-221-11/+10
| | | | | | | | This eliminates type casts in some places where they are not required. Signed-off-by: Alex Elder <elder@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: kill addr_str_lock spinlock; use atomic insteadAlex Elder2012-03-221-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | A spinlock is used to protect a value used for selecting an array index for a string used for formatting a socket address for human consumption. The index is reset to 0 if it ever reaches the maximum index value. Instead, use an ever-increasing atomic variable as a sequence number, and compute the array index by masking off all but the sequence number's lowest bits. Make the number of entries in the array a power of two to allow the use of such a mask (to avoid jumps in the index value when the sequence number wraps). The length of these strings is somewhat arbitrarily set at 60 bytes. The worst-case length of a string produced is 54 bytes, for an IPv6 address that can't be shortened, e.g.: [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767 Change it so we arbitrarily use 64 bytes instead; if nothing else it will make the array of these line up better in hex dumps. Rename a few things to reinforce the distinction between the number of strings in the array and the length of individual strings. Signed-off-by: Alex Elder <elder@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: make use of "else" where appropriateAlex Elder2012-03-221-7/+4
| | | | | | | | | | Rearrange ceph_tcp_connect() a bit, making use of "else" rather than re-testing a value with consecutive "if" statements. Don't record a connection's socket pointer unless the connect operation is successful. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use a shared zero page rather than one per messengerAlex Elder2012-03-221-14/+29
| | | | | | | | | Each messenger allocates a page to be used when writing zeroes out in the event of error or other abnormal condition. Instead, use the kernel ZERO_PAGE() for that purpose. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix overflow check in crush_decode()Xi Wang2012-03-221-1/+2
| | | | | | | | | | | The existing overflow check (n > ULONG_MAX / b) didn't work, because n = ULONG_MAX / b would both bypass the check and still overflow the allocation size a + n * b. The correct check should be (n > (ULONG_MAX - a) / b). Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the ↵Jim Schutt2012-03-221-6/+12
| | | | | | | | | | | | | | socket buffer The Ceph messenger would sometimes queue multiple work items to write data to a socket when the socket buffer was full. Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the same way that net/core/stream.c:sk_stream_write_space() does, i.e., clearing it only when sufficient space is available in the socket buffer. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Alex Elder <elder@dreamhost.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-02-022-3/+12
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix safety of rbd_put_client() rbd: fix a memory leak in rbd_get_client() ceph: create a new session lock to avoid lock inversion ceph: fix length validation in parse_reply_info() ceph: initialize client debugfs outside of monc->mutex ceph: change "ceph.layout" xattr to be "ceph.file.layout"
| * ceph: initialize client debugfs outside of monc->mutexSage Weil2012-02-022-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initializing debufs under monc->mutex introduces a lock dependency for sb->s_type->i_mutex_key, which (combined with several other dependencies) leads to an annoying lockdep warning. There's no particular reason to do the debugfs setup under this lock, so move it out. It used to be the case that our first monmap could come from the OSD; that is no longer the case with recent servers, so we will reliably set up the client entry during the initial authentication. We don't have to worry about racing with debugfs teardown by ceph_debugfs_client_cleanup() because ceph_destroy_client() calls ceph_msgr_flush() first, which will wait for the message dispatch work to complete (and the debugfs init to complete). Fixes: #1940 Signed-off-by: Sage Weil <sage@newdream.net>
* | libceph: remove useless return value for osd_client __send_request()Sage Weil2012-01-101-15/+6
| | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
* | crush: fix force for non-root TAKESage Weil2012-01-101-3/+8
| | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
* | ceph: Use kmemdup rather than duplicating its implementationThomas Meyer2012-01-101-2/+1
|/ | | | | | | | | | Use kmemdup rather than duplicating its implementation The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-12-131-22/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: add missing spin_unlock at ceph_mdsc_build_path() ceph: fix SEEK_CUR, SEEK_SET regression crush: fix mapping calculation when force argument doesn't exist ceph: use i_ceph_lock instead of i_lock rbd: remove buggy rollback functionality rbd: return an error when an invalid header is read ceph: fix rasize reporting by ceph_show_options
| * crush: fix mapping calculation when force argument doesn't existSage Weil2011-12-121-22/+13
| | | | | | | | | | | | | | If the force argument isn't valid, we should continue calculating a mapping as if it weren't specified. Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-11-211-1/+1
|\ \ | |/ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: Allocate larger oid buffer in request msgs ceph: initialize root dentry ceph: fix iput race when queueing inode work
| * libceph: Allocate larger oid buffer in request msgsStratos Psomadakis2011-11-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | ceph_osd_request struct allocates a 40-byte buffer for object names. RBD image names can be up to 96 chars long (100 with the .rbd suffix), which results in the object name for the image being truncated, and a subsequent map failure. Increase the oid buffer in request messages, in order to avoid the truncation. Signed-off-by: Stratos Psomadakis <psomas@grnet.gr> Signed-off-by: Sage Weil <sage@newdream.net>
* | net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modulesPaul Gortmaker2011-10-311-0/+1
|/ | | | | | | | | These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* libceph: force resend of osd requests if we skip an osdmapSage Weil2011-10-251-10/+16
| | | | | | | If we skip over one or more map epochs, we need to resend all osd requests because it is possible they remapped to other servers and then back. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use kernel DNS resolverNoah Watkins2011-10-252-12/+116
| | | | | | | | | | | | | Change ceph_parse_ips to take either names given as IP addresses or standard hostnames (e.g. localhost). The DNS lookup is done using the dns_resolver facility similar to its use in AFS, NFS, and CIFS. This patch defines CONFIG_CEPH_LIB_USE_DNS_RESOLVER that controls if this feature is on or off. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix ceph_monc_init memory leakNoah Watkins2011-10-251-3/+7
| | | | | | | failure clean up does not consider ceph_auth_init. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: warn on msg allocation failuresSage Weil2011-10-251-0/+1
| | | | | | | | Any non-masked msg allocation failure should generate a warning and stack trace to the console. All of these need to eventually be replaced by safe preallocation or msgpools. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: don't complain on msgpool alloc failuresSage Weil2011-10-254-19/+32
| | | | | | | | | | The pool allocation failures are masked by the pool; there is no need to spam the console about them. (That's the whole point of having the pool in the first place.) Mark msg allocations whose failure is safely handled as such. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: always preallocate mon connectionSage Weil2011-10-251-25/+22
| | | | | | | Allocate the mon connection on init. We already reuse it across reconnects. Remove now unnecessary (and incomplete) NULL checks. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: create messenger with clientSage Weil2011-10-251-23/+24
| | | | | | | This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of git://github.com/NewDreamNetwork/ceph-clientLinus Torvalds2011-09-294-42/+48
|\ | | | | | | | | | | | | | | | | * 'for-linus' of git://github.com/NewDreamNetwork/ceph-client: libceph: fix pg_temp mapping update libceph: fix pg_temp mapping calculation libceph: fix linger request requeuing libceph: fix parse options memory leak libceph: initialize ack_stamp to avoid unnecessary connection reset
| * libceph: fix pg_temp mapping updateSage Weil2011-09-281-26/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The incremental map updates have a record for each pg_temp mapping that is to be add/updated (len > 0) or removed (len == 0). The old code was written as if the updates were a complete enumeration; that was just wrong. Update the code to remove 0-length entries and drop the rbtree traversal. This avoids misdirected (and hung) requests that manifest as server errors like [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 Signed-off-by: Sage Weil <sage@newdream.net>
| * libceph: fix pg_temp mapping calculationSage Weil2011-09-281-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to apply the modulo pg_num calculation before looking up a pgid in the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes (some) misdirected requests that result in messages like [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 on the server and stall make the client block without getting a reply (at least until the pg_temp mapping goes way, but that can take a long long time). Reorder calc_pg_raw() a bit to make more sense. Signed-off-by: Sage Weil <sage@newdream.net>
| * libceph: fix linger request requeuingSage Weil2011-09-161-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The r_req_lru_item list node moves between several lists, and that cycle is not directly related (and does not begin) with __register_request(). Initialize it in the request constructor, not __register_request(). This fixes later badness (below) when OSDs restart underneath an rbd mount. Crashes we've seen due to this include: [ 213.974288] kernel BUG at net/ceph/messenger.c:2193! and [ 144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 144.035278] IP: [<ffffffffa036c053>] con_work+0x1463/0x2ce0 [libceph] Signed-off-by: Sage Weil <sage@newdream.net>
| * libceph: fix parse options memory leakNoah Watkins2011-09-161-0/+1
| | | | | | | | | | | | | | | | ceph_destroy_options does not free opt->mon_addr that is allocated in ceph_parse_options. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
| * libceph: initialize ack_stamp to avoid unnecessary connection resetJim Schutt2011-09-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4cf9d544631c recorded when an outgoing ceph message was ACKed, in order to avoid unnecessary connection resets when an OSD is busy. However, ack_stamp is uninitialized, so there is a window between when the message is sent and when it is ACKed in which handle_timeout() interprets the unitialized value as an expired timeout, and resets the connection unnecessarily. Close the window by initializing ack_stamp. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-clientLinus Torvalds2011-09-092-16/+46
|\ \ | |/ | | | | | | | | | | | | * 'for-linus' of git://ceph.newdream.net/git/ceph-client: libceph: fix leak of osd structs during shutdown ceph: fix memory leak ceph: fix encoding of ino only (not relative) paths libceph: fix msgpool
| * libceph: fix leak of osd structs during shutdownSage Weil2011-08-311-5/+17
| | | | | | | | | | | | We want to remove all OSDs, not just those on the idle LRU. Signed-off-by: Sage Weil <sage@newdream.net>
| * libceph: fix msgpoolSage Weil2011-08-091-11/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were several problems here: 1- we weren't tagging allocations with the pool, so they were never returned to the pool. 2- msgpool_put didn't add back to the mempool, even it were called. 3- msgpool_release didn't clear the pool pointer, so it would have looped had #1 not been broken. These may or may not have been responsible for #1136 or #1381 (BUG due to non-empty mempool on umount). I can't seem to trigger the crash now using the method I was using before. Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-07-262-7/+11
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) ceph: document unlocked d_parent accesses ceph: explicitly reference rename old_dentry parent dir in request ceph: document locking for ceph_set_dentry_offset ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug ceph: protect d_parent access in ceph_d_revalidate ceph: protect access to d_parent ceph: handle racing calls to ceph_init_dentry ceph: set dir complete frag after adding capability rbd: set blk_queue request sizes to object size ceph: set up readahead size when rsize is not passed rbd: cancel watch request when releasing the device ceph: ignore lease mask ceph: fix ceph_lookup_open intent usage ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC ceph: fix bad parent_inode calc in ceph_lookup_open ceph: avoid carrying Fw cap during write into page cache libceph: don't time out osd requests that haven't been received ceph: report f_bfree based on kb_avail rather than diffing. ceph: only queue capsnap if caps are dirty ceph: fix snap writeback when racing with writes ...
| * libceph: don't time out osd requests that haven't been receivedSage Weil2011-07-262-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | Keep track of when an outgoing message is ACKed (i.e., the server fully received it and, presumably, queued it for processing). Time out OSD requests only if it's been too long since they've been received. This prevents timeouts and connection thrashing when the OSDs are simply busy and are throttling the requests they read off the network. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'master' of ↵David S. Miller2011-07-211-7/+10
|\ \ | |/ | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c
| * ceph: fix file mode calculationSage Weil2011-07-191-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | open(2) must always include one of O_RDONLY, O_WRONLY, or O_RDWR. No need for any O_APPEND special case. Passing O_WRONLY|O_RDWR is undefined according to the man page, but the Linux VFS interprets this as O_RDWR, so we'll do the same. This fixes open(2) with flags O_RDWR|O_APPEND, which was incorrectly being translated to readonly. Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'master' of ↵David S. Miller2011-07-141-4/+6
|\ \ | |/ | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c
| * libceph: fix page calculation for non-page-aligned ioSage Weil2011-06-131-4/+6
| | | | | | | | | | | | | | | | Set the page count correctly for non-page-aligned IO. We were already doing this correctly for alignment, but not the page count. Fixes DIRECT_IO writes from unaligned pages. Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'master' of ↵David S. Miller2011-06-201-5/+10
|\ \ | |/ | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c
| * ceph: fix sync vs canceled writeSage Weil2011-06-071-5/+10
| | | | | | | | | | | | | | If we cancel a write, trigger the safe completions to prevent a sync from blocking indefinitely in ceph_osdc_sync(). Signed-off-by: Sage Weil <sage@newdream.net>
* | net: Remove casts of void *Joe Perches2011-06-161-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | Unnecessary casts of void * clutter the code. These are the remainder casts after several specific patches to remove netdev_priv and dev_priv. Done via coccinelle script: $ cat cast_void_pointer.cocci @@ type T; T *pt; void *pv; @@ - pt = (T *)pv; + pt = pv; Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
* libceph: subscribe to osdmap when cluster is fullSage Weil2011-05-241-0/+9
| | | | | | | | When the cluster is marked full, subscribe to subsequent map updates to ensure we find out promptly when it is no longer full. This will prevent us from spewing ENOSPC for (much) longer than necessary. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: handle new osdmap down/state change encodingSage Weil2011-05-241-3/+8
| | | | | | | | Old incrementals encode a 0 value (nearly always) when an osd goes down. Change that to allow any state bit(s) to be flipped. Special case 0 to mean flip the CEPH_OSD_UP bit to mimic the old behavior. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: check return value for start_request in writepagesSage Weil2011-05-191-1/+7
| | | | | | | Since we pass the nofail arg, we should never get an error; BUG if we do. (And fix the function to not return an error if __map_request fails.) Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: add missing breaks in addr_set_portSage Weil2011-05-191-0/+2
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
OpenPOWER on IntegriCloud