summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* SUNRPC: remove redundant "linux/nsproxy.h" includesStanislav Kinsbursky2012-12-102-2/+0
| | | | | | | This is a cleanup patch. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make NFSv4 recovery client tracking options per netStanislav Kinsbursky2012-12-102-20/+30
| | | | | | | | | Pointer to client tracking operations - client_tracking_ops - have to be containerized, because different environment can support different trackers (for example, legacy tracker currently is not suported in container). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: lockt, release_lockowner should renew clientsJ. Bruce Fields2012-12-041-18/+23
| | | | | | | | | | Fix nfsd4_lockt and release_lockowner to lookup the referenced client, so that it can renew it, or correctly return "expired", as appropriate. Also share some code while we're here. Reported-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* svcrpc: support multiple-fragment rpc'sJ. Bruce Fields2012-12-041-25/+25
| | | | | | | | | | | | | | | Over TCP, RPC's are preceded by a single 4-byte field telling you how long the rpc is (in bytes). The spec also allows you to send an RPC in multiple such records (the high bit of the length field is used to tell you whether this is the final record). We've survived for years without supporting this because in practice the clients we care about don't use it. But the userland rpc libraries do, and every now and then an experimental client will run into this. (Most recently I noticed it while trying to write a pynfs check.) And we're really on the wrong side of the spec here--let's fix this. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* svcrpc: track rpc data length separately from sk_tcplenJ. Bruce Fields2012-12-042-9/+21
| | | | | | | | | | | Keep a separate field, sk_datalen, that tracks only the data contained in a fragment, not including the fragment header. For now, this is always just max(0, sk_tcplen - 4), but after we allow multiple fragments sk_datalen will accumulate the total rpc data size while sk_tcplen only tracks progress receiving the current fragment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* svcrpc: fix off-by-4 error in "incomplete TCP record" dprintkJ. Bruce Fields2012-12-041-1/+2
| | | | | | | The full reclen doesn't include the fragment header, but sk_tcplen does. Fix this to make it an apples-to-apples comparison. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* svcrpc: delay minimum-rpc-size check till laterJ. Bruce Fields2012-12-041-9/+7
| | | | | | | | | | Soon we want to support multiple fragments, in which case it may be legal for a single fragment to be smaller than 8 bytes, so we'll want to delay this check till we've reached the last fragment. Also fix an outdated comment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* svcrpc: don't byte-swap sk_reclen in placeJ. Bruce Fields2012-12-042-16/+22
| | | | | | | | | Byte-swapping in place is always a little dubious. Let's instead define this field to always be big-endian, and do the swapping on demand where we need it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Forget state for a specific clientBryan Schumaker2012-12-033-4/+49
| | | | | | | | Write the client's ip address to any state file and all appropriate state for that client will be forgotten. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Add a custom file operations structure for fault injectionBryan Schumaker2012-12-031-7/+49
| | | | | | | | | Controlling the read and write functions allows me to add in "forget client w.x.y.z", since we won't be limited to reading and writing only u64 values. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Reading a fault injection file prints a state countBryan Schumaker2012-12-033-2/+58
| | | | | | | | | | I also log basic information that I can figure out about the type of state (such as number of locks for each client IP address). This can be useful for checking that state was actually dropped and later for checking if the client was able to recover. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Fault injection operations take a per-client forget functionBryan Schumaker2012-12-033-42/+16
| | | | | | | | | | The eventual goal is to forget state based on ip address, so it makes sense to call this function in a for-each-client loop until the correct amount of state is forgotten. I also use this patch as an opportunity to rename the forget function from "func()" to "forget()". Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Clean up forgetting and recalling delegationsBryan Schumaker2012-12-031-44/+50
| | | | | | | | Once I have a client, I can easily use its delegation list rather than searching the file hash table for delegations to remove. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Clean up forgetting openownersBryan Schumaker2012-12-031-27/+22
| | | | | | | | Using "forget_n_state()" forces me to implement the code needed to forget a specific client's openowners. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Clean up forgetting locksBryan Schumaker2012-12-031-8/+28
| | | | | | | | | | I use the new "forget_n_state()" function to iterate through each client first when searching for locks. This may slow down forgetting locks a little bit, but it implements most of the code needed to forget a specified client's locks. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Clean up forgetting clientsBryan Schumaker2012-12-032-5/+23
| | | | | | | | | | I added in a generic for-each loop that takes a pass over the client_lru list for the current net namespace and calls some function. The next few patches will update other operations to use this function as well. A value of 0 still means "forget everything that is found". Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Lock state before calling fault injection functionBryan Schumaker2012-12-032-16/+4
| | | | | | | | Each function touches state in some way, so getting the lock earlier can help simplify code. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: discard some unused nfsd4_verify xdr codeJ. Bruce Fields2012-12-031-19/+2
| | | | Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Fold fault_inject.h into state.hBryan Schumaker2012-11-285-31/+16
| | | | | | | | | | There were only a small number of functions in this file and since they all affect stored state I think it makes sense to put them in state.h instead. I also dropped most static inline declarations since there are no callers when fault injection is not enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make NFSv4 grace time per netStanislav Kinsbursky2012-11-284-9/+7
| | | | | | | | Grace time is a part of NFSv4 state engine, which is constructed per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make NFSv4 lease time per netStanislav Kinsbursky2012-11-286-12/+19
| | | | | | | | Lease time is a part of NFSv4 state engine, which is constructed per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: remove redundant declarationsStanislav Kinsbursky2012-11-281-3/+0
| | | | | | | | This is a cleanup patch. Functions nfsd_pool_stats_open() and nfsd_pool_stats_release() are declared in fs/nfsd/nfsd.h. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: recovery - make in_grace per netStanislav Kinsbursky2012-11-282-5/+5
| | | | | | | | Flag in_grace is a part of client tracking state, which is network namesapce aware. So let'a replace global static variable with per-net one. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: recovery - make rec_file per netStanislav Kinsbursky2012-11-282-35/+37
| | | | | | | | | | | Opening and closing of this file is done in client tracking init and exit operations. Client tracking is done in network namespace context already. So let's make this file opened and closed per network context - this will simlify it's management. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: call state init and shutdown twiceStanislav Kinsbursky2012-11-283-20/+29
| | | | | | | | | | | Split NFSv4 state init and shutdown into two different calls: per-net one and generic one. Per-net cwinit/shutdown pair have to be called for any namespace, generic pair - only once on NSFd kthreads start and shutdown respectively. Refresh of diff-nfsd-call-state-init-twice Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: cleanup NFSd state start a bitStanislav Kinsbursky2012-11-281-24/+35
| | | | | | | | | | | This patch renames nfs4_state_start_net() into nfs4_state_create_net(), where get_net() now performed. Also it introduces new nfs4_state_start_net(), which is now responsible for state creation and initializing all per-net data and which is now called from nfs4_state_start(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: cleanup NFSd state shutdown a bitStanislav Kinsbursky2012-11-281-10/+9
| | | | | | | | | This patch renames __nfs4_state_shutdown_net() into nfs4_state_shutdown_net(), __nfs4_state_shutdown() into nfs4_state_shutdown_net() and moves all network related shutdown operations to nfs4_state_shutdown_net(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make delegations shutdown network namespace awareStanislav Kinsbursky2012-11-281-0/+4
| | | | | | | | | | NFSv4 delegations are stored in global list. But they are nfs4_client dependent, which is network namespace aware already. State shutdown and laundromat are done per network namespace as well. So, delegations unhash have to be done in network namespace context. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make client_lock per netStanislav Kinsbursky2012-11-282-30/+46
| | | | | | | | This lock protects the client lru list and session hash table, which are allocated per network namespace already. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: remove state lock from nfs4_state_shutdownStanislav Kinsbursky2012-11-281-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Protection of __nfs4_state_shutdown() with nfs4_lock_state() looks redundant. This function is called by the last NFSd thread on it's exit and state lock protects actually two functions (del_recall_lru is protected by recall_lock): 1) nfsd4_client_tracking_exit 2) __nfs4_state_shutdown_net "nfsd4_client_tracking_exit" doesn't require state lock protection, because it's state can be modified only by tracker callbacks. Here a re they: 1) create: is called only from nfsd4_proc_compound. 2) remove: is called from either nfsd4_proc_compound or nfs4_laundromat. 3) check: is called only from nfsd4_proc_compound. 4) grace_done; called only from nfs4_laundromat. nfsd4_proc_compound is called onll by NFSd kthread, which is exiting right now. nfs4_laundromat is called by laundry_wq. But laundromat_work was canceled already. "__nfs4_state_shutdown_net" also doesn't require state lock protection, because all NFSd kthreads are dead, and no race can happen with NFSd start, because "nfsd_up" flag is still set. Moreover, all Nfsd shutdown is protected with global nfsd_mutex. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: remove state lock from nfsd4_load_reboot_recovery_dataJ. Bruce Fields2012-11-281-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | That function is only called under nfsd_mutex: we know that because the only caller is nfsd_svc, via nfsd_svc nfsd_startup nfs4_state_start nfsd4_client_tracking_init client_tracking_ops->init == nfsd4_load_reboot_recovery_data The shared state accessed here includes: - user_recovery_dirname: used here, modified only by nfs4_reset_recoverydir, which can be verified to only be called under nfsd_mutex. - filesystem state, protected by i_mutex (handwaving slightly here) - rec_file, reclaim_str_hashtbl, reclaim_str_hashtbl_size: other than here, used only from code called from nfsd or laundromat threads, both of which should be started only after this runs (see nfsd_svc) and stopped before this could run again (see nfsd_shutdown, called from nfsd_last_thread). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: return badname, not inval, on "." or "..", or "/"J. Bruce Fields2012-11-271-13/+12
| | | | | | | | | The spec requires badname, not inval, in these cases. Some callers want us to return enoent, but I can see no justification for that. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: downgrade some fs/nfsd/nfs4state.c BUG'sJ. Bruce Fields2012-11-261-11/+15
| | | | | | | | | | | | | | | Linus has pointed out that indiscriminate use of BUG's can make it harder to diagnose bugs because they can bring a machine down, often before we manage to get any useful debugging information to the logs. (Consider, for example, a BUG() that fires in a workqueue, or while holding a spinlock). Most of these BUG's won't do much more than kill an nfsd thread, but it would still probably be safer to get out the warning without dying. There's still more of this to do in nfsd/. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: delay filling in write iovec array till after xdr decodingJ. Bruce Fields2012-11-264-24/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Our server rejects compounds containing more than one write operation. It's unclear whether this is really permitted by the spec; with 4.0, it's possibly OK, with 4.1 (which has clearer limits on compound parameters), it's probably not OK. No client that we're aware of has ever done this, but in theory it could be useful. The source of the limitation: we need an array of iovecs to pass to the write operation. In the worst case that array of iovecs could have hundreds of elements (the maximum rwsize divided by the page size), so it's too big to put on the stack, or in each compound op. So we instead keep a single such array in the compound argument. We fill in that array at the time we decode the xdr operation. But we decode every op in the compound before executing any of them. So once we've used that array we can't decode another write. If we instead delay filling in that array till the time we actually perform the write, we can reuse it. Another option might be to switch to decoding compound ops one at a time. I considered doing that, but it has a number of other side effects, and I'd rather fix just this one problem for now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: move more write parameters into xdr argumentJ. Bruce Fields2012-11-262-11/+11
| | | | | | In preparation for moving some of this elsewhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: reorganize write decodingJ. Bruce Fields2012-11-261-21/+41
| | | | | | In preparation for moving some of it elsewhere. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: simplify reading of opnumJ. Bruce Fields2012-11-261-32/+2
| | | | | | | | | | | | | | | The comment here is totally bogus: - OP_WRITE + 1 is RELEASE_LOCKOWNER. Maybe there was some older version of the spec in which that served as a sort of OP_ILLEGAL? No idea, but it's clearly wrong now. - In any case, I can't see that the spec says anything about what to do if the client sends us less ops than promised. It's clearly nutty client behavior, and we should do whatever's easiest: returning an xdr error (even though it won't be consistent with the error on the last op returned) seems fine to me. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: no, we're not going to check tags for utf8J. Bruce Fields2012-11-261-6/+0
| | | | Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: fix v4 reply cachingJ. Bruce Fields2012-11-261-1/+1
| | | | | | | | | | | Very embarassing: 1091006c5eb15cba56785bd5b498a8d0b9546903 "nfsd: turn on reply cache for NFSv4" missed a line, effectively leaving the reply cache off in the v4 case. I thought I'd tested that, but I guess not. This time, wrote a pynfs test to confirm it works. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make laundromat network namespace awareStanislav Kinsbursky2012-11-152-8/+15
| | | | | | | | This patch moves laundromat_work to nfsd per-net context, thus allowing to run multiple laundries. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: pass nfsd_net instead of net to grace endersStanislav Kinsbursky2012-11-153-14/+10
| | | | | | | Passing net context looks as overkill. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: use service net instead of hard-coded init_netStanislav Kinsbursky2012-11-154-31/+49
| | | | | | | | This patch replaces init_net by SVC_NET(), where possible and also passes proper context to nested functions where required. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make close_lru list per netStanislav Kinsbursky2012-11-152-13/+13
| | | | | | | | | This list holds nfs4 clients (open) stateowner queue for last close replay, which are network namespace aware. So let's make this list per network namespace too. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make client_lru list per netStanislav Kinsbursky2012-11-152-8/+13
| | | | | | | | This list holds nfs4 clients queue for lease renewal, which are network namespace aware. So let's make this list per network namespace too. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make sessionid_hashtbl allocated per netStanislav Kinsbursky2012-11-152-11/+20
| | | | | | | | | | | | | This hash holds established sessions state and closely associated with nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace too. Note: this hash can be allocated in per-net operations. But it looks better to allocate it on nfsd state start and thus don't waste resources if server is not running. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make lockowner_ino_hashtbl allocated per netStanislav Kinsbursky2012-11-152-11/+20
| | | | | | | | | | | | | This hash holds file lock owners and closely associated with nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace too. Note: this hash can be allocated in per-net operations. But it looks better to allocate it on nfsd state start and thus don't waste resources if server is not running. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make ownerstr_hashtbl allocated per netStanislav Kinsbursky2012-11-152-15/+27
| | | | | | | | | | | | | This hash holds open owner state and closely associated with nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace too. Note: this hash can be allocated in per-net operations. But it looks better to allocate it on nfsd state start and thus don't waste resources if server is not running. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make unconf_name_tree per netStanislav Kinsbursky2012-11-152-23/+23
| | | | | | | | This hash holds nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make unconf_id_hashtbl allocated per netStanislav Kinsbursky2012-11-152-10/+16
| | | | | | | | | | | | This hash holds nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace. Note: this hash can be allocated in per-net operations. But it looks better to allocate it on nfsd state start and thus don't waste resources if server is not running. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: make conf_name_tree per netStanislav Kinsbursky2012-11-152-15/+20
| | | | | | | | This tree holds nfs4_clients info, which are network namespace aware. So let's make it per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
OpenPOWER on IntegriCloud