summaryrefslogtreecommitdiffstats
path: root/include/linux/sunrpc
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2016-08-122-1/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user needs multiple different security services (e.g. krb5i and krb5p). - Stable patch to fix a regression introduced by the use of SO_REUSEPORT, and that prevented the use of multiple different NFS versions to the same server. - TCP socket reconnection timer fixes. - Patch from Neil to disable the use of IPv6 temporary addresses" * tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Cap the transport reconnection timer at 1/2 lease period NFSv4: Cleanup the setting of the nfs4 lease period SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout SUNRPC: Fix reconnection timeouts NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED SUNRPC: disable the use of IPv6 temporary addresses. SUNRPC: allow for upcalls for same uid but different gss service SUNRPC: Fix up socket autodisconnect SUNRPC: Handle EADDRNOTAVAIL on connection failures
| * NFSv4: Cap the transport reconnection timer at 1/2 lease periodTrond Myklebust2016-08-051-0/+2
| | | | | | | | | | | | | | | | | | We don't want to miss a lease period renewal due to the TCP connection failing to reconnect in a timely fashion. To ensure this doesn't happen, cap the reconnection timer so that we retry the connection attempt at least every 1/2 lease period. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * SUNRPC: Limit the reconnect backoff timer to the max RPC message timeoutTrond Myklebust2016-08-051-1/+2
| | | | | | | | | | | | | | ...and ensure that we propagate it to new transports on the same client. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | Merge tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2016-08-043-3/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Highlights: - Trond made a change to the server's tcp logic that allows a fast client to better take advantage of high bandwidth networks, but may increase the risk that a single client could starve other clients; a new sunrpc.svc_rpc_per_connection_limit parameter should help mitigate this in the (hopefully unlikely) event this becomes a problem in practice. - Tom Haynes added a minimal flex-layout pnfs server, which is of no use in production for now--don't build it unless you're doing client testing or further server development" * tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits) nfsd: remove some dead code in nfsd_create_locked() nfsd: drop unnecessary MAY_EXEC check from create nfsd: clean up bad-type check in nfsd_create_locked nfsd: remove unnecessary positive-dentry check nfsd: reorganize nfsd_create nfsd: check d_can_lookup in fh_verify of directories nfsd: remove redundant zero-length check from create nfsd: Make creates return EEXIST instead of EACCES SUNRPC: Detect immediate closure of accepted sockets SUNRPC: accept() may return sockets that are still in SYN_RECV nfsd: allow nfsd to advertise multiple layout types nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock nfsd/blocklayout: Make sure calculate signature/designator length aligned xfs: abstract block export operations from nfsd layouts SUNRPC: Remove unused callback xpo_adjust_wspace() SUNRPC: Change TCP socket space reservation SUNRPC: Add a server side per-connection limit SUNRPC: Micro optimisation for svc_data_ready SUNRPC: Call the default socket callbacks instead of open coding SUNRPC: lock the socket while detaching it ...
| * | SUNRPC: Remove unused callback xpo_adjust_wspace()Trond Myklebust2016-07-131-1/+0
| | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | SUNRPC: Add a server side per-connection limitTrond Myklebust2016-07-132-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Allow the user to limit the number of requests serviced through a single connection, to help prevent faster clients from starving slower clients. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | sunrpc: remove 'inuse' flag from struct cache_detail.NeilBrown2016-07-131-2/+0
| | | | | | | | | | | | | | | | | | | | | This field is not currently in use. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | Merge tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2016-07-304-2/+15
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - nfs: don't create zero-length requests - several LAYOUTGET bugfixes Features: - several performance related features - more aggressive caching when we can rely on close-to-open cache consistency - remove serialisation of O_DIRECT reads and writes - optimise several code paths to not flush to disk unnecessarily. However allow for the idiosyncracies of pNFS for those layout types that need to issue a LAYOUTCOMMIT before the metadata can be updated on the server. - SUNRPC updates to the client data receive path - pNFS/SCSI support RH/Fedora dm-mpath device nodes - pNFS files/flexfiles can now use unprivileged ports when the generic NFS mount options allow it. Bugfixes: - Don't use RDMA direct data placement together with data integrity or privacy security flavours - Remove the RDMA ALLPHYSICAL memory registration mode as it has potential security holes. - Several layout recall fixes to improve NFSv4.1 protocol compliance. - Fix an Oops in the pNFS files and flexfiles connection setup to the DS - Allow retry of operations that used a returned delegation stateid - Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding - Fix writeback races in nfs4_copy_range() and nfs42_proc_deallocate()" * tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (104 commits) pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding NFSv4: Clean up lookup of SECINFO_NO_NAME NFSv4.2: Fix warning "variable ‘stateids’ set but not used" NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’" SUNRPC: Fix a compiler warning in fs/nfs/clnt.c pNFS: Remove redundant smp_mb() from pnfs_init_lseg() pNFS: Cleanup - do layout segment initialisation in one place pNFS: Remove redundant stateid invalidation pNFS: Remove redundant pnfs_mark_layout_returned_if_empty() pNFS: Clear the layout metadata if the server changed the layout stateid pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid() NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id pNFS: Do not set plh_return_seq for non-callback related layoutreturns pNFS: Ensure layoutreturn acts as a completion for layout callbacks pNFS: Fix CB_LAYOUTRECALL stateid verification pNFS: Always update the layout barrier seqid on LAYOUTGET pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set pNFS: Clear the layout return tracking on layout reinitialisation pNFS: LAYOUTRETURN should only update the stateid if the layout is valid nfs: don't create zero-length requests ...
| * | Merge branch 'nfs-rdma'Trond Myklebust2016-07-242-0/+5
| |\ \
| | * | xprtrdma: No direct data placement with krb5i and krb5pChuck Lever2016-07-112-0/+5
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Direct data placement is not allowed when using flavors that guarantee integrity or privacy. When such security flavors are in effect, don't allow the use of Read and Write chunks for moving individual data items. All messages larger than the inline threshold are sent via Long Call or Long Reply. On my systems (CX-3 Pro on FDR), for small I/O operations, the use of Long messages adds only around 5 usecs of latency in each direction. Note that when integrity or encryption is used, the host CPU touches every byte in these messages. Even if it could be used, data movement offload doesn't buy much in this case. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | Merge branch 'sunrpc'Trond Myklebust2016-07-242-0/+6
| |\ \
| | * | SUNRPC: Reduce latency when send queue is congestedTrond Myklebust2016-06-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the low latency transport workqueue to process the task that is next in line on the xprt->sending queue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| | * | SUNRPC: RPC transport queue must be low latencyTrond Myklebust2016-06-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rpciod can easily get congested due to the long list of queued rpc_tasks. Having the receive queue wait in turn for those tasks to complete can therefore be a bottleneck. Address the problem by separating the workqueues into: - rpciod: manages rpc_tasks - xprtiod: manages transport related work. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| | * | SUNRPC: Small optimisation of client receiveTrond Myklebust2016-06-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Do not queue the client receive work if we're still processing. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flagsScott Mayhew2016-07-191-2/+4
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A generic_cred can be used to look up a unx_cred or a gss_cred, so it's not really safe to use the the generic_cred->acred->ac_flags to store the NO_CRKEY_TIMEOUT flag. A lookup for a unx_cred triggered while the KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated with the auth_cred to be in a state where they're perpetually doing 4K NFS_FILE_SYNC writes. This can be reproduced as follows: 1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys. They do not need to be the same export, nor do they even need to be from the same NFS server. Also, v3 is fine. $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5 $ sudo mount -o v3,sec=sys server2:/export /mnt/sys 2. As the normal user, before accessing the kerberized mount, kinit with a short lifetime (but not so short that renewing the ticket would leave you within the 4-minute window again by the time the original ticket expires), e.g. $ kinit -l 10m -r 60m 3. Do some I/O to the kerberized mount and verify that the writes are wsize, UNSTABLE: $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1 4. Wait until you're within 4 minutes of key expiry, then do some more I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets set. Verify that the writes are 4K, FILE_SYNC: $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1 5. Now do some I/O to the sec=sys mount. This will cause RPC_CRED_NO_CRKEY_TIMEOUT to be set: $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1 6. Writes for that user will now be permanently 4K, FILE_SYNC for that user, regardless of which mount is being written to, until you reboot the client. Renewing the kerberos ticket (assuming it hasn't already expired) will have no effect. Grabbing a new kerberos ticket at this point will have no effect either. Move the flag to the auth->au_flags field (which is currently unused) and rename it slightly to reflect that it's no longer associated with the auth_cred->ac_flags. Add the rpc_auth to the arg list of rpcauth_cred_key_to_expire and check the au_flags there too. Finally, add the inode to the arg list of nfs_ctx_key_to_expire so we can determine the rpc_auth to pass to rpcauth_cred_key_to_expire. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | | Merge branch 'salted-string-hash'Linus Torvalds2016-07-281-2/+2
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes the vfs dentry hashing to mix in the parent pointer at the _beginning_ of the hash, rather than at the end. That actually improves both the hash and the code generation, because we can move more of the computation to the "static" part of the dcache setup, and do less at lookup runtime. It turns out that a lot of other hash users also really wanted to mix in a base pointer as a 'salt' for the hash, and so the slightly extended interface ends up working well for other cases too. Users that want a string hash that is purely about the string pass in a 'salt' pointer of NULL. * merge branch 'salted-string-hash': fs/dcache.c: Save one 32-bit multiply in dcache lookup vfs: make the string hashes salt the hash
| * | vfs: make the string hashes salt the hashLinus Torvalds2016-06-101-2/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We always mixed in the parent pointer into the dentry name hash, but we did it late at lookup time. It turns out that we can simplify that lookup-time action by salting the hash with the parent pointer early instead of late. A few other users of our string hashes also wanted to mix in their own pointers into the hash, and those are updated to use the same mechanism. Hash users that don't have any particular initial salt can just use the NULL pointer as a no-salt. Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | rpc: share one xps between all backchannelsJ. Bruce Fields2016-06-152-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | The spec allows backchannels for multiple clients to share the same tcp connection. When that happens, we need to use the same xprt for all of them. Similarly, we need the same xps. This fixes list corruption introduced by the multipath code. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com>
* | nfsd4/rpc: move backchannel create logic into rpc codeJ. Bruce Fields2016-06-151-2/+0
|/ | | | | | | | Also simplify the logic a bit. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com>
* Merge branch 'hash' of git://ftp.sciencehorizons.net/linuxLinus Torvalds2016-05-281-31/+9
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull string hash improvements from George Spelvin: "This series does several related things: - Makes the dcache hash (fs/namei.c) useful for general kernel use. (Thanks to Bruce for noticing the zero-length corner case) - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the above. - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two 32-bit multiplies will do well enough. - Rids the world of the bad hash multipliers in hash_32. This finishes the job started in commit 689de1d6ca95 ("Minimal fix-up of bad hashing behavior of hash_64()") The vast majority of Linux architectures have hardware support for 32x32-bit multiply and so derive no benefit from "simplified" multipliers. The few processors that do not (68000, h8/300 and some models of Microblaze) have arch-specific implementations added. Those patches are last in the series. - Overhauls the dcache hash mixing. The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion. Replaced with a much more careful design that's simultaneously faster and better. (My own invention, as there was noting suitable in the literature I could find. Comments welcome!) - Modify the hash_name() loop to skip the initial HASH_MIX(). This would let us salt the hash if we ever wanted to. - Sort out partial_name_hash(). The hash function is declared as using a long state, even though it's truncated to 32 bits at the end and the extra internal state contributes nothing to the result. And some callers do odd things: - fs/hfs/string.c only allocates 32 bits of state - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes - Modify bytemask_from_count to handle inputs of 1..sizeof(long) rather than 0..sizeof(long)-1. This would simplify users other than full_name_hash" Special thanks to Bruce Fields for testing and finding bugs in v1. (I learned some humbling lessons about "obviously correct" code.) On the arch-specific front, the m68k assembly has been tested in a standalone test harness, I've been in contact with the Microblaze maintainers who mostly don't care, as the hardware multiplier is never omitted in real-world applications, and I haven't heard anything from the H8/300 world" * 'hash' of git://ftp.sciencehorizons.net/linux: h8300: Add <asm/hash.h> microblaze: Add <asm/hash.h> m68k: Add <asm/hash.h> <linux/hash.h>: Add support for architecture-specific functions fs/namei.c: Improve dcache hash function Eliminate bad hash multipliers from hash_32() and hash_64() Change hash_64() return value to 32 bits <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() fs/namei.c: Add hashlen_string() function Pull out string hash to <linux/stringhash.h>
| * <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()George Spelvin2016-05-281-31/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Finally, the first use of previous two patches: eliminate the separate ad-hoc string hash functions in the sunrpc code. Now hash_str() is a wrapper around hash_string(), and hash_mem() is likewise a wrapper around full_name_hash(). Note that sunrpc code *does* call hash_mem() with a zero length, which is why the previous patch needed to handle that in full_name_hash(). (Thanks, Bruce, for finding that!) This also eliminates the only caller of hash_long which asks for more than 32 bits of output. The comment about the quality of hashlen_string() and full_name_hash() is jumping the gun by a few patches; they aren't very impressive now, but will be improved greatly later in the series. Signed-off-by: George Spelvin <linux@sciencehorizons.net> Tested-by: J. Bruce Fields <bfields@redhat.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: linux-nfs@vger.kernel.org
* | Merge tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2016-05-265-6/+30
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "Highlights include: Features: - Add support for the NFS v4.2 COPY operation - Add support for NFS/RDMA over IPv6 Bugfixes and cleanups: - Avoid race that crashes nfs_init_commit() - Fix oops in callback path - Fix LOCK/OPEN race when unlinking an open file - Choose correct stateids when using delegations in setattr, read and write - Don't send empty SETATTR after OPEN_CREATE - xprtrdma: Prevent server from writing a reply into memory client has released - xprtrdma: Support using Read list and Reply chunk in one RPC call" * tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (61 commits) pnfs: pnfs_update_layout needs to consider if strict iomode checking is on nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO nfs: avoid race that crashes nfs_init_commit NFS: checking for NULL instead of IS_ERR() in nfs_commit_file() pnfs: make pnfs_layout_process more robust pnfs: rework LAYOUTGET retry handling pnfs: lift retry logic from send_layoutget to pnfs_update_layout pnfs: fix bad error handling in send_layoutget flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args pnfs: keep track of the return sequence number in pnfs_layout_hdr pnfs: record sequence in pnfs_layout_segment when it's created pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io pNFS/flexfile: Fix erroneous fall back to read/write through the MDS NFS: Reclaim writes via writepage are opportunistic NFSv4: Use the right stateid for delegations in setattr, read and write ...
| * | SUNRPC: Ensure get_rpccred() and put_rpccred() can take NULL argumentsTrond Myklebust2016-05-171-1/+2
| | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | xprtrdma: Bound the inline threshold valuesChuck Lever2016-05-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the sysctls that allow setting the inline threshold allow any value to be set. Small values only make the transport run slower. The default 1KB setting is as low as is reasonable. And the logic that decides how to divide a Send buffer between RPC-over-RDMA header and RPC message assumes (but does not check) that the lower bound is not crazy (say, 57 bytes). Send and receive buffers share a page with some control information. Values larger than about 3KB can't be supported, currently. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | sunrpc: Advertise maximum backchannel payload sizeChuck Lever2016-05-172-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RPC-over-RDMA transports have a limit on how large a backward direction (backchannel) RPC message can be. Ensure that the NFSv4.x CREATE_SESSION operation advertises this limit to servers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | sunrpc: Update RPCBIND_MAXNETIDLENChuck Lever2016-05-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 176e21ee2ec8 ("SUNRPC: Support for RPC over AF_LOCAL transports") added a 5-character netid, but did not bump RPCBIND_MAXNETIDLEN from 4 to 5. Fixes: 176e21ee2ec8 ("SUNRPC: Support for RPC over AF_LOCAL ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | sunrpc: add a get_rpccred_rcu inlineJeff Layton2016-05-091-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes we might have a RCU managed credential pointer and don't want to use locking to handle it. Add a function that will take a reference to the cred iff the refcount is not already zero. Callers can dereference the pointer under the rcu_read_lock and use that function to take a reference only if the cred is not on its way to destruction. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | sunrpc: add rpc_lookup_generic_credWeston Andros Adamson2016-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add function rpc_lookup_generic_cred, which allows lookups of a generic credential that's not current_cred(). [jlayton: add gfp_t parm] Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | sunrpc: plumb gfp_t parm into crcreate operationJeff Layton2016-05-091-2/+2
| |/ | | | | | | | | | | | | | | | | | | We need to be able to call the generic_cred creator from different contexts. Add a gfp_t parm to the crcreate operation and to rpcauth_lookup_credcache. For now, we just push the gfp_t parms up one level to the *_lookup_cred functions. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | svcrdma: Generalize svc_rdma_xdr_decode_req()Chuck Lever2016-05-131-1/+1
|/ | | | | | | | | | Clean up: Pass in just the piece of the svc_rqst that is needed here. While we're in the area, add an informative documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usageKirill A. Shutemov2016-04-041-1/+1
| | | | | | | | | Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2016-03-242-7/+20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Various bugfixes, a RDMA update from Chuck Lever, and support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. (Also: note this pull request also includes the client side of SCSI layout, with Trond's permission.)" * tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux: sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race nfsd: recover: fix memory leak nfsd: fix deadlock secinfo+readdir compound nfsd4: resfh unused in nfsd4_secinfo svcrdma: Use new CQ API for RPC-over-RDMA server send CQs svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs svcrdma: Remove close_out exit path svcrdma: Hook up the logic to return ERR_CHUNK svcrdma: Use correct XID in error replies svcrdma: Make RDMA_ERROR messages work rpcrdma: Add RPCRDMA_HDRLEN_ERR svcrdma: svc_rdma_post_recv() should close connection on error svcrdma: Close connection when a send error occurs nfsd: Lower NFSv4.1 callback message size limit svcrdma: Do not send Write chunk XDR pad with inline content svcrdma: Do not write xdr_buf::tail in a Write chunk svcrdma: Find client-provided write and reply chunks once per reply nfsd: Update NFS server comments related to RDMA support nfsd: Fix a memory leak when meeting unsupported state_protect_how4 nfsd4: fix bad bounds checking
| * svcrdma: Use new CQ API for RPC-over-RDMA server send CQsChuck Lever2016-03-011-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Use new CQ API for RPC-over-RDMA server receive CQsChuck Lever2016-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. svcrdma receive completions no longer use the dto_tasklet. Each polled Receive WC is now handled individually in soft IRQ context. The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod metrics are no longer updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Use correct XID in error repliesChuck Lever2016-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Make RDMA_ERROR messages workChuck Lever2016-03-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * rpcrdma: Add RPCRDMA_HDRLEN_ERRChuck Lever2016-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Error headers are shorter than either RDMA_MSG or RDMA_NOMSG. Since HDRLEN_MIN is already used in several other places that would be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two spots where the shorter length is needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: svc_rdma_post_recv() should close connection on errorChuck Lever2016-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | Clean up: Most svc_rdma_post_recv() call sites close the transport connection when a receive cannot be posted. Wrap that in a common helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * nfsd: Lower NFSv4.1 callback message size limitChuck Lever2016-03-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The maximum size of a backchannel message on RPC-over-RDMA depends on the connection's inline threshold. Today that threshold is typically 1024 bytes, making the maximum message size 996 bytes. The Linux server's CREATE_SESSION operation checks that the size of callback Calls can be as large as 1044 bytes, to accommodate RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the true message size maximum of 996 bytes. But the server's backchannel currently does not support RPCSEC_GSS. The actual maximum size it needs is much smaller. It is safe to reduce the limit to enable NFSv4.1 on RDMA backchannel operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Do not send Write chunk XDR pad with inline contentChuck Lever2016-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NFS server's XDR encoders adds an XDR pad for content in the xdr_buf page list at the beginning of the xdr_buf's tail buffer. On RDMA transports, Write chunks are sent separately and without an XDR pad. If a Write chunk is being sent, strip off the pad in the tail buffer so that inline content following the Write chunk remains XDR-aligned when it is sent to the client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | Merge tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2016-03-226-35/+120
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - Add support for multiple NFSv4.1 callbacks in flight - Initial patchset for RPC multipath support - Adapt RPC/RDMA to use the new completion queue API Bugfixes and cleanups: - nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed - Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync - Fix RPC/RDMA credit accounting - Properly handle RDMA_ERROR replies - xprtrdma: Do not wait if ib_post_send() fails - xprtrdma: Segment head and tail XDR buffers on page boundaries - xprtrdma cleanups for dprintk, physical_op_map and unused macros" * tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits) nfs/blocklayout: make sure making a aligned read request nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed nfs: remove nfs_inode_dio_wait nfs: remove nfs4_file_fsync xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs xprtrdma: Use an anonymous union in struct rpcrdma_mw xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs xprtrdma: Serialize credit accounting again xprtrdma: Properly handle RDMA_ERROR replies rpcrdma: Add RPCRDMA_HDRLEN_ERR xprtrdma: Do not wait if ib_post_send() fails xprtrdma: Segment head and tail XDR buffers on page boundaries xprtrdma: Clean up dprintk format string containing a newline xprtrdma: Clean up physical_op_map() xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3 SUNRPC: Allow addition of new transports to a struct rpc_clnt NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections SUNRPC: Make NFS swap work with multipath ...
| * \ Merge tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust2016-03-162-7/+7
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS: NFSoRDMA Client Side Changes These patches include several bugfixes and cleanups for the NFSoRDMA client. This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes from the recent workqueue swicchover. These patches also switch xprtrdma to use the new CQ API Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (787 commits) xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs xprtrdma: Use an anonymous union in struct rpcrdma_mw xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs xprtrdma: Serialize credit accounting again xprtrdma: Properly handle RDMA_ERROR replies rpcrdma: Add RPCRDMA_HDRLEN_ERR xprtrdma: Do not wait if ib_post_send() fails xprtrdma: Segment head and tail XDR buffers on page boundaries xprtrdma: Clean up dprintk format string containing a newline xprtrdma: Clean up physical_op_map() xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
| | * | xprtrdma: Properly handle RDMA_ERROR repliesChuck Lever2016-03-141-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are shorter than RPCRDMA_HDRLEN_MIN, and they need to complete the waiting RPC. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * | rpcrdma: Add RPCRDMA_HDRLEN_ERRChuck Lever2016-03-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Error headers are shorter than either RDMA_MSG or RDMA_NOMSG. Since HDRLEN_MIN is already used in several other places that would be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two spots where the shorter length is needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * | xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macroChuck Lever2016-03-141-2/+0
| | |/ | | | | | | | | | | | | | | | | | | Fixes: b3221d6a53c4 ('xprtrdma: Remove logic that constructs...') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | SUNRPC: Allow addition of new transports to a struct rpc_clntTrond Myklebust2016-02-051-0/+11
| | | | | | | | | | | | | | | | | | | | | Add a function to allow creation and addition of a new transport to an existing rpc_clnt Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | SUNRPC: Add a helper to apply a function to all the rpc_clnt's transportsTrond Myklebust2016-02-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper for tasks that require us to apply a function to all the transports in an rpc_clnt. An example of a usecase would be BIND_CONN_TO_SESSION, where we want to send one RPC call down each transport. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | SUNRPC: Allow caller to specify the transport to useTrond Myklebust2016-02-051-0/+1
| | | | | | | | | | | | | | | | | | | | | This is needed in order to allow the NFSv4.1 backchannel and BIND_CONN_TO_SESSION function to work. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | SUNRPC: Use the multipath iterator to assign a transport to each taskTrond Myklebust2016-02-051-0/+2
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | SUNRPC: Make rpc_clnt store the multipath iteratorsTrond Myklebust2016-02-051-0/+2
| | | | | | | | | | | | | | | | | | | | | This is a pre-patch for the RPC multipath code. It sets up the storage in struct rpc_clnt for the multipath code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
OpenPOWER on IntegriCloud