summaryrefslogtreecommitdiffstats
path: root/fs/nfs
Commit message (Collapse)AuthorAgeFilesLines
* NFS: Move nfs_idmap.h into fs/nfs/Anna Schumaker2015-04-239-8/+76
| | | | | | | | | | This file is only used internally to the NFS v4 module, so it doesn't need to be in the global include path. I also renamed it from nfs_idmap.h to nfs4idmap.h to emphasize that it's an NFSv4-only include file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.hAnna Schumaker2015-04-232-2/+0
| | | | | | | | | The idmapper is completely internal to the NFS v4 module, so this macro will always evaluate to true. This patch also removes unnecessary includes of this file from the generic NFS client. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Add a stub for GETDEVICELISTAnna Schumaker2015-04-231-0/+6
| | | | | | | | | | | | | d4b18c3e (pnfs: remove GETDEVICELIST implementation) removed the GETDEVICELIST operation from the NFS client, but left a "hole" in the nfs4_procedures array. This caused /proc/self/mountstats to report an operation named "51" where GETDEVICELIST used to be. This patch adds a stub to fix mountstats. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Fixes: d4b18c3e (pnfs: remove GETDEVICELIST implementation) Cc: stable@vger.kernel.org # 3.17+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: remove WARN_ON_ONCE from nfs_direct_good_bytesPeng Tao2015-04-231-2/+0
| | | | | | | | | | | For flexfiles driver, we might choose to read from mirror index other than 0 while mirror_count is always 1 for read. Reported-by: Jean Spector <jean@primarydata.com> Cc: <stable@vger.kernel.org> # v3.19+ Cc: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: fix DIO good bytes calculationPeng Tao2015-04-231-12/+17
| | | | | | | | | | | | | | | | For direct read that has IO size larger than rsize, we'll split it into several READ requests and nfs_direct_good_bytes() would count completed bytes incorrectly by eating last zero count reply. Fix it by handling mirror and non-mirror cases differently such that we only count mirrored writes differently. This fixes 5fadeb47("nfs: count DIO good bytes correctly with mirroring"). Reported-by: Jean Spector <jean@primarydata.com> Cc: <stable@vger.kernel.org> # v3.19+ Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: Fetch MOUNTED_ON_FILEID when updating an inodeAnna Schumaker2015-04-232-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2ef47eb1 (NFS: Fix use of nfs_attr_use_mounted_on_fileid()) was a good start to fixing a circular directory structure warning for NFS v4 "junctioned" mountpoints. Unfortunately, further testing continued to generate this error. My server is configured like this: anna@nfsd ~ % df Filesystem Size Used Avail Use% Mounted on /dev/vda1 9.1G 2.0G 6.5G 24% / /dev/vdc1 1014M 33M 982M 4% /exports /dev/vdc2 1014M 33M 982M 4% /exports/vol1 /dev/vdc3 1014M 33M 982M 4% /exports/vol1/vol2 anna@nfsd ~ % cat /etc/exports /exports/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/vol2 *(rw,async,no_subtree_check,no_root_squash) I've been running chown across the entire mountpoint twice in a row to hit this problem. The first run succeeds, but the second one fails with the circular directory warning along with: anna@client ~ % dmesg [Apr 3 14:28] NFS: server 192.168.100.204 error: fileid changed fsid 0:39: expected fileid 0x100080, got 0x80 WHere 0x80 is the mountpoint's fileid and 0x100080 is the mounted-on fileid. This patch fixes the issue by requesting an updated mounted-on fileid from the server during nfs_update_inode(), and then checking that the fileid stored in the nfs_inode matches either the fileid or mounted-on fileid returned by the server. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: fix high load average due to callback thread sleepingJeff Layton2015-04-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Chuck pointed out a problem that crept in with commit 6ffa30d3f734 (nfs: don't call blocking operations while !TASK_RUNNING). Linux counts tasks in uninterruptible sleep against the load average, so this caused the system's load average to be pinned at at least 1 when there was a NFSv4.1+ mount active. Not a huge problem, but it's probably worth fixing before we get too many complaints about it. This patch converts the code back to use TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop iteration. In practice no one should really be signalling this thread at all, so I think this is reasonably safe. With this change, there's also no need to game the hung task watchdog so we can also convert the schedule_timeout call back to a normal schedule. Cc: <stable@vger.kernel.org> Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: commit 6ffa30d3f734 (“nfs: don't call blocking . . .”) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Reduce time spent holding the i_mutex during fallocate()Anna Schumaker2015-04-232-7/+10
| | | | | | | | | At the very least, we should not be taking the i_mutex until after checking if the server even supports ALLOCATE or DEALLOCATE, allowing v4.0 or v4.1 to exit without potentially waiting on a lock. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Don't zap caches on fallocate()Anna Schumaker2015-04-234-10/+35
| | | | | | | | | | | This patch adds a GETATTR to the end of ALLOCATE and DEALLOCATE operations so we can set the updated inode size and change attribute directly. DEALLOCATE will still need to release pagecache pages, so nfs42_proc_deallocate() now calls truncate_pagecache_range() before contacting the server. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Block new writes while syncing data in nfs_getattr()Trond Myklebust2015-03-271-0/+2
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1/pnfs: Separate out metadata and data consistency for pNFSTrond Myklebust2015-03-279-8/+47
| | | | | | | | | | | The LAYOUTCOMMIT operation means different things to different layout types. For blocks and objects, it is both a data and metadata consistency operation. For files and flexfiles, it is only a metadata consistency operation. This patch separates out the 2 cases, allowing the files/flexfiles layout drivers to optimise away the data consistency calls to layoutcommit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1/pnfs: Ensure we send layoutcommit before return-on-closeTrond Myklebust2015-03-271-1/+4
| | | | | | | | We must not send a close or delegreturn that would result in a return-on-close of the layout without ensuring that we've also sent the necessary layoutcommit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1/pnfs: Ensure that writes respect the O_SYNC flag when doing O_DIRECTTrond Myklebust2015-03-273-0/+3
| | | | | | | | | If the caller does not specify the O_SYNC flag, then it is legitimate to return from O_DIRECT without doing a pNFS layoutcommit operation. However if the file is opened O_DIRECT|O_SYNC then we'd better get it right. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Truncating file opens should also sync O_DIRECT writesTrond Myklebust2015-03-272-2/+3
| | | | | | We don't just want to sync out buffered writes, but also O_DIRECT ones. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: File unlock needs to be a metadata synchronisation pointTrond Myklebust2015-03-271-1/+1
| | | | | | | File unlock needs to update both data and metadata on the NFS server in order to act as a synchronisation point for other clients. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Add a helper to sync both O_DIRECT and buffered writesTrond Myklebust2015-03-271-6/+9
| | | | | | Then apply it to nfs_setattr() and nfs_getattr(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1/pnfs: Refactor pnfs_set_layoutcommit()Trond Myklebust2015-03-274-42/+14
| | | | | | | pnfs_set_layoutcommit() and pnfs_commit_set_layoutcommit() are 100% identical except for the function arguments. Refactor to eliminate the difference. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1/pnfs: Fix setting of layoutcommit last write byteTrond Myklebust2015-03-271-9/+8
| | | | | | | | | If the NFS_INO_LAYOUTCOMMIT flag was unset, then we _must_ ensure that we also reset the last write byte (lwb) for that layout. The current code depends on us clearing the lwb when we clear NFS_INO_LAYOUTCOMMIT, which is not the case when we call pnfs_clear_layoutcommit(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Return the delegation before returning the layout in evict_inode()Trond Myklebust2015-03-271-2/+3
| | | | | | | Minor optimisation for the case where the layout has return-on-close enabled. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Allow tracing of NFSv4 fsync callsTrond Myklebust2015-03-272-0/+8
| | | | | | I appear to have missed this when adding the ftrace probes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Fix free_deveiceid -> free_deviceidTrond Myklebust2015-03-272-4/+4
| | | | | | Make it easier to grep for these functions by name. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1: Don't cache deviceids that have no notificationsTrond Myklebust2015-03-273-0/+13
| | | | | | | | The spec says that once all layouts that reference a given deviceid have been returned, then we are only allowed to continue to cache the deviceid if the metadata server supports notifications. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1: Allow getdeviceinfo to return notification info back to callerTrond Myklebust2015-03-272-9/+10
| | | | | | | | We are only allowed to cache deviceinfo if the server supports notifications and actually promises to call us back when changes occur. Right now, we request those notifications, but then we don't check the server's reply. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1: Cleanup - don't opencode nfs4_put_deviceid_node()Trond Myklebust2015-03-271-4/+2
| | | | | | There really is no reason to do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1: Convert pNFS deviceid to use kfree_rcu()Trond Myklebust2015-03-276-8/+7
| | | | | | | | Use of synchronize_rcu() when unmounting and potentially freeing a lot of deviceids is problematic. There really is no reason why we can't just use kfree_rcu() here. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: clean up nfs_direct_IOPeng Tao2015-03-131-7/+0
| | | | | | | | This follows up "nfs: fix dio deadlock when O_DIRECT flag is flipped" and removes the unnecessary CONFIG_NFS_SWAP switch. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Append delegations to the per-client list instead of prependingTrond Myklebust2015-03-121-1/+1
| | | | | | | | | Do so on the assumption that for most use cases, that list will turn into a more or less LRU-ordered list, and so the list traversals in nfs_client_return_marked_delegations() are likely to be shorter before hitting a candidate to return. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4.1: Clear the old state by our client id before establishing a new leaseTrond Myklebust2015-03-033-5/+17
| | | | | | | | | | If the call to exchange-id returns with the EXCHGID4_FLAG_CONFIRMED_R flag set, then that means our lease was established by a previous mount instance. Ensure that we detect this situation, and that we clear the state held by that mount. Reported-by: Jorge Mora <Jorge.Mora@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Fix a race in NFSv4.1 server trunking discoveryTrond Myklebust2015-03-033-8/+17
| | | | | | | | We do not want to allow a race with another NFS mount to cause nfs41_walk_client_list() to establish a lease on our nfs_client before we're done checking for trunking. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Don't write enable new pages while an invalidation is proceedingTrond Myklebust2015-03-032-0/+4
| | | | | | | | | nfs_vm_page_mkwrite() should wait until the page cache invalidation is finished. This is the second patch in a 2 patch series to deprecate the NFS client's reliance on nfs_release_page() in the context of nfs_invalidate_mapping(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Fix a regression in the read() syscallTrond Myklebust2015-03-032-5/+36
| | | | | | | | | | | | | | | | | | | | When invalidating the page cache for a regular file, we want to first sync all dirty data to disk and then call invalidate_inode_pages2(). The latter relies on nfs_launder_page() and nfs_release_page() to deal respectively with dirty pages, and unstable written pages. When commit 9590544694bec ("NFS: avoid deadlocks with loop-back mounted NFS filesystems.") changed the behaviour of nfs_release_page(), then it made it possible for invalidate_inode_pages2() to fail with an EBUSY. Unfortunately, that error is then propagated back to read(). Let's therefore work around the problem for now by protecting the call to sync the data and invalidate_inode_pages2() so that they are atomic w.r.t. the addition of new writes. Later on, we can revisit whether or not we still need nfs_launder_page() and nfs_release_page(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Ensure we skip delegations that are already being returnedTrond Myklebust2015-03-021-0/+6
| | | | | | | | In nfs_client_return_marked_delegations() and nfs_delegation_reap_unclaimed() we want to optimise the loop traversal by skipping delegations that are already in the process of being returned. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Pin the superblock while we're returning the delegationTrond Myklebust2015-03-021-4/+16
| | | | | | | This patch ensures that the superblock doesn't go ahead and disappear underneath us while the state manager thread is returning delegations. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Ensure we honour NFS_DELEGATION_RETURNING in nfs_inode_set_delegation()Trond Myklebust2015-03-021-1/+4
| | | | | | | Ensure that nfs_inode_set_delegation() doesn't inadvertently detach a delegation that is already in the process of being returned. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Ensure that we don't reap a delegation that is being returnedTrond Myklebust2015-03-021-5/+7
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Fix stateid used for NFS v4 closesAnna Schumaker2015-03-021-2/+2
| | | | | | | | | | | | | After 566fcec60 the client uses the "current stateid" from the nfs4_state structure to close a file. This could potentially contain a delegation stateid, which is disallowed by the protocol and causes servers to return NFS4ERR_BAD_STATEID. This patch restores the (correct) behavior of sending the open stateid to close a file. Reported-by: Olga Kornievskaia <kolga@netapp.com> Fixes: 566fcec60 (NFSv4: Fix an atomicity problem in CLOSE) Signed-off-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Don't call put_rpccred() under the rcu_read_lock()Trond Myklebust2015-03-011-1/+1
| | | | | | | | put_rpccred() can sleep. Fixes: 8f649c3762547 ("NFSv4: Fix the locking in nfs_inode_reclaim_delegation()") Cc: stable@vger.kernel.org # 2.6.35+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Don't require a filehandle to refresh the inode in nfs_prime_dcache()Trond Myklebust2015-03-011-3/+13
| | | | | | | | | | | | If the server does not return a valid set of attributes that we can use to either create a file or refresh the inode, then there is no value in calling nfs_prime_dcache(). However if we're just refreshing the inode using the attributes that the server returned, then it shouldn't matter whether or not we have a filehandle, as long as we check the fsid+fileid combination. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv3: Use the readdir fileid as the mounted-on-fileidTrond Myklebust2015-03-011-0/+5
| | | | | | | | | | | | | | When we call readdirplus, set the fileid normally returned by readdir as the mounted-on-fileid, since that is commonly the case if there is a mountpoint. To ensure that we get it right, we only set the flag if the readdir fileid differs from the one returned in the readdirplus attributes. This again means that we can avoid the issues described in commit 2ef47eb1aee17 ("NFS: Fix use of nfs_attr_use_mounted_on_fileid()"), which only fixed NFSv4. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Don't invalidate a submounted dentry in nfs_prime_dcache()Trond Myklebust2015-03-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | If we're traversing a directory which contains a submounted filesystem, or one that has a referral, the NFS server that is processing the READDIR request will often return information for the underlying (mounted-on) directory. It may, or may not, also return filehandle information. If this happens, and the lookup in nfs_prime_dcache() returns the dentry for the submounted directory, the filehandle comparison will fail, and we call d_invalidate(). Post-commit 8ed936b5671bf ("vfs: Lazily remove mounts on unlinked files and directories."), this means the entire subtree is unmounted. The following minimal patch addresses this problem by punting on the invalidation if there is a submount. Kudos to Neil Brown <neilb@suse.de> for having tracked down this issue (see link). Reported-by: Nix <nix@esperi.org.uk> Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Set a barrier in the update_changeattr() helperTrond Myklebust2015-03-012-0/+2
| | | | | | | | Ensure that we don't regress the changes that were made to the directory. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* NFS: Fix nfs_post_op_update_inode() to set an attribute barrierTrond Myklebust2015-03-011-0/+1
| | | | | | | | nfs_post_op_update_inode() is called after a self-induced attribute update. Ensure that it also sets the barrier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* NFS: Remove size hack in nfs_inode_attrs_need_update()Trond Myklebust2015-03-011-8/+0
| | | | | | | | | | | | Prior to this patch, we used to always OK attribute updates that extended the file size on the assumption that we might be performing writeback. Now that we have attribute barriers to protect the writeback related updates, we should remove this hack, as it can cause truncate() operations to apparently be reverted if/when a readahead or getattr RPC call races with our on-the-wire SETATTR. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* NFSv4: Add attribute update barriers to delegreturn and pNFS layoutcommitTrond Myklebust2015-03-011-0/+1
| | | | | | | | Ensure that other operations that race with delegreturn and layoutcommit cannot revert the attribute updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* NFS: Add attribute update barriers to NFS writebacksTrond Myklebust2015-03-016-8/+56
| | | | | | | | Ensure that other operations that race with our write RPC calls cannot revert the file size updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* NFS: Set an attribute barrier on all updatesTrond Myklebust2015-03-011-0/+4
| | | | | | | | Ensure that we update the attribute barrier even if there were no invalidations, provided that this value is newer than the old one. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* NFS: Add attribute update barriers to nfs_setattr_update_inode()Trond Myklebust2015-03-014-10/+17
| | | | | | | | | | | | | | | Ensure that other operations which raced with our setattr RPC call cannot revert the file attribute changes that were made on the server. To do so, we artificially bump the attribute generation counter on the inode so that all calls to nfs_fattr_init() that precede ours will be dropped. The motivation for the patch came from Chuck Lever's reports of readaheads racing with truncate operations and causing the file size to be reverted. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* NFS: Add a helper to set attribute barriersTrond Myklebust2015-03-011-0/+16
| | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* NFS: Ensure that buffered writes wait for O_DIRECT writes to completeTrond Myklebust2015-03-011-0/+4
| | | | | | | | | | | | The O_DIRECT code will grab the inode->i_mutex and flush out buffered writes, before scheduling a read or a write. However there is no equivalent in the buffered write code to wait for O_DIRECT to complete. Fixes a reported issue in xfstests generic/133, when first performing an O_DIRECT write followed by a buffered write. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
* NFSv4: nfs4_open_recover_helper() must set share accessTrond Myklebust2015-02-271-0/+3
| | | | | | | | | The share access mode is now specified as an argument in the nfs4_opendata, and so nfs4_open_recover_helper() needs to call nfs4_map_atomic_open_share() in order to set it. Fixes: 6ae373394c42 ("NFSv4.1: Ask for no delegation on OPEN if using O_DIRECT") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
OpenPOWER on IntegriCloud