summaryrefslogtreecommitdiffstats
path: root/fs/nfs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | NFS: Remove unused argument from nfs_create_request()Trond Myklebust2019-04-254-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | All the callers of nfs_create_request() are now creating page group heads, so we can remove the redundant 'last' page argument. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Fix up NFS I/O subrequest creationTrond Myklebust2019-04-251-38/+55
| | | | | | | | | | | | | | | | | | | | | | | | We require all NFS I/O subrequests to duplicate the lock context as well as the open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Replace custom error reporting mechanism with generic oneTrond Myklebust2019-04-253-45/+26
| | | | | | | | | | | | | | | | | | | | | | | | Replace the NFS custom error reporting mechanism with the generic mapping_set_error(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Don't inadvertently clear writeback errorsTrond Myklebust2019-04-252-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vfs_fsync() has the side effect of clearing unreported writeback errors, so we need to make sure that we do not abuse it in situations where applications might not normally expect us to report those errors. The solution is to replace calls to vfs_fsync() with calls to nfs_wb_all(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Don't call generic_error_remove_page() while holding locksTrond Myklebust2019-04-251-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The NFS read code can trigger writeback while holding the page lock. If an error then triggers a call to nfs_write_error_remove_page(), we can deadlock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Don't interrupt file writeout due to fatal errorsTrond Myklebust2019-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When flushing out dirty pages, the fact that we may hit fatal errors is not a reason to stop writeback. Those errors are reported through fsync(), not through the flush mechanism. Fixes: a6598813a4c5b ("NFS: Don't write back further requests if there...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT errorsTrond Myklebust2019-04-252-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add a mount option that exposes the ETIMEDOUT errors that occur during soft timeouts to the application. This allows aware applications to distinguish between server disk IO errors and client timeout errors. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFS: Consider ETIMEDOUT to be a fatal errorTrond Myklebust2019-04-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we introduce the 'softerr' mount option, we will see the RPC layer returning ETIMEDOUT errors if the server is unresponsive. We want to consider those errors to be fatal on par with the EIO errors that are returned by ordinary 'soft' timeouts.. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | SUNRPC: Add function rpc_sleep_on_timeout()Trond Myklebust2019-04-251-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the RPC task sleep interfaces by replacing the task->tk_timeout 'hidden parameter' to rpc_sleep_on() with a new function that takes an absolute timeout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority()Trond Myklebust2019-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | None of the callers set the 'action' argument, so let's just remove it. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-blockLinus Torvalds2019-05-071-1/+0
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: "Nothing major in this series, just fixes and improvements all over the map. This contains: - Series of fixes for sed-opal (David, Jonas) - Fixes and performance tweaks for BFQ (via Paolo) - Set of fixes for bcache (via Coly) - Set of fixes for md (via Song) - Enabling multi-page for passthrough requests (Ming) - Queue release fix series (Ming) - Device notification improvements (Martin) - Propagate underlying device rotational status in loop (Holger) - Removal of mtip32xx trim support, which has been disabled for years (Christoph) - Improvement and cleanup of nvme command handling (Christoph) - Add block SPDX tags (Christoph) - Cleanup/hardening of bio/bvec iteration (Christoph) - A few NVMe pull requests (Christoph) - Removal of CONFIG_LBDAF (Christoph) - Various little fixes here and there" * tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block: (164 commits) block: fix mismerge in bvec_advance block: don't drain in-progress dispatch in blk_cleanup_queue() blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release blk-mq: always free hctx after request queue is freed blk-mq: split blk_mq_alloc_and_init_hctx into two parts blk-mq: free hw queue's resource in hctx's release handler blk-mq: move cancel of requeue_work into blk_mq_release blk-mq: grab .q_usage_counter when queuing request from plug code path block: fix function name in comment nvmet: protect discovery change log event list iteration nvme: mark nvme_core_init and nvme_core_exit static nvme: move command size checks to the core nvme-fabrics: check more command sizes nvme-pci: check more command sizes nvme-pci: remove an unneeded variable initialization nvme-pci: unquiesce admin queue on shutdown nvme-pci: shutdown on timeout during deletion nvme-pci: fix psdt field for single segment sgls nvme-multipath: don't print ANA group state by default nvme-multipath: split bios with the ns_head bio_set before submitting ...
| * | | Merge tag 'v5.1-rc6' into for-5.2/blockJens Axboe2019-04-224-7/+7
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull in v5.1-rc6 to resolve two conflicts. One is in BFQ, in just a comment, and is trivial. The other one is a conflict due to a later fix in the bio multi-page work, and needs a bit more care. * tag 'v5.1-rc6': (770 commits) Linux 5.1-rc6 block: make sure that bvec length can't be overflow block: kill all_q_node in request_queue x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping mm/kmemleak.c: fix unused-function warning init: initialize jump labels before command line option parsing kernel/watchdog_hld.c: hard lockup message should end with a newline kcov: improve CONFIG_ARCH_HAS_KCOV help text mm: fix inactive list balancing between NUMA nodes and cgroups mm/hotplug: treat CMA pages as unmovable proc: fixup proc-pid-vm test proc: fix map_files test on F29 mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock mm: swapoff: shmem_unuse() stop eviction without igrab() mm: swapoff: take notice of completion sooner mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES mm: swapoff: shmem_find_swap_entries() filter out other types slab: store tagged freelist for off-slab slabmgmt ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: remove CONFIG_LBDAFChristoph Hellwig2019-04-061-1/+0
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit architectures. These types are required to support block device and/or file sizes larger than 2 TiB, and have generally defaulted to on for a long time. Enabling the option only increases the i386 tinyconfig size by 145 bytes, and many data structures already always use 64-bit values for their in-core and on-disk data structures anyway, so there should not be a large change in dynamic memory usage either. Dropping this option removes a somewhat weird non-default config that has cause various bugs or compiler warnings when actually used. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge branch 'work.icache' of ↵Linus Torvalds2019-05-074-11/+5
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs inode freeing updates from Al Viro: "Introduction of separate method for RCU-delayed part of ->destroy_inode() (if any). Pretty much as posted, except that destroy_inode() stashes ->free_inode into the victim (anon-unioned with ->i_fops) before scheduling i_callback() and the last two patches (sockfs conversion and folding struct socket_wq into struct socket) are excluded - that pair should go through netdev once davem reopens his tree" * 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits) orangefs: make use of ->free_inode() shmem: make use of ->free_inode() hugetlb: make use of ->free_inode() overlayfs: make use of ->free_inode() jfs: switch to ->free_inode() fuse: switch to ->free_inode() ext4: make use of ->free_inode() ecryptfs: make use of ->free_inode() ceph: use ->free_inode() btrfs: use ->free_inode() afs: switch to use of ->free_inode() dax: make use of ->free_inode() ntfs: switch to ->free_inode() securityfs: switch to ->free_inode() apparmor: switch to ->free_inode() rpcpipe: switch to ->free_inode() bpf: switch to ->free_inode() mqueue: switch to ->free_inode() ufs: switch to ->free_inode() coda: switch to ->free_inode() ...
| * | nfs{,4}: switch to ->free_inode()Al Viro2019-05-014-11/+5
| |/ | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | NFSv4.1 fix incorrect return value in copy_file_rangeOlga Kornievskaia2019-04-112-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the NFSv4.2 spec if the input and output file is the same file, operation should fail with EINVAL. However, linux copy_file_range() system call has no such restrictions. Therefore, in such case let's return EOPNOTSUPP and allow VFS to fallback to doing do_splice_direct(). Also when copy_file_range is called on an NFSv4.0 or 4.1 mount (ie., a server that doesn't support COPY functionality), we also need to return EOPNOTSUPP and fallback to a regular copy. Fixes xfstest generic/075, generic/091, generic/112, generic/263 for all NFSv4.x versions. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFS: Fix handling of reply page vectorChuck Lever2019-04-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | NFSv4 GETACL and FS_LOCATIONS requests stopped working in v5.1-rc. These two need the extra padding to be added directly to the reply length. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 02ef04e432ba ("NFS: Account for XDR pad of buf->pages") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.Tetsuo Handa2019-04-111-1/+2
|/ | | | | | | | | | | | | | | | | | | syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family (which is embedded into user-visible "struct nfs_mount_data" structure) despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6) bytes of AF_INET6 address to rpc_sockaddr2uaddr(). Since "struct nfs_mount_data" structure is user-visible, we can't change "struct nfs_mount_data" to use "struct sockaddr_storage". Therefore, assuming that everybody is using AF_INET family when passing address via "struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET. [1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75c Reported-by: syzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS/flexfiles: Fix layoutstats handling during read failoversTrond Myklebust2019-03-231-1/+4
| | | | | | | | During a read failover, we may end up changing the value of the pgio_mirror_idx, so make sure that we record the layout stats before that update. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Fix a typo in nfs_init_timeout_values()Trond Myklebust2019-03-231-1/+1
| | | | | | | | | Specifying a retrans=0 mount parameter to a NFS/TCP mount, is inadvertently causing the NFS client to rewrite any specified timeout parameter to the default of 60 seconds. Fixes: a956beda19a6 ("NFS: Allow the mount option retrans=0") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1 don't free interrupted slot on openOlga Kornievskaia2019-03-191-1/+2
| | | | | | | | | | Allow the async rpc task for finish and update the open state if needed, then free the slot. Otherwise, the async rpc unable to decode the reply. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Fixes: ae55e59da0e4 ("pnfs: Don't release the sequence slot...") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data()Catalin Marinas2019-03-181-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7b587e1a5a6c ("NFS: use locks_copy_lock() to copy locks.") changed the lock copying from memcpy() to the dedicated locks_copy_lock() function. The latter correctly increments the nfs4_lock_state.ls_count via nfs4_fl_copy_lock(), however, this refcount has already been incremented in the nfs4_alloc_{lock,unlock}data(). Kmemleak subsequently reports an unreferenced nfs4_lock_state object as below (arm64 platform): unreferenced object 0xffff8000fce0b000 (size 256): comm "systemd-sysuser", pid 1608, jiffies 4294892825 (age 32.348s) hex dump (first 32 bytes): 20 57 4c fb 00 80 ff ff 20 57 4c fb 00 80 ff ff WL..... WL..... 00 57 4c fb 00 80 ff ff 01 00 00 00 00 00 00 00 .WL............. backtrace: [<000000000d15010d>] kmem_cache_alloc+0x178/0x208 [<00000000d7c1d264>] nfs4_set_lock_state+0x124/0x1f0 [<000000009c867628>] nfs4_proc_lock+0x90/0x478 [<000000001686bd74>] do_setlk+0x64/0xe8 [<00000000e01500d4>] nfs_lock+0xe8/0x1f0 [<000000004f387d8d>] vfs_lock_file+0x18/0x40 [<00000000656ab79b>] do_lock_file_wait+0x68/0xf8 [<00000000f17c4a4b>] fcntl_setlk+0x224/0x280 [<0000000052a242c6>] do_fcntl+0x418/0x730 [<000000004f47291a>] __arm64_sys_fcntl+0x84/0xd0 [<00000000d6856e01>] el0_svc_common+0x80/0xf0 [<000000009c4bd1df>] el0_svc_handler+0x2c/0x80 [<00000000b1a0d479>] el0_svc+0x8/0xc [<0000000056c62a0f>] 0xffffffffffffffff This patch removes the original refcount_inc(&lsp->ls_count) that was paired with the memcpy() lock copying. Fixes: 7b587e1a5a6c ("NFS: use locks_copy_lock() to copy locks.") Cc: <stable@vger.kernel.org> # 5.0.x- Cc: NeilBrown <neilb@suse.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Fix a typo in pnfs_update_layoutTrond Myklebust2019-03-121-1/+1
| | | | | | | | | We're supposed to wait for the outstanding layout count to go to zero, but that got lost somehow. Fixes: d03360aaf5cca ("pNFS: Ensure we return the error if someone...") Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1: Bump the default callback session slot count to 16Trond Myklebust2019-03-021-1/+1
| | | | | | | | Users can still control this value explicitly using the max_session_cb_slots module parameter, but let's bump the default up to 16 for now. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Clean up mirror DS initialisationTrond Myklebust2019-03-011-35/+31
| | | | | | | Get rid of the redundant parameter and rename the function ff_layout_mirror_valid() to ff_layout_init_mirror_ds() for clarity. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Remove dead code in ff_layout_mirror_valid()Trond Myklebust2019-03-011-15/+0
| | | | | | | nfs4_ff_alloc_deviceid_node() guarantees that if mirror->mirror_ds is a valid pointer, then so is mirror->mirror_ds->ds. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid()Trond Myklebust2019-03-013-26/+10
| | | | | | | Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfile: Simplify nfs4_ff_layout_ds_version()Trond Myklebust2019-03-012-5/+5
| | | | | | | Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Simplify ff_layout_get_ds_cred()Trond Myklebust2019-03-013-8/+9
| | | | | | | Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client()Trond Myklebust2019-03-013-10/+6
| | | | | | | Pass in a pointer to the mirror rather than forcing another array access. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh()Trond Myklebust2019-03-013-16/+5
| | | | | | | Pass in a pointer to the mirror rather than having to retrieve it from the array and then verify the resulting pointer. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Speed up read failover when DSes are downTrond Myklebust2019-03-013-12/+73
| | | | | | | If we notice that a DS may be down, we should attempt to read from the other mirrors first before we go back to retry the dead DS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Don't invalidate DS deviceids for being unresponsiveTrond Myklebust2019-03-012-21/+3
| | | | | | | | If the DS is unresponsive, we want to just mark it as such, while reporting the errors. If the server later returns the same deviceid in a new layout, then we don't want to have to look it up again. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Remove bogus checks for invalid deviceidsTrond Myklebust2019-03-011-20/+0
| | | | | | We already check the deviceids before we start the RPC call. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Avoid unnecessary layout invalidationsTrond Myklebust2019-03-011-3/+3
| | | | | | | | | | | In ff_layout_mirror_valid() we may not want to invalidate the layout segment despite the call to GETDEVICEINFO failing. The reason is that a read may still be able to make progress on another mirror. So instead we let the caller (in this case nfs4_ff_layout_prepare_ds()) decide whether or not it needs to invalidate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: refactor calls to fs4_ff_layout_prepare_ds()Trond Myklebust2019-03-013-22/+28
| | | | | | | | | While we may want to skip attempting to connect to a downed mirror when we're deciding which mirror to select for a read, we do not want to do so once we've committed to attempting the I/O in ff_layout_read/write_pagelist(), or ff_layout_initiate_commit() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4: Handle early exit in layoutget by returning an errorTrond Myklebust2019-03-011-2/+4
| | | | | | | If the LAYOUTGET rpc call exits early without an error, convert it to EAGAIN. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Send LAYOUTERROR when failing over mirrored readsTrond Myklebust2019-03-013-6/+57
| | | | | | | | | | | | | When a read to the preferred mirror returns an error, the flexfiles driver records the error in the inode list and currently marks the layout for return before failing over the attempted read to the next mirror. What we actually want to do is fire off a LAYOUTERROR to notify the MDS that there is an issue with the preferred mirror, then we fail over. Only once we've failed to read from all mirrors should we return the layout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.2: Add client support for the generic 'layouterror' RPC callTrond Myklebust2019-03-015-1/+269
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/flexfiles: Abort I/O early if the layout segment was invalidatedTrond Myklebust2019-03-011-0/+17
| | | | | | | | | | | | If a layout segment gets invalidated while a pNFS I/O operation is queued for transmission, then we ideally want to abort immediately. This is particularly the case when there is a large number of I/O related RPCs queued in the RPC layer, and the layout segment gets invalidated due to an ENOSPC error, or an EACCES (because the client was fenced). We may end up forced to spam the MDS with a lot of otherwise unnecessary LAYOUTERRORs after that I/O fails. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/pnfs: Fix barriers in nfs4_mark_deviceid_unavailable()Trond Myklebust2019-03-011-0/+3
| | | | | | | Fix the memory barriers in nfs4_mark_deviceid_unavailable() and nfs4_test_deviceid_unavailable(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS/flexfiles: Fix up sparse RCU annotationsTrond Myklebust2019-03-011-2/+2
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE()Trond Myklebust2019-03-011-13/+19
| | | | | | | | | If the attempt to instantiate the mirror's layout DS pointer failed, then that pointer may hold a value of type ERR_PTR(), so we need to check that before we dereference it. Fixes: 65990d1afbd2d ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFS: Add missing encode / decode sequence_maxsz to v4.2 operationsAnna Schumaker2019-03-011-0/+10
| | | | | | | | | | | | | | | | These really should have been there from the beginning, but we never noticed because there was enough slack in the RPC request for the extra bytes. Chuck's recent patch to use au_cslack and au_rslack to compute buffer size shrunk the buffer enough that this was now a problem for SEEK operations on my test client. Fixes: f4ac1674f5da4 ("nfs: Add ALLOCATE support") Fixes: 2e72448b07dc3 ("NFS: Add COPY nfs operation") Fixes: cb95deea0b4aa ("NFS OFFLOAD_CANCEL xdr") Fixes: 624bd5b7b683c ("nfs: Add DEALLOCATE support") Fixes: 1c6dcbe5ceff8 ("NFS: Implement SEEK") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1: Don't process the sequence op more than once.Trond Myklebust2019-03-011-1/+1
| | | | | | | Ensure that if we call nfs41_sequence_process() a second time for the same rpc_task, then we only process the results once. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4.1: Reinitialise sequence results before retransmitting a requestTrond Myklebust2019-03-011-4/+8
| | | | | | | | | If we have to retransmit a request, we should ensure that we reinitialise the sequence results structure, since in the event of a signal we need to treat the request as if it had not been sent. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org
* Merge tag 'nfs-rdma-for-5.1-1' of ↵Trond Myklebust2019-02-259-654/+406
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA client updates for 5.1 New features: - Convert rpc auth layer to use xdr_streams - Config option to disable insecure enctypes - Reduce size of RPC receive buffers Bugfixes and cleanups: - Fix sparse warnings - Check inline size before providing a write chunk - Reduce the receive doorbell rate - Various tracepoint improvements [Trond: Fix up merge conflicts] Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * NFS: Account for XDR pad of buf->pagesChuck Lever2019-02-143-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Certain NFS results (eg. READLINK) might expect a data payload that is not an exact multiple of 4 bytes. In this case, XDR encoding is required to pad that payload so its length on the wire is a multiple of 4 bytes. The constants that define the maximum size of each NFS result do not appear to account for this extra word. In each case where the data payload is to be received into pages: - 1 word is added to the size of the receive buffer allocated by call_allocate - rpc_inline_rcv_pages subtracts 1 word from @hdrsize so that the extra buffer space falls into the rcv_buf's tail iovec - If buf->pagelen is word-aligned, an XDR pad is not needed and is thus removed from the tail Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Introduce rpc_prepare_reply_pages()Chuck Lever2019-02-143-73/+34
| | | | | | | | | | | | | | | | | | | | | | | | prepare_reply_buffer() and its NFSv4 equivalents expose the details of the RPC header and the auth slack values to upper layer consumers, creating a layering violation, and duplicating code. Remedy these issues by adding a new RPC client API that hides those details from upper layers in a common helper function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * NFS: Add trace events to report non-zero NFS status codesChuck Lever2019-02-136-4/+133
| | | | | | | | | | | | | | | | These can help field troubleshooting without needing the overhead of a full network capture (ie, tcpdump). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
OpenPOWER on IntegriCloud