summaryrefslogtreecommitdiffstats
path: root/fs/nfs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | security/selinux: allow security_sb_clone_mnt_opts to enable/disable native ↵Scott Mayhew2017-06-091-1/+16
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | labeling behavior When an NFSv4 client performs a mount operation, it first mounts the NFSv4 root and then does path walk to the exported path and performs a submount on that, cloning the security mount options from the root's superblock to the submount's superblock in the process. Unless the NFS server has an explicit fsid=0 export with the "security_label" option, the NFSv4 root superblock will not have SBLABEL_MNT set, and neither will the submount superblock after cloning the security mount options. As a result, setxattr's of security labels over NFSv4.2 will fail. In a similar fashion, NFSv4.2 mounts mounted with the context= mount option will not show the correct labels because the nfs_server->caps flags of the cloned superblock will still have NFS_CAP_SECURITY_LABEL set. Allowing the NFSv4 client to enable or disable SECURITY_LSM_NATIVE_LABELS behavior will ensure that the SBLABEL_MNT flag has the correct value when the client traverses from an exported path without the "security_label" option to one with the "security_label" option and vice versa. Similarly, checking to see if SECURITY_LSM_NATIVE_LABELS is set upon return from security_sb_clone_mnt_opts() and clearing NFS_CAP_SECURITY_LABEL if necessary will allow the correct labels to be displayed for NFSv4.2 mounts mounted with the context= mount option. Resolves: https://github.com/SELinuxProject/selinux-kernel/issues/35 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
* | | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2017-07-032-2/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Add the SYSTEM_SCHEDULING bootup state to move various scheduler debug checks earlier into the bootup. This turns silent and sporadically deadly bugs into nice, deterministic splats. Fix some of the splats that triggered. (Thomas Gleixner) - A round of restructuring and refactoring of the load-balancing and topology code (Peter Zijlstra) - Another round of consolidating ~20 of incremental scheduler code history: this time in terms of wait-queue nomenclature. (I didn't get much feedback on these renaming patches, and we can still easily change any names I might have misplaced, so if anyone hates a new name, please holler and I'll fix it.) (Ingo Molnar) - sched/numa improvements, fixes and updates (Rik van Riel) - Another round of x86/tsc scheduler clock code improvements, in hope of making it more robust (Peter Zijlstra) - Improve NOHZ behavior (Frederic Weisbecker) - Deadline scheduler improvements and fixes (Luca Abeni, Daniel Bristot de Oliveira) - Simplify and optimize the topology setup code (Lauro Ramos Venancio) - Debloat and decouple scheduler code some more (Nicolas Pitre) - Simplify code by making better use of llist primitives (Byungchul Park) - ... plus other fixes and improvements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) sched/cputime: Refactor the cputime_adjust() code sched/debug: Expose the number of RT/DL tasks that can migrate sched/numa: Hide numa_wake_affine() from UP build sched/fair: Remove effective_load() sched/numa: Implement NUMA node level wake_affine() sched/fair: Simplify wake_affine() for the single socket case sched/numa: Override part of migrate_degrades_locality() when idle balancing sched/rt: Move RT related code from sched/core.c to sched/rt.c sched/deadline: Move DL related code from sched/core.c to sched/deadline.c sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled sched/fair: Spare idle load balancing on nohz_full CPUs nohz: Move idle balancer registration to the idle path sched/loadavg: Generalize "_idle" naming to "_nohz" sched/core: Drop the unused try_get_task_struct() helper function sched/fair: WARN() and refuse to set buddy when !se->on_rq sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> ...
| * | | sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into ↵Ingo Molnar2017-06-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <linux/wait_bit.h> The wait_bit*() types and APIs are mixed into wait.h, but they are a pretty orthogonal extension of wait-queues. Furthermore, only about 50 kernel files use these APIs, while over 1000 use the regular wait-queue functionality. So clean up the main wait.h by moving the wait-bit functionality out of it, into a separate .h and .c file: include/linux/wait_bit.h for types and APIs kernel/sched/wait_bit.c for the implementation Update all header dependencies. This reduces the size of wait.h rather significantly, by about 30%. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | sched/wait: Rename wait_queue_t => wait_queue_entry_tIngo Molnar2017-06-201-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue *entry*. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'for-4.13/block' of git://git.kernel.dk/linux-blockLinus Torvalds2017-07-031-2/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block/IO updates from Jens Axboe: "This is the main pull request for the block layer for 4.13. Not a huge round in terms of features, but there's a lot of churn related to some core cleanups. Note this depends on the UUID tree pull request, that Christoph already sent out. This pull request contains: - A series from Christoph, unifying the error/stats codes in the block layer. We now use blk_status_t everywhere, instead of using different schemes for different places. - Also from Christoph, some cleanups around request allocation and IO scheduler interactions in blk-mq. - And yet another series from Christoph, cleaning up how we handle and do bounce buffering in the block layer. - A blk-mq debugfs series from Bart, further improving on the support we have for exporting internal information to aid debugging IO hangs or stalls. - Also from Bart, a series that cleans up the request initialization differences across types of devices. - A series from Goldwyn Rodrigues, allowing the block layer to return failure if we will block and the user asked for non-blocking. - Patch from Hannes for supporting setting loop devices block size to that of the underlying device. - Two series of patches from Javier, fixing various issues with lightnvm, particular around pblk. - A series from me, adding support for write hints. This comes with NVMe support as well, so applications can help guide data placement on flash to improve performance, latencies, and write amplification. - A series from Ming, improving and hardening blk-mq support for stopping/starting and quiescing hardware queues. - Two pull requests for NVMe updates. Nothing major on the feature side, but lots of cleanups and bug fixes. From the usual crew. - A series from Neil Brown, greatly improving the bio rescue set support. Most notably, this kills the bio rescue work queues, if we don't really need them. - Lots of other little bug fixes that are all over the place" * 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits) lightnvm: pblk: set line bitmap check under debug lightnvm: pblk: verify that cache read is still valid lightnvm: pblk: add initialization check lightnvm: pblk: remove target using async. I/Os lightnvm: pblk: use vmalloc for GC data buffer lightnvm: pblk: use right metadata buffer for recovery lightnvm: pblk: schedule if data is not ready lightnvm: pblk: remove unused return variable lightnvm: pblk: fix double-free on pblk init lightnvm: pblk: fix bad le64 assignations nvme: Makefile: remove dead build rule blk-mq: map all HWQ also in hyperthreaded system nvmet-rdma: register ib_client to not deadlock in device removal nvme_fc: fix error recovery on link down. nvmet_fc: fix crashes on bad opcodes nvme_fc: Fix crash when nvme controller connection fails. nvme_fc: replace ioabort msleep loop with completion nvme_fc: fix double calls to nvme_cleanup_cmd() nvme-fabrics: verify that a controller returns the correct NQN nvme: simplify nvme_dev_attrs_are_visible ...
| * | | Merge tag 'v4.12-rc5' into for-4.13/blockJens Axboe2017-06-128-15/+33
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've already got a few conflicts and upcoming work depends on some of the changes that have gone into mainline as regression fixes for this series. Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream trees to continue working on 4.13 changes. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: switch bios to blk_status_tChristoph Hellwig2017-06-091-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()Trond Myklebust2017-06-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | The current code works only for the case where we have exactly one slot, which is no longer true. nfs4_free_slot() will automatically declare the callback channel to be drained when all slots have been returned. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | | Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind"Benjamin Coddington2017-06-271-27/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 920b4530fb80430ff30ef83efe21ba1fa5623731 which could call d_move() without holding the directory's i_mutex, and reverts commit d4ea7e3c5c0e341c15b073016dbf3ab6c65f12f3 "NFS: Fix old dentry rehash after move", which was a follow-up fix. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: 920b4530fb80 ("NFS: nfs_rename() handle -ERESTARTSYS dentry left behind") Cc: stable@vger.kernel.org # v4.10+ Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | | NFSv4.1: Fix a race in nfs4_proc_layoutgetTrond Myklebust2017-06-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the task calling layoutget is signalled, then it is possible for the calls to nfs4_sequence_free_slot() and nfs4_layoutget_prepare() to race, in which case we leak a slot. The fix is to move the call to nfs4_sequence_free_slot() into the nfs4_layoutget_release() so that it gets called at task teardown time. Fixes: 2e80dbe7ac51 ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | | NFS: Trunking detection should handle ERESTARTSYS/EINTRTrond Myklebust2017-06-271-0/+2
| | | | | | | | | | | | | | | | | | Currently, it will return EIO in those cases. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | | NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umaskBenjamin Coddington2017-06-051-1/+2
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | Now that we have umask support, we shouldn't re-send the mode in a SETATTR following an exclusive CREATE, or we risk having the same problem fixed in commit 5334c5bdac92 ("NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1"), which is that files with S_ISGID will have that bit stripped away. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: dff25ddb4808 ("nfs: add support for the umask attribute") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | Merge tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2017-06-047-14/+32
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Bugfixes include: - Fix a typo in commit e092693443b ("NFS append COMMIT after synchronous COPY") that breaks copy offload - Fix the connect error propagation in xs_tcp_setup_socket() - Fix a lock leak in nfs40_walk_client_list - Verify that pNFS requests lie within the offset range of the layout segment" * tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Mark unnecessarily extern functions as static SUNRPC: ensure correct error is reported by xs_tcp_setup_socket() NFSv4.0: Fix a lock leak in nfs40_walk_client_list pnfs: Fix the check for requests in range of layout segment xprtrdma: Delete an error message for a failed memory allocation in xprt_rdma_bc_setup() pNFS/flexfiles: missing error code in ff_layout_alloc_lseg() NFS fix COMMIT after COPY
| * | nfs: Mark unnecessarily extern functions as staticJan Kara2017-06-032-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs_initialise_sb() and nfs_clone_super() are declared as extern even though they are used only in fs/nfs/super.c. Mark them as static. Also remove explicit 'inline' directive from nfs_initialise_sb() and leave it upto compiler to decide whether inlining is worth it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4.0: Fix a lock leak in nfs40_walk_client_listTrond Myklebust2017-05-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xiaolong Ye's kernel test robot detected the following Oops: [ 299.158991] BUG: scheduling while atomic: mount.nfs/9387/0x00000002 [ 299.169587] 2 locks held by mount.nfs/9387: [ 299.176165] #0: (nfs_clid_init_mutex){......}, at: [<ffffffff8130cc92>] nfs4_discover_server_trunking+0x47/0x1fc [ 299.201802] #1: (&(&nn->nfs_client_lock)->rlock){......}, at: [<ffffffff813125fa>] nfs40_walk_client_list+0x2e9/0x338 [ 299.221979] CPU: 0 PID: 9387 Comm: mount.nfs Not tainted 4.11.0-rc7-00021-g14d1bbb #45 [ 299.235584] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 [ 299.251176] Call Trace: [ 299.255192] dump_stack+0x61/0x7e [ 299.260416] __schedule_bug+0x65/0x74 [ 299.266208] __schedule+0x5d/0x87c [ 299.271883] schedule+0x89/0x9a [ 299.276937] schedule_timeout+0x232/0x289 [ 299.283223] ? detach_if_pending+0x10b/0x10b [ 299.289935] schedule_timeout_uninterruptible+0x2a/0x2c [ 299.298266] ? put_rpccred+0x3e/0x115 [ 299.304327] ? schedule_timeout_uninterruptible+0x2a/0x2c [ 299.312851] msleep+0x1e/0x22 [ 299.317612] nfs4_discover_server_trunking+0x102/0x1fc [ 299.325644] nfs4_init_client+0x13f/0x194 It looks as if we recently added a spin_lock() leak to nfs40_walk_client_list() when cleaning up the code. Reported-by: kernel test robot <xiaolong.ye@intel.com> Fixes: 14d1bbb0ca42 ("NFS: Create a common nfs4_match_client() function") Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pnfs: Fix the check for requests in range of layout segmentBenjamin Coddington2017-05-242-8/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible and acceptable for NFS to attempt to add requests beyond the range of the current pgio->pg_lseg, a case which should be caught and limited by the pg_test operation. However, the current handling of this case replaces pgio->pg_lseg with a new layout segment (after a WARN) within that pg_test operation. That will cause all the previously added requests to be submitted with this new layout segment, which may not be valid for those requests. Fix this problem by only returning zero for the number of bytes to coalesce from pg_test for this case which allows any previously added requests to complete on the current layout segment. The check for requests starting out of range of the layout segment moves to pg_init, so that the replacement of pgio->pg_lseg will be done when the next request is added. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()Dan Carpenter2017-05-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If xdr_inline_decode() fails then we end up returning ERR_PTR(0). The caller treats NULL returns as -ENOMEM so it doesn't really hurt runtime, but obviously we intended to set an error code here. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS fix COMMIT after COPYOlga Kornievskaia2017-05-241-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | Fix a typo in the commit e092693443b995c8e3a565a73b5fdb05f1260f9b "NFS append COMMIT after synchronous COPY" Reported-by: Eryu Guan <eguan@redhat.com> Fixes: e092693443b ("NFS append COMMIT after synchronous COPY") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* / NFS: Use ERR_CAST() to avoid cross-structure castKees Cook2017-05-281-1/+1
|/ | | | | | | | | | | | | | | When the call to nfs_devname() fails, the error path attempts to retain the error via the mnt variable, but this requires a cast across very different types (char * to struct vfsmount *), which the upcoming structure layout randomization plugin flags as being potentially dangerous in the face of randomization. This is a false positive, but what this code actually wants to do is retain the error value, so this patch explicitly sets it, instead of using what seems to be an unexpected cast. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* Merge tag 'nfsd-4.12' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2017-05-101-9/+17
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Another RDMA update from Chuck Lever, and a bunch of miscellaneous bugfixes" * tag 'nfsd-4.12' of git://linux-nfs.org/~bfields/linux: (26 commits) nfsd: Fix up the "supattr_exclcreat" attributes nfsd: encoders mustn't use unitialized values in error cases nfsd: fix undefined behavior in nfsd4_layout_verify lockd: fix lockd shutdown race NFSv4: Fix callback server shutdown SUNRPC: Refactor svc_set_num_threads() NFSv4.x/callback: Create the callback service through svc_create_pooled lockd: remove redundant check on block svcrdma: Clean out old XDR encoders svcrdma: Remove the req_map cache svcrdma: Remove unused RDMA Write completion handler svcrdma: Reduce size of sge array in struct svc_rdma_op_ctxt svcrdma: Clean up RPC-over-RDMA backchannel reply processing svcrdma: Report Write/Reply chunk overruns svcrdma: Clean up RDMA_ERROR path svcrdma: Use rdma_rw API in RPC reply path svcrdma: Introduce local rdma_rw API helpers svcrdma: Clean up svc_rdma_get_inv_rkey() svcrdma: Add helper to save pages under I/O svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT ...
| * NFSv4: Fix callback server shutdownTrond Myklebust2017-04-271-8/+16
| | | | | | | | | | | | | | | | | | | | | | We want to use kthread_stop() in order to ensure the threads are shut down before we tear down the nfs_callback_info in nfs_callback_down. Tested-and-reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Reported-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: bb6aeba736ba9 ("NFSv4.x: Switch to using svc_set_num_threads()...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * NFSv4.x/callback: Create the callback service through svc_create_pooledKinglong Mee2017-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the comments for svc_set_num_threads() said, " Destroying threads relies on the service threads filling in rqstp->rq_task, which only the nfs ones do. Assumes the serv has been created using svc_create_pooled()." If creating service through svc_create(), the svc_pool_map_put() will be called in svc_destroy(), but the pool map isn't used. So that, the reference of pool map will be drop, the next using of pool map will get a zero npools. [ 137.992130] divide error: 0000 [#1] SMP [ 137.992148] Modules linked in: nfsd(E) nfsv4 nfs fscache fuse tun bridge stp llc ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event vmw_balloon coretemp crct10dif_pclmul crc32_pclmul ppdev ghash_clmulni_intel intel_rapl_perf joydev snd_ens1371 gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore parport_pc parport nfit acpi_cpufreq tpm_tis tpm_tis_core tpm vmw_vmci i2c_piix4 shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm crc32c_intel drm e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd] [ 137.992336] CPU: 0 PID: 4514 Comm: rpc.nfsd Tainted: G E 4.11.0-rc8+ #536 [ 137.992777] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 137.993757] task: ffff955984101d00 task.stack: ffff9873c2604000 [ 137.994231] RIP: 0010:svc_pool_for_cpu+0x2b/0x80 [sunrpc] [ 137.994768] RSP: 0018:ffff9873c2607c18 EFLAGS: 00010246 [ 137.995227] RAX: 0000000000000000 RBX: ffff95598376f000 RCX: 0000000000000002 [ 137.995673] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9559944aec00 [ 137.996156] RBP: ffff9873c2607c18 R08: ffff9559944aec28 R09: 0000000000000000 [ 137.996609] R10: 0000000001080002 R11: 0000000000000000 R12: ffff95598376f010 [ 137.997063] R13: ffff95598376f018 R14: ffff9559944aec28 R15: ffff9559944aec00 [ 137.997584] FS: 00007f755529eb40(0000) GS:ffff9559bb600000(0000) knlGS:0000000000000000 [ 137.998048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 137.998548] CR2: 000055f3aecd9660 CR3: 0000000084290000 CR4: 00000000001406f0 [ 137.999052] Call Trace: [ 137.999517] svc_xprt_do_enqueue+0xef/0x260 [sunrpc] [ 138.000028] svc_xprt_received+0x47/0x90 [sunrpc] [ 138.000487] svc_add_new_perm_xprt+0x76/0x90 [sunrpc] [ 138.000981] svc_addsock+0x14b/0x200 [sunrpc] [ 138.001424] ? recalc_sigpending+0x1b/0x50 [ 138.001860] ? __getnstimeofday64+0x41/0xd0 [ 138.002346] ? do_gettimeofday+0x29/0x90 [ 138.002779] write_ports+0x255/0x2c0 [nfsd] [ 138.003202] ? _copy_from_user+0x4e/0x80 [ 138.003676] ? write_recoverydir+0x100/0x100 [nfsd] [ 138.004098] nfsctl_transaction_write+0x48/0x80 [nfsd] [ 138.004544] __vfs_write+0x37/0x160 [ 138.004982] ? selinux_file_permission+0xd7/0x110 [ 138.005401] ? security_file_permission+0x3b/0xc0 [ 138.005865] vfs_write+0xb5/0x1a0 [ 138.006267] SyS_write+0x55/0xc0 [ 138.006654] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 138.007071] RIP: 0033:0x7f7554b9dc30 [ 138.007437] RSP: 002b:00007ffc9f92c788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 138.007807] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f7554b9dc30 [ 138.008168] RDX: 0000000000000002 RSI: 00005640cd536640 RDI: 0000000000000003 [ 138.008573] RBP: 00007ffc9f92c780 R08: 0000000000000001 R09: 0000000000000002 [ 138.008918] R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000000004 [ 138.009254] R13: 00005640cdbf77a0 R14: 00005640cdbf7720 R15: 00007ffc9f92c238 [ 138.009610] Code: 0f 1f 44 00 00 48 8b 87 98 00 00 00 55 48 89 e5 48 83 78 08 00 74 10 8b 05 07 42 02 00 83 f8 01 74 40 83 f8 02 74 19 31 c0 31 d2 <f7> b7 88 00 00 00 5d 89 d0 48 c1 e0 07 48 03 87 90 00 00 00 c3 [ 138.010664] RIP: svc_pool_for_cpu+0x2b/0x80 [sunrpc] RSP: ffff9873c2607c18 [ 138.011061] ---[ end trace b3468224cafa7d11 ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | Merge tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2017-05-1035-2761/+590
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - Fix use after free in write error path - Use GFP_NOIO for two allocations in writeback - Fix a hang in OPEN related to server reboot - Check the result of nfs4_pnfs_ds_connect - Fix an rcu lock leak Features: - Removal of the unmaintained and unused OSD pNFS layout - Cleanup and removal of lots of unnecessary dprintk()s - Cleanup and removal of some memory failure paths now that GFP_NOFS is guaranteed to never fail. - Remove the v3-only data server limitation on pNFS/flexfiles Bugfixes: - RPC/RDMA connection handling bugfixes - Copy offload: fixes to ensure the copied data is COMMITed to disk. - Readdir: switch back to using the ->iterate VFS interface - File locking fixes from Ben Coddington - Various use-after-free and deadlock issues in pNFS - Write path bugfixes" * tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (89 commits) pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled NFSv4.1: Work around a Linux server bug... NFS append COMMIT after synchronous COPY NFSv4: Fix exclusive create attributes encoding NFSv4: Fix an rcu lock leak nfs: use kmap/kunmap directly NFS: always treat the invocation of nfs_getattr as cache hit when noac is on Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commits pNFS: Fix a typo in pnfs_generic_alloc_ds_commits pNFS: Fix a deadlock when coalescing writes and returning the layout pNFS: Don't clear the layout return info if there are segments to return pNFS: Ensure we commit the layout if it has been invalidated pNFS: Don't send COMMITs to the DSes if the server invalidated our layout pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path pNFS: Ensure we check layout validity before marking it for return NFS4.1 handle interrupted slot reuse from ERR_DELAY NFSv4: check return value of xdr_inline_decode nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout() ...
| * | pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabledTrond Myklebust2017-05-091-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | Layoutstats is always desirable when using the flexfiles driver, so we should enable it if that driver is being loaded. It is safe to do so, because even when the mount specifies NFSv4.1, we will turn it off if the server tells us it is unsupported. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4.1: Work around a Linux server bug...Trond Myklebust2017-05-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out the Linux server has a bug in its implementation of supattr_exclcreat; it returns the set of all attributes, whether or not they are supported by minor version 1. In order to avoid a regression, we therefore apply the supported_attrs as a mask on top of whatever the server sent us. Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS append COMMIT after synchronous COPYOlga Kornievskaia2017-05-084-39/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of messing with the commit path which has been causing issues, add a COMMIT op after the COPY and ask for stable copies in the first space. It saves a round trip, since after the COPY, the client sends a COMMIT anyway. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4: Fix exclusive create attributes encodingTrond Myklebust2017-05-081-40/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using NFS4_CREATE_EXCLUSIVE4_1 mode, the client will overestimate the amount of space that it needs for the attributes because it does so before checking whether or not the server supports a given attribute. Fix by checking the attribute mask earlier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4: Fix an rcu lock leakTrond Myklebust2017-05-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intention in the original patch was to release the lock when we put the inode, however something got screwed up. Reported-by: Jason Yan <yanaijie@huawei.com> Fixes: 7b410d9ce460f ("pNFS: Delay getting the layout header in..") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: use kmap/kunmap directlyFabian Frederick2017-05-051-55/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes useless nfs_readdir_get_array() and nfs_readdir_release_array() as suggested by Trond Myklebust nfs_readdir() calls nfs_revalidate_mapping() before readdir_search_pagecache() , nfs_do_filldir(), uncached_readdir() so mapping should be correct. While kmap() can't fail, all subsequent error checks were removed as well as unused labels. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: always treat the invocation of nfs_getattr as cache hit when noac is onHou Tao2017-05-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using 'ls -l' to display a large directory, if noac option is used, in function nfs_getattr() nfs_need_revalidate_inode() will always be true for NFSv3 and the nfs_entry cache of the directory will be flushed. The flush will lead to a fully reread of the directory entries from server. To prevent the unnecessary RPCs, we need to check whether or not the noac option is used, and always report the invocation of nfs_getattr() as cache hit instead cache miss when it's on. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and ↵Dave Wysochanski2017-05-051-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs4_proc_async_renew If memory allocation fails for the callback data, we need to put the nfs_client or we end up with an elevated refcount. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSIONTrond Myklebust2017-05-052-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION because we are trunking, then RECLAIM_COMPLETE must handle that by calling nfs4_schedule_session_recovery() and then retrying. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commitsFred Isaman2017-05-031-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Fix a typo in pnfs_generic_alloc_ds_commitsTrond Myklebust2017-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | If the layout segment is invalid, we want to just resend the remaining writes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Fix a deadlock when coalescing writes and returning the layoutTrond Myklebust2017-05-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following deadlock: Process P1 Process P2 Process P3 ========== ========== ========== lock_page(page) lseg = pnfs_update_layout(inode) lo = NFS_I(inode)->layout pnfs_error_mark_layout_for_return(lo) lock_page(page) lseg = pnfs_update_layout(inode) In this scenario, - P1 has declared the layout to be in error, but P2 holds a reference to a layout segment on that inode, so the layoutreturn is deferred. - P2 is waiting for a page lock held by P3. - P3 is asking for a new layout segment, but is blocked waiting for the layoutreturn. The fix is to ensure that pnfs_error_mark_layout_for_return() does not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow the latter to call LAYOUTGET so that it can make progress and unblock P2. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Don't clear the layout return info if there are segments to returnTrond Myklebust2017-05-021-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout return info if there are new segments queued for return due to, for instance, a race between a LAYOUTRETURN and a failed I/O attempt. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Ensure we commit the layout if it has been invalidatedTrond Myklebust2017-04-293-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | If the layout is being invalidated on the server, then we must invoke nfs_commit_inode() to ensure any commits to the DS get cleared out. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Don't send COMMITs to the DSes if the server invalidated our layoutTrond Myklebust2017-04-291-0/+7
| | | | | | | | | | | | | | | | | | | | | If the layout was invalidated, then assume we should requeue all the pending writes for the DS in question. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure pathTrond Myklebust2017-04-292-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the attempt to write through pNFS fails, we need to use the same failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS flag is set or we have sufficient valid DSes, then we must retry through pNFS Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Ensure we check layout validity before marking it for returnTrond Myklebust2017-04-281-0/+4
| | | | | | | | | | | | | | | | | | | | | pnfs_error_mark_layout_for_return needs to check that the layout is valid before calling pnfs_set_plh_return_info(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS4.1 handle interrupted slot reuse from ERR_DELAYOlga Kornievskaia2017-04-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the RPC slot was interrupted and server replied to the next operation on the "reused" slot with ERR_DELAY, don't clear out the "interrupted" flag until we properly recover. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4: check return value of xdr_inline_decodePan Bian2017-04-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function xdr_inline_decode() will return a NULL pointer if the input buffer does not have long enough buffer to decode nbytes of data. However, in function decode_op_map(), the return value of xdr_inline_decode() is not validated before it is used. This patch adds a check to the return value of xdr_inline_decode(). Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()Artem Savkov2017-04-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling pnfs_put_lset on an IS_ERR pointer results in a NULL pointer dereference like the one below. At the same time the check of retvalue of filelayout_check_deviceid() sets lseg to error, but does not free it before that. [ 3000.636161] BUG: unable to handle kernel NULL pointer dereference at 000000000000003c [ 3000.636970] IP: pnfs_put_lseg+0x29/0x100 [nfsv4] [ 3000.637420] PGD 4f23b067 [ 3000.637421] PUD 4a0f4067 [ 3000.637679] PMD 0 [ 3000.637937] [ 3000.638287] Oops: 0000 [#1] SMP [ 3000.638591] Modules linked in: nfs_layout_nfsv41_files nfsv3 nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc arc4 md4 nls_utf8 cifs ccm dns_resolver rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr virtio_balloon ppdev virtio_rng parport_pc i2c_piix4 parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel ata_piix ttm libata drm serio_raw [ 3000.645245] i2c_core virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: xt_u32] [ 3000.646360] CPU: 1 PID: 26402 Comm: date Not tainted 4.11.0-rc7.1.el7.test.x86_64 #1 [ 3000.647092] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 3000.647638] task: ffff8800415ada00 task.stack: ffffc90000ff0000 [ 3000.648207] RIP: 0010:pnfs_put_lseg+0x29/0x100 [nfsv4] [ 3000.648696] RSP: 0018:ffffc90000ff39b8 EFLAGS: 00010246 [ 3000.649193] RAX: 0000000000000000 RBX: fffffffffffffff4 RCX: 00000000000d43be [ 3000.649859] RDX: 00000000000d43bd RSI: 0000000000000000 RDI: fffffffffffffff4 [ 3000.650530] RBP: ffffc90000ff39d8 R08: 000000000001e320 R09: ffffffffa05c35ce [ 3000.651203] R10: ffff88007fd1e320 R11: ffffea0001283d80 R12: 0000000001400040 [ 3000.651875] R13: ffff88004f77d9f0 R14: ffffc90000ff3cd8 R15: ffff8800417ade00 [ 3000.652546] FS: 00007fac4d5cd740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 3000.653304] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3000.653849] CR2: 000000000000003c CR3: 000000004f080000 CR4: 00000000000406e0 [ 3000.654527] Call Trace: [ 3000.654771] fl_pnfs_update_layout.constprop.20+0x10c/0x150 [nfs_layout_nfsv41_files] [ 3000.655505] filelayout_pg_init_write+0x21d/0x270 [nfs_layout_nfsv41_files] [ 3000.656195] __nfs_pageio_add_request+0x11c/0x490 [nfs] [ 3000.656698] nfs_pageio_add_request+0xac/0x260 [nfs] [ 3000.657180] nfs_do_writepage+0x109/0x2e0 [nfs] [ 3000.657616] nfs_writepages_callback+0x16/0x30 [nfs] [ 3000.658096] write_cache_pages+0x26f/0x510 [ 3000.658495] ? nfs_do_writepage+0x2e0/0x2e0 [nfs] [ 3000.658946] ? _raw_spin_unlock_bh+0x1e/0x20 [ 3000.659357] ? wb_wakeup_delayed+0x5f/0x70 [ 3000.659748] ? __mark_inode_dirty+0x2eb/0x360 [ 3000.660170] nfs_writepages+0x84/0xd0 [nfs] [ 3000.660575] ? nfs_updatepage+0x571/0xb70 [nfs] [ 3000.661012] do_writepages+0x1e/0x30 [ 3000.661358] __filemap_fdatawrite_range+0xc6/0x100 [ 3000.661819] filemap_write_and_wait_range+0x41/0x90 [ 3000.662292] nfs_file_fsync+0x34/0x1f0 [nfs] [ 3000.662704] vfs_fsync_range+0x3d/0xb0 [ 3000.663065] vfs_fsync+0x1c/0x20 [ 3000.663385] nfs4_file_flush+0x57/0x80 [nfsv4] [ 3000.663813] filp_close+0x2f/0x70 [ 3000.664132] __close_fd+0x9a/0xc0 [ 3000.664453] SyS_close+0x23/0x50 [ 3000.664785] do_syscall_64+0x67/0x180 [ 3000.665162] entry_SYSCALL64_slow_path+0x25/0x25 [ 3000.665600] RIP: 0033:0x7fac4d0e1e90 [ 3000.665946] RSP: 002b:00007ffd54e90c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 3000.666679] RAX: ffffffffffffffda RBX: 00007fac4d3b5400 RCX: 00007fac4d0e1e90 [ 3000.667349] RDX: 0000000000000000 RSI: 00007fac4d5d9000 RDI: 0000000000000001 [ 3000.668031] RBP: 0000000000000000 R08: 00007fac4d3b6a00 R09: 00007fac4d5cd740 [ 3000.668709] R10: 00007ffd54e909e0 R11: 0000000000000246 R12: 0000000000000000 [ 3000.669385] R13: 00007fac4d3b5e80 R14: 0000000000000000 R15: 0000000000000000 [ 3000.670061] Code: 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 41 56 41 55 41 54 53 48 89 fb 0f 84 97 00 00 00 f6 05 16 8f bc ff 10 0f 85 a6 00 00 00 <4c> 8b 63 48 48 8d 7b 38 49 8b 84 24 90 00 00 00 4c 8d a8 88 00 [ 3000.671831] RIP: pnfs_put_lseg+0x29/0x100 [nfsv4] RSP: ffffc90000ff39b8 [ 3000.672462] CR2: 000000000000003c Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4: Don't special case "launder"Trond Myklebust2017-04-262-17/+12
| | | | | | | | | | | | | | | | | | | | | If the client receives a fatal server error from nfs_pageio_add_request(), then we should always truncate the page on which the error occurred. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Add a few more fatal I/O errors to nfs_error_is_fatal()Trond Myklebust2017-04-261-0/+4
| | | | | | | | | | | | | | | | | | | | | EACCES, EDQUOT, EFBIG and ESTALE are all fatal errors as far as NFS I/O is concerned. They need to be reported back to the application. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv3: nfs3_nlm_alloc_call should be declared staticTrond Myklebust2017-04-251-3/+3
| | | | | | | | | | | | | | | | | | Fix compiler warnings. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Don't write back further requests if there is a pending write errorTrond Myklebust2017-04-251-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the server has already returned a fatal write error that the user has not yet received on this file, then don't write back the other pages. Instead, act as if they have been sent, and have returned with the same error. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Fix use after free issues in pnfs_do_read()Trond Myklebust2017-04-251-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | The assumption should be that if the caller returns PNFS_ATTEMPTED, then hdr has been consumed, and so we should not be testing hdr->task.tk_status. If the caller returns PNFS_TRY_AGAIN, then we need to recoalesce and free hdr. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Ensure we check layout segment validity in the pg_init() callbackTrond Myklebust2017-04-254-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Always wait for I/O completion before unlockBenjamin Coddington2017-04-213-6/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS attempts to wait for read and write completion before unlocking in order to ensure that the data returned was protected by the lock. When this waiting is interrupted by a signal, the unlock may be skipped, and messages similar to the following are seen in the kernel ring buffer: [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3: [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 fl_pid=20183 [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 fl_pid=20185 For NFSv3, the missing unlock will cause the server to refuse conflicting locks indefinitely. For NFSv4, the leftover lock will be removed by the server after the lease timeout. This patch fixes this issue by skipping the usual wait in nfs_iocounter_wait if the FL_CLOSE flag is set when signaled. Instead, the wait happens in the unlock RPC task on the NFS UOC rpc_waitqueue. For NFSv3, use lockd's new nlmclnt_operations along with nfs_async_iocounter_wait to defer NLM's unlock task until the lock context's iocounter reaches zero. For NFSv4, call nfs_async_iocounter_wait() directly from unlock's current rpc_call_prepare. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
OpenPOWER on IntegriCloud