From a6bbce54efa9145dbcf3029c885549f7ebc40a3b Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 29 Oct 2014 08:22:18 +1100 Subject: xfs: bulkstat doesn't release AGI buffer on error The recent refactoring of the bulkstat code left a small landmine in the code. If a inobt read fails, then the tree walk is aborted and returns without releasing the AGI buffer or freeing the cursor. This can lead to a subsequent bulkstat call hanging trying to grab the AGI buffer again. cc: Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Reviewed-by: Eric Sandeen Signed-off-by: Dave Chinner --- fs/xfs/xfs_itable.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index f1deb961a296..ef8ea0589780 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -427,7 +427,7 @@ xfs_bulkstat( error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r); if (error) - break; + goto del_cursor; if (icount) { irbp->ir_startino = r.ir_startino; irbp->ir_freecount = r.ir_freecount; @@ -442,7 +442,7 @@ xfs_bulkstat( error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &tmp); } if (error) - break; + goto del_cursor; /* * Loop through inode btree records in this ag, @@ -454,7 +454,7 @@ xfs_bulkstat( error = xfs_inobt_get_rec(cur, &r, &i); if (error || i == 0) { end_of_ag = 1; - break; + goto del_cursor; } /* @@ -476,13 +476,17 @@ xfs_bulkstat( error = xfs_btree_increment(cur, 0, &tmp); cond_resched(); } + /* - * Drop the btree buffers and the agi buffer. - * We can't hold any of the locks these represent - * when calling iget. + * Drop the btree buffers and the agi buffer as we can't hold any + * of the locks these represent when calling iget. If there is a + * pending error, then we are done. */ +del_cursor: xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR); xfs_buf_relse(agbp); + if (error) + break; /* * Now format all the good inodes into the user's buffer. */ -- cgit v1.2.1 From 7a19dee116c8fae7ba7a778043c245194289f5a2 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 30 Oct 2014 10:34:52 +1100 Subject: xfs: Check error during inode btree iteration in xfs_bulkstat() xfs_bulkstat() doesn't check error return from xfs_btree_increment(). In case of specific fs corruption that could result in xfs_bulkstat() entering an infinite loop because we would be looping over the same chunk over and over again. Fix the problem by checking the return value and terminating the loop properly. Coverity-id: 1231338 cc: Signed-off-by: Jan Kara Reviewed-by: Jie Liu Signed-off-by: Dave Chinner --- fs/xfs/xfs_itable.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'fs') diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index ef8ea0589780..7765ff743e91 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -474,6 +474,10 @@ xfs_bulkstat( */ agino = r.ir_startino + XFS_INODES_PER_CHUNK; error = xfs_btree_increment(cur, 0, &tmp); + if (error) { + end_of_ag = 1; + goto del_cursor; + } cond_resched(); } -- cgit v1.2.1 From 5d11fb4b9a1d90983452c029b5e1377af78fda49 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Thu, 30 Oct 2014 10:35:11 +1100 Subject: xfs: rework zero range to prevent invalid i_size updates The zero range operation is analogous to fallocate with the exception of converting the range to zeroes. E.g., it attempts to allocate zeroed blocks over the range specified by the caller. The XFS implementation kills all delalloc blocks currently over the aligned range, converts the range to allocated zero blocks (unwritten extents) and handles the partial pages at the ends of the range by sending writes through the pagecache. The current implementation suffers from several problems associated with inode size. If the aligned range covers an extending I/O, said I/O is discarded and an inode size update from a previous write never makes it to disk. Further, if an unaligned zero range extends beyond eof, the page write induced for the partial end page can itself increase the inode size, even if the zero range request is not supposed to update i_size (via KEEP_SIZE, similar to an fallocate beyond EOF). The latter behavior not only incorrectly increases the inode size, but can lead to stray delalloc blocks on the inode. Typically, post-eof preallocation blocks are either truncated on release or inode eviction or explicitly written to by xfs_zero_eof() on natural file size extension. If the inode size increases due to zero range, however, associated blocks leak into the address space having never been converted or mapped to pagecache pages. A direct I/O to such an uncovered range cannot convert the extent via writeback and will BUG(). For example: $ xfs_io -fc "pwrite 0 128k" -c "fzero -k 1m 54321" ... $ xfs_io -d -c "pread 128k 128k" If the entire delalloc extent happens to not have page coverage whatsoever (e.g., delalloc conversion couldn't find a large enough free space extent), even a full file writeback won't convert what's left of the extent and we'll assert on inode eviction. Rework xfs_zero_file_space() to avoid buffered I/O for partial pages. Use the existing hole punch and prealloc mechanisms as primitives for zero range. This implementation is not efficient nor ideal as we writeback dirty data over the range and remove existing extents rather than convert to unwrittern. The former writeback, however, is currently the only mechanism available to ensure consistency between pagecache and extent state. Even a pagecache truncate/delalloc punch prior to hole punch has lead to inconsistencies due to racing with writeback. This provides a consistent, correct implementation of zero range that survives fsstress/fsx testing without assert failures. The implementation can be optimized from this point forward once the fundamental issue of pagecache and delalloc extent state consistency is addressed. Signed-off-by: Brian Foster Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/xfs_bmap_util.c | 72 ++++++++++++++------------------------------------ 1 file changed, 20 insertions(+), 52 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 92e8f99a5857..281002689d64 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1338,7 +1338,10 @@ xfs_free_file_space( goto out; } - +/* + * Preallocate and zero a range of a file. This mechanism has the allocation + * semantics of fallocate and in addition converts data in the range to zeroes. + */ int xfs_zero_file_space( struct xfs_inode *ip, @@ -1346,65 +1349,30 @@ xfs_zero_file_space( xfs_off_t len) { struct xfs_mount *mp = ip->i_mount; - uint granularity; - xfs_off_t start_boundary; - xfs_off_t end_boundary; + uint blksize; int error; trace_xfs_zero_file_space(ip); - granularity = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE); + blksize = 1 << mp->m_sb.sb_blocklog; /* - * Round the range of extents we are going to convert inwards. If the - * offset is aligned, then it doesn't get changed so we zero from the - * start of the block offset points to. + * Punch a hole and prealloc the range. We use hole punch rather than + * unwritten extent conversion for two reasons: + * + * 1.) Hole punch handles partial block zeroing for us. + * + * 2.) If prealloc returns ENOSPC, the file range is still zero-valued + * by virtue of the hole punch. */ - start_boundary = round_up(offset, granularity); - end_boundary = round_down(offset + len, granularity); - - ASSERT(start_boundary >= offset); - ASSERT(end_boundary <= offset + len); - - if (start_boundary < end_boundary - 1) { - /* - * Writeback the range to ensure any inode size updates due to - * appending writes make it to disk (otherwise we could just - * punch out the delalloc blocks). - */ - error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, - start_boundary, end_boundary - 1); - if (error) - goto out; - truncate_pagecache_range(VFS_I(ip), start_boundary, - end_boundary - 1); - - /* convert the blocks */ - error = xfs_alloc_file_space(ip, start_boundary, - end_boundary - start_boundary - 1, - XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT); - if (error) - goto out; - - /* We've handled the interior of the range, now for the edges */ - if (start_boundary != offset) { - error = xfs_iozero(ip, offset, start_boundary - offset); - if (error) - goto out; - } - - if (end_boundary != offset + len) - error = xfs_iozero(ip, end_boundary, - offset + len - end_boundary); - - } else { - /* - * It's either a sub-granularity range or the range spanned lies - * partially across two adjacent blocks. - */ - error = xfs_iozero(ip, offset, len); - } + error = xfs_free_file_space(ip, offset, len); + if (error) + goto out; + error = xfs_alloc_file_space(ip, round_down(offset, blksize), + round_up(offset + len, blksize) - + round_down(offset, blksize), + XFS_BMAPI_PREALLOC); out: return error; -- cgit v1.2.1 From 6e5aafb27419f32575b27ef9d6a31e5d54661aca Mon Sep 17 00:00:00 2001 From: Chris Mason Date: Tue, 4 Nov 2014 06:59:04 -0800 Subject: Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanup If we hit any errors in btrfs_lookup_csums_range, we'll loop through all the csums we allocate and free them. But the code was using list_entry incorrectly, and ended up trying to free the on-stack list_head instead. This bug came from commit 0678b6185 btrfs: Don't BUG_ON kzalloc error in btrfs_lookup_csums_range() Signed-off-by: Chris Mason Reported-by: Erik Berg cc: stable@vger.kernel.org # 3.3 or newer --- fs/btrfs/file-item.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 783a94355efd..84a2d1868271 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -413,7 +413,7 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end, ret = 0; fail: while (ret < 0 && !list_empty(&tmplist)) { - sums = list_entry(&tmplist, struct btrfs_ordered_sum, list); + sums = list_entry(tmplist.next, struct btrfs_ordered_sum, list); list_del(&sums->list); kfree(sums); } -- cgit v1.2.1 From 809fd143de8805970eec02c27c0bc2622a6ecbda Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 23 Oct 2014 19:33:14 +0300 Subject: NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT If the OPEN rpc call to the server fails with an ENOENT call, nfs_atomic_open will create a negative dentry for that file, however it currently fails to call nfs_set_verifier(), thus causing the dentry to be immediately revalidated on the next call to nfs_lookup_revalidate() instead of following the usual lookup caching rules. Signed-off-by: Trond Myklebust --- fs/nfs/dir.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 06e8cfcbb670..6e62155abf26 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1527,6 +1527,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry, case -ENOENT: d_drop(dentry); d_add(dentry, NULL); + nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); break; case -EISDIR: case -ENOTDIR: -- cgit v1.2.1 From 7488cbc2568391d5e0b2bda8902a96b5dd7b1ea7 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 23 Oct 2014 19:22:31 +0300 Subject: Revert "NFS: remove BUG possibility in nfs4_open_and_get_state" This reverts commit f39c01047994e66e7f3d89ddb4c6141f23349d8d. --- fs/nfs/nfs4proc.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 405bd95c1f58..8026197e2b9f 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2233,13 +2233,9 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata, ret = _nfs4_proc_open(opendata); if (ret != 0) { if (ret == -ENOENT) { - dentry = opendata->dentry; - if (dentry->d_inode) - d_delete(dentry); - else if (d_unhashed(dentry)) - d_add(dentry, NULL); - - nfs_set_verifier(dentry, + d_drop(opendata->dentry); + d_add(opendata->dentry, NULL); + nfs_set_verifier(opendata->dentry, nfs_save_change_attribute(opendata->dir->d_inode)); } goto out; -- cgit v1.2.1 From dca780016dab84d6ac500b1d84fdfe1628802a59 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 23 Oct 2014 19:23:03 +0300 Subject: Revert "NFS: nfs4_do_open should add negative results to the dcache." This reverts commit 4fa2c54b5198d09607a534e2fd436581064587ed. --- fs/nfs/nfs4proc.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 8026197e2b9f..41b8fcbfdadd 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2231,15 +2231,8 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata, seq = raw_seqcount_begin(&sp->so_reclaim_seqcount); ret = _nfs4_proc_open(opendata); - if (ret != 0) { - if (ret == -ENOENT) { - d_drop(opendata->dentry); - d_add(opendata->dentry, NULL); - nfs_set_verifier(opendata->dentry, - nfs_save_change_attribute(opendata->dir->d_inode)); - } + if (ret != 0) goto out; - } state = nfs4_opendata_to_nfs4_state(opendata); ret = PTR_ERR(state); -- cgit v1.2.1 From afa947cb52a8e73fe71915a0b0af6fcf98dfbe1a Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Fri, 7 Nov 2014 08:29:57 +1100 Subject: xfs: bulkstat btree walk doesn't terminate The bulkstat code has several different ways of detecting the end of an AG when doing a walk. They are not consistently detected, and the code that checks for the end of AG conditions is not consistently coded. Hence the are conditions where the walk code can get stuck in an endless loop making no progress and not triggering any termination conditions. Convert all the "tmp/i" status return codes from btree operations to a common name (stat) and apply end-of-ag detection to these operations consistently. cc: # 3.17 Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner --- fs/xfs/xfs_itable.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index 7765ff743e91..16737cbbee17 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -356,7 +356,6 @@ xfs_bulkstat( int end_of_ag; /* set if we've seen the ag end */ int error; /* error code */ int fmterror;/* bulkstat formatter result */ - int i; /* loop index */ int icount; /* count of inodes good in irbuf */ size_t irbsize; /* size of irec buffer in bytes */ xfs_ino_t ino; /* inode number (filesystem) */ @@ -366,11 +365,11 @@ xfs_bulkstat( xfs_ino_t lastino; /* last inode number returned */ int nirbuf; /* size of irbuf */ int rval; /* return value error code */ - int tmp; /* result value from btree calls */ int ubcount; /* size of user's buffer */ int ubleft; /* bytes left in user's buffer */ char __user *ubufp; /* pointer into user's buffer */ int ubelem; /* spaces used in user's buffer */ + int stat; /* * Get the last inode value, see if there's nothing to do. @@ -436,13 +435,15 @@ xfs_bulkstat( agino = r.ir_startino + XFS_INODES_PER_CHUNK; } /* Increment to the next record */ - error = xfs_btree_increment(cur, 0, &tmp); + error = xfs_btree_increment(cur, 0, &stat); } else { /* Start of ag. Lookup the first inode chunk */ - error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &tmp); + error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat); } - if (error) + if (error || stat == 0) { + end_of_ag = 1; goto del_cursor; + } /* * Loop through inode btree records in this ag, @@ -451,8 +452,8 @@ xfs_bulkstat( while (irbp < irbufend && icount < ubcount) { struct xfs_inobt_rec_incore r; - error = xfs_inobt_get_rec(cur, &r, &i); - if (error || i == 0) { + error = xfs_inobt_get_rec(cur, &r, &stat); + if (error || stat == 0) { end_of_ag = 1; goto del_cursor; } @@ -473,8 +474,8 @@ xfs_bulkstat( * Set agino to after this chunk and bump the cursor. */ agino = r.ir_startino + XFS_INODES_PER_CHUNK; - error = xfs_btree_increment(cur, 0, &tmp); - if (error) { + error = xfs_btree_increment(cur, 0, &stat); + if (error || stat == 0) { end_of_ag = 1; goto del_cursor; } -- cgit v1.2.1 From bf4a5af20d25ecc8876978ad34b8db83b4235f3c Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Fri, 7 Nov 2014 08:30:30 +1100 Subject: xfs: bulkstat chunk formatting cursor is broken The xfs_bulkstat_agichunk formatting cursor takes buffer values from the main loop and passes them via the structure to the chunk formatter, and the writes the changed values back into the main loop local variables. Unfortunately, this complex dance is full of corner cases that aren't handled correctly. The biggest problem is that it is double handling the information in both the main loop and the chunk formatting function, leading to inconsistent updates and endless loops where progress is not made. To fix this, push the struct xfs_bulkstat_agichunk outwards to be the primary holder of user buffer information. this removes the double handling in the main loop. Also, pass the last inode processed by the chunk formatter as a separate parameter as it purely an output variable and is not related to the user buffer consumption cursor. Finally, the chunk formatting code is not shared by anyone, so make it local to xfs_itable.c. cc: # 3.17 Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner --- fs/xfs/xfs_itable.c | 59 +++++++++++++++++++++++++---------------------------- fs/xfs/xfs_itable.h | 16 --------------- 2 files changed, 28 insertions(+), 47 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index 16737cbbee17..50a3e5995dd9 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -262,20 +262,26 @@ xfs_bulkstat_grab_ichunk( #define XFS_BULKSTAT_UBLEFT(ubleft) ((ubleft) >= statstruct_size) +struct xfs_bulkstat_agichunk { + char __user **ac_ubuffer;/* pointer into user's buffer */ + int ac_ubleft; /* bytes left in user's buffer */ + int ac_ubelem; /* spaces used in user's buffer */ +}; + /* * Process inodes in chunk with a pointer to a formatter function * that will iget the inode and fill in the appropriate structure. */ -int +static int xfs_bulkstat_ag_ichunk( struct xfs_mount *mp, xfs_agnumber_t agno, struct xfs_inobt_rec_incore *irbp, bulkstat_one_pf formatter, size_t statstruct_size, - struct xfs_bulkstat_agichunk *acp) + struct xfs_bulkstat_agichunk *acp, + xfs_ino_t *lastino) { - xfs_ino_t lastino = acp->ac_lastino; char __user **ubufp = acp->ac_ubuffer; int ubleft = acp->ac_ubleft; int ubelem = acp->ac_ubelem; @@ -295,7 +301,7 @@ xfs_bulkstat_ag_ichunk( /* Skip if this inode is free */ if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free) { - lastino = ino; + *lastino = ino; continue; } @@ -313,7 +319,7 @@ xfs_bulkstat_ag_ichunk( ubleft = 0; break; } - lastino = ino; + *lastino = ino; continue; } if (fmterror == BULKSTAT_RV_GIVEUP) { @@ -325,10 +331,9 @@ xfs_bulkstat_ag_ichunk( *ubufp += ubused; ubleft -= ubused; ubelem++; - lastino = ino; + *lastino = ino; } - acp->ac_lastino = lastino; acp->ac_ubleft = ubleft; acp->ac_ubelem = ubelem; @@ -355,7 +360,6 @@ xfs_bulkstat( xfs_btree_cur_t *cur; /* btree cursor for ialloc btree */ int end_of_ag; /* set if we've seen the ag end */ int error; /* error code */ - int fmterror;/* bulkstat formatter result */ int icount; /* count of inodes good in irbuf */ size_t irbsize; /* size of irec buffer in bytes */ xfs_ino_t ino; /* inode number (filesystem) */ @@ -366,10 +370,8 @@ xfs_bulkstat( int nirbuf; /* size of irbuf */ int rval; /* return value error code */ int ubcount; /* size of user's buffer */ - int ubleft; /* bytes left in user's buffer */ - char __user *ubufp; /* pointer into user's buffer */ - int ubelem; /* spaces used in user's buffer */ int stat; + struct xfs_bulkstat_agichunk ac; /* * Get the last inode value, see if there's nothing to do. @@ -386,11 +388,13 @@ xfs_bulkstat( } ubcount = *ubcountp; /* statstruct's */ - ubleft = ubcount * statstruct_size; /* bytes */ - *ubcountp = ubelem = 0; + ac.ac_ubuffer = &ubuffer; + ac.ac_ubleft = ubcount * statstruct_size; /* bytes */; + ac.ac_ubelem = 0; + + *ubcountp = 0; *done = 0; - fmterror = 0; - ubufp = ubuffer; + irbuf = kmem_zalloc_greedy(&irbsize, PAGE_SIZE, PAGE_SIZE * 4); if (!irbuf) return -ENOMEM; @@ -402,7 +406,7 @@ xfs_bulkstat( * inode returned; 0 means start of the allocation group. */ rval = 0; - while (XFS_BULKSTAT_UBLEFT(ubleft) && agno < mp->m_sb.sb_agcount) { + while (XFS_BULKSTAT_UBLEFT(ac.ac_ubleft) && agno < mp->m_sb.sb_agcount) { cond_resched(); error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp); if (error) @@ -497,28 +501,21 @@ del_cursor: */ irbufend = irbp; for (irbp = irbuf; - irbp < irbufend && XFS_BULKSTAT_UBLEFT(ubleft); irbp++) { - struct xfs_bulkstat_agichunk ac; - - ac.ac_lastino = lastino; - ac.ac_ubuffer = &ubuffer; - ac.ac_ubleft = ubleft; - ac.ac_ubelem = ubelem; + irbp < irbufend && XFS_BULKSTAT_UBLEFT(ac.ac_ubleft); + irbp++) { error = xfs_bulkstat_ag_ichunk(mp, agno, irbp, - formatter, statstruct_size, &ac); + formatter, statstruct_size, &ac, + &lastino); if (error) rval = error; - lastino = ac.ac_lastino; - ubleft = ac.ac_ubleft; - ubelem = ac.ac_ubelem; - cond_resched(); } + /* * Set up for the next loop iteration. */ - if (XFS_BULKSTAT_UBLEFT(ubleft)) { + if (XFS_BULKSTAT_UBLEFT(ac.ac_ubleft)) { if (end_of_ag) { agno++; agino = 0; @@ -531,11 +528,11 @@ del_cursor: * Done, we're either out of filesystem or space to put the data. */ kmem_free(irbuf); - *ubcountp = ubelem; + *ubcountp = ac.ac_ubelem; /* * Found some inodes, return them now and return the error next time. */ - if (ubelem) + if (ac.ac_ubelem) rval = 0; if (agno >= mp->m_sb.sb_agcount) { /* diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h index aaed08022eb9..6ea8b3912fa4 100644 --- a/fs/xfs/xfs_itable.h +++ b/fs/xfs/xfs_itable.h @@ -30,22 +30,6 @@ typedef int (*bulkstat_one_pf)(struct xfs_mount *mp, int *ubused, int *stat); -struct xfs_bulkstat_agichunk { - xfs_ino_t ac_lastino; /* last inode returned */ - char __user **ac_ubuffer;/* pointer into user's buffer */ - int ac_ubleft; /* bytes left in user's buffer */ - int ac_ubelem; /* spaces used in user's buffer */ -}; - -int -xfs_bulkstat_ag_ichunk( - struct xfs_mount *mp, - xfs_agnumber_t agno, - struct xfs_inobt_rec_incore *irbp, - bulkstat_one_pf formatter, - size_t statstruct_size, - struct xfs_bulkstat_agichunk *acp); - /* * Values for stat return value. */ -- cgit v1.2.1 From 2b831ac6bc87d3cbcbb1a8816827b6923403e461 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Fri, 7 Nov 2014 08:30:58 +1100 Subject: xfs: bulkstat chunk-formatter has issues The loop construct has issues: - clustidx is completely unused, so remove it. - the loop tries to be smart by terminating when the "freecount" tells it that all inodes are free. Just drop it as in most cases we have to scan all inodes in the chunk anyway. - move the "user buffer left" condition check to the only point where we consume space int eh user buffer. - move the initialisation of agino out of the loop, leaving just a simple loop control logic using the clusteridx. Also, double handling of the user buffer variables leads to problems tracking the current state - use the cursor variables directly rather than keeping local copies and then having to update the cursor before returning. cc: # 3.17 Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner --- fs/xfs/xfs_itable.c | 58 ++++++++++++++++++++++------------------------------- 1 file changed, 24 insertions(+), 34 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index 50a3e5995dd9..7ea2b113db1b 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -283,59 +283,49 @@ xfs_bulkstat_ag_ichunk( xfs_ino_t *lastino) { char __user **ubufp = acp->ac_ubuffer; - int ubleft = acp->ac_ubleft; - int ubelem = acp->ac_ubelem; - int chunkidx, clustidx; + int chunkidx; int error = 0; xfs_agino_t agino; - for (agino = irbp->ir_startino, chunkidx = clustidx = 0; - XFS_BULKSTAT_UBLEFT(ubleft) && - irbp->ir_freecount < XFS_INODES_PER_CHUNK; - chunkidx++, clustidx++, agino++) { - int fmterror; /* bulkstat formatter result */ + agino = irbp->ir_startino; + for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK; + chunkidx++, agino++) { + int fmterror; int ubused; xfs_ino_t ino = XFS_AGINO_TO_INO(mp, agno, agino); - ASSERT(chunkidx < XFS_INODES_PER_CHUNK); - /* Skip if this inode is free */ if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free) { *lastino = ino; continue; } - /* - * Count used inodes as free so we can tell when the - * chunk is used up. - */ - irbp->ir_freecount++; - /* Get the inode and fill in a single buffer */ ubused = statstruct_size; - error = formatter(mp, ino, *ubufp, ubleft, &ubused, &fmterror); - if (fmterror == BULKSTAT_RV_NOTHING) { - if (error && error != -ENOENT && error != -EINVAL) { - ubleft = 0; - break; - } - *lastino = ino; - continue; - } - if (fmterror == BULKSTAT_RV_GIVEUP) { - ubleft = 0; + error = formatter(mp, ino, *ubufp, acp->ac_ubleft, + &ubused, &fmterror); + if (fmterror == BULKSTAT_RV_GIVEUP || + (error && error != -ENOENT && error != -EINVAL)) { + acp->ac_ubleft = 0; ASSERT(error); break; } - if (*ubufp) - *ubufp += ubused; - ubleft -= ubused; - ubelem++; + + /* be careful not to leak error if at end of chunk */ + if (fmterror == BULKSTAT_RV_NOTHING || error) { + *lastino = ino; + error = 0; + continue; + } + + *ubufp += ubused; + acp->ac_ubleft -= ubused; + acp->ac_ubelem++; *lastino = ino; - } - acp->ac_ubleft = ubleft; - acp->ac_ubelem = ubelem; + if (acp->ac_ubleft < statstruct_size) + break; + } return error; } -- cgit v1.2.1 From 6e57c542cb7e0e580eb53ae76a77875c7d92b4b1 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Fri, 7 Nov 2014 08:31:13 +1100 Subject: xfs: bulkstat main loop logic is a mess There are a bunch of variables tha tare more wildy scoped than they need to be, obfuscated user buffer checks and tortured "next inode" tracking. This all needs cleaning up to expose the real issues that need fixing. cc: # 3.17 Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner --- fs/xfs/xfs_itable.c | 56 +++++++++++++++++++++++------------------------------ 1 file changed, 24 insertions(+), 32 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index 7ea2b113db1b..acae3355ab22 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -348,30 +348,23 @@ xfs_bulkstat( xfs_agino_t agino; /* inode # in allocation group */ xfs_agnumber_t agno; /* allocation group number */ xfs_btree_cur_t *cur; /* btree cursor for ialloc btree */ - int end_of_ag; /* set if we've seen the ag end */ - int error; /* error code */ - int icount; /* count of inodes good in irbuf */ size_t irbsize; /* size of irec buffer in bytes */ - xfs_ino_t ino; /* inode number (filesystem) */ - xfs_inobt_rec_incore_t *irbp; /* current irec buffer pointer */ xfs_inobt_rec_incore_t *irbuf; /* start of irec buffer */ - xfs_inobt_rec_incore_t *irbufend; /* end of good irec buffer entries */ xfs_ino_t lastino; /* last inode number returned */ int nirbuf; /* size of irbuf */ int rval; /* return value error code */ int ubcount; /* size of user's buffer */ - int stat; struct xfs_bulkstat_agichunk ac; + int error = 0; /* * Get the last inode value, see if there's nothing to do. */ - ino = (xfs_ino_t)*lastinop; - lastino = ino; - agno = XFS_INO_TO_AGNO(mp, ino); - agino = XFS_INO_TO_AGINO(mp, ino); + lastino = *lastinop; + agno = XFS_INO_TO_AGNO(mp, lastino); + agino = XFS_INO_TO_AGINO(mp, lastino); if (agno >= mp->m_sb.sb_agcount || - ino != XFS_AGINO_TO_INO(mp, agno, agino)) { + lastino != XFS_AGINO_TO_INO(mp, agno, agino)) { *done = 1; *ubcountp = 0; return 0; @@ -396,8 +389,13 @@ xfs_bulkstat( * inode returned; 0 means start of the allocation group. */ rval = 0; - while (XFS_BULKSTAT_UBLEFT(ac.ac_ubleft) && agno < mp->m_sb.sb_agcount) { - cond_resched(); + while (agno < mp->m_sb.sb_agcount) { + struct xfs_inobt_rec_incore *irbp = irbuf; + struct xfs_inobt_rec_incore *irbufend = irbuf + nirbuf; + bool end_of_ag = false; + int icount = 0; + int stat; + error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp); if (error) break; @@ -407,10 +405,6 @@ xfs_bulkstat( */ cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno, XFS_BTNUM_INO); - irbp = irbuf; - irbufend = irbuf + nirbuf; - end_of_ag = 0; - icount = 0; if (agino > 0) { /* * In the middle of an allocation group, we need to get @@ -435,7 +429,7 @@ xfs_bulkstat( error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat); } if (error || stat == 0) { - end_of_ag = 1; + end_of_ag = true; goto del_cursor; } @@ -448,7 +442,7 @@ xfs_bulkstat( error = xfs_inobt_get_rec(cur, &r, &stat); if (error || stat == 0) { - end_of_ag = 1; + end_of_ag = true; goto del_cursor; } @@ -470,7 +464,7 @@ xfs_bulkstat( agino = r.ir_startino + XFS_INODES_PER_CHUNK; error = xfs_btree_increment(cur, 0, &stat); if (error || stat == 0) { - end_of_ag = 1; + end_of_ag = true; goto del_cursor; } cond_resched(); @@ -491,7 +485,7 @@ del_cursor: */ irbufend = irbp; for (irbp = irbuf; - irbp < irbufend && XFS_BULKSTAT_UBLEFT(ac.ac_ubleft); + irbp < irbufend && ac.ac_ubleft >= statstruct_size; irbp++) { error = xfs_bulkstat_ag_ichunk(mp, agno, irbp, formatter, statstruct_size, &ac, @@ -502,17 +496,15 @@ del_cursor: cond_resched(); } - /* - * Set up for the next loop iteration. - */ - if (XFS_BULKSTAT_UBLEFT(ac.ac_ubleft)) { - if (end_of_ag) { - agno++; - agino = 0; - } else - agino = XFS_INO_TO_AGINO(mp, lastino); - } else + /* If we've run out of space, we are done */ + if (ac.ac_ubleft < statstruct_size) break; + + if (end_of_ag) { + agno++; + agino = 0; + } else + agino = XFS_INO_TO_AGINO(mp, lastino); } /* * Done, we're either out of filesystem or space to put the data. -- cgit v1.2.1 From febe3cbe38b0bc0a925906dc90e8d59048851f87 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Fri, 7 Nov 2014 08:31:15 +1100 Subject: xfs: bulkstat error handling is broken The error propagation is a horror - xfs_bulkstat() returns a rval variable which is only set if there are formatter errors. Any sort of btree walk error or corruption will cause the bulkstat walk to terminate but will not pass an error back to userspace. Worse is the fact that formatter errors will also be ignored if any inodes were correctly formatted into the user buffer. Hence bulkstat can fail badly yet still report success to userspace. This causes significant issues with xfsdump not dumping everything in the filesystem yet reporting success. It's not until a restore fails that there is any indication that the dump was bad and tha bulkstat failed. This patch now triggers xfsdump to fail with bulkstat errors rather than silently missing files in the dump. This now causes bulkstat to fail when the lastino cookie does not fall inside an existing inode chunk. The pre-3.17 code tolerated that error by allowing the code to move to the next inode chunk as the agino target is guaranteed to fall into the next btree record. With the fixes up to this point in the series, xfsdump now passes on the troublesome filesystem image that exposes all these bugs. cc: Signed-off-by: Dave Chinner Reviewed-by: Brian Foster --- fs/xfs/xfs_itable.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index acae3355ab22..ff3f431671b9 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -236,8 +236,10 @@ xfs_bulkstat_grab_ichunk( XFS_WANT_CORRUPTED_RETURN(stat == 1); /* Check if the record contains the inode in request */ - if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino) - return -EINVAL; + if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino) { + *icount = 0; + return 0; + } idx = agino - irec->ir_startino + 1; if (idx < XFS_INODES_PER_CHUNK && @@ -352,7 +354,6 @@ xfs_bulkstat( xfs_inobt_rec_incore_t *irbuf; /* start of irec buffer */ xfs_ino_t lastino; /* last inode number returned */ int nirbuf; /* size of irbuf */ - int rval; /* return value error code */ int ubcount; /* size of user's buffer */ struct xfs_bulkstat_agichunk ac; int error = 0; @@ -388,7 +389,6 @@ xfs_bulkstat( * Loop over the allocation groups, starting from the last * inode returned; 0 means start of the allocation group. */ - rval = 0; while (agno < mp->m_sb.sb_agcount) { struct xfs_inobt_rec_incore *irbp = irbuf; struct xfs_inobt_rec_incore *irbufend = irbuf + nirbuf; @@ -491,13 +491,16 @@ del_cursor: formatter, statstruct_size, &ac, &lastino); if (error) - rval = error; + break; cond_resched(); } - /* If we've run out of space, we are done */ - if (ac.ac_ubleft < statstruct_size) + /* + * If we've run out of space or had a formatting error, we + * are now done + */ + if (ac.ac_ubleft < statstruct_size || error) break; if (end_of_ag) { @@ -511,11 +514,17 @@ del_cursor: */ kmem_free(irbuf); *ubcountp = ac.ac_ubelem; + /* - * Found some inodes, return them now and return the error next time. + * We found some inodes, so clear the error status and return them. + * The lastino pointer will point directly at the inode that triggered + * any error that occurred, so on the next call the error will be + * triggered again and propagated to userspace as there will be no + * formatted inodes in the buffer. */ if (ac.ac_ubelem) - rval = 0; + error = 0; + if (agno >= mp->m_sb.sb_agcount) { /* * If we ran out of filesystem, mark lastino as off @@ -527,7 +536,7 @@ del_cursor: } else *lastinop = (xfs_ino_t)lastino; - return rval; + return error; } int -- cgit v1.2.1 From 002758992693ae63c04122603ea9261a0a58d728 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Fri, 7 Nov 2014 08:33:52 +1100 Subject: xfs: track bulkstat progress by agino The bulkstat main loop progress is tracked by the "lastino" variable, which is a full 64 bit inode. However, the loop actually works on agno/agino pairs, and so there's a significant disconnect between the rest of the loop and the main cursor. Convert this to use the agino, and pass the agino into the chunk formatting function and convert it too. This gets rid of the inconsistency in the loop processing, and finally makes it simple for us to skip inodes at any point in the loop simply by incrementing the agino cursor. cc: # 3.17 Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner --- fs/xfs/xfs_itable.c | 71 +++++++++++++++++++++++++---------------------------- 1 file changed, 34 insertions(+), 37 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index ff3f431671b9..894924a5129b 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -282,30 +282,31 @@ xfs_bulkstat_ag_ichunk( bulkstat_one_pf formatter, size_t statstruct_size, struct xfs_bulkstat_agichunk *acp, - xfs_ino_t *lastino) + xfs_agino_t *last_agino) { char __user **ubufp = acp->ac_ubuffer; int chunkidx; int error = 0; - xfs_agino_t agino; + xfs_agino_t agino = irbp->ir_startino; - agino = irbp->ir_startino; for (chunkidx = 0; chunkidx < XFS_INODES_PER_CHUNK; chunkidx++, agino++) { int fmterror; int ubused; - xfs_ino_t ino = XFS_AGINO_TO_INO(mp, agno, agino); + + /* inode won't fit in buffer, we are done */ + if (acp->ac_ubleft < statstruct_size) + break; /* Skip if this inode is free */ - if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free) { - *lastino = ino; + if (XFS_INOBT_MASK(chunkidx) & irbp->ir_free) continue; - } /* Get the inode and fill in a single buffer */ ubused = statstruct_size; - error = formatter(mp, ino, *ubufp, acp->ac_ubleft, - &ubused, &fmterror); + error = formatter(mp, XFS_AGINO_TO_INO(mp, agno, agino), + *ubufp, acp->ac_ubleft, &ubused, &fmterror); + if (fmterror == BULKSTAT_RV_GIVEUP || (error && error != -ENOENT && error != -EINVAL)) { acp->ac_ubleft = 0; @@ -315,7 +316,6 @@ xfs_bulkstat_ag_ichunk( /* be careful not to leak error if at end of chunk */ if (fmterror == BULKSTAT_RV_NOTHING || error) { - *lastino = ino; error = 0; continue; } @@ -323,12 +323,18 @@ xfs_bulkstat_ag_ichunk( *ubufp += ubused; acp->ac_ubleft -= ubused; acp->ac_ubelem++; - *lastino = ino; - - if (acp->ac_ubleft < statstruct_size) - break; } + /* + * Post-update *last_agino. At this point, agino will always point one + * inode past the last inode we processed successfully. Hence we + * substract that inode when setting the *last_agino cursor so that we + * return the correct cookie to userspace. On the next bulkstat call, + * the inode under the lastino cookie will be skipped as we have already + * processed it here. + */ + *last_agino = agino - 1; + return error; } @@ -352,7 +358,6 @@ xfs_bulkstat( xfs_btree_cur_t *cur; /* btree cursor for ialloc btree */ size_t irbsize; /* size of irec buffer in bytes */ xfs_inobt_rec_incore_t *irbuf; /* start of irec buffer */ - xfs_ino_t lastino; /* last inode number returned */ int nirbuf; /* size of irbuf */ int ubcount; /* size of user's buffer */ struct xfs_bulkstat_agichunk ac; @@ -361,11 +366,10 @@ xfs_bulkstat( /* * Get the last inode value, see if there's nothing to do. */ - lastino = *lastinop; - agno = XFS_INO_TO_AGNO(mp, lastino); - agino = XFS_INO_TO_AGINO(mp, lastino); + agno = XFS_INO_TO_AGNO(mp, *lastinop); + agino = XFS_INO_TO_AGINO(mp, *lastinop); if (agno >= mp->m_sb.sb_agcount || - lastino != XFS_AGINO_TO_INO(mp, agno, agino)) { + *lastinop != XFS_AGINO_TO_INO(mp, agno, agino)) { *done = 1; *ubcountp = 0; return 0; @@ -420,7 +424,6 @@ xfs_bulkstat( irbp->ir_freecount = r.ir_freecount; irbp->ir_free = r.ir_free; irbp++; - agino = r.ir_startino + XFS_INODES_PER_CHUNK; } /* Increment to the next record */ error = xfs_btree_increment(cur, 0, &stat); @@ -458,10 +461,6 @@ xfs_bulkstat( irbp++; icount += XFS_INODES_PER_CHUNK - r.ir_freecount; } - /* - * Set agino to after this chunk and bump the cursor. - */ - agino = r.ir_startino + XFS_INODES_PER_CHUNK; error = xfs_btree_increment(cur, 0, &stat); if (error || stat == 0) { end_of_ag = true; @@ -481,7 +480,9 @@ del_cursor: if (error) break; /* - * Now format all the good inodes into the user's buffer. + * Now format all the good inodes into the user's buffer. The + * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer + * for the next loop iteration. */ irbufend = irbp; for (irbp = irbuf; @@ -489,7 +490,7 @@ del_cursor: irbp++) { error = xfs_bulkstat_ag_ichunk(mp, agno, irbp, formatter, statstruct_size, &ac, - &lastino); + &agino); if (error) break; @@ -506,8 +507,7 @@ del_cursor: if (end_of_ag) { agno++; agino = 0; - } else - agino = XFS_INO_TO_AGINO(mp, lastino); + } } /* * Done, we're either out of filesystem or space to put the data. @@ -525,16 +525,13 @@ del_cursor: if (ac.ac_ubelem) error = 0; - if (agno >= mp->m_sb.sb_agcount) { - /* - * If we ran out of filesystem, mark lastino as off - * the end of the filesystem, so the next call - * will return immediately. - */ - *lastinop = (xfs_ino_t)XFS_AGINO_TO_INO(mp, agno, 0); + /* + * If we ran out of filesystem, lastino will point off the end of + * the filesystem so the next call will return immediately. + */ + *lastinop = XFS_AGINO_TO_INO(mp, agno, agino); + if (agno >= mp->m_sb.sb_agcount) *done = 1; - } else - *lastinop = (xfs_ino_t)lastino; return error; } -- cgit v1.2.1 From 8c393f9a721c30a030049a680e1bf896669bb279 Mon Sep 17 00:00:00 2001 From: Peng Tao Date: Wed, 5 Nov 2014 22:36:50 +0800 Subject: nfs: fix pnfs direct write memory leak For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets. So we need to take care to free them when freeing dreq. Ideally this needs to be done inside layout driver where ds_cinfo.buckets are allocated. But buckets are attached to dreq and reused across LD IO iterations. So I feel it's OK to free them in the generic layer. Cc: stable@vger.kernel.org [v3.4+] Signed-off-by: Peng Tao Signed-off-by: Trond Myklebust --- fs/nfs/direct.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 20cffc830468..10bf07280f4a 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -266,6 +266,7 @@ static void nfs_direct_req_free(struct kref *kref) { struct nfs_direct_req *dreq = container_of(kref, struct nfs_direct_req, kref); + nfs_free_pnfs_ds_cinfo(&dreq->ds_cinfo); if (dreq->l_ctx != NULL) nfs_put_lock_context(dreq->l_ctx); if (dreq->ctx != NULL) -- cgit v1.2.1 From e0d4ed71ca0344494722a041780f004d2bcf0f11 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 26 Sep 2014 16:02:50 +0200 Subject: pnfs/blocklayout: serialize GETDEVICEINFO calls The rpc_pipefs code isn't thread safe, leading to occasional use after frees when running xfstests generic/241 (dbench). Signed-off-by: Christoph Hellwig Link: http://lkml.kernel.org/r/1411740170-18611-2-git-send-email-hch@lst.de Cc: stable@vger.kernel.org # 3.17.x Signed-off-by: Trond Myklebust --- fs/nfs/blocklayout/rpc_pipefs.c | 14 +++++++++----- fs/nfs/netns.h | 1 + 2 files changed, 10 insertions(+), 5 deletions(-) (limited to 'fs') diff --git a/fs/nfs/blocklayout/rpc_pipefs.c b/fs/nfs/blocklayout/rpc_pipefs.c index e966c023b1b7..acbf9ca4018c 100644 --- a/fs/nfs/blocklayout/rpc_pipefs.c +++ b/fs/nfs/blocklayout/rpc_pipefs.c @@ -65,17 +65,18 @@ bl_resolve_deviceid(struct nfs_server *server, struct pnfs_block_volume *b, dprintk("%s CREATING PIPEFS MESSAGE\n", __func__); + mutex_lock(&nn->bl_mutex); bl_pipe_msg.bl_wq = &nn->bl_wq; b->simple.len += 4; /* single volume */ if (b->simple.len > PAGE_SIZE) - return -EIO; + goto out_unlock; memset(msg, 0, sizeof(*msg)); msg->len = sizeof(*bl_msg) + b->simple.len; msg->data = kzalloc(msg->len, gfp_mask); if (!msg->data) - goto out; + goto out_free_data; bl_msg = msg->data; bl_msg->type = BL_DEVICE_MOUNT, @@ -87,7 +88,7 @@ bl_resolve_deviceid(struct nfs_server *server, struct pnfs_block_volume *b, rc = rpc_queue_upcall(nn->bl_device_pipe, msg); if (rc < 0) { remove_wait_queue(&nn->bl_wq, &wq); - goto out; + goto out_free_data; } set_current_state(TASK_UNINTERRUPTIBLE); @@ -97,12 +98,14 @@ bl_resolve_deviceid(struct nfs_server *server, struct pnfs_block_volume *b, if (reply->status != BL_DEVICE_REQUEST_PROC) { printk(KERN_WARNING "%s failed to decode device: %d\n", __func__, reply->status); - goto out; + goto out_free_data; } dev = MKDEV(reply->major, reply->minor); -out: +out_free_data: kfree(msg->data); +out_unlock: + mutex_unlock(&nn->bl_mutex); return dev; } @@ -232,6 +235,7 @@ static int nfs4blocklayout_net_init(struct net *net) struct nfs_net *nn = net_generic(net, nfs_net_id); struct dentry *dentry; + mutex_init(&nn->bl_mutex); init_waitqueue_head(&nn->bl_wq); nn->bl_device_pipe = rpc_mkpipe_data(&bl_upcall_ops, 0); if (IS_ERR(nn->bl_device_pipe)) diff --git a/fs/nfs/netns.h b/fs/nfs/netns.h index ef221fb8a183..f0e06e4acbef 100644 --- a/fs/nfs/netns.h +++ b/fs/nfs/netns.h @@ -19,6 +19,7 @@ struct nfs_net { struct rpc_pipe *bl_device_pipe; struct bl_dev_msg bl_mount_reply; wait_queue_head_t bl_wq; + struct mutex bl_mutex; struct list_head nfs_client_list; struct list_head nfs_volume_list; #if IS_ENABLED(CONFIG_NFS_V4) -- cgit v1.2.1 From 16c9914069536c77ed358d94b6e247bdc464b7f0 Mon Sep 17 00:00:00 2001 From: Weston Andros Adamson Date: Mon, 3 Nov 2014 15:19:45 -0500 Subject: nfs: remove spurious WARN_ON_ONCE in write path This WARN_ON_ONCE was supposed to catch reference counting bugs, but can trigger in inappropriate situations. This was reproducible using NFSv2 on an architecture with 64K pages -- we verified that it was not a reference counting bug and the warning was safe to ignore. Reported-by: Will Deacon Tested-by: Will Deacon Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust --- fs/nfs/write.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'fs') diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 12493846a2d3..f83b02dc9166 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -715,8 +715,6 @@ static void nfs_inode_remove_request(struct nfs_page *req) if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) nfs_release_request(req); - else - WARN_ON_ONCE(1); } static void -- cgit v1.2.1 From b283f9445214d4d573906f919c70caccd27b74ea Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 21 Oct 2014 13:32:10 +0200 Subject: nfs: Remove bogus assignment Commit 3a6fd1f004fc (pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist) introduced a bogus assignment pg_index = pg_index in variable initialization. AFAICS it's just a typo so remove it. Spotted by Coverity (id 1248711). CC: Christoph Hellwig Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig Signed-off-by: Trond Myklebust --- fs/nfs/blocklayout/blocklayout.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c index 5228f201d3d5..4f46f7a05289 100644 --- a/fs/nfs/blocklayout/blocklayout.c +++ b/fs/nfs/blocklayout/blocklayout.c @@ -378,7 +378,7 @@ bl_write_pagelist(struct nfs_pgio_header *header, int sync) loff_t offset = header->args.offset; size_t count = header->args.count; struct page **pages = header->args.pages; - int pg_index = pg_index = header->args.pgbase >> PAGE_CACHE_SHIFT; + int pg_index = header->args.pgbase >> PAGE_CACHE_SHIFT; unsigned int pg_len; struct blk_plug plug; int i; -- cgit v1.2.1 From 16caf5b6101d03335b386e77e9e14136f989be87 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 23 Oct 2014 14:02:47 +0200 Subject: nfs: Fix use of uninitialized variable in nfs_getattr() Variable 'err' needn't be initialized when nfs_getattr() uses it to check whether it should call generic_fillattr() or not. That can result in spurious error returns. Initialize 'err' properly. Signed-off-by: Jan Kara Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- fs/nfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 6388a59f2add..00689a8a85e4 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -626,7 +626,7 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { struct inode *inode = dentry->d_inode; int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME; - int err; + int err = 0; trace_nfs_getattr_enter(inode); /* Flush out writes to the server in order to update c/mtime. */ -- cgit v1.2.1 From e983120e923aa1c5d2aaf528331c298c88f3ab85 Mon Sep 17 00:00:00 2001 From: Anna Schumaker Date: Wed, 22 Oct 2014 15:53:10 -0400 Subject: NFS: SEEK is an NFS v4.2 feature Somehow the nfs_v4_1_minor_ops had the NFS_CAP_SEEK flag set, enabling SEEK over v4.1. This is wrong, and can make servers crash. Signed-off-by: Anna Schumaker Tested-by: J. Bruce Fields Signed-off-by: Trond Myklebust --- fs/nfs/nfs4proc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 41b8fcbfdadd..dca174ce8309 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -8397,8 +8397,7 @@ static const struct nfs4_minor_version_ops nfs_v4_1_minor_ops = { | NFS_CAP_CHANGE_ATTR | NFS_CAP_POSIX_LOCK | NFS_CAP_STATEID_NFSV41 - | NFS_CAP_ATOMIC_OPEN_V1 - | NFS_CAP_SEEK, + | NFS_CAP_ATOMIC_OPEN_V1, .init_client = nfs41_init_client, .shutdown_client = nfs41_shutdown_client, .match_stateid = nfs41_match_stateid, @@ -8420,7 +8419,8 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = { | NFS_CAP_CHANGE_ATTR | NFS_CAP_POSIX_LOCK | NFS_CAP_STATEID_NFSV41 - | NFS_CAP_ATOMIC_OPEN_V1, + | NFS_CAP_ATOMIC_OPEN_V1 + | NFS_CAP_SEEK, .init_client = nfs41_init_client, .shutdown_client = nfs41_shutdown_client, .match_stateid = nfs41_match_stateid, -- cgit v1.2.1 From 4dfd4f7af0afd201706ad186352ca423b0f17d4b Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 17 Oct 2014 15:10:25 +0300 Subject: NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired NFSv4.0 does not have TEST_STATEID/FREE_STATEID functionality, so unlike NFSv4.1, the recovery procedure when stateids have expired or have been revoked requires us to just forget the delegation. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- fs/nfs/nfs4proc.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index dca174ce8309..bdd880bddba4 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2109,6 +2109,28 @@ static int nfs4_open_expired(struct nfs4_state_owner *sp, struct nfs4_state *sta return ret; } +static void nfs_finish_clear_delegation_stateid(struct nfs4_state *state) +{ + nfs_remove_bad_delegation(state->inode); + write_seqlock(&state->seqlock); + nfs4_stateid_copy(&state->stateid, &state->open_stateid); + write_sequnlock(&state->seqlock); + clear_bit(NFS_DELEGATED_STATE, &state->flags); +} + +static void nfs40_clear_delegation_stateid(struct nfs4_state *state) +{ + if (rcu_access_pointer(NFS_I(state->inode)->delegation) != NULL) + nfs_finish_clear_delegation_stateid(state); +} + +static int nfs40_open_expired(struct nfs4_state_owner *sp, struct nfs4_state *state) +{ + /* NFSv4.0 doesn't allow for delegation recovery on open expire */ + nfs40_clear_delegation_stateid(state); + return nfs4_open_expired(sp, state); +} + #if defined(CONFIG_NFS_V4_1) static void nfs41_clear_delegation_stateid(struct nfs4_state *state) { @@ -8330,7 +8352,7 @@ static const struct nfs4_state_recovery_ops nfs41_reboot_recovery_ops = { static const struct nfs4_state_recovery_ops nfs40_nograce_recovery_ops = { .owner_flag_bit = NFS_OWNER_RECLAIM_NOGRACE, .state_flag_bit = NFS_STATE_RECLAIM_NOGRACE, - .recover_open = nfs4_open_expired, + .recover_open = nfs40_open_expired, .recover_lock = nfs4_lock_expired, .establish_clid = nfs4_init_clientid, }; -- cgit v1.2.1 From 0c116cadd94b16b30b1dd90d38b2784d9b39b01a Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 12 Nov 2014 14:44:49 -0500 Subject: NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE This patch removes the assumption made previously, that we only need to check the delegation stateid when it matches the stateid on a cached open. If we believe that we hold a delegation for this file, then we must assume that its stateid may have been revoked or expired too. If we don't test it then our state recovery process may end up caching open/lock state in a situation where it should not. We therefore rename the function nfs41_clear_delegation_stateid as nfs41_check_delegation_stateid, and change it to always run through the delegation stateid test and recovery process as outlined in RFC5661. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- fs/nfs/nfs4proc.c | 42 +++++++++++++++++------------------------- 1 file changed, 17 insertions(+), 25 deletions(-) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index bdd880bddba4..3b98fe752ef8 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2132,45 +2132,37 @@ static int nfs40_open_expired(struct nfs4_state_owner *sp, struct nfs4_state *st } #if defined(CONFIG_NFS_V4_1) -static void nfs41_clear_delegation_stateid(struct nfs4_state *state) +static void nfs41_check_delegation_stateid(struct nfs4_state *state) { struct nfs_server *server = NFS_SERVER(state->inode); - nfs4_stateid *stateid = &state->stateid; + nfs4_stateid stateid; struct nfs_delegation *delegation; - struct rpc_cred *cred = NULL; - int status = -NFS4ERR_BAD_STATEID; - - /* If a state reset has been done, test_stateid is unneeded */ - if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0) - return; + struct rpc_cred *cred; + int status; /* Get the delegation credential for use by test/free_stateid */ rcu_read_lock(); delegation = rcu_dereference(NFS_I(state->inode)->delegation); - if (delegation != NULL && - nfs4_stateid_match(&delegation->stateid, stateid)) { - cred = get_rpccred(delegation->cred); - rcu_read_unlock(); - status = nfs41_test_stateid(server, stateid, cred); - trace_nfs4_test_delegation_stateid(state, NULL, status); - } else + if (delegation == NULL) { rcu_read_unlock(); + return; + } + + nfs4_stateid_copy(&stateid, &delegation->stateid); + cred = get_rpccred(delegation->cred); + rcu_read_unlock(); + status = nfs41_test_stateid(server, &stateid, cred); + trace_nfs4_test_delegation_stateid(state, NULL, status); if (status != NFS_OK) { /* Free the stateid unless the server explicitly * informs us the stateid is unrecognized. */ if (status != -NFS4ERR_BAD_STATEID) - nfs41_free_stateid(server, stateid, cred); - nfs_remove_bad_delegation(state->inode); - - write_seqlock(&state->seqlock); - nfs4_stateid_copy(&state->stateid, &state->open_stateid); - write_sequnlock(&state->seqlock); - clear_bit(NFS_DELEGATED_STATE, &state->flags); + nfs41_free_stateid(server, &stateid, cred); + nfs_finish_clear_delegation_stateid(state); } - if (cred != NULL) - put_rpccred(cred); + put_rpccred(cred); } /** @@ -2214,7 +2206,7 @@ static int nfs41_open_expired(struct nfs4_state_owner *sp, struct nfs4_state *st { int status; - nfs41_clear_delegation_stateid(state); + nfs41_check_delegation_stateid(state); status = nfs41_check_open_stateid(state); if (status != NFS_OK) status = nfs4_open_expired(sp, state); -- cgit v1.2.1 From 869f9dfa4d6d57b79e0afc3af14772c2a023eeb1 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 10 Nov 2014 18:43:56 -0500 Subject: NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return Any attempt to call nfs_remove_bad_delegation() while a delegation is being returned is currently a no-op. This means that we can end up looping forever in nfs_end_delegation_return() if something causes the delegation to be revoked. This patch adds a mechanism whereby the state recovery code can communicate to the delegation return code that the delegation is no longer valid and that it should not be used when reclaiming state. It also changes the return value for nfs4_handle_delegation_recall_error() to ensure that nfs_end_delegation_return() does not reattempt the lock reclaim before state recovery is done. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- fs/nfs/delegation.c | 23 +++++++++++++++++++++-- fs/nfs/delegation.h | 1 + fs/nfs/nfs4proc.c | 2 +- 3 files changed, 23 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index 5853f53db732..e5f473d13e24 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -193,7 +193,11 @@ static int nfs_do_return_delegation(struct inode *inode, struct nfs_delegation * { int res = 0; - res = nfs4_proc_delegreturn(inode, delegation->cred, &delegation->stateid, issync); + if (!test_bit(NFS_DELEGATION_REVOKED, &delegation->flags)) + res = nfs4_proc_delegreturn(inode, + delegation->cred, + &delegation->stateid, + issync); nfs_free_delegation(delegation); return res; } @@ -380,11 +384,13 @@ static int nfs_end_delegation_return(struct inode *inode, struct nfs_delegation { struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; struct nfs_inode *nfsi = NFS_I(inode); - int err; + int err = 0; if (delegation == NULL) return 0; do { + if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags)) + break; err = nfs_delegation_claim_opens(inode, &delegation->stateid); if (!issync || err != -EAGAIN) break; @@ -605,10 +611,23 @@ static void nfs_client_mark_return_unused_delegation_types(struct nfs_client *cl rcu_read_unlock(); } +static void nfs_revoke_delegation(struct inode *inode) +{ + struct nfs_delegation *delegation; + rcu_read_lock(); + delegation = rcu_dereference(NFS_I(inode)->delegation); + if (delegation != NULL) { + set_bit(NFS_DELEGATION_REVOKED, &delegation->flags); + nfs_mark_return_delegation(NFS_SERVER(inode), delegation); + } + rcu_read_unlock(); +} + void nfs_remove_bad_delegation(struct inode *inode) { struct nfs_delegation *delegation; + nfs_revoke_delegation(inode); delegation = nfs_inode_detach_delegation(inode); if (delegation) { nfs_inode_find_state_and_recover(inode, &delegation->stateid); diff --git a/fs/nfs/delegation.h b/fs/nfs/delegation.h index 5c1cce39297f..e3c20a3ccc93 100644 --- a/fs/nfs/delegation.h +++ b/fs/nfs/delegation.h @@ -31,6 +31,7 @@ enum { NFS_DELEGATION_RETURN_IF_CLOSED, NFS_DELEGATION_REFERENCED, NFS_DELEGATION_RETURNING, + NFS_DELEGATION_REVOKED, }; int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct nfs_openres *res); diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 3b98fe752ef8..4b7166f4e1cf 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1654,7 +1654,7 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct nfs_inode_find_state_and_recover(state->inode, stateid); nfs4_schedule_stateid_recovery(server, state); - return 0; + return -EAGAIN; case -NFS4ERR_DELAY: case -NFS4ERR_GRACE: set_bit(NFS_DELEGATED_STATE, &state->flags); -- cgit v1.2.1 From c606bb8857921d3ecf4d353942d6cc7e116cc75a Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 17 Oct 2014 15:15:13 +0300 Subject: NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked NFSv4.x (x>0) requires us to call TEST_STATEID+FREE_STATEID if a stateid is revoked. We will currently fail to do this if the stateid is a delegation. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- fs/nfs/filelayout/filelayout.c | 3 --- fs/nfs/nfs4proc.c | 8 -------- 2 files changed, 11 deletions(-) (limited to 'fs') diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index 46fab1cb455a..7afb52f6a25a 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -145,9 +145,6 @@ static int filelayout_async_handle_error(struct rpc_task *task, case -NFS4ERR_DELEG_REVOKED: case -NFS4ERR_ADMIN_REVOKED: case -NFS4ERR_BAD_STATEID: - if (state == NULL) - break; - nfs_remove_bad_delegation(state->inode); case -NFS4ERR_OPENMODE: if (state == NULL) break; diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 4b7166f4e1cf..69dc20a743f9 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -370,11 +370,6 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc case -NFS4ERR_DELEG_REVOKED: case -NFS4ERR_ADMIN_REVOKED: case -NFS4ERR_BAD_STATEID: - if (inode != NULL && nfs4_have_delegation(inode, FMODE_READ)) { - nfs_remove_bad_delegation(inode); - exception->retry = 1; - break; - } if (state == NULL) break; ret = nfs4_schedule_stateid_recovery(server, state); @@ -4844,9 +4839,6 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, case -NFS4ERR_DELEG_REVOKED: case -NFS4ERR_ADMIN_REVOKED: case -NFS4ERR_BAD_STATEID: - if (state == NULL) - break; - nfs_remove_bad_delegation(state->inode); case -NFS4ERR_OPENMODE: if (state == NULL) break; -- cgit v1.2.1 From f8ebf7a8ca35dde321f0cd385fee6f1950609367 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 17 Oct 2014 23:02:52 +0300 Subject: NFS: Don't try to reclaim delegation open state if recovery failed If state recovery failed, then we should not attempt to reclaim delegated state. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- fs/nfs/delegation.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index e5f473d13e24..7f3f60641344 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -125,6 +125,8 @@ again: continue; if (!test_bit(NFS_DELEGATED_STATE, &state->flags)) continue; + if (!nfs4_valid_open_stateid(state)) + continue; if (!nfs4_stateid_match(&state->stateid, stateid)) continue; get_nfs_open_context(ctx); -- cgit v1.2.1 From 3231300bb986947a6b74e7075d84a2f434e4d788 Mon Sep 17 00:00:00 2001 From: "Yan, Zheng" Date: Wed, 22 Oct 2014 17:13:26 -0700 Subject: ceph: fix flush tid comparision TID of cap flush ack is 64 bits, but ceph_inode_info::flushing_cap_tid is only 16 bits. 16 bits should be plenty to let the cap flush updates pipeline appropriately, but we need to cast in the proper direction when comparing these differently-sized versions. So downcast the 64-bits one to 16 bits. Reflects ceph.git commit a5184cf46a6e867287e24aeb731634828467cd98. Signed-off-by: Yan, Zheng Reviewed-by: Ilya Dryomov --- fs/ceph/caps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 659f2ea9e6f7..cefca661464b 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2638,7 +2638,7 @@ static void handle_cap_flush_ack(struct inode *inode, u64 flush_tid, for (i = 0; i < CEPH_CAP_BITS; i++) if ((dirty & (1 << i)) && - flush_tid == ci->i_cap_flush_tid[i]) + (u16)flush_tid == ci->i_cap_flush_tid[i]) cleaned |= 1 << i; dout("handle_cap_flush_ack inode %p mds%d seq %d on %s cleaned %s," -- cgit v1.2.1 From 8edc6e1688fc8f02c8c1f53a2ec4928cb1055f4d Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 13 Nov 2014 15:19:33 -0800 Subject: fanotify: fix notification of groups with inode & mount marks fsnotify() needs to merge inode and mount marks lists when notifying groups about events so that ignore masks from inode marks are reflected in mount mark notifications and groups are notified in proper order (according to priorities). Currently the sorting of the lists done by fsnotify_add_inode_mark() / fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted ignore masks not being used in some cases. Fix the problem by always using the same comparison function when sorting / merging the mark lists. Thanks to Heinrich Schuchardt for improvements of my patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721 Signed-off-by: Jan Kara Reported-by: Heinrich Schuchardt Tested-by: Heinrich Schuchardt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/notify/fsnotify.c | 36 +++++++++++++++++++++--------------- fs/notify/fsnotify.h | 4 ++++ fs/notify/inode_mark.c | 8 +++----- fs/notify/mark.c | 36 ++++++++++++++++++++++++++++++++++++ fs/notify/vfsmount_mark.c | 8 +++----- 5 files changed, 67 insertions(+), 25 deletions(-) (limited to 'fs') diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 9d3e9c50066a..89326acd4561 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -229,8 +229,16 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, &fsnotify_mark_srcu); } + /* + * We need to merge inode & vfsmount mark lists so that inode mark + * ignore masks are properly reflected for mount mark notifications. + * That's why this traversal is so complicated... + */ while (inode_node || vfsmount_node) { - inode_group = vfsmount_group = NULL; + inode_group = NULL; + inode_mark = NULL; + vfsmount_group = NULL; + vfsmount_mark = NULL; if (inode_node) { inode_mark = hlist_entry(srcu_dereference(inode_node, &fsnotify_mark_srcu), @@ -244,21 +252,19 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, vfsmount_group = vfsmount_mark->group; } - if (inode_group > vfsmount_group) { - /* handle inode */ - ret = send_to_group(to_tell, inode_mark, NULL, mask, - data, data_is, cookie, file_name); - /* we didn't use the vfsmount_mark */ - vfsmount_group = NULL; - } else if (vfsmount_group > inode_group) { - ret = send_to_group(to_tell, NULL, vfsmount_mark, mask, - data, data_is, cookie, file_name); - inode_group = NULL; - } else { - ret = send_to_group(to_tell, inode_mark, vfsmount_mark, - mask, data, data_is, cookie, - file_name); + if (inode_group && vfsmount_group) { + int cmp = fsnotify_compare_groups(inode_group, + vfsmount_group); + if (cmp > 0) { + inode_group = NULL; + inode_mark = NULL; + } else if (cmp < 0) { + vfsmount_group = NULL; + vfsmount_mark = NULL; + } } + ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask, + data, data_is, cookie, file_name); if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS)) goto out; diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h index 9c0898c4cfe1..3b68b0ae0a97 100644 --- a/fs/notify/fsnotify.h +++ b/fs/notify/fsnotify.h @@ -12,6 +12,10 @@ extern void fsnotify_flush_notify(struct fsnotify_group *group); /* protects reads of inode and vfsmount marks list */ extern struct srcu_struct fsnotify_mark_srcu; +/* compare two groups for sorting of marks lists */ +extern int fsnotify_compare_groups(struct fsnotify_group *a, + struct fsnotify_group *b); + extern void fsnotify_set_inode_mark_mask_locked(struct fsnotify_mark *fsn_mark, __u32 mask); /* add a mark to an inode */ diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c index e8497144b323..dfbf5447eea4 100644 --- a/fs/notify/inode_mark.c +++ b/fs/notify/inode_mark.c @@ -194,6 +194,7 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark, { struct fsnotify_mark *lmark, *last = NULL; int ret = 0; + int cmp; mark->flags |= FSNOTIFY_MARK_FLAG_INODE; @@ -219,11 +220,8 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark, goto out; } - if (mark->group->priority < lmark->group->priority) - continue; - - if ((mark->group->priority == lmark->group->priority) && - (mark->group < lmark->group)) + cmp = fsnotify_compare_groups(lmark->group, mark->group); + if (cmp < 0) continue; hlist_add_before_rcu(&mark->i.i_list, &lmark->i.i_list); diff --git a/fs/notify/mark.c b/fs/notify/mark.c index d90deaa08e78..34c38fabf514 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -209,6 +209,42 @@ void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mas mark->ignored_mask = mask; } +/* + * Sorting function for lists of fsnotify marks. + * + * Fanotify supports different notification classes (reflected as priority of + * notification group). Events shall be passed to notification groups in + * decreasing priority order. To achieve this marks in notification lists for + * inodes and vfsmounts are sorted so that priorities of corresponding groups + * are descending. + * + * Furthermore correct handling of the ignore mask requires processing inode + * and vfsmount marks of each group together. Using the group address as + * further sort criterion provides a unique sorting order and thus we can + * merge inode and vfsmount lists of marks in linear time and find groups + * present in both lists. + * + * A return value of 1 signifies that b has priority over a. + * A return value of 0 signifies that the two marks have to be handled together. + * A return value of -1 signifies that a has priority over b. + */ +int fsnotify_compare_groups(struct fsnotify_group *a, struct fsnotify_group *b) +{ + if (a == b) + return 0; + if (!a) + return 1; + if (!b) + return -1; + if (a->priority < b->priority) + return 1; + if (a->priority > b->priority) + return -1; + if (a < b) + return 1; + return -1; +} + /* * Attach an initialized mark to a given group and fs object. * These marks may be used for the fsnotify backend to determine which diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c index ac851e8376b1..faefa72a11eb 100644 --- a/fs/notify/vfsmount_mark.c +++ b/fs/notify/vfsmount_mark.c @@ -153,6 +153,7 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark, struct mount *m = real_mount(mnt); struct fsnotify_mark *lmark, *last = NULL; int ret = 0; + int cmp; mark->flags |= FSNOTIFY_MARK_FLAG_VFSMOUNT; @@ -178,11 +179,8 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark, goto out; } - if (mark->group->priority < lmark->group->priority) - continue; - - if ((mark->group->priority == lmark->group->priority) && - (mark->group < lmark->group)) + cmp = fsnotify_compare_groups(lmark->group, mark->group); + if (cmp < 0) continue; hlist_add_before_rcu(&mark->m.m_list, &lmark->m.m_list); -- cgit v1.2.1 From ef94b1864d1ed5be54376404bb23d22ed0481feb Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 20 Nov 2014 16:39:59 +0100 Subject: ovl: rename filesystem type to "overlay" Some distributions carry an "old" format of overlayfs while mainline has a "new" format. The distros will possibly want to keep the old overlayfs alongside the new for compatibility reasons. To make it possible to differentiate the two versions change the name of the new one from "overlayfs" to "overlay". Signed-off-by: Miklos Szeredi Reported-by: Serge Hallyn Cc: Andy Whitcroft --- fs/Makefile | 2 +- fs/overlayfs/Kconfig | 2 +- fs/overlayfs/Makefile | 4 ++-- fs/overlayfs/super.c | 6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) (limited to 'fs') diff --git a/fs/Makefile b/fs/Makefile index 34a1b9dea6dd..da0bbb456d3f 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -104,7 +104,7 @@ obj-$(CONFIG_QNX6FS_FS) += qnx6/ obj-$(CONFIG_AUTOFS4_FS) += autofs4/ obj-$(CONFIG_ADFS_FS) += adfs/ obj-$(CONFIG_FUSE_FS) += fuse/ -obj-$(CONFIG_OVERLAYFS_FS) += overlayfs/ +obj-$(CONFIG_OVERLAY_FS) += overlayfs/ obj-$(CONFIG_UDF_FS) += udf/ obj-$(CONFIG_SUN_OPENPROMFS) += openpromfs/ obj-$(CONFIG_OMFS_FS) += omfs/ diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig index e60125976873..34355818a2e0 100644 --- a/fs/overlayfs/Kconfig +++ b/fs/overlayfs/Kconfig @@ -1,4 +1,4 @@ -config OVERLAYFS_FS +config OVERLAY_FS tristate "Overlay filesystem support" help An overlay filesystem combines two filesystems - an 'upper' filesystem diff --git a/fs/overlayfs/Makefile b/fs/overlayfs/Makefile index 8f91889480d0..900daed3e91d 100644 --- a/fs/overlayfs/Makefile +++ b/fs/overlayfs/Makefile @@ -2,6 +2,6 @@ # Makefile for the overlay filesystem. # -obj-$(CONFIG_OVERLAYFS_FS) += overlayfs.o +obj-$(CONFIG_OVERLAY_FS) += overlay.o -overlayfs-objs := super.o inode.o dir.o readdir.o copy_up.o +overlay-objs := super.o inode.o dir.o readdir.o copy_up.o diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 08b704cebfc4..b92bd1829cf7 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -24,7 +24,7 @@ MODULE_AUTHOR("Miklos Szeredi "); MODULE_DESCRIPTION("Overlay filesystem"); MODULE_LICENSE("GPL"); -#define OVERLAYFS_SUPER_MAGIC 0x794c764f +#define OVERLAYFS_SUPER_MAGIC 0x794c7630 struct ovl_config { char *lowerdir; @@ -776,11 +776,11 @@ static struct dentry *ovl_mount(struct file_system_type *fs_type, int flags, static struct file_system_type ovl_fs_type = { .owner = THIS_MODULE, - .name = "overlayfs", + .name = "overlay", .mount = ovl_mount, .kill_sb = kill_anon_super, }; -MODULE_ALIAS_FS("overlayfs"); +MODULE_ALIAS_FS("overlay"); static int __init ovl_init(void) { -- cgit v1.2.1 From a105d685a8483985a01776411de191a726b48132 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 20 Nov 2014 16:39:59 +0100 Subject: ovl: fix remove/copy-up race ovl_remove_and_whiteout() needs to check if upper dentry exists or not after having locked upper parent directory. Previously we used a "type" value computed before locking the upper parent directory, which is susceptible to racing with copy-up. There's a similar check in ovl_check_empty_and_clear(). This one is not actually racy, since copy-up doesn't change the "emptyness" property of a directory. Add a comment to this effect, and check the existence of upper dentry locally to make the code cleaner. Signed-off-by: Miklos Szeredi --- fs/overlayfs/dir.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) (limited to 'fs') diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index 15cd91ad9940..8ffc4b980f1b 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -284,8 +284,7 @@ out: return ERR_PTR(err); } -static struct dentry *ovl_check_empty_and_clear(struct dentry *dentry, - enum ovl_path_type type) +static struct dentry *ovl_check_empty_and_clear(struct dentry *dentry) { int err; struct dentry *ret = NULL; @@ -294,8 +293,17 @@ static struct dentry *ovl_check_empty_and_clear(struct dentry *dentry, err = ovl_check_empty_dir(dentry, &list); if (err) ret = ERR_PTR(err); - else if (type == OVL_PATH_MERGE) - ret = ovl_clear_empty(dentry, &list); + else { + /* + * If no upperdentry then skip clearing whiteouts. + * + * Can race with copy-up, since we don't hold the upperdir + * mutex. Doesn't matter, since copy-up can't create a + * non-empty directory from an empty one. + */ + if (ovl_dentry_upper(dentry)) + ret = ovl_clear_empty(dentry, &list); + } ovl_cache_free(&list); @@ -487,8 +495,7 @@ out: return err; } -static int ovl_remove_and_whiteout(struct dentry *dentry, - enum ovl_path_type type, bool is_dir) +static int ovl_remove_and_whiteout(struct dentry *dentry, bool is_dir) { struct dentry *workdir = ovl_workdir(dentry); struct inode *wdir = workdir->d_inode; @@ -500,7 +507,7 @@ static int ovl_remove_and_whiteout(struct dentry *dentry, int err; if (is_dir) { - opaquedir = ovl_check_empty_and_clear(dentry, type); + opaquedir = ovl_check_empty_and_clear(dentry); err = PTR_ERR(opaquedir); if (IS_ERR(opaquedir)) goto out; @@ -515,9 +522,10 @@ static int ovl_remove_and_whiteout(struct dentry *dentry, if (IS_ERR(whiteout)) goto out_unlock; - if (type == OVL_PATH_LOWER) { + upper = ovl_dentry_upper(dentry); + if (!upper) { upper = lookup_one_len(dentry->d_name.name, upperdir, - dentry->d_name.len); + dentry->d_name.len); err = PTR_ERR(upper); if (IS_ERR(upper)) goto kill_whiteout; @@ -529,7 +537,6 @@ static int ovl_remove_and_whiteout(struct dentry *dentry, } else { int flags = 0; - upper = ovl_dentry_upper(dentry); if (opaquedir) upper = opaquedir; err = -ESTALE; @@ -648,7 +655,7 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir) cap_raise(override_cred->cap_effective, CAP_CHOWN); old_cred = override_creds(override_cred); - err = ovl_remove_and_whiteout(dentry, type, is_dir); + err = ovl_remove_and_whiteout(dentry, is_dir); revert_creds(old_cred); put_cred(override_cred); @@ -781,7 +788,7 @@ static int ovl_rename2(struct inode *olddir, struct dentry *old, } if (overwrite && (new_type == OVL_PATH_LOWER || new_type == OVL_PATH_MERGE) && new_is_dir) { - opaquedir = ovl_check_empty_and_clear(new, new_type); + opaquedir = ovl_check_empty_and_clear(new); err = PTR_ERR(opaquedir); if (IS_ERR(opaquedir)) { opaquedir = NULL; -- cgit v1.2.1 From 521484639ec19a6f1ed56de6993feb255f5f676c Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 20 Nov 2014 16:40:00 +0100 Subject: ovl: fix race in private xattr checks Xattr operations can race with copy up. This does not matter as long as we consistently fiter out "trunsted.overlay.opaque" attribute on upper directories. Previously we checked parent against OVL_PATH_MERGE. This is too general, and prone to race with copy-up. I.e. we found the parent to be on the lower layer but ovl_dentry_real() would return the copied-up dentry, possibly with the "opaque" attribute. So instead use ovl_path_real() and decide to filter the attributes based on the actual type of the dentry we'll use. Signed-off-by: Miklos Szeredi --- fs/overlayfs/inode.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) (limited to 'fs') diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index af2d18c9fcee..07d74b24913b 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -235,26 +235,36 @@ out: return err; } +static bool ovl_need_xattr_filter(struct dentry *dentry, + enum ovl_path_type type) +{ + return type == OVL_PATH_UPPER && S_ISDIR(dentry->d_inode->i_mode); +} + ssize_t ovl_getxattr(struct dentry *dentry, const char *name, void *value, size_t size) { - if (ovl_path_type(dentry->d_parent) == OVL_PATH_MERGE && - ovl_is_private_xattr(name)) + struct path realpath; + enum ovl_path_type type = ovl_path_real(dentry, &realpath); + + if (ovl_need_xattr_filter(dentry, type) && ovl_is_private_xattr(name)) return -ENODATA; - return vfs_getxattr(ovl_dentry_real(dentry), name, value, size); + return vfs_getxattr(realpath.dentry, name, value, size); } ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size) { + struct path realpath; + enum ovl_path_type type = ovl_path_real(dentry, &realpath); ssize_t res; int off; - res = vfs_listxattr(ovl_dentry_real(dentry), list, size); + res = vfs_listxattr(realpath.dentry, list, size); if (res <= 0 || size == 0) return res; - if (ovl_path_type(dentry->d_parent) != OVL_PATH_MERGE) + if (!ovl_need_xattr_filter(dentry, type)) return res; /* filter out private xattrs */ @@ -279,17 +289,16 @@ int ovl_removexattr(struct dentry *dentry, const char *name) { int err; struct path realpath; - enum ovl_path_type type; + enum ovl_path_type type = ovl_path_real(dentry, &realpath); err = ovl_want_write(dentry); if (err) goto out; - if (ovl_path_type(dentry->d_parent) == OVL_PATH_MERGE && - ovl_is_private_xattr(name)) + err = -ENODATA; + if (ovl_need_xattr_filter(dentry, type) && ovl_is_private_xattr(name)) goto out_drop_write; - type = ovl_path_real(dentry, &realpath); if (type == OVL_PATH_LOWER) { err = vfs_getxattr(realpath.dentry, name, NULL, 0); if (err < 0) -- cgit v1.2.1 From 91c77947133f7aef851b625701e182d3f99d14a9 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 20 Nov 2014 16:40:00 +0100 Subject: ovl: allow filenames with comma Allow option separator (comma) to be escaped with backslash. Signed-off-by: Miklos Szeredi --- fs/overlayfs/super.c | 48 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 45 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index b92bd1829cf7..eee7a62e1c0e 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -462,11 +462,34 @@ static const match_table_t ovl_tokens = { {OPT_ERR, NULL} }; +static char *ovl_next_opt(char **s) +{ + char *sbegin = *s; + char *p; + + if (sbegin == NULL) + return NULL; + + for (p = sbegin; *p; p++) { + if (*p == '\\') { + p++; + if (!*p) + break; + } else if (*p == ',') { + *p = '\0'; + *s = p + 1; + return sbegin; + } + } + *s = NULL; + return sbegin; +} + static int ovl_parse_opt(char *opt, struct ovl_config *config) { char *p; - while ((p = strsep(&opt, ",")) != NULL) { + while ((p = ovl_next_opt(&opt)) != NULL) { int token; substring_t args[MAX_OPT_ARGS]; @@ -554,15 +577,34 @@ out_dput: goto out_unlock; } +static void ovl_unescape(char *s) +{ + char *d = s; + + for (;; s++, d++) { + if (*s == '\\') + s++; + *d = *s; + if (!*s) + break; + } +} + static int ovl_mount_dir(const char *name, struct path *path) { int err; + char *tmp = kstrdup(name, GFP_KERNEL); + + if (!tmp) + return -ENOMEM; - err = kern_path(name, LOOKUP_FOLLOW, path); + ovl_unescape(tmp); + err = kern_path(tmp, LOOKUP_FOLLOW, path); if (err) { - pr_err("overlayfs: failed to resolve '%s': %i\n", name, err); + pr_err("overlayfs: failed to resolve '%s': %i\n", tmp, err); err = -EINVAL; } + kfree(tmp); return err; } -- cgit v1.2.1 From 71d509280f7e92eb60ae6b7c78c20afafff060c7 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 20 Nov 2014 16:40:01 +0100 Subject: ovl: use lockless_dereference() for upperdentry Don't open code lockless_dereference() in ovl_upperdentry_dereference(). Signed-off-by: Miklos Szeredi --- fs/overlayfs/super.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index eee7a62e1c0e..f16d318b71f8 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -84,12 +84,7 @@ enum ovl_path_type ovl_path_type(struct dentry *dentry) static struct dentry *ovl_upperdentry_dereference(struct ovl_entry *oe) { - struct dentry *upperdentry = ACCESS_ONCE(oe->__upperdentry); - /* - * Make sure to order reads to upperdentry wrt ovl_dentry_update() - */ - smp_read_barrier_depends(); - return upperdentry; + return lockless_dereference(oe->__upperdentry); } void ovl_path_upper(struct dentry *dentry, struct path *path) -- cgit v1.2.1 From c9f00fdb9ab3999cb2fb582ad82a5db9e70c82f5 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 20 Nov 2014 16:40:01 +0100 Subject: ovl: pass dentry into ovl_dir_read_merged() Pass dentry into ovl_dir_read_merged() insted of upperpath and lowerpath. This cleans up callers and paves the way for multi-layer directory reads. Signed-off-by: Miklos Szeredi --- fs/overlayfs/readdir.c | 35 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 21 deletions(-) (limited to 'fs') diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index 2a7ef4f8e2a6..7299e962f334 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -274,11 +274,11 @@ static int ovl_dir_mark_whiteouts(struct dentry *dir, return 0; } -static inline int ovl_dir_read_merged(struct path *upperpath, - struct path *lowerpath, - struct list_head *list) +static int ovl_dir_read_merged(struct dentry *dentry, struct list_head *list) { int err; + struct path lowerpath; + struct path upperpath; struct ovl_readdir_data rdd = { .ctx.actor = ovl_fill_merge, .list = list, @@ -286,25 +286,28 @@ static inline int ovl_dir_read_merged(struct path *upperpath, .is_merge = false, }; - if (upperpath->dentry) { - err = ovl_dir_read(upperpath, &rdd); + ovl_path_lower(dentry, &lowerpath); + ovl_path_upper(dentry, &upperpath); + + if (upperpath.dentry) { + err = ovl_dir_read(&upperpath, &rdd); if (err) goto out; - if (lowerpath->dentry) { - err = ovl_dir_mark_whiteouts(upperpath->dentry, &rdd); + if (lowerpath.dentry) { + err = ovl_dir_mark_whiteouts(upperpath.dentry, &rdd); if (err) goto out; } } - if (lowerpath->dentry) { + if (lowerpath.dentry) { /* * Insert lowerpath entries before upperpath ones, this allows * offsets to be reasonably constant */ list_add(&rdd.middle, rdd.list); rdd.is_merge = true; - err = ovl_dir_read(lowerpath, &rdd); + err = ovl_dir_read(&lowerpath, &rdd); list_del(&rdd.middle); } out: @@ -329,8 +332,6 @@ static void ovl_seek_cursor(struct ovl_dir_file *od, loff_t pos) static struct ovl_dir_cache *ovl_cache_get(struct dentry *dentry) { int res; - struct path lowerpath; - struct path upperpath; struct ovl_dir_cache *cache; cache = ovl_dir_cache(dentry); @@ -347,10 +348,7 @@ static struct ovl_dir_cache *ovl_cache_get(struct dentry *dentry) cache->refcount = 1; INIT_LIST_HEAD(&cache->entries); - ovl_path_lower(dentry, &lowerpath); - ovl_path_upper(dentry, &upperpath); - - res = ovl_dir_read_merged(&upperpath, &lowerpath, &cache->entries); + res = ovl_dir_read_merged(dentry, &cache->entries); if (res) { ovl_cache_free(&cache->entries); kfree(cache); @@ -538,14 +536,9 @@ const struct file_operations ovl_dir_operations = { int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list) { int err; - struct path lowerpath; - struct path upperpath; struct ovl_cache_entry *p; - ovl_path_upper(dentry, &upperpath); - ovl_path_lower(dentry, &lowerpath); - - err = ovl_dir_read_merged(&upperpath, &lowerpath, list); + err = ovl_dir_read_merged(dentry, list); if (err) return err; -- cgit v1.2.1 From 7676895f4736421ebafc48de5078e25ea69e88ee Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 20 Nov 2014 16:40:02 +0100 Subject: ovl: ovl_dir_fsync() cleanup Check against !OVL_PATH_LOWER instead of OVL_PATH_MERGE. For a copied up directory the two are currently equivalent. Signed-off-by: Miklos Szeredi --- fs/overlayfs/readdir.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index 7299e962f334..ab1e3dcbed95 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -450,10 +450,10 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end, /* * Need to check if we started out being a lower dir, but got copied up */ - if (!od->is_upper && ovl_path_type(dentry) == OVL_PATH_MERGE) { + if (!od->is_upper && ovl_path_type(dentry) != OVL_PATH_LOWER) { struct inode *inode = file_inode(file); - realfile =lockless_dereference(od->upperfile); + realfile = lockless_dereference(od->upperfile); if (!realfile) { struct path upperpath; -- cgit v1.2.1