summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'char-misc-4.8-rc1' of ↵Linus Torvalds2016-07-241-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver update for 4.8-rc1. Not a lot of stuff, but it's all over the place, full details are in the shortlog. All of these have been in linux-next with no reported issues for a while" * tag 'char-misc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (49 commits) lkdtm: silence warnings about function declarations lkdtm: hide unused functions intel_th: pci: Add Kaby Lake PCH-H support intel_th: Fix a deadlock in modprobing dsp56k: prevent a harmless underflow chardev: add missing line break in pr_warn lkdtm: use struct arrays instead of enums lkdtm: move jprobe entry points to start of source lkdtm: reorganize module paramaters lkdtm: rename globals for clarity lkdtm: rename "count" to "crash_count" lkdtm: remove intentional off-by-one array access lkdtm: split remaining logic bug tests to separate file lkdtm: split heap corruption tests to separate file lkdtm: split memory permissions tests to separate file lkdtm: split usercopy tests to separate file lkdtm: drop "alloc_size" parameter lkdtm: add usercopy test for blocking kernel text extcon: adc-jack: add suspend/resume support extcon: add missing of_node_put after calling of_parse_phandle ...
| * chardev: add missing line break in pr_warnFengguang Wu2016-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To fix super long dmesg error lines like CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation rangeCHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation rangeswapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK) After fix, it should look like CHRDEV "dummy_stm.0" major number 224 goes below the dynamic allocation range CHRDEV "dummy_stm.1" major number 223 goes below the dynamic allocation range swapper: page allocation failure: order:8, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK) Reported-by: Philip Li <philip.li@intel.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge tag 'gfs2-4.7.fixes' of ↵Linus Torvalds2016-07-2418-116/+179
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Bob Peterson: "We've got ten patches this time, half of which are related to a plethora of nasty outcomes when inodes are transitioned from the unlinked state to the free state. Small file systems are particularly vulnerable to these problems, and it can manifest as mainly hangs, but also file system corruption. The patches have been tested for literally many weeks, with a very gruelling test, so I have a high level of confidence. - Andreas Gruenbacher wrote a series of five patches for various lockups during the transition of inodes from unlinked to free. The main patch is titled "Fix gfs2_lookup_by_inum lock inversion" and the other four are support and cleanup patches related to that. - Ben Marzinski contributed two patches with regard to a recreatable problem when gfs2 tries to write a page to a file that is being truncated, resulting in a BUG() in gfs2_remove_from_journal. Note that Ben had to export vfs function __block_write_full_page to get this to work properly. It's been posted a long time and he talked to various VFS people about it, and nobody seemed to mind. - I contributed 3 patches: o The first one fixes a memory corruptor: a race in which one process can overwrite the gl_object pointer set by another process, causing kernel panic and other symptoms. o The second patch fixes another race that resulted in a false-positive BUG_ON. This occurred when resource group reservations were freed by one process while another process was trying to grab a new reservation in the same resource group. o The third patch fixes a problem with doing journal replay when the journals are not all the same size" * tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes GFS2: Check rs_free with rd_rsspin protection gfs2: writeout truncated pages fs: export __block_write_full_page gfs2: Lock holder cleanup gfs2: Large-filesystem fix for 32-bit systems gfs2: Get rid of gfs2_ilookup gfs2: Fix gfs2_lookup_by_inum lock inversion gfs2: Initialize iopen glock holder for new inodes GFS2: don't set rgrp gl_object until it's inserted into rgrp tree
| * | GFS2: Fix gfs2_replay_incr_blk for multiple journal sizesBob Peterson2016-07-213-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, if you used gfs2_jadd to add new journals of a size smaller than the existing journals, replaying those new journals would withdraw. That's because function gfs2_replay_incr_blk was using the number of journal blocks (jd_block) from the superblock's journal pointer. In other words, "My journal's max size" rather than "the journal we're replaying's size." This patch changes the function to use the size of the pertinent journal rather than always using the journal we happen to be using. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | GFS2: Check rs_free with rd_rsspin protectionBob Peterson2016-07-121-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the last process to close a file opened for write, function gfs2_rsqa_delete was deleting the file's inode's block reservation out of the rgrp reservations tree. Then it was checking to make sure rs_free was 0, but it was performing the check outside the protection of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed to prevent a race between the process freeing the reservation and another who is allocating a new set of blocks inside the same rgrp for the same inode, thus changing its value. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | gfs2: writeout truncated pagesBenjamin Marzinski2016-06-271-15/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When gfs2 attempts to write a page to a file that is being truncated, and notices that the page is completely outside of the file size, it tries to invalidate it. However, this may require a transaction for journaled data files to revoke any buffers from the page on the active items list. Unfortunately, this can happen inside a log flush, where a transaction cannot be started. Also, gfs2 may need to be able to remove the buffer from the ail1 list before it can finish the log flush. To deal with this, when writing a page of a file with data journalling enabled gfs2 now skips the check to see if the write is outside the file size, and simply writes it anyway. This situation can only occur when the truncate code still has the file locked exclusively, and hasn't marked this block as free in the metadata (which happens later in truc_dealloc). After gfs2 writes this page out, the truncation code will shortly invalidate it and write out any revokes if necessary. To do this, gfs2 now implements its own version of block_write_full_page without the check, and calls the newly exported __block_write_full_page. It also no longer calls gfs2_writepage_common from gfs2_jdata_writepage. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | fs: export __block_write_full_pageBenjamin Marzinski2016-06-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gfs2 needs to be able to skip the check to see if a page is outside of the file size when writing it out. gfs2 can get into a situation where it needs to flush its in-memory log to disk while a truncate is in progress. If the file being trucated has data journaling enabled, it is possible that there are data blocks in the log that are past the end of the file. gfs can't finish the log flush without either writing these blocks out or revoking them. Otherwise, if the node crashed, it could overwrite subsequent changes made by other nodes in the cluster when it's journal was replayed. Unfortunately, there is no way to add log entries to the log during a flush. So gfs2 simply writes out the page instead. This situation can only occur when the truncate code still has the file locked exclusively, and hasn't marked this block as free in the metadata (which happens later in truc_dealloc). After gfs2 writes this page out, the truncation code will shortly invalidate it and write out any revokes if necessary. In order to make this work, gfs2 needs to be able to skip the check for writes outside the file size. Since the check exists in block_write_full_page, this patch exports __block_write_full_page, which doesn't have the check. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | gfs2: Lock holder cleanupAndreas Gruenbacher2016-06-279-35/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the code more readable by cleaning up the different ways of initializing lock holders and checking for initialized lock holders: mark lock holders as uninitialized by setting the holder's glock to NULL (gfs2_holder_mark_uninitialized) instead of zeroing out the entire object or using a separate flag. Recognize initialized holders by their non-NULL glock (gfs2_holder_initialized). Don't zero out holder objects which are immeditiately initialized via gfs2_holder_init or gfs2_glock_nq_init. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | gfs2: Large-filesystem fix for 32-bit systemsAndreas Gruenbacher2016-06-271-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ff34245d switched from iget5_locked to iget_locked among other things, but iget_locked doesn't work for filesystems larger than 2^32 blocks on 32-bit systems. Switch back to iget5_locked. Filesystems larger than 2^32 blocks are unrealistic to work well on 32-bit systems, so this is mostly a code cleanliness fix. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | gfs2: Get rid of gfs2_ilookupAndreas Gruenbacher2016-06-274-36/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that gfs2_lookup_by_inum only takes the inode glock for new inodes (and not for cached inodes anymore), there no longer is a need to optimize the cached-inode case in gfs2_get_dentry or delete_work_func, and gfs2_ilookup can be removed. In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this inconsistency goes away as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | gfs2: Fix gfs2_lookup_by_inum lock inversionAndreas Gruenbacher2016-06-275-33/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current gfs2_lookup_by_inum takes the glock of a presumed inode identified by block number, verifies that the block is indeed an inode, and then instantiates and reads the new inode via gfs2_inode_lookup. However, instantiating a new inode may block on freeing a previous instance of that inode (__wait_on_freeing_inode), and freeing an inode requires to take the glock already held, leading to lock inversion and deadlock. Fix this by first instantiating the new inode, then verifying that the block is an inode (if required), and then reading in the new inode, all in gfs2_inode_lookup. If the block we are looking for is not an inode, we discard the new inode via iget_failed, which marks inodes as bad and unhashes them. Other tasks waiting on that inode will get back a bad inode back from ilookup or iget_locked; in that case, retry the lookup. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | gfs2: Initialize iopen glock holder for new inodesAndreas Gruenbacher2016-06-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | In gfs2_init_inode_once, initialize inode->i_iopen_gh.gh_gl to NULL: otherwise, when gfs2_inode_lookup fails, the iopen glock holder can remain unset and iget_failed can end up accessing random memory. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | GFS2: don't set rgrp gl_object until it's inserted into rgrp treeBob Peterson2016-06-101-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, function read_rindex_entry would set a rgrp glock's gl_object pointer to itself before inserting the rgrp into the rgrp rbtree. The problem is: if another process was also reading the rgrp in, and had already inserted its newly created rgrp, then the second call to read_rindex_entry would overwrite that value, then return a bad return code to the caller. Later, other functions would reference the now-freed rgrp memory by way of gl_object. In some cases, that could result in gfs2_rgrp_brelse being called twice for the same rgrp: once for the failed attempt and once for the "real" rgrp release. Eventually the kernel would panic. There are also a number of other things that could go wrong when a kernel module is accessing freed storage. For example, this could result in rgrp corruption because the fake rgrp would point to a fake bitmap in memory too, causing gfs2_inplace_reserve to search some random memory for free blocks, and find some, since we were never setting rgd->rd_bits to NULL before freeing it. This patch fixes the problem by not setting gl_object until we have successfully inserted the rgrp into the rbtree. Also, it sets rd_bits to NULL as it frees them, which will ensure any accidental access to the wrong rgrp will result in a kernel panic rather than file system corruption, which is preferred. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
* | | Merge branch 'overlayfs-linus' of ↵Linus Torvalds2016-07-233-32/+29
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "This contains a fix for a potential crash/corruption issue and another where the suid/sgid bits weren't cleared on write" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: verify upper dentry in ovl_remove_and_whiteout() ovl: Copy up underlying inode's ->i_mode to overlay inode ovl: handle ATTR_KILL*
| * | | ovl: verify upper dentry in ovl_remove_and_whiteout()Maxim Patlasov2016-07-221-30/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The upper dentry may become stale before we call ovl_lock_rename_workdir. For example, someone could (mistakenly or maliciously) manually unlink(2) it directly from upperdir. To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir and and check if it matches the upper dentry. Essentially, it is the same problem and similar solution as in commit 11f3710417d0 ("ovl: verify upper dentry before unlink and rename"). Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
| * | | ovl: Copy up underlying inode's ->i_mode to overlay inodeVivek Goyal2016-07-042-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now when a new overlay inode is created, we initialize overlay inode's ->i_mode from underlying inode ->i_mode but we retain only file type bits (S_IFMT) and discard permission bits. This patch changes it and retains permission bits too. This should allow overlay to do permission checks on overlay inode itself in task context. [SzM] It also fixes clearing suid/sgid bits on write. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
| * | | ovl: handle ATTR_KILL*Miklos Szeredi2016-07-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before 4bacc9c9234c ("overlayfs: Make f_path...") file->f_path pointed to the underlying file, hence suid/sgid removal on write worked fine. After that patch file->f_path pointed to the overlay file, and the file mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid removal simply stopped working. The fix is to copy the mode bits, but then ovl_setattr() needs to clear ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in the next patch copy the mode. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
* | | | xfs: fix type confusion in xfs_ioc_swapextJann Horn2016-07-161-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this check, the following XFS_I invocations would return bad pointers when used on non-XFS inodes (perhaps pointers into preceding allocator chunks). This could be used by an attacker to trick xfs_swap_extents into performing locking operations on attacker-chosen structures in kernel memory, potentially leading to code execution in the kernel. (I have not investigated how likely this is to be usable for an attack in practice.) Signed-off-by: Jann Horn <jann@thejh.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2016-07-127-10/+32
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: posix_acl: de-union a_refcount and a_rcu nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed Use the right predicate in ->atomic_open() instances
| * | | | nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashedAl Viro2016-07-051-3/+25
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | Use the right predicate in ->atomic_open() instancesAl Viro2016-07-057-7/+7
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->atomic_open() can be given an in-lookup dentry *or* a negative one found in dcache. Use d_in_lookup() to tell one from another, rather than d_unhashed(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge tag 'ecryptfs-4.7-rc7-fixes' of ↵Linus Torvalds2016-07-084-20/+23
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull eCryptfs fixes from Tyler Hicks: "Provide a more concise fix for CVE-2016-1583: - Additionally fixes linux-stable regressions caused by the cherry-picking of the original fix Some very minor changes that have queued up: - Fix typos in code comments - Remove unnecessary check for NULL before destroying kmem_cache" * tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: don't allow mmap when the lower fs doesn't support it Revert "ecryptfs: forbid opening files without mmap handler" ecryptfs: fix spelling mistakes eCryptfs: fix typos in comment ecryptfs: drop null test before destroy functions
| * | | | ecryptfs: don't allow mmap when the lower fs doesn't support itJeff Mahoney2016-07-081-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are legitimate reasons to disallow mmap on certain files, notably in sysfs or procfs. We shouldn't emulate mmap support on file systems that don't offer support natively. CVE-2016-1583 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
| * | | | Revert "ecryptfs: forbid opening files without mmap handler"Jeff Mahoney2016-07-071-11/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 2f36db71009304b3f0b95afacd8eba1f9f046b87. It fixed a local root exploit but also introduced a dependency on the lower file system implementing an mmap operation just to open a file, which is a bit of a heavy hammer. The right fix is to have mmap depend on the existence of the mmap handler instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
| * | | | ecryptfs: fix spelling mistakesChris J Arges2016-06-202-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Noticed some minor spelling errors when looking through the code. Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
| * | | | eCryptfs: fix typos in commentWei Yuan2016-06-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Weiyuan <weiyuan.wei@huawei.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
| * | | | ecryptfs: drop null test before destroy functionsJulia Lawall2016-06-201-2/+1
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
* | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2016-07-071-1/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block IO fixes from Jens Axboe: "Three small fixes that have been queued up and tested for this series: - A bug fix for xen-blkfront from Bob Liu, fixing an issue with incomplete requests during migration. - A fix for an ancient issue in retrieving the IO priority of a different PID than self, preventing that task from going away while we access it. From Omar. - A writeback fix from Tahsin, fixing a case where we'd call ihold() with a zero ref count inode" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix use-after-free in sys_ioprio_get() writeback: inode cgroup wb switch should not call ihold() xen-blkfront: save uncompleted reqs in blkfront_resume()
| * | | | writeback: inode cgroup wb switch should not call ihold()Tahsin Erdogan2016-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Asynchronous wb switching of inodes takes an additional ref count on an inode to make sure inode remains valid until switchover is completed. However, anyone calling ihold() must already have a ref count on inode, but in this case inode->i_count may already be zero: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 917 at fs/inode.c:397 ihold+0x2b/0x30 CPU: 1 PID: 917 Comm: kworker/u4:5 Not tainted 4.7.0-rc2+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-8:16) 0000000000000000 ffff88007ca0fb58 ffffffff805990af 0000000000000000 0000000000000000 ffff88007ca0fb98 ffffffff80268702 0000018d000004e2 ffff88007cef40e8 ffff88007c9b89a8 ffff880079e3a740 0000000000000003 Call Trace: [<ffffffff805990af>] dump_stack+0x4d/0x6e [<ffffffff80268702>] __warn+0xc2/0xe0 [<ffffffff802687d8>] warn_slowpath_null+0x18/0x20 [<ffffffff8035b4ab>] ihold+0x2b/0x30 [<ffffffff80367ecc>] inode_switch_wbs+0x11c/0x180 [<ffffffff80369110>] wbc_detach_inode+0x170/0x1a0 [<ffffffff80369abc>] writeback_sb_inodes+0x21c/0x530 [<ffffffff80369f7e>] wb_writeback+0xee/0x1e0 [<ffffffff8036a147>] wb_workfn+0xd7/0x280 [<ffffffff80287531>] ? try_to_wake_up+0x1b1/0x2b0 [<ffffffff8027bb09>] process_one_work+0x129/0x300 [<ffffffff8027be06>] worker_thread+0x126/0x480 [<ffffffff8098cde7>] ? __schedule+0x1c7/0x561 [<ffffffff8027bce0>] ? process_one_work+0x300/0x300 [<ffffffff80280ff4>] kthread+0xc4/0xe0 [<ffffffff80335578>] ? kfree+0xc8/0x100 [<ffffffff809903cf>] ret_from_fork+0x1f/0x40 [<ffffffff80280f30>] ? __kthread_parkme+0x70/0x70 ---[ end trace aaefd2fd9f306bc4 ]--- Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | | | Merge tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfsLinus Torvalds2016-07-071-2/+0
|\ \ \ \ \ | |_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull configfs fix from Christoph Hellwig: "A fix from Marek for ppos handling in configfs_write_bin_file, which was introduced in Linux 4.5, but didn't have any users until recently" * tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs: configfs: Remove ppos increment in configfs_write_bin_file
| * | | | configfs: Remove ppos increment in configfs_write_bin_fileMarek Vasut2016-06-301-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The simple_write_to_buffer() already increments the @ppos on success, see fs/libfs.c simple_write_to_buffer() comment: " On success, the number of bytes written is returned and the offset @ppos advanced by this number, or negative value is returned on error. " If the configfs_write_bin_file() is invoked with @count smaller than the total length of the written binary file, it will be invoked multiple times. Since configfs_write_bin_file() increments @ppos on success, after calling simple_write_to_buffer(), the @ppos is incremented twice. Subsequent invocation of configfs_write_bin_file() will result in the next piece of data being written to the offset twice as long as the length of the previous write, thus creating buffer with "holes" in it. The simple testcase using DTO follows: $ mkdir /sys/kernel/config/device-tree/overlays/1 $ dd bs=1 if=foo.dtbo of=/sys/kernel/config/device-tree/overlays/1/dtbo Without this patch, the testcase will result in twice as big buffer in the kernel, which is then passed to the cfs_overlay_item_dtbo_write() . Signed-off-by: Marek Vasut <marex@denx.de> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Christoph Hellwig <hch@lst.de> Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2016-07-033-1/+31
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fix from Miklos Szeredi: "This makes sure userspace filesystems are not broken by the parallel lookups and readdir feature" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: serialize dirops by default
| * | | | | fuse: serialize dirops by defaultMiklos Szeredi2016-06-303-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Negotiate with userspace filesystems whether they support parallel readdir and lookup. Disable parallelism by default for fear of breaking fuse filesystems. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 9902af79c01a ("parallel lookups: actual switch to rwsem") Fixes: d9b3dbdcfd62 ("fuse: switch to ->iterate_shared()")
* | | | | | Merge branch 'overlayfs-linus' of ↵Linus Torvalds2016-07-032-8/+33
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "This contains fixes for a dentry leak, a regression in 4.6 noticed by Docker users and missing write access checking in truncate" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: warn instead of error if d_type is not supported ovl: get_write_access() in truncate ovl: fix dentry leak for default_permissions
| * | | | | | ovl: warn instead of error if d_type is not supportedVivek Goyal2016-07-031-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | overlay needs underlying fs to support d_type. Recently I put in a patch in to detect this condition and started failing mount if underlying fs did not support d_type. But this breaks existing configurations over kernel upgrade. Those who are running docker (partially broken configuration) with xfs not supporting d_type, are surprised that after kernel upgrade docker does not run anymore. https://github.com/docker/docker/issues/22937#issuecomment-229881315 So instead of erroring out, detect broken configuration and warn about it. This should allow existing docker setups to continue working after kernel upgrade. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 45aebeaf4f67 ("ovl: Ensure upper filesystem supports d_type") Cc: <stable@vger.kernel.org> 4.6
| * | | | | | ovl: get_write_access() in truncateMiklos Szeredi2016-06-291-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When truncating a file we should check write access on the underlying inode. And we should do so on the lower file as well (before copy-up) for consistency. Original patch and test case by Aihua Zhang. - - >o >o - - test.c - - >o >o - - #include <stdio.h> #include <errno.h> #include <unistd.h> int main(int argc, char *argv[]) { int ret; ret = truncate(argv[0], 4096); if (ret != -1) { fprintf(stderr, "truncate(argv[0]) should have failed\n"); return 1; } if (errno != ETXTBSY) { perror("truncate(argv[0])"); return 1; } return 0; } - - >o >o - - >o >o - - >o >o - - Reported-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
| * | | | | | ovl: fix dentry leak for default_permissionsMiklos Szeredi2016-06-291-3/+5
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using the 'default_permissions' mount option, ovl_permission() on non-directories was missing a dput(alias), resulting in "BUG Dentry still in use". Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 8d3095f4ad47 ("ovl: default permissions") Cc: <stable@vger.kernel.org> # v4.5+
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2016-07-014-48/+78
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Tmpfs readdir throughput regression fix (this cycle) + some -stable fodder all over the place. One missing bit is Miklos' tonight locks.c fix - NFS folks had already grabbed that one by the time I woke up ;-)" [ The locks.c fix came through the nfsd tree just moments ago ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: namespace: update event counter when umounting a deleted dentry 9p: use file_dentry() ceph: fix d_obtain_alias() misuses lockless next_positive() libfs.c: new helper - next_positive() dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
| * | | | | | namespace: update event counter when umounting a deleted dentryAndrey Ulanov2016-06-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - m_start() in fs/namespace.c expects that ns->event is incremented each time a mount added or removed from ns->list. - umount_tree() removes items from the list but does not increment event counter, expecting that it's done before the function is called. - There are some codepaths that call umount_tree() without updating "event" counter. e.g. from __detach_mounts(). - When this happens m_start may reuse a cached mount structure that no longer belongs to ns->list (i.e. use after free which usually leads to infinite loop). This change fixes the above problem by incrementing global event counter before invoking umount_tree(). Change-Id: I622c8e84dcb9fb63542372c5dbf0178ee86bb589 Cc: stable@vger.kernel.org Signed-off-by: Andrey Ulanov <andreyu@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | 9p: use file_dentry()Miklos Szeredi2016-06-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v9fs may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. In this case it's a NULL pointer dereference in p9_fid_create(). Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Reported-by: Alessio Igor Bogani <alessioigorbogani@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Alessio Igor Bogani <alessioigorbogani@gmail.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | ceph: fix d_obtain_alias() misusesAl Viro2016-06-241-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on failure d_obtain_alias() will have done iput() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | lockless next_positive()Al Viro2016-06-201-5/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | libfs.c: new helper - next_positive()Al Viro2016-06-201-30/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return nth positive child after given or NULL if there's less than n left. dcache_readdir() and dcache_dir_lseek() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lockAl Viro2016-06-201-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure that directory is locked shared in dcache_dir_lseek(); for dcache_readdir() it's already tru, and that's enough to make simple_positive() stable. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | Merge tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2016-07-012-4/+11
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull lockd/locks fixes from Bruce Fields: "One fix for lockd soft lookups in an error path, and one fix for file leases on overlayfs" * tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux: locks: use file_inode() lockd: unregister notifier blocks if the service fails to come up completely
| * | | | | | | locks: use file_inode()Miklos Szeredi2016-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Another one for the f_path debacle.) ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask. The reason is that generic_add_lease() used filp->f_path.dentry->inode while all the others use file_inode(). This makes a difference for files opened on overlayfs since the former will point to the overlay inode the latter to the underlying inode. So generic_add_lease() added the lease to the overlay inode and generic_delete_lease() removed it from the underlying inode. When the file was released the lease remained on the overlay inode's lock list, resulting in use after free. Reported-by: Eryu Guan <eguan@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | lockd: unregister notifier blocks if the service fails to come up completelyScott Mayhew2016-06-301-3/+10
| | |/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the lockd service fails to start up then we need to be sure that the notifier blocks are not registered, otherwise a subsequent start of the service could cause the same notifier to be registered twice, leading to soft lockups. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Fixes: 0751ddf77b6a "lockd: Register callbacks on the inetaddr_chain..." Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | | | Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds2016-07-011-1/+6
|\ \ \ \ \ \ \ | |_|_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "1/ Two regression fixes since v4.6: one for the byte order of a sysfs attribute (bz121161) and another for QEMU 2.6's NVDIMM _DSM (ACPI Device Specific Method) implementation that gets tripped up by new auto-probing behavior in the NFIT driver. 2/ A fix tagged for -stable that stops the kernel from clobbering/ignoring changes to the configuration of a 'pfn' instance ("struct page" driver). For example changing the alignment from 2M to 1G may silently revert to 2M if that value is currently stored on media. 3/ A fix from Eric for an xfstests failure in dax. It is not currently tagged for -stable since it requires an 8-exabyte file system to trigger, and there appear to be no user visible side effects" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: fix format interface code byte order dax: fix offset overflow in dax_io acpi, nfit: fix acpi_check_dsm() vs zero functions implemented libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment
| * | | | | | dax: fix offset overflow in dax_ioEric Sandeen2016-06-271-1/+6
| | |_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This isn't functionally apparent for some reason, but when we test io at extreme offsets at the end of the loff_t rang, such as in fstests xfs/071, the calculation of "max" in dax_io() can be wrong due to pos + size overflowing. For example, # xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file enters dax_io with: start 0x7ffffffffffff000 end 0x7ffffffffffff200 and the rounded up "size" variable is 0x1000. This yields: pos + size 0x8000000000000000 (overflows loff_t) end 0x7ffffffffffff200 Due to the overflow, the min() function picks the wrong value for the "max" variable, and when we send (max - pos) into i.e. copy_from_iter_pmem() it is also the wrong value. This somehow(tm) gets magically absorbed without incident, probably because iter->count is correct. But it seems best to fix it up properly by comparing the two values as unsigned. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | | | | Merge tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2016-06-298-23/+45
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Anna Schumaker: "Stable bugfixes: - Fix _cancel_empty_pagelist - Fix a double page unlock - Make nfs_atomic_open() call d_drop() on all ->open_context() errors. - Fix another OPEN_DOWNGRADE bug Other bugfixes: - Ensure we handle delegation errors in nfs4_proc_layoutget() - Layout stateids start out as being invalid - Add sparse lock annotations for pnfs_find_alloc_layout - Handle bad delegation stateids in nfs4_layoutget_handle_exception - Fix up O_DIRECT results - Fix potential use after free of state in nfs4_do_reclaim. - Mark the layout stateid invalid when all segments are removed - Don't let readdirplus revalidate an inode that was marked as stale - Fix potential race in nfs_fhget() - Fix an unused variable warning" * tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix another OPEN_DOWNGRADE bug make nfs_atomic_open() call d_drop() on all ->open_context() errors. NFS: Fix an unused variable warning NFS: Fix potential race in nfs_fhget() NFS: Don't let readdirplus revalidate an inode that was marked as stale NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed NFS: Fix a double page unlock pnfs_nfs: fix _cancel_empty_pagelist nfs4: Fix potential use after free of state in nfs4_do_reclaim. NFS: Fix up O_DIRECT results NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exception NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout NFSv4.1/pnfs: Layout stateids start out as being invalid NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()
OpenPOWER on IntegriCloud