summaryrefslogtreecommitdiffstats
path: root/fs/ext4
Commit message (Collapse)AuthorAgeFilesLines
* ext4: fix possible leak of sbi->s_group_desc_leak in error pathTheodore Ts'o2018-11-071-8/+8
| | | | | | | Fixes: bfe0a5f47ada ("ext4: add more mount time checks of the superblock") Reported-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.18
* ext4: remove unneeded brelse call in ext4_xattr_inode_update_ref()Vasily Averin2018-11-061-5/+1
| | | | | Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: avoid possible double brelse() in add_new_gdb() on error pathTheodore Ts'o2018-11-061-0/+1
| | | | | | | Fixes: b40971426a83 ("ext4: add error checking to calls to ...") Reported-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.38
* ext4: avoid buffer leak in ext4_orphan_add() after prior errorsVasily Averin2018-11-061-1/+3
| | | | | | | | | Fixes: d745a8c20c1f ("ext4: reduce contention on s_orphan_lock") Fixes: 6e3617e579e0 ("ext4: Handle non empty on-disk orphan link") Cc: Dmitry Monakhov <dmonakhov@gmail.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.34
* ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty()Vasily Averin2018-11-061-2/+3
| | | | | | | | | | ext4_mark_iloc_dirty() callers expect that it releases iloc->bh even if it returns an error. Fixes: 0db1ff222d40 ("ext4: add shutdown bit and check for it") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.11
* ext4: fix possible inode leak in the retry loop of ext4_resize_fs()Vasily Averin2018-11-061-0/+4
| | | | | | | Fixes: 1c6bd7173d66 ("ext4: convert file system to meta_bg if needed ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.7
* ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizingVasily Averin2018-11-061-1/+1
| | | | | | | Fixes: 117fff10d7f1 ("ext4: grow the s_flex_groups array as needed ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.7
* ext4: add missing brelse() update_backups()'s error pathVasily Averin2018-11-031-1/+3
| | | | | | | Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.19
* ext4: add missing brelse() add_new_gdb_meta_bg()'s error pathVasily Averin2018-11-031-2/+1
| | | | | | | Fixes: 01f795f9e0d6 ("ext4: add online resizing support for meta_bg ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.7
* ext4: add missing brelse() in set_flexbg_block_bitmap()'s error pathVasily Averin2018-11-031-2/+4
| | | | | | | Fixes: 33afdcc5402d ("ext4: add a function which sets up group blocks ...") Cc: stable@kernel.org # 3.3 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: avoid potential extra brelse in setup_new_flex_group_blocks() Vasily Averin2018-11-031-6/+2
| | | | | | | | | | | | | | | Currently bh is set to NULL only during first iteration of for cycle, then this pointer is not cleared after end of using. Therefore rollback after errors can lead to extra brelse(bh) call, decrements bh counter and later trigger an unexpected warning in __brelse() Patch moves brelse() calls in body of cycle to exclude requirement of brelse() call in rollback. Fixes: 33afdcc5402d ("ext4: add a function which sets up group blocks ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.3+
* ext4: fix use-after-free race in ext4_remount()'s error pathTheodore Ts'o2018-10-122-26/+50
| | | | | | | | | | | It's possible for ext4_show_quota_options() to try reading s_qf_names[i] while it is being modified by ext4_remount() --- most notably, in ext4_remount's error path when the original values of the quota file name gets restored. Reported-by: syzbot+a2872d6feea6918008a9@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.2+
* ext4: cache NULL when both default_acl and acl are NULLChengguang Xu2018-10-061-0/+4
| | | | | | | | | | | | | default_acl and acl of newly created inode will be initiated as ACL_NOT_CACHED in vfs function inode_init_always() and later will be updated by calling xxx_init_acl() in specific filesystems. However, when default_acl and acl are NULL then they keep the value of ACL_NOT_CACHED. This patch changes the code to cache NULL for acl / default_acl in this case to save unnecessary ACL lookup attempt. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* ext4: propagate error from dquot_initialize() in EXT4_IOC_FSSETXATTRWang Shilong2018-10-031-1/+3
| | | | | | | | | | | | | We return most failure of dquota_initialize() except inode evict, this could make a bit sense, for example we allow file removal even quota files are broken? But it dosen't make sense to allow setting project if quota files etc are broken. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: fix setattr project check in fssetxattr ioctlWang Shilong2018-10-031-23/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, project quota could be changed by fssetxattr ioctl, and existed permission check inode_owner_or_capable() is obviously not enough, just think that common users could change project id of file, that could make users to break project quota easily. This patch try to follow same regular of xfs project quota: "Project Quota ID state is only allowed to change from within the init namespace. Enforce that restriction only if we are trying to change the quota ID state. Everything else is allowed in user namespaces." Besides that, check and set project id'state should be an atomic operation, protect whole operation with inode lock, ext4_ioctl_setproject() is only used for ioctl EXT4_IOC_FSSETXATTR, we have held mnt_want_write_file() before ext4_ioctl_setflags(), and ext4_ioctl_setproject() is called after ext4_ioctl_setflags(), we could share codes, so remove it inside ext4_ioctl_setproject(). Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org
* ext4: convert fault handler to use vm_fault_t typeSouptick Joarder2018-10-022-16/+17
| | | | | | | | | | | | | Return type of ext4_page_mkwrite and ext4_filemap_fault are changed to use vm_fault_t type. With this patch all the callers of block_page_mkwrite_return() are changed to handle vm_fault_t. So converting the return type of block_page_mkwrite_return() to vm_fault_t. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Matthew Wilcox <willy@infradead.org>
* ext4: initialize retries variable in ext4_da_write_inline_data_begin()Lukas Czerner2018-10-021-1/+1
| | | | | | | | | | | Variable retries is not initialized in ext4_da_write_inline_data_begin() which can lead to nondeterministic number of retries in case we hit ENOSPC. Initialize retries to zero as we do everywhere else. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Fixes: bc0ca9df3b2a ("ext4: retry allocation when inline->extent conversion failed") Cc: stable@kernel.org
* ext4: fix EXT4_IOC_SWAP_BOOTTheodore Ts'o2018-10-021-6/+27
| | | | | | | | | | | | | | | | | | | | | | | | The code EXT4_IOC_SWAP_BOOT ioctl hasn't been updated in a while, and it's a bit broken with respect to more modern ext4 kernels, especially metadata checksums. Other problems fixed with this commit: * Don't allow installing a DAX, swap file, or an encrypted file as a boot loader. * Respect the immutable and append-only flags. * Wait until any DIO operations are finished *before* calling truncate_inode_pages(). * Don't swap inode->i_flags, since these flags have nothing to do with the inode blocks --- and it will give the IMA/audit code heartburn when the inode is evicted. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Reported-by: syzbot+e81ccd4744c6c4f71354@syzkaller.appspotmail.com
* ext4: fix build error when DX_DEBUG is definedGabriel Krisman Bertazi2018-10-021-1/+1
| | | | | | | | | | | | | | Enabling DX_DEBUG triggers the build error below. info is an attribute of the dxroot structure. linux/fs/ext4/namei.c:2264:12: error: ‘info’ undeclared (first use in this function); did you mean ‘insl’? info->indirect_levels)); Fixes: e08ac99fa2a2 ("ext4: add largedir feature") Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
* ext4: fix argument checking in EXT4_IOC_MOVE_EXTTheodore Ts'o2018-10-021-2/+6
| | | | | | | | | | | | | | | | If the starting block number of either the source or destination file exceeds the EOF, EXT4_IOC_MOVE_EXT should return EINVAL. Also fixed the helper function mext_check_coverage() so that if the logical block is beyond EOF, make it return immediately, instead of looping until the block number wraps all the away around. This takes long enough that if there are multiple threads trying to do pound on an the same inode doing non-sensical things, it can end up triggering the kernel's soft lockup detector. Reported-by: syzbot+c61979f6f2cba5cb3c06@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* ext4: fix reserved cluster accounting at page invalidation timeEric Whitney2018-10-013-19/+95
| | | | | | | | | | Add new code to count canceled pending cluster reservations on bigalloc file systems and to reduce the cluster reservation count on all file systems using delayed allocation. This replaces old code in ext4_da_page_release_reservations that was incorrect. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: adjust reserved cluster count when removing extentsEric Whitney2018-10-014-114/+198
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify ext4_ext_remove_space() and the code it calls to correct the reserved cluster count for pending reservations (delayed allocated clusters shared with allocated blocks) when a block range is removed from the extent tree. Pending reservations may be found for the clusters at the ends of written or unwritten extents when a block range is removed. If a physical cluster at the end of an extent is freed, it's necessary to increment the reserved cluster count to maintain correct accounting if the corresponding logical cluster is shared with at least one delayed and unwritten extent as found in the extents status tree. Add a new function, ext4_rereserve_cluster(), to reapply a reservation on a delayed allocated cluster sharing blocks with a freed allocated cluster. To avoid ENOSPC on reservation, a flag is applied to ext4_free_blocks() to briefly defer updating the freeclusters counter when an allocated cluster is freed. This prevents another thread from allocating the freed block before the reservation can be reapplied. Redefine the partial cluster object as a struct to carry more state information and to clarify the code using it. Adjust the conditional code structure in ext4_ext_remove_space to reduce the indentation level in the main body of the code to improve readability. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: reduce reserved cluster count by number of allocated clustersEric Whitney2018-10-013-160/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | Ext4 does not always reduce the reserved cluster count by the number of clusters allocated when mapping a delayed extent. It sometimes adds back one or more clusters after allocation if delalloc blocks adjacent to the range allocated by ext4_ext_map_blocks() share the clusters newly allocated for that range. However, this overcounts the number of clusters needed to satisfy future mapping requests (holding one or more reservations for clusters that have already been allocated) and premature ENOSPC and quota failures, etc., result. Ext4 also does not reduce the reserved cluster count when allocating clusters for non-delayed allocated writes that have previously been reserved for delayed writes. This also results in overcounts. To make it possible to handle reserved cluster accounting for fallocated regions in the same manner as used for other non-delayed writes, do the reserved cluster accounting for them at the time of allocation. In the current code, this is only done later when a delayed extent sharing the fallocated region is finally mapped. Address comment correcting handling of unsigned long long constant from Jan Kara's review of RFC version of this patch. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix reserved cluster accounting at delayed write timeEric Whitney2018-10-015-18/+206
| | | | | | | | | | | | | | The code in ext4_da_map_blocks sometimes reserves space for more delayed allocated clusters than it should, resulting in premature ENOSPC, exceeded quota, and inaccurate free space reporting. Fix this by checking for written and unwritten blocks shared in the same cluster with the newly delayed allocated block. A cluster reservation should not be made for a cluster for which physical space has already been allocated. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add new pending reservation mechanismEric Whitney2018-10-014-0/+249
| | | | | | | | | | Add new pending reservation mechanism to help manage reserved cluster accounting. Its primary function is to avoid the need to read extents from the disk when invalidating pages as a result of a truncate, punch hole, or collapse range operation. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: generalize extents status tree search functionsEric Whitney2018-10-015-72/+163
| | | | | | | | | | | | | | | | | | Ext4 contains a few functions that are used to search for delayed extents or blocks in the extents status tree. Rather than duplicate code to add new functions to search for extents with different status values, such as written or a combination of delayed and unwritten, generalize the existing code to search for caller-specified extents status values. Also, move this code into extents_status.c where it is better associated with the data structures it operates upon, and where it can be more readily used to implement new extents status tree functions that might want a broader scope for i_es_lock. Three missing static specifiers in RFC version of patch reported and fixed by Fengguang Wu <fengguang.wu@intel.com>. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* Merge tag 'ext4_for_linus_stable' of ↵Greg Kroah-Hartman2018-09-178-26/+72
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Ted writes: Various ext4 bug fixes; primarily making ext4 more robust against maliciously crafted file systems, and some DAX fixes. * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4, dax: set ext4_dax_aops for dax files ext4, dax: add ext4_bmap to ext4_dax_aops ext4: don't mark mmp buffer head dirty ext4: show test_dummy_encryption mount option in /proc/mounts ext4: close race between direct IO and ext4_break_layouts() ext4: fix online resizing for bigalloc file systems with a 1k block size ext4: fix online resize's handling of a too-small final block group ext4: recalucate superblock checksum after updating free blocks/inodes ext4: avoid arithemetic overflow that can trigger a BUG ext4: avoid divide by zero fault when deleting corrupted inline directories ext4: check to make sure the rename(2)'s destination is not freed ext4: add nonstring annotations to ext4.h
| * ext4, dax: set ext4_dax_aops for dax filesToshi Kani2018-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sync syscall to DAX file needs to flush processor cache, but it currently does not flush to existing DAX files. This is because 'ext4_da_aops' is set to address_space_operations of existing DAX files, instead of 'ext4_dax_aops', since S_DAX flag is set after ext4_set_aops() in the open path. New file -------- lookup_open ext4_create __ext4_new_inode ext4_set_inode_flags // Set S_DAX flag ext4_set_aops // Set aops to ext4_dax_aops Existing file ------------- lookup_open ext4_lookup ext4_iget ext4_set_aops // Set aops to ext4_da_aops ext4_set_inode_flags // Set S_DAX flag Change ext4_iget() to initialize i_flags before ext4_set_aops(). Fixes: 5f0663bb4a64 ("ext4, dax: introduce ext4_dax_aops") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Suggested-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
| * ext4, dax: add ext4_bmap to ext4_dax_aopsToshi Kani2018-09-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ext4 mount path calls .bmap to the journal inode. This currently works for the DAX mount case because ext4_iget() always set 'ext4_da_aops' to any regular files. In preparation to fix ext4_iget() to set 'ext4_dax_aops' for ext4 DAX files, add ext4_bmap() to 'ext4_dax_aops', since bmap works for DAX inodes. Fixes: 5f0663bb4a64 ("ext4, dax: introduce ext4_dax_aops") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Suggested-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
| * ext4: don't mark mmp buffer head dirtyLi Dongyang2018-09-151-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marking mmp bh dirty before writing it will make writeback pick up mmp block later and submit a write, we don't want the duplicate write as kmmpd thread should have full control of reading and writing the mmp block. Another reason is we will also have random I/O error on the writeback request when blk integrity is enabled, because kmmpd could modify the content of the mmp block(e.g. setting new seq and time) while the mmp block is under I/O requested by writeback. Signed-off-by: Li Dongyang <dongyangli@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org
| * ext4: show test_dummy_encryption mount option in /proc/mountsEric Biggers2018-09-151-0/+2
| | | | | | | | | | | | | | | | | | When in effect, add "test_dummy_encryption" to _ext4_show_options() so that it is shown in /proc/mounts and other relevant procfs files. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: close race between direct IO and ext4_break_layouts()Ross Zwisler2018-09-111-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the refcount of a page is lowered between the time that it is returned by dax_busy_page() and when the refcount is again checked in ext4_break_layouts() => ___wait_var_event(), the waiting function ext4_wait_dax_page() will never be called. This means that ext4_break_layouts() will still have 'retry' set to false, so we'll stop looping and never check the refcount of other pages in this inode. Instead, always continue looping as long as dax_layout_busy_page() gives us a page which it found with an elevated refcount. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: fix online resizing for bigalloc file systems with a 1k block sizeTheodore Ts'o2018-09-031-1/+2
| | | | | | | | | | | | | | | | | | | | An online resize of a file system with the bigalloc feature enabled and a 1k block size would be refused since ext4_resize_begin() did not understand s_first_data_block is 0 for all bigalloc file systems, even when the block size is 1k. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: fix online resize's handling of a too-small final block groupTheodore Ts'o2018-09-031-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid growing the file system to an extent so that the last block group is too small to hold all of the metadata that must be stored in the block group. This problem can be triggered with the following reproducer: umount /mnt mke2fs -F -m0 -b 4096 -t ext4 -O resize_inode,^has_journal \ -E resize=1073741824 /tmp/foo.img 128M mount /tmp/foo.img /mnt truncate --size 1708M /tmp/foo.img resize2fs /dev/loop0 295400 umount /mnt e2fsck -fy /tmp/foo.img Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: recalucate superblock checksum after updating free blocks/inodesTheodore Ts'o2018-09-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When mounting the superblock, ext4_fill_super() calculates the free blocks and free inodes and stores them in the superblock. It's not strictly necessary, since we don't use them any more, but it's nice to keep them roughly aligned to reality. Since it's not critical for file system correctness, the code doesn't call ext4_commit_super(). The problem is that it's in ext4_commit_super() that we recalculate the superblock checksum. So if we're not going to call ext4_commit_super(), we need to call ext4_superblock_csum_set() to make sure the superblock checksum is consistent. Most of the time, this doesn't matter, since we end up calling ext4_commit_super() very soon thereafter, and definitely by the time the file system is unmounted. However, it doesn't work in this sequence: mke2fs -Fq -t ext4 /dev/vdc 128M mount /dev/vdc /vdc cp xfstests/git-versions /vdc godown /vdc umount /vdc mount /dev/vdc tune2fs -l /dev/vdc With this commit, the "tune2fs -l" no longer fails. Reported-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: avoid arithemetic overflow that can trigger a BUGTheodore Ts'o2018-09-012-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | A maliciously crafted file system can cause an overflow when the results of a 64-bit calculation is stored into a 32-bit length parameter. https://bugzilla.kernel.org/show_bug.cgi?id=200623 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org
| * ext4: avoid divide by zero fault when deleting corrupted inline directoriesTheodore Ts'o2018-08-272-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A specially crafted file system can trick empty_inline_dir() into reading past the last valid entry in a inline directory, and then run into the end of xattr marker. This will trigger a divide by zero fault. Fix this by using the size of the inline directory instead of dir->i_size. Also clean up error reporting in __ext4_check_dir_entry so that the message is clearer and more understandable --- and avoids the division by zero trap if the size passed in is zero. (I'm not sure why we coded it that way in the first place; printing offset % size is actually more confusing and less useful.) https://bugzilla.kernel.org/show_bug.cgi?id=200933 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org
| * ext4: check to make sure the rename(2)'s destination is not freedTheodore Ts'o2018-08-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | If the destination of the rename(2) system call exists, the inode's link count (i_nlinks) must be non-zero. If it is, the inode can end up on the orphan list prematurely, leading to all sorts of hilarity, including a use-after-free. https://bugzilla.kernel.org/show_bug.cgi?id=200931 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org
| * ext4: add nonstring annotations to ext4.hTheodore Ts'o2018-08-271-3/+14
| | | | | | | | | | | | | | | | | | | | This suppresses some false positives in gcc 8's -Wstringop-truncation Suggested by Miguel Ojeda (hopefully the __nonstring definition will eventually get accepted in the compiler-gcc.h header file). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
* | ext4: readpages() should submit IO as read-aheadJens Axboe2018-08-173-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a_ops->readpages() is only ever used for read-ahead. Ensure that we pass this information down to the block layer. Link: http://lkml.kernel.org/r/20180621010725.17813-5-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <clm@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | dax: remove VM_MIXEDMAP for fsdax and device daxDave Jiang2018-08-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is reworked from an earlier patch that Dan has posted: https://patchwork.kernel.org/patch/10131727/ VM_MIXEDMAP is used by dax to direct mm paths like vm_normal_page() that the memory page it is dealing with is not typical memory from the linear map. The get_user_pages_fast() path, since it does not resolve the vma, is already using {pte,pmd}_devmap() as a stand-in for VM_MIXEDMAP, so we use that as a VM_MIXEDMAP replacement in some locations. In the cases where there is no pte to consult we fallback to using vma_is_dax() to detect the VM_MIXEDMAP special case. Now that we have explicit driver pfn_t-flag opt-in/opt-out for get_user_pages() support for DAX we can stop setting VM_MIXEDMAP. This also means we no longer need to worry about safely manipulating vm_flags in a future where we support dynamically changing the dax mode of a file. DAX should also now be supported with madvise_behavior(), vma_merge(), and copy_page_range(). This patch has been tested against ndctl unit test. It has also been tested against xfstests commit: 625515d using fake pmem created by memmap and no additional issues have been observed. Link: http://lkml.kernel.org/r/152847720311.55924.16999195879201817653.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-blockLinus Torvalds2018-08-142-4/+7
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ...
| * block: Define and use STAT_READ and STAT_WRITEMichael Callahan2018-07-182-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add defines for STAT_READ and STAT_WRITE for indexing the partition stat entries. This clarifies some fs/ code which has hardcoded 1 for STAT_WRITE and will make it easier to extend the stats with additional fields. tj: Refreshed on top of v4.17. Signed-off-by: Michael Callahan <michaelcallahan@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | ext4: remove unneeded variable "err" in ext4_mb_release_inode_pa()zhong jiang2018-08-041-2/+1
| | | | | | | | | | | | | | The err is not used after initalization. So just remove the variable. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: improve code readability in ext4_iget()Liu Song2018-08-021-10/+7
| | | | | | | | | | | | | | | | Merge the duplicated complex conditions to improve code readability. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
* | ext4: fix spectre gadget in ext4_mb_regular_allocator()Jeremy Cline2018-08-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'ac->ac_g_ex.fe_len' is a user-controlled value which is used in the derivation of 'ac->ac_2order'. 'ac->ac_2order', in turn, is used to index arrays which makes it a potential spectre gadget. Fix this by sanitizing the value assigned to 'ac->ac2_order'. This covers the following accesses found with the help of smatch: * fs/ext4/mballoc.c:1896 ext4_mb_simple_scan_group() warn: potential spectre issue 'grp->bb_counters' [w] (local cap) * fs/ext4/mballoc.c:445 mb_find_buddy() warn: potential spectre issue 'EXT4_SB(e4b->bd_sb)->s_mb_offsets' [r] (local cap) * fs/ext4/mballoc.c:446 mb_find_buddy() warn: potential spectre issue 'EXT4_SB(e4b->bd_sb)->s_mb_maxs' [r] (local cap) Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: check for NUL characters in extended attribute's nameTheodore Ts'o2018-08-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extended attribute names are defined to be NUL-terminated, so the name must not contain a NUL character. This is important because there are places when remove extended attribute, the code uses strlen to determine the length of the entry. That should probably be fixed at some point, but code is currently really messy, so the simplest fix for now is to simply validate that the extended attributes are sane. https://bugzilla.kernel.org/show_bug.cgi?id=200401 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: use ext4_warning() for sb_getblk failureWang Shilong2018-08-012-6/+6
| | | | | | | | | | | | | | | | | | Out of memory should not be considered as critical errors; so replace ext4_error() with ext4_warnig(). Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: fix race when setting the bitmap corrupted flagWang Shilong2018-07-291-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever we hit block or inode bitmap corruptions we set bit and then reduce this block group free inode/clusters counter to expose right available space. However some of ext4_mark_group_bitmap_corrupted() is called inside group spinlock, some are not, this could make it happen that we double reduce one block group free counters from system. Always hold group spinlock for it could fix it, but it looks a little heavy, we could use test_and_set_bit() to fix race problems here. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | ext4: reset error code in ext4_find_entry in fallbackEric Sandeen2018-07-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ext4_find_entry() falls back to "searching the old fashioned way" due to a corrupt dx dir, it needs to reset the error code to NULL so that the nonstandard ERR_BAD_DX_DIR code isn't returned to userspace. https://bugzilla.kernel.org/show_bug.cgi?id=199947 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@yandex.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
OpenPOWER on IntegriCloud