| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In the bio completion routine, we should not be setting
PageUptodate at all -- it's set at sys_write() time, and is
unaffected by success/failure of the write to disk.
This can cause a page corruption bug when the file system's
block size is less than the architecture's VM page size.
if we have only written a single block -- we might end up
setting the page's PageUptodate flag, indicating that page
is completely read into memory, which may not be true.
This could cause subsequent reads to get bad data.
This commit also takes the opportunity to clean up error
handling in ext4_end_bio(), and remove some extraneous code:
- fixes ext4_end_bio() to set AS_EIO in the
page->mapping->flags on error, which was left out by
mistake. This is needed so that fsync() will
return an error if there was an I/O error.
- remove the clear_buffer_dirty() call on unmapped
buffers for each page.
- consolidate page/buffer error handling in a single
section.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Jim Meyering <jim@meyering.net>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Mingming Cao <cmm@us.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Provide better emulation for ext[23] mode by enforcing that the file
system does not have any unsupported file system features as defined
by ext[23] when emulating the ext[23] file system driver when
CONFIG_EXT4_USE_FOR_EXT23 is defined.
This causes the file system type information in /proc/mounts to be
correct for the automatically mounted root file system. This also
means that "mount -t ext2 /dev/sda /mnt" will fail if /dev/sda
contains an ext3 or ext4 file system, just as one would expect if the
original ext2 file system driver were in use.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|/
|
|
|
|
|
|
| |
Add missing page_cache_release in the error path of ext4_mb_load_buddy
Signed-off-by: Yang Ruirui <ruirui.r.yang@tieto.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix data corruption regression by reverting commit 6de9843dab3f
ext4: Allow indirect-block file to grow the file size to max file size
ext4: allow an active handle to be started when freezing
ext4: sync the directory inode in ext4_sync_parent()
ext4: init timer earlier to avoid a kernel panic in __save_error_info
jbd2: fix potential memory leak on transaction commit
ext4: fix a double free in ext4_register_li_request
ext4: fix credits computing for indirect mapped files
ext4: remove unnecessary [cm]time update of quota file
jbd2: move bdget out of critical section
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Revert commit 6de9843dab3f2a1d4d66d80aa9e5782f80977d20, since it
caused a data corruption regression with BitTorrent downloads. Thanks
to Damien for discovering and bisecting to find the problem commit.
https://bugzilla.kernel.org/show_bug.cgi?id=32972
Reported-by: Damien Grassart <damien@grassart.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We can create 4402345721856 byte file with indirect block mapping.
However, if we grow an indirect-block file to the size with ftruncate(),
we can see an ext4 warning. The following patch fixes this problem.
How to reproduce:
# dd if=/dev/zero of=/mnt/mp1/hoge bs=1 count=0 seek=4402345721856
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000221428 s, 0.0 kB/s
# tail -n 1 /var/log/messages
Nov 25 15:10:27 test kernel: EXT4-fs warning (device sda8): ext4_block_to_path:345: block 1074791436 > max in inode 12
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
ext4_journal_start_sb() should not prevent an active handle from being
started due to s_frozen. Otherwise, deadlock is easy to happen, below
is a situation.
================================================
freeze | truncate
================================================
| ext4_ext_truncate()
freeze_super() | starts a handle
sets s_frozen |
| ext4_ext_truncate()
| holds i_data_sem
ext4_freeze() |
waits for updates |
| ext4_free_blocks()
| calls dquot_free_block()
|
| dquot_free_blocks()
| calls ext4_dirty_inode()
|
| ext4_dirty_inode()
| trys to start an active
| handle
|
| block due to s_frozen
================================================
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Amir Goldstein <amir73il@users.sf.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*. This patch fixes this.
Also now return error status from ext4_sync_parent().
I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client. Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on > 6 runs, I see zero files being lost.
Google-Bug-Id: 4179519
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
During mount, when we fail to open journal inode or root inode, the
__save_error_info will mod_timer. But actually s_err_report isn't
initialized yet and the kernel oops. The detailed information can
be found https://bugzilla.kernel.org/show_bug.cgi?id=32082.
The best way is to check whether the timer s_err_report is initialized
or not. But it seems that in include/linux/timer.h, we can't find a
good function to check the status of this timer, so this patch just
move the initializtion of s_err_report earlier so that we can avoid
the kernel panic. The corresponding del_timer is also added in the
error path.
Reported-by: Sami Liedes <sliedes@cc.hut.fi>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In ext4_register_li_request, we malloc a ext4_li_request and
inserts it into ext4_li_info->li_request_list. In case of any
error later, we free it in the end. But if we have some error
in ext4_run_lazyinit_thread, the whole li_request_list will be
dropped and freed in it. So we will double free this ext4_li_request.
This patch just sets elr to NULL after it is inserted to the list
so that the latter kfree won't double free it.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When writing a contiguous set of blocks, two indirect blocks could be
needed depending on how the blocks are aligned, so we need to increase
the number of credits needed by one.
[ Also fixed a another bug which could further underestimate the
number of journal credits needed by 1; the code was using integer
division instead of DIV_ROUND_UP() -- tytso]
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It is not necessary to update [cm]time of quota file on each quota
file write and it wastes journal space and IO throughput with inode
writes. So just remove the updating from ext4_quota_write() and only
update times when quotas are being turned off. Userspace cannot get
anything reliable from quota files while they are used by the kernel
anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|/
|
|
|
|
| |
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits)
ext4: fix a BUG in mb_mark_used during trim.
ext4: unused variables cleanup in fs/ext4/extents.c
ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep()
ext4: add more tracepoints and use dev_t in the trace buffer
ext4: don't kfree uninitialized s_group_info members
ext4: add missing space in printk's in __ext4_grp_locked_error()
ext4: add FITRIM to compat_ioctl.
ext4: handle errors in ext4_clear_blocks()
ext4: unify the ext4_handle_release_buffer() api
ext4: handle errors in ext4_rename
jbd2: add COW fields to struct jbd2_journal_handle
jbd2: add the b_cow_tid field to journal_head struct
ext4: Initialize fsync transaction ids in ext4_new_inode()
ext4: Use single thread to perform DIO unwritten convertion
ext4: optimize ext4_bio_write_page() when no extent conversion is needed
ext4: skip orphan cleanup if fs has unknown ROCOMPAT features
ext4: use the nblocks arg to ext4_truncate_restart_trans()
ext4: fix missing iput of root inode for some mount error paths
ext4: make FIEMAP and delayed allocation play well together
ext4: suppress verbose debugging information if malloc-debug is off
...
Fi up conflicts in fs/ext4/super.c due to workqueue changes
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In a bs=4096 volume, if we call FITRIM with the following parameter as
fstrim_range(start = 102400, len = 134144000, minlen = 10240),
we will trigger this BUG_ON:
BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
Mar 4 00:55:52 boyu-tm kernel: ------------[ cut here ]------------
Mar 4 00:55:52 boyu-tm kernel: kernel BUG at fs/ext4/mballoc.c:1506!
Mar 4 01:21:09 boyu-tm kernel: Code: d4 00 00 00 00 49 89 fe 8b 56 0c 44 8b 7e 04 89 55 c4 48 8b 4f 28 89 d6 44 01 fe 48 63 d6 48 8b 41 18 48 c1 e0 03 48 39 c2 76 04 <0f> 0b eb fe 48 8b 55 b0 8b 47 34 3b 42 08 74 04 0f 0b eb fe 48
Mar 4 01:21:09 boyu-tm kernel: RIP [<ffffffffa053eb42>] mb_mark_used+0x47/0x26c [ext4]
Mar 4 01:21:09 boyu-tm kernel: RSP <ffff880121e45c38>
Mar 4 01:21:09 boyu-tm kernel: ---[ end trace 9f461696f6a9dcf2 ]---
Fix this bug by doing the accounting correctly.
Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
ext4 extents cleanup:
. remove unused `*ex' from check_eofblocks_fl
. remove unused `*eh' from ext4_ext_map_blocks
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| | |
The map_bh() call will have already set the buffer_head to mapped.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Add more ext4 tracepoints.
- Change ext4 tracepoints to use dev_t field with MAJOR/MINOR macros
so that we can save 4 bytes in the ring buffer on some platforms.
- Add sync_mode to ext4_da_writepages, ext4_da_write_pages, and
ext4_da_writepages_result tracepoints. Also remove for_reclaim
field from ext4_da_writepages since it is usually not very useful.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We can call kfree on uninitialized members of the s_group_info array
on an the error path. We can avoid this by kzalloc'ing the array.
This doesn't entirely solve the oops on mount if we fail down this
path; failed_mount4: frees the sbi, for one, which gets referenced
later in the failed mount paths - I haven't worked that out yet.
https://bugzilla.kernel.org/show_bug.cgi?id=30872
Reported-by: Eugene A. Shatokhin <dame_eugene@mail.ru>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When we do performence-testing on ext4 filesystem, we observed a
warning like this:
EXT4-fs error (device sda7): ext4_mb_generate_buddy:718: group 259825901 blocks in bitmap, 26057 in gd
instead, it should be
"group 2598, 25901 blocks in bitmap, 26057 in gd"
Reviewed-by: Coly Li <bosong.ly@taobao.com>
Cc: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| | |
FITRIM isn't added in compat_ioctl. So a 32 bit program can't be executed
in a 64 bit platform. Add it in the compat_ioctl.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Checking return code from ext4_journal_get_write_access() is important
with snapshots, because this function invokes COW, so may return new
errors, such as ENOSPC.
ext4_clear_blocks() now returns < 0 for fatal errors, in which case,
ext4_free_data() is aborted.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are two wrapper functions which do exactly the same thing:
ext4_journal_release_buffer(), and ext4_handle_release_buffer(). In
addition, ext4_xattr_block_set() calls jbd2_journal_release_buffer()
directly.
Unify all of the code to use ext4_handle_release_buffer(), and get rid
of ext4_journal_release_buffer().
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Checking return code from ext4_journal_get_write_access() is important
with snapshots, because this function invokes COW, so may return new
errors, such as ENOSPC.
We move the call to ext4_journal_get_write_access earlier in the
function, to simplify error handling in the case that this function
returns returns an error.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When allocating a new inode, we need to make sure i_sync_tid and
i_datasync_tid are initialized. Otherwise, one or both of these two
values could be left initialized to zero, which could potentially
result in BUG_ON in jbd2_journal_commit_transaction.
(This could happen by having journal->commit_request getting set to
zero, which could wake up the kjournald process even though there is
no running transaction, which then causes a BUG_ON via the
J_ASSERT(j_ruinning_transaction != NULL) statement.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While running ext4 testing on multiple core, we found there are per
cpu ext4-dio-unwritten threads processing conversion from unwritten
extents to written for IOs completed from async direct IO patch. Per
filesystem is enough, we don't need per cpu threads to work on
conversion.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If no extent conversion is required, wake up any processes waiting for
the page's writeback to be complete and free the ext4_io_end structure
directly in ext4_end_bio() instead of dropping it on the linked list
(which requires taking a spinlock to queue and dequeue the io_end
structure), and waiting for the workqueue to do this work.
This removes an extra scheduling delay before process waiting for an
fsync() to complete gets woken up, and it also reduces the CPU
overhead for a random write workload.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Orphan cleanup is currently executed even if the file system has some
number of unknown ROCOMPAT features, which deletes inodes and frees
blocks, which could be very bad for some RO_COMPAT features,
especially the SNAPSHOT feature.
This patch skips the orphan cleanup if it contains readonly compatible
features not known by this ext4 implementation, which would prevent
the fs from being mounted (or remounted) readwrite.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
nblocks is passed into ext4_truncate_restart_trans() from
ext4_ext_truncate_extend_restart() with a value different from the default
blocks_for_truncate(), but is being ignored.
The two other calls to ext4_truncate_restart_trans() already pass the
default value, which is then being recalculated inside the function.
Fix the problem by using the passed argument.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This assures that the root inode is not leaked, and that sb->s_root is
NULL, which will prevent generic_shutdown_super() from doing extra
work, including call sync_filesystem, which ultimately results in
ext4_sync_fs() getting called with an uninitialized struct super,
which is the cause of the crash noted in Kernel Bugzilla #26752.
https://bugzilla.kernel.org/show_bug.cgi?id=26752
Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix the FIEMAP ioctl so that it returns all of the page ranges which
are still subject to delayed allocation. We were missing some cases
if the file was sparse.
Reported by Chris Mason <chris.mason@oracle.com>:
>We've had reports on btrfs that cp is giving us files full of zeros
>instead of actually copying them. It was tracked down to a bug with
>the btrfs fiemap implementation where it was returning holes for
>delalloc ranges.
>
>Newer versions of cp are trusting fiemap to tell it where the holes
>are, which does seem like a pretty neat trick.
>
>I decided to give xfs and ext4 a shot with a few tests cases too, xfs
>passed with all the ones btrfs was getting wrong, and ext4 got the basic
>delalloc case right.
>$ mkfs.ext4 /dev/xxx
>$ mount /dev/xxx /mnt
>$ dd if=/dev/zero of=/mnt/foo bs=1M count=1
>$ fiemap-test foo
>ext: 0 logical: [ 0.. 255] phys: 0.. 255
>flags: 0x007 tot: 256
>
>Horray! But once we throw a hole in, things go bad:
>$ mkfs.ext4 /dev/xxx
>$ mount /dev/xxx /mnt
>$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
>$ fiemap-test foo
>< no output >
>
>We've got a delalloc extent after the hole and ext4 fiemap didn't find
>it. If I run sync to kick the delalloc out:
>$sync
>$ fiemap-test foo
>ext: 0 logical: [ 256.. 511] phys: 34048.. 34303
>flags: 0x001 tot: 256
>
>fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
>got there. It's full of pretty comments so I know it isn't mine, but
>you can grab it here:
>
>http://oss.oracle.com/~mason/fiemap-test.c
>
>xfsqa has a fiemap program too.
After Fix, test results are as follows:
ext: 0 logical: [ 256.. 511] phys: 0.. 255
flags: 0x007 tot: 256
ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
flags: 0x001 tot: 256
$ mkfs.ext4 /dev/xxx
$ mount /dev/xxx /mnt
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
$ sync
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=3
$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=5
$ fiemap-test foo
ext: 0 logical: [ 256.. 511] phys: 33280.. 33535
flags: 0x000 tot: 256
ext: 1 logical: [ 768.. 1023] phys: 0.. 255
flags: 0x006 tot: 256
ext: 2 logical: [ 1280.. 1535] phys: 0.. 255
flags: 0x007 tot: 256
Tested-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If CONFIG_EXT4_DEBUG is enabled, then if a block allocation fails due
to disk being full, a verbose debugging message is printed, even if
the malloc-debug switch has not been enabled. Suppress the debugging
message so that nothing is printed unless malloc-debug has been turned
on.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In ext4_bio_write_page(), if the memory allocation for the struct
ext4_io_page fails, it returns with the page's PageWriteback flag set.
This will end up causing the page not to skip writeback in
WB_SYNC_NONE mode, and in WB_SYNC_ALL mode (i.e., on a sync, fsync, or
umount) the writeback daemon will get stuck forever on the
wait_on_page_writeback() function in write_cache_pages_da().
Or, if journalling is enabled and the file gets deleted, it the
journal thread can get stuck in journal_finish_inode_data_buffers()
call to filemap_fdatawait().
Another place where things can get hung up is in
truncate_inode_pages(), called out of ext4_evict_inode().
Fix this by not setting PageWriteback until after we have successfully
allocated the struct ext4_io_page.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| | |
Move the initialization of all of the fields of the mpd structure to
write_cache_pages_da(). This simplifies the code considerably.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we have accumulated a contiguous region of memory to be written
out, and the next page can added to this region, don't bother locking
(and then unlocking the page) before writing out the memory. In the
unlikely event that the next page was being written back by some other
CPU, we can also skip waiting that page to finish writeback.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Because the ext4 page writeback codepath had been prematurely calling
clear_page_dirty_for_io(), if it turned out that a particular page
couldn't be written out during a particular pass of
write_cache_pages_da(), the page would have to get redirtied by
calling redirty_pages_for_writeback(). Not only was this wasted work,
but redirty_page_for_writeback() would increment wbc->pages_skipped to
signal to writeback_sb_inodes() that buffers were locked, and that it
should skip this inode until later.
Since this signal was incorrect in ext4's case --- which was caused by
ext4's historically incorrect use of write_cache_pages() ---
ext4_da_writepages() saved and restored wbc->skipped_pages to avoid
confusing writeback_sb_inodes().
Now that we've fixed ext4 to call clear_page_dirty_for_io() right
before initiating the page I/O, we can nuke the page_skipped
save/restore hackery, and breathe a sigh of relief.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move when we call clear_page_dirty_for_io() to just before we actually
write the page. This simplifies the code somewhat, and avoids marking
pages as clean and then needing to remark them as dirty later.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| | |
Eliminate duplicate code, unneeded variables, etc., to make it easier
to understand the code. No behavioral changes were made in this patch.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Fold the __mpage_da_writepage() function into write_cache_pages_da().
This will give us opportunities to clean up and simplify the resulting
code.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| | |
Now that we've fixed the file corruption bug in commit d50bdd5aa55,
it's time to enable mblk_io_submit by default.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If ext4_da_block_invalidatepages() is called because of a
failure from ext4_map_blocks() in mpage_da_map_and_submit(),
it's supposed to clean up -- including unlock -- all the
pages in the mpd structure. But these values may not match
up, even on a system in which block size == page size:
mpd->b_blocknr != mpd->first_page
mpd->b_size != (mpd->next_page - mpd->first_page)
ext4_da_block_invalidatepages() has been using b_blocknr and
b_size; this patch changes it to use first_page and
next_page.
Tested: I injected a small number (5%) of failures in
ext4_map_blocks() in the case that the flags contain
EXT4_GET_BLOCKS_DELALLOC_RESERVE, and ran fsstress on this
kernel. Without this patch, I got hung tasks every time.
With this patch, I see no hangs in many runs of fsstress.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In mpage_da_map_and_submit(), if we have a delayed block
allocation failure from ext4_map_blocks(), we need to mark
the IO as complete, by setting
mpd->io_done = 1;
Otherwise, we could end up submitting the pages in an outer
loop; since they are unlocked on mapping failure in
ext4_da_block_invalidatepages(), this will cause a bug check
in mpage_da_submit_io().
I tested this by injected failures into ext4_map_blocks().
Without this patch, a simple fsstress run will bug check;
with the patch, it works fine.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In ext4_mb_check_group_pa(), the current preallocation space is
replaced with a new preallocation space when the two have the same
distance from the goal block.
This doesn't actually gain us anything, so change things so that the
function only switches to the new preallocation group if its distance
from the goal block is strictly smaller than the current preallocaiton
group's distance from the goal block.
Signed-off-by: Coly Li <bosong.ly@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| | |
Signed-off-by: Coly Li <bosong.ly@taobao.com>
Cc: Alex Tomas <alex@clusterfs.com>
Cc: Theodore Tso <tytso@google.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds comments to ext4_mb_mark_free_simple to make it more
understandable.
Signed-off-by: Coly Li <bosong.ly@taobao.com>
Cc: Alex Tomas <alex@clusterfs.com>
Cc: Theodore Tso <tytso@google.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In __mb_check_buddy(), look at the code below:
591 fstart = -1;
592 buddy = mb_find_buddy(e4b, 0, &max);
593 for (i = 0; i < max; i++) {
594 if (!mb_test_bit(i, buddy)) {
595 MB_CHECK_ASSERT(i >= e4b->bd_info->bb_first_free);
596 if (fstart == -1) {
597 fragments++;
598 fstart = i;
599 }
600 continue;
601 }
602 fstart = -1;
603 /* check used bits only */
604 for (j = 0; j < e4b->bd_blkbits + 1; j++) {
605 buddy2 = mb_find_buddy(e4b, j, &max2);
606 k = i >> j;
607 MB_CHECK_ASSERT(k < max2);
608 MB_CHECK_ASSERT(mb_test_bit(k, buddy2));
609 }
610 }
611 MB_CHECK_ASSERT(!EXT4_MB_GRP_NEED_INIT(e4b->bd_info));
612 MB_CHECK_ASSERT(e4b->bd_info->bb_fragments == fragments);
613
614 grp = ext4_get_group_info(sb, e4b->bd_group);
615 buddy = mb_find_buddy(e4b, 0, &max);
On line 592, buddy is fetched by mb_find_buddy() with order 0, between
line 593 to line 615, buddy is not changed, therefore there is
no need to fetch buddy again from mb_find_buddy() with order 0 again.
We can safely remove the second mb_find_buddy() on line 615.
Signed-off-by: Coly Li <bosong.ly@taobao.com>
Cc: Alex Tomas <alex@clusterfs.com>
Cc: Theodore Tso <tytso@google.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Current code calculate max no matter whether order is zero, it's
unnecessary. This cleanup patch sets max to "1 << (e4b->bd_blkbits
+ 3)" only when order == 0.
Signed-off-by: Coly Li <bosong.ly@taobao.com>
Cc: Alex Tomas <alex@clusterfs.com>
Cc: Theodore Tso <tytso@google.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There's no good reason to require the extra step of providing
a mount option for acl or user_xattr once the feature is configured
on; no other filesystem that I know of requires this.
Userspace patches have set these options in default mount options,
and this patch makes them default in the kernel. At some point
we can start to deprecate the options, perhaps.
For now I've removed default mount option checks in show_options()
to be explicit about what's set, since it's changing the default,
but I'm open to alternatives if desired.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Discard granularity tells us the minimum size of extent that can be
discarded by the device. If the user supplies a minimum extent that
should be discarded (range.minlen) which is smaller than the discard
granularity, increase minlen to the discard granularity, since there's
no point submitting trim requests that the device will reject anyway.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For a device that does not support discard, the FITRIM ioctl returns
-EOPNOTSUPP when blkdev_issue_discard() returns this error code, which
is how the user is informed that the device does not support discard.
If there are no suitable free extents to be trimmed, then FITRIM will
return success even though the device does not support discard, which
could confuse the user. So check explicitly if the device supports
discard and return an error code at the beginning of the FITRIM ioctl
processing.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|