| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
We call hrtimer_enqueue_reprogram() only when we are in high resolution
mode now so we don't need to check that again in hrtimer_enqueue_reprogram().
Once the check is removed, hrtimer_enqueue_reprogram() turns to be an
useless wrapper over hrtimer_reprogram() and can be dropped.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1403393357-2070-6-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In lowres mode, hrtimers are serviced by the tick instead of a clock
event. It works well as long as the tick stays periodic but we must also
make sure that the hrtimers are serviced in dynticks mode targets,
pretty much like timer list timers do.
Note that all dynticks modes are concerned: get_nohz_timer_target()
tries not to return remote idle CPUs but there is nothing to prevent
the elected target from entering dynticks idle mode until we lock its
base. It's also prefectly legal to enqueue hrtimers on full dynticks CPU.
So there are two requirements to correctly handle dynticks:
1) On target's tick stop time, we must not delay the next tick further
the next hrtimer.
2) On hrtimer queue time. If the tick of the target is stopped, we must
wake up that CPU such that it sees the new hrtimer and recalculate
the next tick accordingly.
The point 1 is well handled currently through get_nohz_timer_interrupt() and
cmp_next_hrtimer_event().
But the point 2 isn't handled at all.
Fixing this is easy though as we have the necessary API ready for that.
All we need is to call wake_up_nohz_cpu() on a target when a newly
enqueued hrtimer requires tick rescheduling, like timer list timer do.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/3d7ea08ce008698e26bd39fe10f55949391073ab.1403507178.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In lowres mode, hrtimers are serviced by the tick instead of a clock
event. Now it works well as long as the tick stays periodic but we
must also make sure that the hrtimers are serviced in dynticks mode.
Part of that job consist in kicking a dynticks hrtimer target in order
to make it reconsider the next tick to schedule to correctly handle the
hrtimer's expiring time. And that part isn't handled by the hrtimers
subsystem.
To prepare for fixing this, we need __hrtimer_start_range_ns() to be
able to resolve the CPU target associated to a hrtimer's object
'cpu_base' so that the kick can be centralized there.
So lets store it in the 'struct hrtimer_cpu_base' to resolve the CPU
without overhead. It is set once at CPU's online notification.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1403393357-2070-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a timer is enqueued or modified on a dynticks target, that CPU
must re-evaluate the next tick to service that timer.
The tick re-evaluation is performed by an IPI kick on the target.
Now while we correctly call wake_up_nohz_cpu() from add_timer_on(), the
mod_timer*() API family doesn't support so well dynticks targets.
The reason for this is likely that __mod_timer() isn't supposed to
select an idle target for a timer, unless that target is the current
CPU, in which case a dynticks idle kick isn't actually needed.
But there is a small race window lurking behind that assumption: the
elected target has all the time to turn dynticks idle between the call
to get_nohz_timer_target() and the locking of its base. Hence a risk
that we enqueue a timer on a dynticks idle destination without kicking
it. As a result, the timer might be serviced too late in the future.
Also a target elected by __mod_timer() can be in full dynticks mode
and thus require to be kicked as well. And unlike idle dynticks, this
concern both local and remote targets.
To fix this whole issue, lets centralize the dynticks kick to
internal_add_timer() so that it is well handled for all sort of timer
enqueue. Even timer migration is concerned so that a full dynticks target
is correctly kicked as needed when timers are migrating to it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1403393357-2070-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Timers are serviced by the tick. But when a timer is enqueued on a
dynticks target, we need to kick it in order to make it reconsider the
next tick to schedule to correctly handle the timer's expiring time.
Now while this kick is correctly performed for add_timer_on(), the
mod_timer*() family has been a bit neglected.
To prepare for fixing this, we need internal_add_timer() to be able to
resolve the CPU target associated to a timer's object 'base' so that the
kick can be centralized there.
This can't be passed as an argument as not all the callers know the CPU
number of a timer's base. So lets store it in the struct tvec_base to
resolve the CPU without much overhead. It is set once for good at every
CPU's first boot.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1403393357-2070-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
| |
Except for Kconfig.HZ. That needs a separate treatment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
| |
Move the clocksource Kconfig entries into their own menu, so that they
don't pollute the main device driver menu.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/20140616114845.343e9960@endymion.delvare
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no point in calling gettimeofday if only the seconds part of
the timespec is used. Use get_seconds() instead. It's not only the
proper interface it's also faster.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: linux-security-module@vger.kernel.org
Link: http://lkml.kernel.org/r/20140611234607.775273584@linutronix.de
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140611234607.427408044@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Link: http://lkml.kernel.org/r/20140611234607.351283464@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
| |
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20140611234607.261629142@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
| |
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611234606.840900621@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
| |
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts(). Remove the
silly wrapper while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611234606.931409215@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
| |
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611234606.764810535@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p fixes from Eric Van Hensbergen:
"Two bug fixes, one in xattr error path and the other in parsing
major/minor numbers from devices"
* tag 'for-linus-3.16-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9P: fix return value in v9fs_fid_xattr_set
fs/9p: adjust sscanf parameters accordingly to the variable types
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
v9fs_fid_xattr_set is supposed to return 0 on success.
This corrects the behaviour introduced in commit
bdd5c28dcb8330b9074404cc92a0b83aae5606a
"9p: fix return value in case in v9fs_fid_xattr_set()"
(The function returns a negative error on error, as expected)
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This function is supposed to return true if the new load imbalance is
worse than the old one. It didn't. I can only hope brown paper bags
are in style.
Now things converge much better on both the 4 node and 8 node systems.
I am not sure why this did not seem to impact specjbb performance on the
4 node system, which is the system I have full-time access to.
This bug was introduced recently, with commit e63da03639cc ("sched/numa:
Allow task switch if load imbalance improves")
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
shrink_inactive_list() used to wait 0.1s to avoid congestion when all
the pages that were isolated from the inactive list were dirty but not
under active writeback. That makes no real sense, and apparently causes
major interactivity issues under some loads since 3.11.
The ostensible reason for it was to wait for kswapd to start writing
pages, but that seems questionable as well, since the congestion wait
code seems to trigger for kswapd itself as well. Also, the logic behind
delaying anything when we haven't actually started writeback is not
clear - it only delays actually starting that writeback.
We'll still trigger the congestion waiting if
(a) the process is kswapd, and we hit pages flagged for immediate
reclaim
(b) the process is not kswapd, and the zone backing dev writeback is
actually congested.
This probably needs to be revisited, but as it is this fixes a reported
regression.
Reported-by: Felipe Contreras <felipe.contreras@gmail.com>
Pinpointed-by: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Clean ups and miscellaneous bug fixes, in particular for the new
collapse_range and zero_range fallocate functions. In addition,
improve the scalability of adding and remove inodes from the orphan
list"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
ext4: handle symlink properly with inline_data
ext4: fix wrong assert in ext4_mb_normalize_request()
ext4: fix zeroing of page during writeback
ext4: remove unused local variable "stored" from ext4_readdir(...)
ext4: fix ZERO_RANGE test failure in data journalling
ext4: reduce contention on s_orphan_lock
ext4: use sbi in ext4_orphan_{add|del}()
ext4: use EXT_MAX_BLOCKS in ext4_es_can_be_merged()
ext4: add missing BUFFER_TRACE before ext4_journal_get_write_access
ext4: remove unnecessary double parentheses
ext4: do not destroy ext4_groupinfo_caches if ext4_mb_init() fails
ext4: make local functions static
ext4: fix block bitmap validation when bigalloc, ^flex_bg
ext4: fix block bitmap initialization under sparse_super2
ext4: find the group descriptors on a 1k-block bigalloc,meta_bg filesystem
ext4: avoid unneeded lookup when xattr name is invalid
ext4: fix data integrity sync in ordered mode
ext4: remove obsoleted check
ext4: add a new spinlock i_raw_lock to protect the ext4's raw inode
ext4: fix locking for O_APPEND writes
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This commit tries to fix a bug that we can't read symlink properly with
inline data feature when the length of symlink is greater than 60 bytes
but less than extra space.
The key issue is in ext4_inode_is_fast_symlink() that it doesn't check
whether or not an inode has inline data. When the user creates a new
symlink, an inode will be allocated with MAY_INLINE_DATA flag. Then
symlink will be stored in ->i_block and extended attribute space. In
the mean time, this inode is with inline data flag. After remounting
it, ext4_inode_is_fast_symlink() function thinks that this inode is a
fast symlink so that the data in ->i_block is copied to the user, and
the data in extra space is trimmed. In fact this inode should be as a
normal symlink.
The following script can hit this bug.
#!/bin/bash
cd ${MNT}
filename=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
rm -rf test
mkdir test
cd test
echo "hello" >$filename
ln -s $filename symlinkfile
cd
sudo umount /mnt/sda1
sudo mount -t ext4 /dev/sda1 /mnt/sda1
readlink /mnt/sda1/test/symlinkfile
After applying this patch, it will break the assumption in e2fsck
because the original implementation doesn't want to support symlink
with inline data.
Reported-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Reported-by: Ian Nartowicz <claws@nartowicz.co.uk>
Cc: Ian Nartowicz <claws@nartowicz.co.uk>
Cc: Tao Ma <tm@tao.ma>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The variable "size" is expressed as number of blocks and not as
number of clusters, this could trigger a kernel panic when using
ext4 with the size of a cluster different from the size of a block.
Cc: stable@vger.kernel.org
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Tail of a page straddling inode size must be zeroed when being written
out due to POSIX requirement that modifications of mmaped page beyond
inode size must not be written to the file. ext4_bio_write_page() did
this only for blocks fully beyond inode size but didn't properly zero
blocks partially beyond inode size. Fix this.
The problem has been uncovered by mmap_11-4 test in openposix test suite
(part of LTP).
Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Fixes: 5a0dc7365c240
Fixes: bd2d0210cf22f
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Remove local variable "stored" from ext4_readdir(...). This variable
gets initialized but is never used inside the function.
Signed-off-by: Giedrius Rekasius <giedrius.rekasius@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
xfstests generic/091 is failing when mounting ext4 with data=journal.
I think that this regression is same problem that occurred prior to collapse
range issue. So ZERO RANGE also need to call ext4_force_commit as
collapse range.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Shuffle code around in ext4_orphan_add() and ext4_orphan_del() so that
we avoid taking global s_orphan_lock in some cases and hold it for
shorter time in other cases.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use sbi pointer consistently in ext4_orphan_del() instead of opencoding
it sometimes. Also ext4_orphan_add() uses EXT4_SB(sb) often so create
sbi variable for it as well and use it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In ext4_es_can_be_merged() when checking whether we can merge two
extents we should use EXT_MAX_BLOCKS instead of defining it manually.
Also if it is really the case we should notify userspace because clearly
there is a bug in extent status tree implementation since this should
never happen.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make them more consistently
Signed-off-by: xieliang <xieliang@xiaomi.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Caches from 'ext4_groupinfo_caches' may be in use by other mounts,
which have already existed. So, it is incorrect to destroy them when
newly requested mount fails.
Found by Linux File System Verification project (linuxtesting.org).
Signed-off-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
I have been running make namespacecheck to look for unneeded globals, and
found these in ext4.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On a bigalloc,^flex_bg filesystem, the ext4_valid_block_bitmap
function fails to convert from blocks to clusters when spot-checking
the validity of the bitmap block that we've just read from disk. This
causes ext4 to think that the bitmap is garbage, which results in the
block group being taken offline when it's not necessary. Add in the
necessary EXT4_B2C() calls to perform the conversions.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The ext4_bg_has_super() function doesn't know about the new rules for
where backup superblocks go on a sparse_super2 filesystem. Therefore,
block bitmap initialization doesn't know that it shouldn't reserve
space for backups in groups that are never going to contain backups.
The result of this is e2fsck complaining about the block bitmap being
incorrect (fortunately not in a way that results in cross-linked
files), so fix the whole thing.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On a filesystem with a 1k block size, the group descriptors live in
block 2, not block 1. If the filesystem has bigalloc,meta_bg set,
however, the calculation of the group descriptor table location does
not take this into account and returns the wrong block number. Fix
the calculation to return the correct value for this case.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In ext4_xattr_set_handle() we have checked the xattr name's length. So
we should also check it in ext4_xattr_get() to avoid unneeded lookup
caused by invalid name.
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When we perform a data integrity sync we tag all the dirty pages with
PAGECACHE_TAG_TOWRITE at start of ext4_da_writepages. Later we check
for this tag in write_cache_pages_da and creates a struct
mpage_da_data containing contiguously indexed pages tagged with this
tag and sync these pages with a call to mpage_da_map_and_submit. This
process is done in while loop until all the PAGECACHE_TAG_TOWRITE
pages are synced. We also do journal start and stop in each iteration.
journal_stop could initiate journal commit which would call
ext4_writepage which in turn will call ext4_bio_write_page even for
delayed OR unwritten buffers. When ext4_bio_write_page is called for
such buffers, even though it does not sync them but it clears the
PAGECACHE_TAG_TOWRITE of the corresponding page and hence these pages
are also not synced by the currently running data integrity sync. We
will end up with dirty pages although sync is completed.
This could cause a potential data loss when the sync call is followed
by a truncate_pagecache call, which is exactly the case in
collapse_range. (It will cause generic/127 failure in xfstests)
To avoid this issue, we can use set_page_writeback_keepwrite instead of
set_page_writeback, which doesn't clear TOWRITE tag.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
BH can not be NULL at this point, ext4_read_dirblock() always return
non null value, and we already have done all necessery checks.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
To avoid potential data races, use a spinlock which protects the raw
(on-disk) inode.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Al Viro pointed out that locking for O_APPEND writes was problematic,
since the location of the write isn't known until after we take the
i_mutex, which impacts the ext4_unaligned_aio() and s_bitmap_maxbytes
check.
For O_APPEND always assume that the write is unaligned so call
ext4_unwritten_wait(). And to solve the second problem, take the
i_mutex earlier before we start the s_bitmap_maxbytes check.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This shouldn't change any logic flow; just delete duplicated code.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This commit doesn't actually change anything; it just moves code
around in preparation for some code simplification work.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Copy generic_file_aio_write() into ext4_file_write(). This is part of
a patch series which allows us to simplify ext4_file_write() and
ext4_file_dio_write(), by calling __generic_file_aio_write() directly.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently in ext4 there is quite a mess when it comes to naming
unwritten extents. Sometimes we call it uninitialized and sometimes we
refer to it as unwritten.
The right name for the extent which has been allocated but does not
contain any written data is _unwritten_. Other file systems are
using this name consistently, even the buffer head state refers to it as
unwritten. We need to fix this confusion in ext4.
This commit changes every reference to an uninitialized extent (meaning
allocated but unwritten) to unwritten extent. This includes comments,
function names and variable names. It even covers abbreviation of the
word uninitialized (such as uninit) and some misspellings.
This commit does not change any of the code paths at all. This has been
confirmed by comparing md5sums of the assembly code of each object file
after all the function names were stripped from it.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently EXT4_MAP_UNINIT is used in dioread_nolock case to mark the
cases where we're using dioread_nolock and we're writing into either
unallocated, or unwritten extent, because we need to make sure that
any DIO write into that inode will wait for the extent conversion.
However EXT4_MAP_UNINIT is not only entirely misleading name but also
unnecessary because we can check for EXT4_MAP_UNWRITTEN in the
dioread_nolock case instead.
This commit removes EXT4_MAP_UNINIT flag.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.linuxfoundation.org/llvmlinux/kernel
Pull LLVM patches from Behan Webster:
"Next set of patches to support compiling the kernel with clang.
They've been soaking in linux-next since the last merge window.
More still in the works for the next merge window..."
* tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel:
arm, unwind, LLVMLinux: Enable clang to be used for unwinding the stack
ARM: LLVMLinux: Change "extern inline" to "static inline" in glue-cache.h
all: LLVMLinux: Change DWARF flag to support gcc and clang
net: netfilter: LLVMLinux: vlais-netfilter
crypto: LLVMLinux: aligned-attribute.patch
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Patch to prevent warning of a buggy compiler when using clang and
the ARM_UNWIND option.
Clang defines (at least on the current trunk) GNUC, GNUC_MINOR, and
GNUC_PATCHLEVEL to 4, 2, and 1 respectively.
This version of GCC gets flagged as buggy, but it isn't actually an
issue with clang so the patch will do what it did before unless clang
is defined and then it will not report the GCC version as an issue.
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With compilers which follow the C99 standard (like modern versions of gcc and
clang), "extern inline" does the wrong thing (emits code for an externally
linkable version of the inline function). "static inline" is the correct choice
instead.
Author: Behan Webster <behanw@converseincode.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Reviewed-by: Mark Charlebois <charlebm@gmail.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Both gcc (well, actually gnu as) and clang support the "-Wa,-gdwarf-2" option
(though clang does not support "-Wa,--gdwarf-2"). Since these flags are equivalent
in meaning, this patch uses the one which is better supported across compilers.
Signed-off-by: Behan Webster <behanw@converseincode.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Replaced non-standard C use of Variable Length Arrays In Structs (VLAIS) in
xt_repldata.h with a C99 compliant flexible array member and then calculated
offsets to the other struct members. These other members aren't referenced by
name in this code, however this patch maintains the same memory layout and
padding as was previously accomplished using VLAIS.
Had the original structure been ordered differently, with the entries VLA at
the end, then it could have been a flexible member, and this patch would have
been a lot simpler. However since the data stored in this structure is
ultimately exported to userspace, the order of this structure can't be changed.
This patch makes no attempt to change the existing behavior, merely the way in
which the current layout is accomplished using standard C99 constructs. As such
the code can now be compiled with either gcc or clang.
This version of the patch removes the trailing alignment that the VLAIS
structure would allocate in order to simplify the patch.
Author: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Vinícius Tinti <viniciustinti@gmail.com>
|