| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The struct itimerspec is not year 2038 safe on 32bit systems due to
the limitation of the struct timespec members. Introduce itimerspec64
which uses struct timespec64 instead and provide conversion functions.
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The weak update_persistent_clock64() calls update_persistent_clock(),
if the architecture defines an update_persistent_clock64() to replace
and remove its update_persistent_clock() version, when building the
kernel the linker will throw an undefined symbol error, that is, any
arch that switches to update_persistent_clock64() will have this issue.
To solve the issue, we add the common weak update_persistent_clock().
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two issues were found on an IMX6 development board without an
enabled RTC device(resulting in the boot time and monotonic
time being initialized to 0).
Issue 1:exportfs -a generate:
"exportfs: /opt/nfs/arm does not support NFS export"
Issue 2:cat /proc/stat:
"btime 4294967236"
The same issues can be reproduced on x86 after running the
following code:
int main(void)
{
struct timeval val;
int ret;
val.tv_sec = 0;
val.tv_usec = 0;
ret = settimeofday(&val, NULL);
return 0;
}
Two issues are different symptoms of same problem:
The reason is a positive wall_to_monotonic pushes boot time back
to the time before Epoch, and getboottime will return negative
value.
In symptom 1:
negative boot time cause get_expiry() to overflow time_t
when input expire time is 2147483647, then cache_flush()
always clears entries just added in ip_map_parse.
In symptom 2:
show_stat() uses "unsigned long" to print negative btime
value returned by getboottime.
This patch fix the problem by prohibiting time from being set to a value which
would cause a negative boot time. As a result one can't set the CLOCK_REALTIME
time prior to (1970 + system uptime).
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wang YanQing <udknight@gmail.com>
[jstultz: reworded commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
timespec_trunc() avoids rounding if granularity <= nanoseconds-per-jiffie
(or TICK_NSEC). This optimization assumes that:
1. current_kernel_time().tv_nsec is already rounded to TICK_NSEC (i.e.
with HZ=1000 you'd get 1000000, 2000000, 3000000... but never 1000001).
This is no longer true (probably since hrtimers introduced in 2.6.16).
2. TICK_NSEC is evenly divisible by all possible granularities. This may
be true for HZ=100, 250, 1000, but obviously not for HZ=300 /
TICK_NSEC=3333333 (introduced in 2.6.20).
Thus, sub-second portions of in-core file times are not rounded to on-disk
granularity. I.e. file times may change when the inode is re-read from disk
or when the file system is remounted.
This affects all file systems with file time granularities > 1 ns and < 1s,
e.g. CEPH (1000 ns), UDF (1000 ns), CIFS (100 ns), NTFS (100 ns) and FUSE
(configurable from user mode via struct fuse_init_out.time_gran).
Steps to reproduce with e.g. UDF:
$ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
$ mkdir udf && mount udfdisk udf
$ touch udf/test && stat -c %y udf/test
2015-06-09 10:22:56.130006767 +0200
$ umount udf && mount udfdisk udf
$ stat -c %y udf/test
2015-06-09 10:22:56.130006000 +0200
Remounting truncates the mtime to 1 µs.
Fix the rounding in timespec_trunc() and update the documentation.
timespec_trunc() is exclusively used to calculate inode's [acm]time (mostly
via current_fs_time()), and always with super_block.s_time_gran as second
argument. So this can safely be changed without side effects.
Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
as super_block.s_time_gran isn't prepared to handle different ctime /
mtime / atime resolutions nor resolutions > 1 second.
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
monotonic timers
I noticed for non-monotonic timers in timer_list, some of the
output looked a little confusing.
For example:
#1: <0000000000000000>, posix_timer_fn, S:01, hrtimer_start_range_ns, leap-a-day/2360
# expires at 1434412800000000000-1434412800000000000 nsecs [in 1434410725062375469 to 1434410725062375469 nsecs]
You'll note the relative time till the expiration "[in xxx to
yyy nsecs]" is incorrect. This is because its printing the delta
between CLOCK_MONOTONIC time to the CLOCK_REALTIME expiration.
This patch fixes this issue by adding the clock offset to the
"now" time which we use to calculate the delta.
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull late x86 platform driver updates from Darren Hart:
"The following came in a bit later and I wanted them to bake in next a
few more days before submitting, thus the second pull.
A new intel_pmc_ipc driver, a symmetrical allocation and free fix in
dell-laptop, a couple minor fixes, and some updated documentation in
the dell-laptop comments.
intel_pmc_ipc:
- Add Intel Apollo Lake PMC IPC driver
tc1100-wmi:
- Delete an unnecessary check before the function call "kfree"
dell-laptop:
- Fix allocating & freeing SMI buffer page
- Show info about WiGig and UWB in debugfs
- Update information about wireless control"
* tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver
tc1100-wmi: Delete an unnecessary check before the function call "kfree"
dell-laptop: Fix allocating & freeing SMI buffer page
dell-laptop: Show info about WiGig and UWB in debugfs
dell-laptop: Update information about wireless control
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This driver provides support for PMC control on Apollo Lake platforms.
The PMC is an ARC processor which defines some IPC commands for
communication with other entities in the CPU.
Signed-off-by: qipeng.zha <qipeng.zha@intel.com>
[fengguang.wu@intel.com: Fix Sparse and Cocinelle warnings]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This commit fix kernel crash when probing for rfkill devices in dell-laptop
driver failed. Function free_page() was incorrectly used on struct page *
instead of virtual address of SMI buffer.
This commit also simplify allocating page for SMI buffer by using
__get_free_page() function instead of sequential call of functions
alloc_page() and page_address().
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This commit show additional information about rfkill state in debugfs based
on newly released documentation by Dell.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Make sure that all existing SMBIOS calls for wireless control are properly
documented. This commit also add new documentation released by Dell.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
|
| | |
| | |
| | |
| | |
| | |
| | | |
if server claims to have written/read more than we'd told it to,
warn and cap the claimed byte count to avoid advancing more than
we are ready to.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
if response is impossible to parse and we discard the request, get the
out of the loop right there.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Cc: stable@vger.kernel.org # v3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The brd driver is the only in-tree driver that may sleep currently.
After some discussion on linux-fsdevel, we decided that any driver
may choose to sleep in its ->direct_access method. To ensure that all
callers of bdev_direct_access() are prepared for this, add a call
to might_sleep().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If a block device supports the ->direct_access methods, bypass the normal
DIO path and use DAX to go straight to memcpy() instead of allocating
a DIO and a BIO.
Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in
do_blockdev_direct_IO().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When userspace does a write, there's no need for the written data to
pollute the CPU cache. This matches the original XIP code.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For block devices which are small enough, mkfs will default to creating
a filesystem with block sizes smaller than page size.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
__fget() does lockless fetch of pointer from the descriptor
table, attempts to grab a reference and treats "it was already
zero" as "it's already gone from the table, we just hadn't
seen the store, let's fail". Unfortunately, that breaks the
atomicity of dup2() - __fget() might see the old pointer,
notice that it's been already dropped and treat that as
"it's closed". What we should be getting is either the
old file or new one, depending whether we come before or after
dup2().
Dmitry had following test failing sometimes :
int fd;
void *Thread(void *x) {
char buf;
int n = read(fd, &buf, 1);
if (n != 1)
exit(printf("read failed: n=%d errno=%d\n", n, errno));
return 0;
}
int main()
{
fd = open("/dev/urandom", O_RDONLY);
int fd2 = open("/dev/urandom", O_RDONLY);
if (fd == -1 || fd2 == -1)
exit(printf("open failed\n"));
pthread_t th;
pthread_create(&th, 0, Thread, 0);
if (dup2(fd2, fd) == -1)
exit(printf("dup2 failed\n"));
pthread_join(th, 0);
if (close(fd) == -1)
exit(printf("close failed\n"));
if (close(fd2) == -1)
exit(printf("close failed\n"));
printf("DONE\n");
return 0;
}
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Mateusz Guzik reported :
Currently obtaining a new file descriptor results in locking fdtable
twice - once in order to reserve a slot and second time to fill it.
Holding the spinlock in __fd_install() is needed in case a resize is
done, or to prevent a resize.
Mateusz provided an RFC patch and a micro benchmark :
http://people.redhat.com/~mguzik/pipebench.c
A resize is an unlikely operation in a process lifetime,
as table size is at least doubled at every resize.
We can use RCU instead of the spinlock.
__fd_install() must wait if a resize is in progress.
The resize must block new __fd_install() callers from starting,
and wait that ongoing install are finished (synchronize_sched())
resize should be attempted by a single thread to not waste resources.
rcu_sched variant is used, as __fd_install() and expand_fdtable() run
from process context.
It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
CPU E5-2696 v2 @ 2.50GHz
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Mateusz Guzik <mguzik@redhat.com>
Tested-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
limitation
Execution of get_anon_bdev concurrently and preemptive kernel all
could bring race condition, it isn't enough to check dev against
its upper limitation with equality operator only.
This patch fix it.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
currently, get_next_ino() is able to create inodes with inode number = 0.
This have a bad impact in the filesystems relying in this function to generate
inode numbers.
While there is no problem at all in having inodes with number 0, userspace tools
which handle file management tasks can have problems handling these files, like
for example, the impossiblity of users to delete these files, since glibc will
ignore them. So, I believe the best way is kernel to avoid creating them.
This problem has been raised previously, but the old thread didn't have any
other update for a year+, and I've seen too many users hitting the same issue
regarding the impossibility to delete files while using filesystems relying on
this function. So, I'm starting the thread again, with the same patch
that I believe is enough to address this problem.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The only caller that cares about its return value can just
as easily pick it from nd->root_seq itself. We used to just
calculate it and return to caller, but these days we are
storing it in nd->root_seq in all cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
dir_pages was declared in a lot of filesystems.
Use newly dir_pages() from pagemap.h
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
That function was declared in a lot of filesystems to calculate
directory pages.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
list_entry is just a wrapper for container_of, but it is arguably
wrong (and slightly confusing) to use it when the pointed-to struct
member is not a struct list_head. Use container_of directly instead.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now that the retrieval operation may be disposed of by fscache_put_operation()
before we actually set the context, the retrieval-specific cleanup operation
can produce a NULL-pointer dereference when it tries to unconditionally clean
up the netfs context.
Given that it is expected that we'll get at least as far as the place where we
currently set the context pointer and it is unlikely we'll go through the
error handling paths prior to that point, retain the context right from the
point that the retrieval op is allocated.
Concomitant to this, we need to retain the cookie pointer in the retrieval op
also so that we can call the netfs to release its context in the release
method.
In addition, we might now get into fscache_release_retrieval_op() with the op
only initialised. To this end, set the operation to DEAD only after the
release method has been called and skip the n_pages test upon cleanup if the
op is still in the INITIALISED state.
Without these changes, the following oops might be seen:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
...
RIP: 0010:[<ffffffffa0089c98>] fscache_release_retrieval_op+0xae/0x100
...
Call Trace:
[<ffffffffa0088560>] fscache_put_operation+0x117/0x2e0
[<ffffffffa008b8f5>] __fscache_read_or_alloc_pages+0x351/0x3ac
[<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
[<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs]
[<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e
[<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a
[<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c
[<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af
[<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a
[<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
[<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs]
[<ffffffff811363be>] new_sync_read+0x78/0x9c
[<ffffffff81137164>] __vfs_read+0x13/0x38
[<ffffffff8113721e>] vfs_read+0x95/0x121
[<ffffffff811372f6>] SyS_read+0x4c/0x8a
[<ffffffff81557a52>] system_call_fastpath+0x12/0x17
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Any time an incomplete operation is cancelled, the operation cancellation
function needs to be called to clean up. This is currently being passed
directly to some of the functions that might want to call it, but not all.
Instead, pass the cancellation method pointer to the fscache_operation_init()
and have that cache it in the operation struct. Further, plug in a dummy
cancellation handler if the caller declines to set one as this allows us to
call the function unconditionally (the extra overhead isn't worth bothering
about as we don't expect to be calling this typically).
The cancellation method must thence be called everywhere the CANCELLED state
is set. Note that we call it *before* setting the CANCELLED state such that
the method can use the old state value to guide its operation.
fscache_do_cancel_retrieval() needs moving higher up in the sources so that
the init function can use it now.
Without this, the following oops may be seen:
FS-Cache: Assertion failed
FS-Cache: 3 == 0 is false
------------[ cut here ]------------
kernel BUG at ../fs/fscache/page.c:261!
...
RIP: 0010:[<ffffffffa0089c1b>] fscache_release_retrieval_op+0x77/0x100
[<ffffffffa008853d>] fscache_put_operation+0x114/0x2da
[<ffffffffa008b8c2>] __fscache_read_or_alloc_pages+0x358/0x3b3
[<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
[<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs]
[<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e
[<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a
[<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c
[<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af
[<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a
[<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
[<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs]
[<ffffffff811363be>] new_sync_read+0x78/0x9c
[<ffffffff81137164>] __vfs_read+0x13/0x38
[<ffffffff8113721e>] vfs_read+0x95/0x121
[<ffffffff811372f6>] SyS_read+0x4c/0x8a
[<ffffffff81557a52>] system_call_fastpath+0x12/0x17
The assertion is showing that the remaining number of pages (n_pages) is not 0
when the operation is being released.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Call fscache_put_operation() or a wrapper on any op that has gone through
fscache_operation_init() so that the accounting shown in /proc is done
correctly, specifically fscache_n_op_release.
fscache_put_operation() therefore now allows an op in the INITIALISED state as
well as in the CANCELLED and COMPLETE states.
Note that this means that an operation can get put that doesn't have its
->object pointer filled in, so anything that depends on the object needs to be
conditional in fscache_put_operation().
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Cancellation of an in-progress operation needs to update the relevant counters
and start any operations that are pending waiting on this one.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Count and display through /proc/fs/fscache/stats the number of initialised
operations.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Out of line fscache_operation_init() so that it can access internal FS-Cache
features, such as stats, in a later commit.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently, fscache_cancel_op() only cancels pending operations - attempts to
cancel in-progress operations are ignored. This leads to a problem in
fscache_wait_for_operation_activation() whereby the wait is terminated, but
the object has been killed.
The check at the end of the function now triggers because it's no longer
contingent on the cache having produced an I/O error since the commit that
fixed the logic error in fscache_object_is_dead().
The result of the check is that it tries to cancel the operation - but since
the object may not be pending by this point, the cancellation request may be
ignored - with the result that the the object is just put by the caller and
fscache_put_operation has an assertion failure because the operation isn't in
either the COMPLETE or the CANCELLED states.
To fix this, we permit in-progress ops to be cancelled under some
circumstances.
The bug results in an oops that looks something like this:
FS-Cache: fscache_wait_for_operation_activation() = -ENOBUFS [obj dead 3]
FS-Cache:
FS-Cache: Assertion failed
FS-Cache: 3 == 5 is false
------------[ cut here ]------------
kernel BUG at ../fs/fscache/operation.c:432!
...
RIP: 0010:[<ffffffffa0088574>] fscache_put_operation+0xf2/0x2cd
Call Trace:
[<ffffffffa008b92a>] __fscache_read_or_alloc_pages+0x2ec/0x3b3
[<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs]
[<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs]
[<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e
[<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a
[<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c
[<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af
[<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a
[<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs]
[<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs]
[<ffffffff811363be>] new_sync_read+0x78/0x9c
[<ffffffff81137164>] __vfs_read+0x13/0x38
[<ffffffff8113721e>] vfs_read+0x95/0x121
[<ffffffff811372f6>] SyS_read+0x4c/0x8a
[<ffffffff81557a52>] system_call_fastpath+0x12/0x17
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
fscache_object_is_dead() returns true only if the object is marked dead and
the cache got an I/O error. This should be a logical OR instead. Since two
of the callers got split up into handling for separate subcases, expand the
other callers and kill the function. This is probably the right thing to do
anyway since one of the subcases isn't about the object at all, but rather
about the cache.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When an object is being marked as no longer live, do this under the object
spinlock to prevent a race with operation submission targeted on that object.
The problem occurs due to the following pair of intertwined sequences when the
cache tries to create an object that would take it over the hard available
space limit:
NETFS INTERFACE
===============
(A) The netfs calls fscache_acquire_cookie(). object creation is deferred to
the object state machine and the netfs is allowed to continue.
OBJECT STATE MACHINE KTHREAD
============================
(1) The object is looked up on disk by fscache_look_up_object()
calling cachefiles_walk_to_object(). The latter finds that the
object is not yet represented on disk and calls
fscache_object_lookup_negative().
(2) fscache_object_lookup_negative() sets FSCACHE_COOKIE_NO_DATA_YET
and clears FSCACHE_COOKIE_LOOKING_UP, thus allowing the netfs to
start queuing read operations.
(B) The netfs calls fscache_read_or_alloc_pages(). This calls
fscache_wait_for_deferred_lookup() which sees FSCACHE_COOKIE_LOOKING_UP
become clear, allowing the read to begin.
(C) A read operation is set up and passed to fscache_submit_op() to deal
with.
(3) cachefiles_walk_to_object() calls cachefiles_has_space(), which
fails (or one of the file operations to create stuff fails).
cachefiles returns an error to fscache.
(4) fscache_look_up_object() transits to the LOOKUP_FAILURE state,
(5) fscache_lookup_failure() sets FSCACHE_OBJECT_LOOKED_UP and
FSCACHE_COOKIE_UNAVAILABLE and clears FSCACHE_COOKIE_LOOKING_UP
then transits to the KILL_OBJECT state.
(6) fscache_kill_object() clears FSCACHE_OBJECT_IS_LIVE in an attempt
to reject any further requests from the netfs.
(7) object->n_ops is examined and found to be 0.
fscache_kill_object() transits to the DROP_OBJECT state.
(D) fscache_submit_op() locks the object spinlock, sees if it can dispatch
the op immediately by calling fscache_object_is_active() - which fails
since FSCACHE_OBJECT_IS_AVAILABLE has not yet been set.
(E) fscache_submit_op() then tests FSCACHE_OBJECT_LOOKED_UP - which is set.
It then queues the object and increments object->n_ops.
(8) fscache_drop_object() releases the object and eventually
fscache_put_object() calls cachefiles_put_object() which suffers
an assertion failure here:
ASSERTCMP(object->fscache.n_ops, ==, 0);
Locking the object spinlock in step (6) around the clearance of
FSCACHE_OBJECT_IS_LIVE ensures that the the decision trees in
fscache_submit_op() and fscache_submit_exclusive_op() don't see the IS_LIVE
flag being cleared mid-decision: either the op is queued before step (7) - in
which case fscache_kill_object() will see n_ops>0 and will deal with the op -
or the op will be rejected.
This, combined with rejecting op submission if the target object is dying, fix
the problem.
The problem shows up as the following oops:
CacheFiles: Assertion failed
CacheFiles: 1 == 0 is false
------------[ cut here ]------------
kernel BUG at ../fs/cachefiles/interface.c:339!
...
RIP: 0010:[<ffffffffa014fd9c>] [<ffffffffa014fd9c>] cachefiles_put_object+0x2a4/0x301 [cachefiles]
...
Call Trace:
[<ffffffffa008674b>] fscache_put_object+0x18/0x21 [fscache]
[<ffffffffa00883e6>] fscache_object_work_func+0x3ba/0x3c9 [fscache]
[<ffffffff81054dad>] process_one_work+0x226/0x441
[<ffffffff81055d91>] worker_thread+0x273/0x36b
[<ffffffff81055b1e>] ? rescuer_thread+0x2e1/0x2e1
[<ffffffff81059b9d>] kthread+0x10e/0x116
[<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
[<ffffffff815579ac>] ret_from_fork+0x7c/0xb0
[<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Reject new operations that are being submitted against an object if that
object has failed its lookup or creation states or has been killed by the
cache backend for some other reason, such as having been culled.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When submitting an operation, prefer to cancel the operation immediately
rather than queuing it for later processing if the object is marked as dying
(ie. the object state machine has reached the KILL_OBJECT state).
Whilst we're at it, change the series of related test_bit() calls into a
READ_ONCE() and bitwise-AND operators to reduce the number of load
instructions (test_bit() has a volatile address).
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Move fscache_report_unexpected_submission() up within operation.c so that it
can be called from fscache_submit_exclusive_op() too.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Count the number of objects that get culled by the cache backend and the
number of objects that the cache backend declines to instantiate due to lack
of space in the cache.
These numbers are made available through /proc/fs/fscache/stats
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently XFS calls file_remove_privs() without holding i_mutex. This is
wrong because that function can end up messing with file permissions and
file capabilities stored in xattrs for which we need i_mutex held.
Fix the problem by grabbing iolock exclusively when we will need to
change anything in permissions / xattrs.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Comment in include/linux/security.h says that ->inode_killpriv() should
be called when setuid bit is being removed and that similar security
labels (in fact this applies only to file capabilities) should be
removed at this time as well. However we don't call ->inode_killpriv()
when we remove suid bit on truncate.
We fix the problem by calling ->inode_need_killpriv() and subsequently
->inode_killpriv() on truncate the same way as we do it on file write.
After this patch there's only one user of should_remove_suid() - ocfs2 -
and indeed it's buggy because it doesn't call ->inode_killpriv() on
write. However fixing it is difficult because of special locking
constraints.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Provide function telling whether file_remove_privs() will do anything.
Currently we only have should_remove_suid() and that does something
slightly different.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
file_remove_suid() is a misnomer since it removes also file capabilities
stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells
something else than whether file_remove_suid() call is necessary which
leads to bugs.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
file_remove_suid() could mistakenly set S_NOSEC inode bit when root was
modifying the file. As a result following writes to the file by ordinary
user would avoid clearing suid or sgid bits.
Fix the bug by checking actual mode bits before setting S_NOSEC.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If posix_acl_create() returns an error code then "*acl" and "*default_acl"
can be uninitialized or point to freed memory. This is a dangerous thing
to do. For example, it causes a problem in ocfs2_reflink():
fs/ocfs2/refcounttree.c:4327 ocfs2_reflink()
error: potentially using uninitialized 'default_acl'.
I've re-written this so we set the pointers to NULL at the start. I've
added a temporary "clone" variable to hold the value of "*acl" until end.
Setting them to NULL means means we don't need the "no_acl" label. We may
as well remove the "apply_umask" stuff forward and remove that label as
well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|