| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
"The main changes in this cycle are:
- (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
van Riel, Peter Zijlstra et al. Yay!
- optimize preemption counter handling: merge the NEED_RESCHED flag
into the preempt_count variable, by Peter Zijlstra.
- wait.h fixes and code reorganization from Peter Zijlstra
- cfs_bandwidth fixes from Ben Segall
- SMP load-balancer cleanups from Peter Zijstra
- idle balancer improvements from Jason Low
- other fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
stop_machine: Fix race between stop_two_cpus() and stop_cpus()
sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
sched: Fix asymmetric scheduling for POWER7
sched: Move completion code from core.c to completion.c
sched: Move wait code from core.c to wait.c
sched: Move wait.c into kernel/sched/
sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
sched: Avoid throttle_cfs_rq() racing with period_timer stopping
sched: Guarantee new group-entities always have weight
sched: Fix hrtimer_cancel()/rq->lock deadlock
sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
sched: Fix race on toggling cfs_bandwidth_used
sched: Remove extra put_online_cpus() inside sched_setaffinity()
sched/rt: Fix task_tick_rt() comment
sched/wait: Fix build breakage
sched/wait: Introduce prepare_to_wait_event()
sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
sched: Remove get_online_cpus() usage
sched: Fix race in migrate_swap_stop()
...
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Resolve cherry-picking conflicts:
Conflicts:
mm/huge_memory.c
mm/memory.c
mm/mprotect.c
See this upstream merge commit for more details:
52469b4fcd4f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It is possible for a task in a numa group to call exec, and
have the new (unrelated) executable inherit the numa group
association from its former self.
This has the potential to break numa grouping, and is trivial
to fix.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It is desirable to model from userspace how the scheduler groups tasks
over time. This patch adds an ID to the numa_group and reports it via
/proc/PID/status.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull gfs2 updates from Steven Whitehouse:
"The main feature of interest this time is quota updates. There are
some clean ups and some patches to use the new generic lru list code.
There is still plenty of scope for some further changes in due course -
faster lookups of quota structures is very much on the todo list.
Also, a start has been made towards the more tricky issue of using the
generic lru code with glocks, but that will have to be completed in a
subsequent merge window.
The other, more minor feature, is that there have been a number of
performance patches which relate to block allocation. In particular
they will improve performance when the disk is nearly full"
* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: Use generic list_lru for quota
GFS2: Rename quota qd_lru_lock qd_lock
GFS2: Use reflink for quota data cache
GFS2: Use lockref for glocks
GFS2: Protect quota sync generation
GFS2: Inline qd_trylock into gfs2_quota_unlock
GFS2: Make two similar quota code fragments into a function
GFS2: Remove obsolete quota tunable
GFS2: Move gfs2_icbit_munge into quota.c
GFS2: Speed up starting point selection for block allocation
GFS2: Add allocation parameters structure
GFS2: Clean up reservation removal
GFS2: fix dentry leaks
GFS2: new function gfs2_rbm_incr
GFS2: Introduce rbm field bii
GFS2: Do not reset flags on active reservations
GFS2: introduce bi_blocks for optimization
GFS2: optimize rbm_from_block wrt bi_start
GFS2: d_splice_alias() can't return error
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
By using the generic list_lru code, we can now separate the
per sb quota list locking from the lru locking. The lru
lock is made into the inner-most lock.
As a result of this new lock order, we may occasionally see
items on the per-sb quota list which are "dead" so that the
two places where we traverse that list are updated to take
account of that.
As a result of this patch, the gfs2 quota shrinker is now
NUMA zone aware, and we are also laying the foundations for
further improvments in due course.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>
Tested-by: Abhijith Das <adas@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is a straight forward rename which is in preparation for
introducing the generic list_lru infrastructure in the
following patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>
Tested-by: Abhijith Das <adas@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch adds reflink support to the quota data cache. It
looks a bit strange because we still don't have a sensible
split in the lookup by id and the lru list. That is coming in
later patches though.
The intent here is just to swap the current ref count for
reflinks in all cases with as little as possible other change.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>
Tested-by: Abhijith Das <adas@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently glocks have an atomic reference count and also a spinlock
which covers various internal fields, such as the state. This intent of
this patch is to replace the spinlock and the atomic reference count
with a lockref structure. This contains a spinlock which we can continue
to use as before, and a reference counter which is used in conjuction
with the spinlock to replace the previous atomic counter.
As a result of this there are some new rules for reference counting on
glocks. We need to distinguish between reference count changes under
gl_spin (which are now just increment or decrement of the new counter,
provided the count cannot hit zero) and those which are outside of
gl_spin, but which now take gl_spin internally.
The conversion is relatively straight forward. There is probably some
further clean up which can be done, but the priority at this stage is to
make the change in as simple a manner as possible.
A consequence of this change is that the reference count is being
decoupled from the lru list processing. This should allow future
adoption of the lru_list code with glocks in due course.
The reason for using the "dead" state and not just relying on 0 being
the "invalid state" is so that in due course 0 ref counts can be
allowable. The intent is to eventually be able to remove the ref count
changes which are currently hidden away in state_change().
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now that gfs2_quota_sync can be potentially called from multiple
threads, we should protect this bit of code, and the sync generation
number in particular in order to ensure that there are no races
when syncing quotas.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The function qd_trylock was not a trylock despite its name and
can be inlined into gfs2_quota_unlock in order to make the
code a bit clearer. There should be no functional change as a
result of this patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There should be no functional change bar the removal of a
test of the MS_READONLY flag which would never be reachable.
This merges the common code from qd_fish and qd_trylock into
a single function and calls it from both those places.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There is no need for a paramater which relates to the internals
of quota to be exposed to users. The only possible use would be
to turn it up so large that the memory allocation fails. So lets
remove it and set it to a sensible value which ensures that we
don't ask for multipage allocations.
Currently the size of struct gfs2_holder means that the caluclated
value is identical to the previous default value, so there should
be no functional change.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This function is only called twice, and both callers are
quota related, so lets move this function into quota.c and
make it static.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When setting the starting point for block allocation, there were calls
to both gfs2_rbm_to_block() and gfs2_rbm_from_block() in the common case
of there being an active reservation. The gfs2_rbm_from_block() function
can be quite slow, and since the two conversions were effectively a
no-op, it makes sense to avoid them entirely in this case.
There is no functional change here, but the code should be a bit more
efficient after this patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch adds a structure to contain allocation parameters with
the intention of future expansion of this structure. The idea is
that we should be able to add more information about the allocation
in the future in order to allow the allocator to make a better job
of placing the requests on-disk.
There is no functional difference from applying this patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The reservation for an inode should be cleared when it is truncated so
that we can start again at a different offset for future allocations.
We could try and do better than that, by resetting the search based on
where the truncation started from, but this is only a first step.
In addition, there are three callers of gfs2_rs_delete() but only one
of those should really be testing the value of i_writecount. While
we get away with that in the other cases currently, I think it would
be better if we made that test specific to the one case which
requires it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We need to dput() the result of d_splice_alias(), unless it is passed to
finish_no_open().
Edited by Steven Whitehouse in order to make it apply to the current
GFS2 git tree, and taking account of a prerequisite patch which hasn't
been applied.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: stable@vger.kernel.org
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since the previous patch eliminated bi in favor of bii, this follow-on
patch needed to be adjusted accordingly. Here is the revised version.
This patch adds a new function, gfs2_rbm_incr, which increments
an rbm structure. This is more efficient than calling gfs2_rbm_to_block,
incrementing, then calling gfs2_rbm_from_block.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is a respin of the original patch. As Steve pointed out, the
introduction of field bii makes it easy to eliminate bi itself.
This revised patch does just that, replacing bi with bii.
This patch adds a new field to the rbm structure, called bii,
which is an index into the array of bitmaps for an rgrp.
This replaces *bi which was a pointer to the bitmap.
This is being done for further optimizations.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When we used try locks for rgrps on block allocations, it was important
to clear the flags field so that we used a blocking hold on the glock.
Now that we're not doing try locks, clearing flags is unnecessary, and
a waste of time. In fact, it's probably doing the wrong thing because
it clears the GL_SKIP bit that was set for the lvb tracking purposes.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch introduces a new field in the bitmap structure called
bi_blocks. Its purpose is to save us from constantly multiplying
bi_len by the constant GFS2_NBBY. It also paves the way for more
optimization in a future patch.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In function gfs2_rbm_from_block, it starts by checking if the block
falls within the first bitmap. It does so by checking if the rbm's
offset is less than (rbm->bi->bi_start + rbm->bi->bi_len) * GFS2_NBBY.
However, the first bitmap will always have bi_start==0. Therefore
this is an unnecessary calculation in a function that gets called
billions of times. This patch removes the reference to bi_start.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
unless it was given an IS_ERR(inode), which isn't the case here. So clean
up the unnecessary error handling in gfs2_create_inode().
This paves the way for real fixes (hence the stable Cc).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: stable@vger.kernel.org
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull CIFS updates from Steve French:
"Includes a couple of fixes, plus changes to make multiplex identifiers
easier to read and correlate with network traces, and a set of
enhancements for SMB3 dialect. Also adds support for per-file
compression for both cifs and smb2/smb3 ("chattr +c filename).
Should have at least one other merge request ready by next week with
some new SMB3 security features and copy offload support"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
Query network adapter info at mount time for debugging
Fix unused variable warning when CIFS POSIX disabled
Allow setting per-file compression via CIFS protocol
Query File System Alignment
Query device characteristics at mount time from server on SMB2/3 not just on cifs mounts
cifs: Send a logoff request before removing a smb session
cifs: Make big endian multiplex ID sequences monotonic on the wire
cifs: Remove redundant multiplex identifier check from check_smb_hdr()
Query file system attributes from server on SMB2, not just cifs, mounts
Allow setting per-file compression via SMB2/3
Fix corrupt SMB2 ioctl requests
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When CONFIG_CIFS_STATS2 enabled query adapter info for debugging
It is easy now in SMB3 to query the information about the server's
network interfaces (and at least Windows 8 and above do this, if not
other clients) there are some useful pieces of information you can get
including:
- all of the network interfaces that the server advertises (not just
the one you are mounting over), and with SMB3 supporting multichannel
this helps with more than just failover (also aggregating multiple
sockets under one mount)
- whether the adapter supports RSS (useful to know if you want to
estimate whether setting up two or more socket connections to the same
address is going to be faster due to RSS offload in the adapter)
- whether the server supports RDMA
- whether the server has IPv6 interfaces (if you connected over IPv4
but prefer IPv6 e.g.)
- what the link speed is (you might want to reconnect over a higher
speed interface if available)
(Of course we could also rerequest this on every mount cheaplly to the
same server, as Windows apparently does, so we can update the adapter
info on new mounts, and also on every reconnect if the network
interface drops temporarily - so we don't have to rely on info from
the first mount to this server)
It is trivial to request this information - and certainly will be useful
when we get to the point of doing multichannel (and eventually RDMA),
but some of this (linkspeed etc.) info may help for debugging in
the meantime. Enable this request when CONFIG_CIFS_STATS2 is on
(only for smb3 mounts since it is an SMB3 or later ioctl).
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fix unused variable warning when CONFIG_CIFS_POSIX disabled.
fs/cifs/ioctl.c: In function 'cifs_ioctl':
>> fs/cifs/ioctl.c:40:8: warning: unused variable 'ExtAttrMask' [-Wunused-variable]
__u64 ExtAttrMask = 0;
^
Pointed out by 0-DAY kernel build testing backend
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
An earlier patch allowed setting the per-file compression flag
"chattr +c filename"
on an smb2 or smb3 mount, and also allowed lsattr to return
whether a file on a cifs, or smb2/smb3 mount was compressed.
This patch extends the ability to set the per-file
compression flag to the cifs protocol, which uses a somewhat
different IOCTL mechanism than SMB2, although the payload
(the flags stored in the compression_state) are the same.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In SMB3 it is now possible to query the file system
alignment info, and the preferred (for performance)
sector size and whether the underlying disk
has no seek penalty (like SSD).
Query this information at mount time for SMB3,
and make it visible in /proc/fs/cifs/DebugData
for debugging purposes.
This alignment information and preferred sector
size info will be helpful for the copy offload
patches to setup the right chunks in the CopyChunk
requests. Presumably the knowledge that the
underlying disk is SSD could also help us
make better readahead and writebehind
decisions (something to look at in the future).
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
cifs mounts
Currently SMB2 and SMB3 mounts do not query the device information at mount time
from the server as is done for cifs. These can be useful for debugging.
This is a minor patch, that extends the previous one (which added ability to
query file system attributes at mount time - this returns the device
characteristics - also via in /proc/fs/cifs/DebugData)
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Send a smb session logoff request before removing smb session off of the list.
On a signed smb session, remvoing a session off of the list before sending
a logoff request results in server returning an error for lack of
smb signature.
Never seen an error during smb logoff, so as per MS-SMB2 3.2.5.1,
not sure how an error during logoff should be retried. So for now,
if a server returns an error to a logoff request, log the error and
remove the session off of the list.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The multiplex identifier (MID) in the SMB header is only
ever used by the client, in conjunction with PID, to match responses
from the server. As such, the endianess of the MID is not important.
However, When tracing packet sequences on the wire, protocol analyzers
such as wireshark display MID as little endian. It is much more informative
for the on-the-wire MID sequences to match debug information emitted by the
CIFS driver. Therefore, one should write and read MID in the SMB header
assuming it is always little endian.
Observed from wireshark during the protocol negotiation
and session setup:
Multiplex ID: 256
Multiplex ID: 256
Multiplex ID: 512
Multiplex ID: 512
Multiplex ID: 768
Multiplex ID: 768
After this patch on-the-wire MID values begin at 1 and increase monotonically.
Introduce get_next_mid64() for the internal consumers that use the full 64 bit
multiplex identifier.
Introduce the helpers get_mid() and compare_mid() to make the endian
translation clear.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Tim Gardner <timg@tpi.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The only call site for check_smb_header() assigns 'mid' from the SMB
packet, which is then checked again in check_smb_header(). This seems
like redundant redundancy.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Tim Gardner <timg@tpi.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Currently SMB2 and SMB3 mounts do not query the file system attributes
from the server at mount time as is done for cifs. These can be useful for debugging.
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Allow cifs/smb2/smb3 to return whether or not a file is compressed
via lsattr, and allow SMB2/SMB3 to set the per-file compression
flag ("chattr +c filename" on an smb3 mount).
Windows users often set the compressed flag (it can be
done from the desktop and file manager). David Disseldorp
has patches to Samba server to support this (at least on btrfs)
which are complementary to this
Signed-off-by: Steve French <smfrench@gmail.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We were off by one calculating the length of ioctls in some cases
because the protocol specification for SMB2 ioctl includes a mininum
one byte payload but not all SMB2 ioctl requests actually have
a data buffer to send. We were also not zeroing out the
return buffer (in case of error this is helpful).
Signed-off-by: Steve French <smfrench@gmail.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- Changes to the RPC socket code to allow NFSv4 to turn off
timeout+retry:
* Detect TCP connection breakage through the "keepalive" mechanism
- Add client side support for NFSv4.x migration (Chuck Lever)
- Add support for multiple security flavour arguments to the "sec="
mount option (Dros Adamson)
- fs-cache bugfixes from David Howells:
* Fix an issue whereby caching can be enabled on a file that is
open for writing
- More NFSv4 open code stable bugfixes
- Various Labeled NFS (selinux) bugfixes, including one stable fix
- Fix buffer overflow checking in the RPCSEC_GSS upcall encoding"
* tag 'nfs-for-3.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits)
NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security
NFSv4: Sanity check the server reply in _nfs4_server_capabilities
NFSv4.2: encode_readdir - only ask for labels when doing readdirplus
nfs: set security label when revalidating inode
NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec
NFS: Fix a missing initialisation when reading the SELinux label
nfs: fix oops when trying to set SELinux label
nfs: fix inverted test for delegation in nfs4_reclaim_open_state
SUNRPC: Cleanup xs_destroy()
SUNRPC: close a rare race in xs_tcp_setup_socket.
SUNRPC: remove duplicated include from clnt.c
nfs: use IS_ROOT not DCACHE_DISCONNECTED
SUNRPC: Fix buffer overflow checking in gss_encode_v0_msg/gss_encode_v1_msg
SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message
SUNRPC: remove an unnecessary if statement
nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'
nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function
nfs: Remove useless 'error' assignment
sunrpc: comment typo fix
SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport
...
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We already check for nfs_server_capable(inode, NFS_CAP_SECURITY_LABEL)
in nfs4_label_alloc()
We check the minor version in _nfs4_server_capabilities before setting
NFS_CAP_SECURITY_LABEL.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We don't want to be setting capabilities and/or requesting attributes
that are not appropriate for the NFSv4 minor version.
- Ensure that we clear the NFS_CAP_SECURITY_LABEL capability when appropriate
- Ensure that we limit the attribute bitmasks to the mounted_on_fileid
attribute and less for NFSv4.0
- Ensure that we limit the attribute bitmasks to suppattr_exclcreat and
less for NFSv4.1
- Ensure that we limit it to change_sec_label or less for NFSv4.2
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Currently, if the server is doing NFSv4.2 and supports labeled NFS, then
our on-the-wire READDIR request ends up asking for the label information,
which is then ignored unless we're doing readdirplus.
This patch ensures that READDIR doesn't ask the server for label information
at all unless the readdir->bitmask contains the FATTR4_WORD2_SECURITY_LABEL
attribute, and the readdir->plus flag is set.
While we're at it, optimise away the 3rd bitmap field if it is zero.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Currently, we fetch the security label when revalidating an inode's
attributes, but don't apply it. This is in contrast to the readdir()
codepath where we do apply label changes.
Cc: Dave Quigley <dpquigl@davequigley.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Ensure that _nfs4_do_get_security_label() also initialises the
SEQUENCE call correctly, by having it call into nfs4_call_sync().
Reported-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Chao reported the following oops when testing labeled NFS:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
PGD 277bbd067 PUD 2777ea067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache sg coretemp kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul iTCO_wdt glue_helper ablk_helper cryptd iTCO_vendor_support bnx2 pcspkr serio_raw i7core_edac cdc_ether microcode usbnet edac_core mii lpc_ich i2c_i801 mfd_core shpchp ioatdma dca acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi drm ata_piix libata megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 4 PID: 25657 Comm: chcon Not tainted 3.10.0-33.el7.x86_64 #1
Hardware name: IBM System x3550 M3 -[7944OEJ]-/90Y4784 , BIOS -[D6E150CUS-1.11]- 02/08/2011
task: ffff880178397220 ti: ffff8801595d2000 task.ti: ffff8801595d2000
RIP: 0010:[<ffffffffa0568703>] [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
RSP: 0018:ffff8801595d3888 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff8801595d3b30 RCX: 0000000000000b4c
RDX: ffff8801595d3b30 RSI: ffff8801595d38e0 RDI: ffff880278b6ec00
RBP: ffff8801595d38c8 R08: ffff8801595d3b30 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801595d38e0
R13: ffff880277a4a780 R14: ffffffffa05686c0 R15: ffff8802765f206c
FS: 00007f2c68486800(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000027651a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 ffff880277865800 ffff880278b6ec00 ffff880277a4a780
ffff8801595d3948 ffffffffa02ad926 ffff8801595d3b30 ffff8802765f206c
Call Trace:
[<ffffffffa02ad926>] rpcauth_wrap_req+0x86/0xd0 [sunrpc]
[<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
[<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
[<ffffffffa02a1ecb>] call_transmit+0x18b/0x290 [sunrpc]
[<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
[<ffffffffa02aae14>] __rpc_execute+0x84/0x400 [sunrpc]
[<ffffffffa02ac40e>] rpc_execute+0x5e/0xa0 [sunrpc]
[<ffffffffa02a2ea0>] rpc_run_task+0x70/0x90 [sunrpc]
[<ffffffffa02a2f03>] rpc_call_sync+0x43/0xa0 [sunrpc]
[<ffffffffa055284d>] _nfs4_do_set_security_label+0x11d/0x170 [nfsv4]
[<ffffffffa0558861>] nfs4_set_security_label.isra.69+0xf1/0x1d0 [nfsv4]
[<ffffffff815fca8b>] ? avc_alloc_node+0x24/0x125
[<ffffffff815fcd2f>] ? avc_compute_av+0x1a3/0x1b5
[<ffffffffa055897b>] nfs4_xattr_set_nfs4_label+0x3b/0x50 [nfsv4]
[<ffffffff811bc772>] generic_setxattr+0x62/0x80
[<ffffffff811bcfc3>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff811bd1c5>] vfs_setxattr+0xb5/0xc0
[<ffffffff811bd2fe>] setxattr+0x12e/0x1c0
[<ffffffff811a4d22>] ? final_putname+0x22/0x50
[<ffffffff811a4f2b>] ? putname+0x2b/0x40
[<ffffffff811aa1cf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff8119bc29>] ? __sb_start_write+0x49/0x100
[<ffffffff811bd66f>] SyS_lsetxattr+0x8f/0xd0
[<ffffffff8160cf99>] system_call_fastpath+0x16/0x1b
Code: 48 8b 02 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 d0 00 00 00 00 48 c7 45 d8 00 00 00 00 48 c7 45 e0 00 00 00 00 <48> 8b 00 48 8b 00 48 85 c0 0f 84 ae 00 00 00 48 8b 80 b8 03 00
RIP [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
RSP <ffff8801595d3888>
CR2: 0000000000000000
The problem is that _nfs4_do_set_security_label calls rpc_call_sync()
directly which fails to do any setup of the SEQUENCE call. Have it use
nfs4_call_sync() instead which does the right thing. While we're at it
change the name of "args" to "arg" to better match the pattern in
_nfs4_do_setattr.
Reported-by: Chao Ye <cye@redhat.com>
Cc: David Quigley <dpquigl@davequigley.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
commit 6686390bab6a0e0 (NFS: remove incorrect "Lock reclaim failed!"
warning.) added a test for a delegation before checking to see if any
reclaimed locks failed. The test however is backward and is only doing
that check when a delegation is held instead of when one isn't.
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Fixes: 6686390bab6a: NFS: remove incorrect "Lock reclaim failed!" warning.
Cc: stable@vger.kernel.org # 3.12
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into linux-next
Pull fs-cache fixes from David Howells:
Can you pull these commits to fix an issue with NFS whereby caching can be
enabled on a file that is open for writing by subsequently opening it for
reading. This can be made to crash by opening it for writing again if you're
quick enough.
The gist of the patchset is that the cookie should be acquired at inode
creation only and subsequently enabled and disabled as appropriate (which
dispenses with the backing objects when they're not needed).
The extra synchronisation that NFS does can then be dispensed with as it is
thenceforth managed by FS-Cache.
Could you send these on to Linus?
This likely will need fixing also in CIFS and 9P also once the FS-Cache
changes are upstream. AFS and Ceph are probably safe.
* 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()
FS-Cache: Provide the ability to enable/disable cookies
FS-Cache: Add use/unuse/wake cookie wrappers
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Use i_writecount to control whether to get an fscache cookie in nfs_open() as
NFS does not do write caching yet. I *think* this is the cause of a problem
encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL
pointer dereference because cookie->def is NULL:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
PGD 0
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP
Modules linked in: ...
CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012
task: ffff8804203460c0 ti: ffff880420346640
RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
RSP: 0018:ffff8801053af878 EFLAGS: 00210286
RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8
RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780
RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440
R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0
Stack:
...
Call Trace:
[<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70
[<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90
[<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90
[<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620
[<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0
[<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70
[<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40
[<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70
[<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120
[<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0
[<ffffffff81357f78>] nfs_setattr+0xc8/0x150
[<ffffffff8122d95b>] notify_change+0x1cb/0x390
[<ffffffff8120a55b>] do_truncate+0x7b/0xc0
[<ffffffff8121f96c>] do_last+0xa4c/0xfd0
[<ffffffff8121ffbc>] path_openat+0xcc/0x670
[<ffffffff81220a0e>] do_filp_open+0x4e/0xb0
[<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0
[<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50
[<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24
The code at the instruction pointer was disassembled:
> (gdb) disas __fscache_uncache_page
> Dump of assembler code for function __fscache_uncache_page:
> ...
> 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax
> 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax)
> 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237>
These instructions make up:
ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
That cmpb is the faulting instruction (%rax is 0). So cookie->def is NULL -
which presumably means that the cookie has already been at least partway
through __fscache_relinquish_cookie().
What I think may be happening is something like a three-way race on the same
file:
PROCESS 1 PROCESS 2 PROCESS 3
=============== =============== ===============
open(O_TRUNC|O_WRONLY)
open(O_RDONLY)
open(O_WRONLY)
-->nfs_open()
-->nfs_fscache_set_inode_cookie()
nfs_fscache_inode_lock()
nfs_fscache_disable_inode_cookie()
__fscache_relinquish_cookie()
nfs_inode->fscache = NULL
<--nfs_fscache_set_inode_cookie()
-->nfs_open()
-->nfs_fscache_set_inode_cookie()
nfs_fscache_inode_lock()
nfs_fscache_enable_inode_cookie()
__fscache_acquire_cookie()
nfs_inode->fscache = cookie
<--nfs_fscache_set_inode_cookie()
<--nfs_open()
-->nfs_setattr()
...
...
-->nfs_invalidate_page()
-->__nfs_fscache_invalidate_page()
cookie = nfsi->fscache
-->nfs_open()
-->nfs_fscache_set_inode_cookie()
nfs_fscache_inode_lock()
nfs_fscache_disable_inode_cookie()
-->__fscache_relinquish_cookie()
-->__fscache_uncache_page(cookie)
<crash>
<--__fscache_relinquish_cookie()
nfs_inode->fscache = NULL
<--nfs_fscache_set_inode_cookie()
What is needed is something to prevent process #2 from reacquiring the cookie
- and I think checking i_writecount should do the trick.
It's also possible to have a two-way race on this if the file is opened
O_TRUNC|O_RDONLY instead.
Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Provide the ability to enable and disable fscache cookies. A disabled cookie
will reject or ignore further requests to:
Acquire a child cookie
Invalidate and update backing objects
Check the consistency of a backing object
Allocate storage for backing page
Read backing pages
Write to backing pages
but still allows:
Checks/waits on the completion of already in-progress objects
Uncaching of pages
Relinquishment of cookies
Two new operations are provided:
(1) Disable a cookie:
void fscache_disable_cookie(struct fscache_cookie *cookie,
bool invalidate);
If the cookie is not already disabled, this locks the cookie against other
dis/enablement ops, marks the cookie as being disabled, discards or
invalidates any backing objects and waits for cessation of activity on any
associated object.
This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
but it reinitialises the cookie such that it can be reenabled.
All possible failures are handled internally. The caller should consider
calling fscache_uncache_all_inode_pages() afterwards to make sure all page
markings are cleared up.
(2) Enable a cookie:
void fscache_enable_cookie(struct fscache_cookie *cookie,
bool (*can_enable)(void *data),
void *data)
If the cookie is not already enabled, this locks the cookie against other
dis/enablement ops, invokes can_enable() and, if the cookie is not an
index cookie, will begin the procedure of acquiring backing objects.
The optional can_enable() function is passed the data argument and returns
a ruling as to whether or not enablement should actually be permitted to
begin.
All possible failures are handled internally. The cookie will only be
marked as enabled if provisional backing objects are allocated.
A later patch will introduce these to NFS. Cookie enablement during nfs_open()
is then contingent on i_writecount <= 0. can_enable() checks for a race
between open(O_RDONLY) and open(O_WRONLY/O_RDWR). This simplifies NFS's cookie
handling and allows us to get rid of open(O_RDONLY) accidentally introducing
caching to an inode that's open for writing already.
One operation has its API modified:
(3) Acquire a cookie.
struct fscache_cookie *fscache_acquire_cookie(
struct fscache_cookie *parent,
const struct fscache_cookie_def *def,
void *netfs_data,
bool enable);
This now has an additional argument that indicates whether the requested
cookie should be enabled by default. It doesn't need the can_enable()
function because the caller must prevent multiple calls for the same netfs
object and it doesn't need to take the enablement lock because no one else
can get at the cookie before this returns.
Signed-off-by: David Howells <dhowells@redhat.com
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Add wrapper functions for dealing with cookie->n_active:
(*) __fscache_use_cookie() to increment it.
(*) __fscache_unuse_cookie() to decrement and test against zero.
(*) __fscache_wake_unused_cookie() to wake up anyone waiting for it to reach
zero.
The second and third are split so that the third can be done after cookie->lock
has been released in case the waiter wakes up whilst we're still holding it and
tries to get it.
We will need to wake-on-zero once the cookie disablement patch is applied
because it will then be possible to see n_active become zero without the cookie
being relinquished.
Also move the cookie usement out of fscache_attr_changed_op() and into
fscache_attr_changed() and the operation struct so that cookie disablement
will be able to track it.
Whilst we're at it, only increment n_active if we're about to do
fscache_submit_op() so that we don't have to deal with undoing it if anything
earlier fails. Possibly this should be moved into fscache_submit_op() which
could look at FSCACHE_OP_UNUSE_COOKIE.
Signed-off-by: David Howells <dhowells@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This check was added by Al Viro with
d9e80b7de91db05c1c4d2e5ebbfd70b3b3ba0e0f "nfs d_revalidate() is too
trigger-happy with d_drop()", with the explanation that we don't want to
remove the root of a disconnected tree, which will still be included on
the s_anon list.
But DCACHE_DISCONNECTED does *not* actually identify dentries that are
disconnected from the dentry tree or hashed on s_anon. IS_ROOT() is the
way to do that.
Also add a comment from Al's commit to remind us why this check is
there.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|