| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull pwm fixes from Thierry Reding:
"These patches fix the Samsung PWM driver and perform some minor
cleanups like fixing checkpatch and sparse warnings.
Two redundant error messages are removed and the Kconfig help text for
the PWM subsystem is made more descriptive."
* tag 'for-3.6-rc3' of git://gitorious.org/linux-pwm/linux-pwm:
pwm: Improve Kconfig help text
pwm: core: Fix coding style issues
pwm: vt8500: Fix coding style issue
pwm: Remove a redundant error message when devm_request_and_ioremap fails
pwm: samsung: add missing device pointer to struct pwm_chip
pwm: Add missing static storage class specifiers in core.c file
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The Kconfig help text should help the user understand what functionality
is provided by an option. This is especially true for new subsystems. An
improved help text is provided by this commit in the hopes of clarifying
the usefulness of the PWM framework.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Borislav Petkov <bp@alien8.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes the following:
WARNING: line over 80 characters
ERROR: spaces required around that ':' (ctx:VxW)
WARNING: Prefer pr_warn(... to pr_warning(...
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes the following:
WARNING: Prefer pr_warn(... to pr_warning(...
pr_warning("Waiting for status bits 0x%x to clear timed out\n",
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The implementation in devm_request_and_ioremap() already shows error message,
so no need to show dev_err again if devm_request_and_ioremap() fails.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Philip, Avinash <avinashphilip@ti.com>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds missing device pointer to struct pwm_chip. If the
device pointer is NULL, pwmchip_add() will return error.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes the following sparse warnings:
drivers/pwm/core.c:152:6: warning:
symbol 'of_pwmchip_add' was not declared. Should it be static?
drivers/pwm/core.c:165:6: warning:
symbol 'of_pwmchip_remove' was not declared. Should it be static?
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
"Jim's fix closes a narrow race introduced with the msgr changes. One
fix resolves problems with debugfs initialization that Yan found when
multiple client instances are created (e.g., two clusters mounted, or
rbd + cephfs), another one fixes problems with mounting a nonexistent
server subdirectory, and the last one fixes a divide by zero error
from unsanitized ioctl input that Dan Carpenter found."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: avoid divide by zero in __validate_layout()
libceph: avoid truncation due to racing banners
ceph: tolerate (and warn on) extraneous dentry from mds
libceph: delay debugfs initialization until we learn global_id
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If "l->stripe_unit" is zero the the mod on the next line will cause a
divide by zero bug. This comes from the copy_from_user() in
ceph_ioctl_set_layout_policy(). Passing 0 is valid, though (it means
"do not change") so avoid the % check in that case.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Because the Ceph client messenger uses a non-blocking connect, it is
possible for the sending of the client banner to race with the
arrival of the banner sent by the peer.
When ceph_sock_state_change() notices the connect has completed, it
schedules work to process the socket via con_work(). During this
time the peer is writing its banner, and arrival of the peer banner
races with con_work().
If con_work() calls try_read() before the peer banner arrives, there
is nothing for it to do, after which con_work() calls try_write() to
send the client's banner. In this case Ceph's protocol negotiation
can complete succesfully.
The server-side messenger immediately sends its banner and addresses
after accepting a connect request, *before* actually attempting to
read or verify the banner from the client. As a result, it is
possible for the banner from the server to arrive before con_work()
calls try_read(). If that happens, try_read() will read the banner
and prepare protocol negotiation info via prepare_write_connect().
prepare_write_connect() calls con_out_kvec_reset(), which discards
the as-yet-unsent client banner. Next, con_work() calls
try_write(), which sends the protocol negotiation info rather than
the banner that the peer is expecting.
The result is that the peer sees an invalid banner, and the client
reports "negotiation failed".
Fix this by moving con_out_kvec_reset() out of
prepare_write_connect() to its callers at all locations except the
one where the banner might still need to be sent.
[elder@inktak.com: added note about server-side behavior]
Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Reviewed-by: Alex Elder <elder@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the MDS gives us a dentry and we weren't prepared to handle it,
WARN_ON_ONCE instead of crashing.
Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The debugfs directory includes the cluster fsid and our unique global_id.
We need to delay the initialization of the debug entry until we have
learned both the fsid and our global_id from the monitor or else the
second client can't create its debugfs entry and will fail (and multiple
client instances aren't properly reflected in debugfs).
Reported by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull NFS client bugfixes from Trond Myklebust:
- NFSv3 mounts need to fail if the FSINFO rpc call fails
- Ensure that the NFS commit cache gets torn down when we unload the
NFS module.
- Fix memory scribble issues when interrupting a LAYOUTGET rpc call
- Fix NFSv4 legacy idmapper regressions
- Fix issues with the NFSv4 getacl command
- Fix a regression when using the legacy "mount -t nfs4"
* tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv3: Ensure that do_proc_get_root() reports errors correctly
NFSv4: Ensure that nfs4_alloc_client cleans up on error.
NFS: return -ENOKEY when the upcall fails to map the name
NFS: Clear key construction data if the idmap upcall fails
NFSv4: Don't use private xdr_stream fields in decode_getacl
NFSv4: Fix the acl cache size calculation
NFSv4: Fix pointer arithmetic in decode_getacl
NFS: Alias the nfs module to nfs4
NFS: Fix a regression when loading the NFS v4 module
NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
pnfs-obj: Better IO pattern in case of unaligned offset
NFS41: add pg_layout_private to nfs_pageio_descriptor
pnfs: nfs4_proc_layoutget returns void
pnfs: defer release of pages in layoutget
nfs: tear down caches in nfs_init_writepagecache when allocation fails
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If the rpc call to NFS3PROC_FSINFO fails, then we need to report that
error so that the mount fails. Otherwise we can end up with a
superblock with completely unusable values for block sizes, maxfilesize,
etc.
Reported-by: Yuanming Chen <hikvision_linux@163.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Any pointer that was allocated through nfs_alloc_client() needs to be
freed via a call to nfs_free_client().
Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This allows the normal error-paths to handle the error, rather than
making a special call to complete_request_key() just for this instance.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Cc: stable@vger.kernel.org [>= 3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
idmap_pipe_downcall already clears this field if the upcall succeeds,
but if it fails (rpc.idmapd isn't running) the field will still be set
on the next call triggering a BUG_ON(). This patch tries to handle all
possible ways that the upcall could fail and clear the idmap key data
for each one.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Cc: stable@vger.kernel.org [>= 3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Instead of using the private field xdr->p from struct xdr_stream,
use the public xdr_stream_pos().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently, we do not take into account the size of the 16 byte
struct nfs4_cached_acl header, when deciding whether or not we should
cache the acl data. Consequently, we will end up allocating an
8k buffer in order to fit a maximum size 4k acl.
This patch adjusts the calculation so that we limit the cache size
to 4k for the acl header+data.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Resetting the cursor xdr->p to a previous value is not a safe
practice: if the xdr_stream has crossed out of the initial iovec,
then a bunch of other fields would need to be reset too.
Fix this issue by using xdr_enter_page() so that the buffer gets
page aligned at the bitmap _before_ we decode it.
Also fix the confusion of the ACL length with the page buffer length
by not adding the base offset to the ACL length...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This allows distros to remove the line from their modprobe
configuration.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Some systems have a modprobe.d/nfs.conf file that sets an nfs4 alias
pointing to nfs.ko, rather than nfs4.ko. This can prevent the v4 module
from loading on mount, since the kernel sees that something named "nfs4"
has already been loaded. To work around this, I've renamed the modules
to "nfsv2.ko" "nfsv3.ko" and "nfsv4.ko".
I also had to move the nfs4_fs_type back to nfs.ko to ensure that `mount
-t nfs4` still works.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence
disconnected data server) we've been sending layoutreturn calls
while there is potentially still outstanding I/O to the data
servers. The reason we do this is to avoid races between replayed
writes to the MDS and the original writes to the DS.
When this happens, the BUG_ON() in nfs4_layoutreturn_done can
be triggered because it assumes that we would never call
layoutreturn without knowing that all I/O to the DS is
finished. The fix is to remove the BUG_ON() now that the
assumptions behind the test are obsolete.
Reported-by: Boaz Harrosh <bharrosh@panasas.com>
Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>=3.5]
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Depending on layout and ARCH, ORE has some limits on max IO sizes
which is communicated on (what else) ore_layout->max_io_length,
which is always stripe aligned.
This was considered as the pg_test boundary for splitting and starting
a new IO.
But in the case of a long IO where the start offset is not aligned
what would happen is that both end of IO[N] and start of IO[N+1]
would be unaligned, causing each IO boundary parity unit to be
calculated and written twice.
So what we do in this patch is split the very start of an unaligned
IO, up to a stripe boundary, and then next IO's can continue fully
aligned til the end.
We might be sacrificing the case where the full unaligned IO would
fit within a single max_io_length, but the sacrifice is well worth
the elimination of double calculation and parity units IO.
Actually the sacrificing is marginal and is almost unmeasurable.
TODO:
If we know the total expected linear segment that will
be received, at pg_init, we could use that information
in many places:
1. blocks-layout get_layout write segment size
2. Better mds-threshold
3. In above situation for a better clean split
I will do this in future submission.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
To allow layout driver to pass private information around
pg_init/pg_doio.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
since the only user of nfs4_proc_layoutget is send_layoutget, which
ignores its return value, there is no reason to return any value.
Signed-off-by: Idan Kedar <idank@tonian.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
we have encountered a bug whereby reading a lot of files (copying
fedora's /bin) from a pNFS mount and hitting Ctrl+C in the middle caused
a general protection fault in xdr_shrink_bufhead. this function is
called when decoding the response from LAYOUTGET. the decoding is done
by a worker thread, and the caller of LAYOUTGET waits for the worker
thread to complete.
hitting Ctrl+C caused the synchronous wait to end and the next thing the
caller does is to free the pages, so when the worker thread calls
xdr_shrink_bufhead, the pages are gone. therefore, the cleanup of these
pages has been moved to nfs4_layoutget_release.
Signed-off-by: Idan Kedar <idank@tonian.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
...and ensure that we tear down the nfs_commit_data cache too when
unloading the module.
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull assorted fixes - mostly vfs - from Al Viro:
"Assorted fixes, with an unexpected detour into vfio refcounting logics
(fell out when digging in an analog of eventpoll race in there)."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
task_work: add a scheduling point in task_work_run()
fs: fix fs/namei.c kernel-doc warnings
eventpoll: use-after-possible-free in epoll_create1()
vfio: grab vfio_device reference *before* exposing the sucker via fd_install()
vfio: get rid of vfio_device_put()/vfio_group_get_device* races
vfio: get rid of open-coding kref_put_mutex
introduce kref_put_mutex()
vfio: don't dereference after kfree...
mqueue: lift mnt_want_write() outside ->i_mutex, clean up a bit
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
It seems commit 4a9d4b02 (switch fput to task_work_add) reintroduced
the problem addressed in commit 944be0b2 (close_files(): add scheduling
point)
If a server process with a lot of files (say 2 million tcp sockets)
is killed, we can spend a lot of time in task_work_run() and trigger
a soft lockup.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fix kernel-doc warnings in fs/namei.c:
Warning(fs/namei.c:360): No description found for parameter 'inode'
Warning(fs/namei.c:672): No description found for parameter 'nd'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As soon as we'd installed the file into descriptor table, it can
get closed by another thread. Freeing ep in process...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
It's not critical (anymore) since another thread closing the file will block
on ->device_lock before it gets to dropping the final reference, but it's
definitely cleaner that way...
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
we really need to make sure that dropping the last reference happens
under the group->device_lock; otherwise a loop (under device_lock)
might find vfio_device instance that is being freed right now, has
already dropped the last reference and waits on device_lock to exclude
the sucker from the list.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
equivalent of
mutex_lock(mutex);
if (!kref_put(kref, release))
mutex_unlock(mutex);
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
the way it abuses ->d_fsdata still needs to be killed, but that's
a separate story.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Merge fixes from Andrew Morton.
Random drivers and some VM fixes.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (17 commits)
mm: compaction: Abort async compaction if locks are contended or taking too long
mm: have order > 0 compaction start near a pageblock with free pages
rapidio/tsi721: fix unused variable compiler warning
rapidio/tsi721: fix inbound doorbell interrupt handling
drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode
mm: correct page->pfmemalloc to fix deactivate_slab regression
drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes
mm/compaction.c: fix deferring compaction mistake
drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources
string: do not export memweight() to userspace
hugetlb: update hugetlbpage.txt
checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_MACRO
mm: hugetlbfs: correctly populate shared pmd
cciss: fix incorrect scsi status reporting
Documentation: update mount option in filesystem/vfat.txt
mm: change nr_ptes BUG_ON to WARN_ON
cs5535-clockevt: typo, it's MFGPT, not MFPGT
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Jim Schutt reported a problem that pointed at compaction contending
heavily on locks. The workload is straight-forward and in his own words;
The systems in question have 24 SAS drives spread across 3 HBAs,
running 24 Ceph OSD instances, one per drive. FWIW these servers
are dual-socket Intel 5675 Xeons w/48 GB memory. I've got ~160
Ceph Linux clients doing dd simultaneously to a Ceph file system
backed by 12 of these servers.
Early in the test everything looks fine
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
31 15 0 287216 576 38606628 0 0 2 1158 2 14 1 3 95 0 0
27 15 0 225288 576 38583384 0 0 18 2222016 203357 134876 11 56 17 15 0
28 17 0 219256 576 38544736 0 0 11 2305932 203141 146296 11 49 23 17 0
6 18 0 215596 576 38552872 0 0 7 2363207 215264 166502 12 45 22 20 0
22 18 0 226984 576 38596404 0 0 3 2445741 223114 179527 12 43 23 22 0
and then it goes to pot
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
163 8 0 464308 576 36791368 0 0 11 22210 866 536 3 13 79 4 0
207 14 0 917752 576 36181928 0 0 712 1345376 134598 47367 7 90 1 2 0
123 12 0 685516 576 36296148 0 0 429 1386615 158494 60077 8 84 5 3 0
123 12 0 598572 576 36333728 0 0 1107 1233281 147542 62351 7 84 5 4 0
622 7 0 660768 576 36118264 0 0 557 1345548 151394 59353 7 85 4 3 0
223 11 0 283960 576 36463868 0 0 46 1107160 121846 33006 6 93 1 1 0
Note that system CPU usage is very high blocks being written out has
dropped by 42%. He analysed this with perf and found
perf record -g -a sleep 10
perf report --sort symbol --call-graph fractal,5
34.63% [k] _raw_spin_lock_irqsave
|
|--97.30%-- isolate_freepages
| compaction_alloc
| unmap_and_move
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_slowpath
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| do_page_fault
| page_fault
| |
| |--87.39%-- skb_copy_datagram_iovec
| | tcp_recvmsg
| | inet_recvmsg
| | sock_recvmsg
| | sys_recvfrom
| | system_call
| | __recv
| | |
| | --100.00%-- (nil)
| |
| --12.61%-- memcpy
--2.70%-- [...]
There was other data but primarily it is all showing that compaction is
contended heavily on the zone->lock and zone->lru_lock.
commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
while isolating pages for migration] noted that it was possible for
migration to hold the lru_lock for an excessive amount of time. Very
broadly speaking this patch expands the concept.
This patch introduces compact_checklock_irqsave() to check if a lock
is contended or the process needs to be scheduled. If either condition
is true then async compaction is aborted and the caller is informed.
The page allocator will fail a THP allocation if compaction failed due
to contention. This patch also introduces compact_trylock_irqsave()
which will acquire the lock only if it is not contended and the process
does not need to schedule.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Commit 7db8889ab05b ("mm: have order > 0 compaction start off where it
left") introduced a caching mechanism to reduce the amount work the free
page scanner does in compaction. However, it has a problem. Consider
two process simultaneously scanning free pages
C
Process A M S F
|---------------------------------------|
Process B M FS
C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn
In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.
Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.
Process A moves to "end_of_zone - one_pageblock" and runs this check
if (cc->order > 0 && (!cc->wrapped ||
zone->compact_cached_free_pfn >
cc->start_free_pfn))
pfn = min(pfn, zone->compact_cached_free_pfn);
compact_cached_free_pfn is above where it started so the free scanner
skips almost the entire space it should have scanned. When there are
multiple processes compacting it can end in a situation where the entire
zone is not being scanned at all. Further, it is possible for two
processes to ping-pong update to compact_cached_free_pfn which is just
random.
Overall, the end result wrecks allocation success rates.
There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.
First, it gets rid of the skip logic because it's not clear that it
matters if two free scanners happen to be in the same block but with
racing updates it's too easy for it to skip over blocks it should not.
Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.
If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.
If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.
This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fix unused variable compiler warning when built with CONFIG_RAPIDIO_DEBUG
option off.
This patch is applicable to kernel versions starting from v3.2
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Make sure that there is no doorbell messages left behind due to disabled
interrupts during inbound doorbell processing.
The most common case for this bug is loss of rionet JOIN messages in
systems with three or more rionet participants and MSI or MSI-X enabled.
As result, requests for packet transfers may finish with "destination
unreachable" error message.
This patch is applicable to kernel versions starting from v3.2.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Correct the offset by subtracting 20 from tm_hour before taking the
modulo 12.
[ "Why 20?" I hear you ask. Or at least I did.
Here's the reason why: RS5C348_BIT_PM is 32, and is - stupidly -
included in the RS5C348_HOURS_MASK define. So it's really subtracting
out that bit to get "hour+12". But then because it does things modulo
12, it needs to add the 12 in again afterwards anyway.
This code is confused. It would be much clearer if RS5C348_HOURS_MASK
just didn't include the RS5C348_BIT_PM bit at all, then it wouldn't
need to do the silly subtract either.
Whatever. It's all just math, the end result is the same. - Linus ]
Reported-by: James Nute <newten82@gmail.com>
Tested-by: James Nute <newten82@gmail.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Commit cfd19c5a9ecf ("mm: only set page->pfmemalloc when
ALLOC_NO_WATERMARKS was used") tried to narrow down page->pfmemalloc
setting, but it missed some places the pfmemalloc should be set.
So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS
cause incorrect deactivate_slab() on our core2 server:
64.73% fio [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|---0.34%-- deactivate_slab
| __slab_alloc
| kmem_cache_alloc
| |
That causes our fio sync write performance to have a 40% regression.
Move the checking in get_page_from_freelist() which resolves this issue.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Sage Weil <sage@inktank.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Dynamically allocated sysfs attributes must be initialized using
sysfs_attr_init(), otherwise lockdep complains: BUG: key <address> not in
.data!
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Ilya Shchepetkov <shchepetkov@ispras.ru>
Cc: Chris Verges <chrisv@cyberswitching.com>
Cc: Christian Pellegrin <chripell@fsfe.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Commit aff622495c9a ("vmscan: only defer compaction for failed order and
higher") fixed bad deferring policy but made mistake about checking
compact_order_failed in __compact_pgdat(). So it can't update
compact_order_failed with the new order. This ends up preventing
correct operation of policy deferral. This patch fixes it.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
resources
On many of our larger systems, CPU 0 has had all of its IRQ resources
consumed before XPC loads. Worst cases on machines with multiple 10
GigE cards and multiple IB cards have depleted the entire first socket
of IRQs.
This patch makes selecting the node upon which IRQs are allocated (as
well as all the other GRU Message Queue structures) specifiable as a
module load param and has a default behavior of searching all nodes/cpus
for an available resources.
[akpm@linux-foundation.org: fix build: include cpu.h and module.h]
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fix the following warning:
usr/include/linux/string.h:8: userspace cannot reference function or variable defined in the kernel
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Commit f0f57b2b1488 ("mm: move hugepage test examples to
tools/testing/selftests/vm") moved map_hugetlb.c, hugepage-shm.c and
hugepage-mmap.c tests into tools/testing/selftests/vm/ directory, but it
didn't update hugetlbpage.txt
Signed-off-by: Zhouping Liu <sanweidaying@gmail.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|