summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* udf: Ignore [ug]id=ignore mount optionsJan Kara2018-02-273-18/+8
| | | | | | | | | | | | | Currently uid=ignore and gid=ignore make no sense without uid=<number> and gid=<number> respectively as they result in all files having invalid uid / gid which then doesn't allow even root to modify files and thus causes confusion. And since commit ca76d2d8031f "UDF: fix UID and GID mount option ignorance" (from over 10 years ago) uid=<number> overrides all uids on disk as uid=ignore does. So just silently ignore uid=ignore mount option. Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Fix handling of Partition DescriptorsJan Kara2018-02-271-32/+73
| | | | | | | | | | | | | | | | | Current handling of Partition Descriptors in Volume Descriptor Sequence is buggy in several ways. Firstly, it does not take descriptor sequence numbers into account at all, thus any volume making serious use of them would be unmountable. Secondly, it does not handle Volume Descriptor Pointers or Volume Descriptor Sequence without Terminating Descriptor. Fix these problems by properly remembering all Partition Descriptors in the Volume Descriptor Sequence and their sequence numbers. This is made more complicated by the fact that we don't know number of partitions in advance and sequence numbers have to be tracked on per-partition basis. Reported-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Unify common handling of descriptorsJan Kara2018-02-271-22/+19
| | | | | | | | When scanning Volume Descriptor Sequence, several descriptors have exactly the same handling. Unify it. Acked-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Convert descriptor index definitions to enumJan Kara2018-02-161-8/+9
| | | | | | | | | Convert index definitions from defines to enum. It is a shorter description and easier to modify. Also remove VDS_POS_VOL_DESC_PTR since it is unused. Acked-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Allow volume descriptor sequence to be terminated by unrecorded blockJan Kara2018-02-161-6/+2
| | | | | | | | | | | | According to ECMA-167 3/8.4.2 a volume descriptor sequence can be terminated also by an unrecorded block within the extent of volume descriptor sequence. Currently we errored out in such case making such volumes unmountable. Handle that case by treating any invalid block as a block terminating the sequence. Reported-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Simplify handling of Volume Descriptor PointersJan Kara2018-02-161-25/+16
| | | | | | | | | | | According to ECMA-167 3/8.4.2 Volume Descriptor Pointer is terminating current extent of Volume Descriptor Sequence. Also according to ECMA-167 3/8.4.3 Volume Descriptor Sequence Number is not significant for Volume Descriptor Pointers. Simplify the handling of Volume Descriptor Pointers to take this into account. Acked-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Fix off-by-one in volume descriptor sequence lengthJan Kara2018-02-161-3/+3
| | | | | | | | | | | We pass one block beyond end of volume descriptor sequence into process_sequence() as 'lastblock' instead of the last block of the sequence. When the sequence is not terminated with TD descriptor, this could lead to false errors due to invalid blocks in volume descriptor sequence and thus unmountable volumes. Acked-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* inotify: Extend ioctl to allow to request id of new watch descriptorKirill Tkhai2018-02-141-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Watch descriptor is id of the watch created by inotify_add_watch(). It is allocated in inotify_add_to_idr(), and takes the numbers starting from 1. Every new inotify watch obtains next available number (usually, old + 1), as served by idr_alloc_cyclic(). CRIU (Checkpoint/Restore In Userspace) project supports inotify files, and restores watched descriptors with the same numbers, they had before dump. Since there was no kernel support, we had to use cycle to add a watch with specific descriptor id: while (1) { int wd; wd = inotify_add_watch(inotify_fd, path, mask); if (wd < 0) { break; } else if (wd == desired_wd_id) { ret = 0; break; } inotify_rm_watch(inotify_fd, wd); } (You may find the actual code at the below link: https://github.com/checkpoint-restore/criu/blob/v3.7/criu/fsnotify.c#L577) The cycle is suboptiomal and very expensive, but since there is no better kernel support, it was the only way to restore that. Happily, we had met mostly descriptors with small id, and this approach had worked somehow. But recent time containers with inotify with big watch descriptors begun to come, and this way stopped to work at all. When descriptor id is something about 0x34d71d6, the restoring process spins in busy loop for a long time, and the restore hungs and delay of migration from node to node could easily be watched. This patch aims to solve this problem. It introduces new ioctl INOTIFY_IOC_SETNEXTWD, which allows to request the number of next created watch descriptor from userspace. It simply calls idr_set_cursor() primitive to populate idr::idr_next, so that next idr_alloc_cyclic() allocation will return this id, if it is not occupied. This is the way which is used to restore some other resources from userspace. For example, /proc/sys/kernel/ns_last_pid works the same for task pids. The new code is under CONFIG_CHECKPOINT_RESTORE #define, so small system may exclude it. v2: Use INT_MAX instead of custom definition of max id, as IDR subsystem guarantees id is between 0 and INT_MAX. CC: Jan Kara <jack@suse.cz> CC: Matthew Wilcox <willy@infradead.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>
* vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds2018-02-1124-80/+80
| | | | | | | | | | | | | | | | | | | | | | | This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'work.poll2' of ↵Linus Torvalds2018-02-114-21/+26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more poll annotation updates from Al Viro: "This is preparation to solving the problems you've mentioned in the original poll series. After this series, the kernel is ready for running for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done as a for bulk search-and-replace. After that, the kernel is ready to apply the patch to unify {de,}mangle_poll(), and then get rid of kernel-side POLL... uses entirely, and we should be all done with that stuff. Basically, that's what you suggested wrt KPOLL..., except that we can use EPOLL... instead - they already are arch-independent (and equal to what is currently kernel-side POLL...). After the preparations (in this series) switch to returning EPOLL... from ->poll() instances is completely mechanical and kernel-side POLL... can go away. The last step (killing kernel-side POLL... and unifying {de,}mangle_poll() has to be done after the search-and-replace job, since we need userland-side POLL... for unified {de,}mangle_poll(), thus the cherry-pick at the last step. After that we will have: - POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of ->poll() still using those will be caught by sparse. - eventpoll.c and select.c warning-free wrt __poll_t - no more kernel-side definitions of POLL... - userland ones are visible through the entire kernel (and used pretty much only for mangle/demangle) - same behavior as after the first series (i.e. sparc et.al. epoll(2) working correctly)" * 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: annotate ep_scan_ready_list() ep_send_events_proc(): return result via esed->res preparation to switching ->poll() to returning EPOLL... add EPOLLNVAL, annotate EPOLL... and event_poll->event use linux/poll.h instead of asm/poll.h xen: fix poll misannotation smc: missing poll annotations
| * annotate ep_scan_ready_list()Al Viro2018-02-011-11/+13
| | | | | | | | | | | | make it always return __poll_t and have its callbacks do the same Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ep_send_events_proc(): return result via esed->resAl Viro2018-02-011-7/+10
| | | | | | | | | | | | preparations for not mixing __poll_t and int in ep_scan_ready_list() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * use linux/poll.h instead of asm/poll.hAl Viro2018-02-013-3/+3
| | | | | | | | | | | | | | | | | | The only place that has any business including asm/poll.h is linux/poll.h. Fortunately, asm/poll.h had only been included in 3 places beyond that one, and all of them are trivial to switch to using linux/poll.h. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-linus' of ↵Linus Torvalds2018-02-092-2/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs fixes from Al Viro. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: seq_file: fix incomplete reset on read from zero offset kernfs: fix regression in kernfs_fop_write caused by wrong type
| * | seq_file: fix incomplete reset on read from zero offsetMiklos Szeredi2018-01-201-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When resetting iterator on a zero offset we need to discard any data already in the buffer (count), and private state of the iterator (version). For example this bug results in first line being repeated in /proc/mounts if doing a zero size read before a non-zero size read. Reported-by: Rich Felker <dalias@libc.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: e522751d605d ("seq_file: reset iterator to first record for zero offset") Cc: <stable@vger.kernel.org> # v4.10 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | kernfs: fix regression in kernfs_fop_write caused by wrong typeIvan Vecera2018-01-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b7ce40cff0b9 ("kernfs: cache atomic_write_len in kernfs_open_file") changes type of local variable 'len' from ssize_t to size_t. This change caused that the *ppos value is updated also when the previous write callback failed. Mentioned snippet: ... len = ops->write(...); <- return value can be negative ... if (len > 0) <- true here in this case *ppos += len; ... Fixes: b7ce40cff0b9 ("kernfs: cache atomic_write_len in kernfs_open_file") Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2018-02-094-13/+130
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "There are a couple additional security fixes that are still being tested that are not in this set." * tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6: Add missing structs and defines from recent SMB3.1.1 documentation address lock imbalance warnings in smbdirect.c cifs: silence compiler warnings showing up with gcc-8.0.0 Add some missing debug fields in server and tcon structs
| * | | Add missing structs and defines from recent SMB3.1.1 documentationSteve French2018-02-071-2/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last two updates to MS-SMB2 protocol documentation added various flags and structs (especially relating to SMB3.1.1 tree connect). Add missing defines and structs to smb2pdu.h Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
| * | | address lock imbalance warnings in smbdirect.cSteve French2018-02-071-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although at least one of these was an overly strict sparse warning in the new smbdirect code, it is cleaner to fix - so no warnings. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
| * | | cifs: silence compiler warnings showing up with gcc-8.0.0Arnd Bergmann2018-02-071-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bug was fixed before, but came up again with the latest compiler in another function: fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA': fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 4] [-Werror=array-bounds] strncpy(parm_data->list[0].name, ea_name, name_len); Let's apply the same fix that was used for the other instances. Fixes: b2a3ad9ca502 ("cifs: silence compiler warnings showing up with gcc-4.7.0") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | Add some missing debug fields in server and tcon structsSteve French2018-02-071-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow dumping out debug information on dialect, signing, unix extensions and encryption Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
* | | | Merge tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2018-02-087-40/+57
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd update from Bruce Fields: "A fairly small update this time around. Some cleanup, RDMA fixes, overlayfs fixes, and a fix for an NFSv4 state bug. The bigger deal for nfsd this time around was Jeff Layton's already-merged i_version patches" * tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux: svcrdma: Fix Read chunk round-up NFSD: hide unused svcxdr_dupstr() nfsd: store stat times in fill_pre_wcc() instead of inode times nfsd: encode stat->mtime for getattr instead of inode->i_mtime nfsd: return RESOURCE not GARBAGE_ARGS on too many ops nfsd4: don't set lock stateid's sc_type to CLOSED nfsd: Detect unhashed stids in nfsd4_verify_open_stid() sunrpc: remove dead code in svc_sock_setbufsize svcrdma: Post Receives in the Receive completion handler nfsd4: permit layoutget of executable-only files lockd: convert nlm_rqst.a_count from atomic_t to refcount_t lockd: convert nlm_lockowner.count from atomic_t to refcount_t lockd: convert nsm_handle.sm_count from atomic_t to refcount_t
| * | | | NFSD: hide unused svcxdr_dupstr()Arnd Bergmann2018-02-081-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is now only one caller left for svcxdr_dupstr() and this is inside of an #ifdef, so we can get a warning when the option is disabled: fs/nfsd/nfs4xdr.c:241:1: error: 'svcxdr_dupstr' defined but not used [-Werror=unused-function] This changes the remaining caller to use a nicer IS_ENABLED() check, which lets the compiler drop the unused code silently. Fixes: e40d99e6183e ("NFSD: Clean up symlink argument XDR decoders") Suggested-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | nfsd: store stat times in fill_pre_wcc() instead of inode timesAmir Goldstein2018-02-083-24/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The time values in stat and inode may differ for overlayfs and stat time values are the correct ones to use. This is also consistent with the fact that fill_post_wcc() also stores stat time values. This means introducing a stat call that could fail, where previously we were just copying values out of the inode. To be conservative about changing behavior, we fall back to copying values out of the inode in the error case. It might be better just to clear fh_pre_saved (though note the BUG_ON in set_change_info). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | nfsd: encode stat->mtime for getattr instead of inode->i_mtimeAmir Goldstein2018-02-082-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The values of stat->mtime and inode->i_mtime may differ for overlayfs and stat->mtime is the correct value to use when encoding getattr. This is also consistent with the fact that other attr times are also encoded from stat values. Both callers of lease_get_mtime() already have the value of stat->mtime, so the only needed change is that lease_get_mtime() will not overwrite this value with inode->i_mtime in case the inode does not have an exclusive lease. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | nfsd: return RESOURCE not GARBAGE_ARGS on too many opsJ. Bruce Fields2018-02-082-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A client that sends more than a hundred ops in a single compound currently gets an rpc-level GARBAGE_ARGS error. It would be more helpful to return NFS4ERR_RESOURCE, since that gives the client a better idea how to recover (for example by splitting up the compound into smaller compounds). This is all a bit academic since we've never actually seen a reason for clients to send such long compounds, but we may as well fix it. While we're there, just use NFSD4_MAX_OPS_PER_COMPOUND == 16, the constant we already use in the 4.1 case, instead of hard-coding 100. Chances anyone actually uses even 16 ops per compound are small enough that I think there's a neglible risk or any regression. This fixes pynfs test COMP6. Reported-by: "Lu, Xinyu" <luxy.fnst@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | nfsd4: don't set lock stateid's sc_type to CLOSEDJ. Bruce Fields2018-02-051-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no point I can see to stp->st_stid.sc_type = NFS4_CLOSED_STID; given release_lock_stateid immediately sets sc_type to 0. That set of sc_type to 0 should be enough to prevent it being used where we don't want it to be; NFS4_CLOSED_STID should only be needed for actual open stateid's that are actually closed. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | nfsd: Detect unhashed stids in nfsd4_verify_open_stid()Trond Myklebust2018-02-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The state of the stid is guaranteed by 2 locks: - The nfs4_client 'cl_lock' spinlock - The nfs4_ol_stateid 'st_mutex' mutex so it is quite possible for the stid to be unhashed after lookup, but before calling nfsd4_lock_ol_stateid(). So we do need to check for a zero value for 'sc_type' in nfsd4_verify_open_stid(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Checuk Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Fixes: 659aefb68eca "nfsd: Ensure we don't recognise lock stateids..." Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | nfsd4: permit layoutget of executable-only filesBenjamin Coddington2017-12-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clients must be able to read a file in order to execute it, and for pNFS that means the client needs to be able to perform a LAYOUTGET on the file. This behavior for executable-only files was added for OPEN in commit a043226bc140 "nfsd4: permit read opens of executable-only files". This fixes up xfstests generic/126 on block/scsi layouts. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | lockd: convert nlm_rqst.a_count from atomic_t to refcount_tElena Reshetova2017-12-212-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_rqst.a_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_rqst.a_count it might make a difference in following places: - nlmclnt_release_call() and nlmsvc_release_call(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | lockd: convert nlm_lockowner.count from atomic_t to refcount_tElena Reshetova2017-12-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_lockowner.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_lockowner.count it might make a difference in following places: - nlm_put_lockowner(): decrement in refcount_dec_and_lock() only provides RELEASE ordering, control dependency on success and holds a spin lock on success vs. fully ordered atomic counterpart. No changes in spin lock guarantees. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | lockd: convert nsm_handle.sm_count from atomic_t to refcount_tElena Reshetova2017-12-212-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nsm_handle.sm_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nsm_handle.sm_count it might make a difference in following places: - nsm_release(): decrement in refcount_dec_and_lock() only provides RELEASE ordering, control dependency on success and holds a spin lock on success vs. fully ordered atomic counterpart. No change for the spin lock guarantees. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | Merge tag 'for-linus-4.16' of ↵Linus Torvalds2018-02-088-92/+74
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Mostly cleanups, but three bug fixes: - don't pass garbage return codes back up the call chain (Mike Marshall) - fix stale inode test (Martin Brandenburg) - fix off-by-one errors (Xiongfeng Wang) Also add Martin as a reviewer in the Maintainers file" * tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: reverse sense of is-inode-stale test in d_revalidate orangefs: simplify orangefs_inode_is_stale Orangefs: don't propogate whacky error codes orangefs: use correct string length orangefs: make orangefs_make_bad_inode static orangefs: remove ORANGEFS_KERNEL_DEBUG orangefs: remove gossip_ldebug and gossip_lerr orangefs: make orangefs_client_debug_init static MAINTAINERS: update orangefs list and add myself as reviewer
| * | | | | orangefs: reverse sense of is-inode-stale test in d_revalidateMartin Brandenburg2018-02-061-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a dentry is deleted, then a dentry is recreated with the same handle but a different type (i.e. it was a file and now it's a symlink), then its a different inode. The check was backwards, so d_revalidate would not have noticed. Due to the design of the OrangeFS server, this is rather unlikely. It's also possible for the dentry to be deleted and recreated with the same type. This would be undetectable. It's a bit of a ship of Theseus. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | | | | orangefs: simplify orangefs_inode_is_staleMartin Brandenburg2018-02-061-24/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check whether this is a new inode at location of call. Raises the question of what to do with an unknown inode type. Old code would've marked the inode bad and returned ESTALE. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | | | | Orangefs: don't propogate whacky error codesMike Marshall2018-02-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we get an error return code from userspace (the client-core) we check to make sure it is a valid code. This patch maps the whacky return code to -EINVAL instead of propagating garbage back up the call chain potentially resulting in a hard-to-find train-wreck. The client-core doesn't have any business returning whacky return codes, but if it does, we don't want the kernel to crash as a result. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | | | | orangefs: use correct string lengthXiongfeng Wang2018-02-063-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc-8 reports fs/orangefs/dcache.c: In function 'orangefs_d_revalidate': ./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified bound 256 equals destination size [-Wstringop-truncation] fs/orangefs/namei.c: In function 'orangefs_rename': ./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified bound 256 equals destination size [-Wstringop-truncation] fs/orangefs/super.c: In function 'orangefs_mount': ./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified bound 256 equals destination size [-Wstringop-truncation] We need one less byte or call strlcpy() to make it a nul-terminated string. Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | | | | orangefs: make orangefs_make_bad_inode staticMartin Brandenburg2018-02-062-21/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | | | | orangefs: remove ORANGEFS_KERNEL_DEBUGMartin Brandenburg2018-02-061-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It wasn't possible to enable it, and it would've had very little effect. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | | | | orangefs: remove gossip_ldebug and gossip_lerrMartin Brandenburg2018-02-063-17/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gossip_ldebug is unused. gossip_lerr is used in two places. The messages are unique so line numbers are unnecessary. Also remove support for compiling gossip messages out. It wasn't possible to enable it anyway. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | | | | orangefs: make orangefs_client_debug_init staticMartin Brandenburg2018-02-062-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
* | | | | | Merge tag 'afs-next-20180208' of ↵Linus Torvalds2018-02-0810-405/+295
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull afs updates from David Howells: "Four fixes: - add a missing put - two fixes to reset the address iteration cursor correctly - fix setting up the fileserver iteration cursor. Two cleanups: - remove some dead code - rearrange a function to be more logically laid out And one new feature: - Support AFS dynamic root. With this one should be able to do, say: mkdir /afs mount -t afs none /afs -o dyn to create a dynamic root and then, provided you have keyutils installed, do: ls /afs/grand.central.org and: ls /afs/umich.edu to list the root volumes of both those organisations' AFS cells without requiring any other setup (the kernel upcall to a program in the keyutils package to do DNS access as does NFS)" * tag 'afs-next-20180208' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Support the AFS dynamic root afs: Rearrange afs_select_fileserver() a little afs: Remove unused code afs: Fix server list handling afs: Need to clear responded flag in addr cursor afs: Fix missing cursor clearance afs: Add missing afs_put_cell()
| * | | | | | afs: Support the AFS dynamic rootDavid Howells2018-02-065-87/+247
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support the AFS dynamic root which is a pseudo-volume that doesn't connect to any server resource, but rather is just a root directory that dynamically creates mountpoint directories where the name of such a directory is the name of the cell. Such a mount can be created thus: mount -t afs none /afs -o dyn Dynamic root superblocks aren't shared except by bind mounts and propagation. Cell root volumes can then be mounted by referring to them by name, e.g.: ls /afs/grand.central.org/ ls /afs/.grand.central.org/ The kernel will upcall to consult the DNS if the address wasn't supplied directly. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Rearrange afs_select_fileserver() a littleDavid Howells2018-02-061-22/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rearrange afs_select_fileserver() a little to put the use_server chunk before the next_server chunk so that with the removal of a couple of gotos the main path through the function is all one sequence. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Remove unused codeDavid Howells2018-02-061-235/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove some old unused code. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | afs: Fix server list handlingDavid Howells2018-02-063-48/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix server list handling in the following ways: (1) In afs_alloc_volume(), remove duplicate server list build code. This was already done by afs_alloc_server_list() which afs_alloc_volume() previously called. This just results in twice as many VL RPCs. (2) In afs_deliver_vl_get_entry_by_name_u(), use the number of server records indicated by ->nServers in the UVLDB record returned by the VL.GetEntryByNameU RPC call rather than scanning all NMAXNSERVERS slots. Unused slots may contain garbage. (3) In afs_alloc_server_list(), don't stop converting a UVLDB record into a server list just because we can't look up one of the servers. Just skip that server and go on to the next. If we can't look up any of the servers then we'll fail at the end. Without this patch, an attempt to view the umich.edu root cell using something like "ls /afs/umich.edu" on a dynamic root (future patch) mount or an autocell mount will result in ENOMEDIUM. The failure is due to kafs not stopping after nServers'worth of records have been read, but then trying to access a server with a garbage UUID and getting an error, which aborts the server list build. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
| * | | | | | afs: Need to clear responded flag in addr cursorDavid Howells2018-02-061-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In afs_select_fileserver(), we need to clear the ->responded flag in the address list when reusing it. We should also clear it in afs_select_current_fileserver(). To this end, just memset() the object before initialising it. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
| * | | | | | afs: Fix missing cursor clearanceDavid Howells2018-02-062-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afs_select_fileserver() ends the address cursor it is using in the case in which we get some sort of network error and run out of addresses to iterate through, before it jumps to try the next server. This also needs to be done when the server aborts with some sort of error that means we should try the next server. Fix this by: (1) Move the iterate_address afs_end_cursor() call to the next_server case. (2) End the cursor in the failed case. (3) Make afs_end_cursor() clear the ->begun flag and ->addr pointer in the address cursor. (4) Make afs_end_cursor() able to be called on an already cleared cursor. Without this, something like the following oops may occur: AFS: Assertion failed 18446612134397189888 == 0 is false 0xffff88007c279f00 == 0x0 is false ------------[ cut here ]------------ kernel BUG at fs/afs/rotate.c:360! RIP: 0010:afs_select_fileserver+0x79b/0xa30 [kafs] Call Trace: afs_statfs+0xcc/0x180 [kafs] ? p9_client_statfs+0x9e/0x110 [9pnet] ? _cond_resched+0x19/0x40 statfs_by_dentry+0x6d/0x90 vfs_statfs+0x1b/0xc0 user_statfs+0x4b/0x80 SYSC_statfs+0x15/0x30 SyS_statfs+0xe/0x10 entry_SYSCALL_64_fastpath+0x20/0x83 Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
| * | | | | | afs: Add missing afs_put_cell()David Howells2018-02-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afs_alloc_volume() needs to release the cell ref it obtained in the case of an error. Fix this by adding an afs_put_cell() call into the error path. This can triggered when a lookup for a cell in a dynamic root or an autocell mount returns an error whilst trying to look up the server (such as ENOMEDIUM). This results in an assertion failure oops when the module is unloaded due to outstanding refs on a cell record. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
* | | | | | | Merge tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2018-02-089-129/+313
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "Things have been very quiet on the rbd side, as work continues on the big ticket items slated for the next merge window. On the CephFS side we have a large number of cap handling improvements, a fix for our long-standing abuse of ->journal_info in ceph_readpages() and yet another dentry pointer management patch" * tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client: ceph: improving efficiency of syncfs libceph: check kstrndup() return value ceph: try to allocate enough memory for reserved caps ceph: fix race of queuing delayed caps ceph: delete unreachable code in ceph_check_caps() ceph: limit rate of cap import/export error messages ceph: fix incorrect snaprealm when adding caps ceph: fix un-balanced fsc->writeback_count update ceph: track read contexts in ceph_file_info ceph: avoid dereferencing invalid pointer during cached readdir ceph: use atomic_t for ceph_inode_info::i_shared_gen ceph: cleanup traceless reply handling for rename ceph: voluntarily drop Fx cap for readdir request ceph: properly drop caps for setattr request ceph: voluntarily drop Lx cap for link/rename requests ceph: voluntarily drop Ax cap for requests that create new inode rbd: whitelist RBD_FEATURE_OPERATIONS feature bit rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full() rbd: use kmem_cache_zalloc() in rbd_img_request_create() rbd: obj_request->completion is unused
OpenPOWER on IntegriCloud