summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | ext4: fix online resize's handling of a too-small final block groupTheodore Ts'o2018-09-031-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid growing the file system to an extent so that the last block group is too small to hold all of the metadata that must be stored in the block group. This problem can be triggered with the following reproducer: umount /mnt mke2fs -F -m0 -b 4096 -t ext4 -O resize_inode,^has_journal \ -E resize=1073741824 /tmp/foo.img 128M mount /tmp/foo.img /mnt truncate --size 1708M /tmp/foo.img resize2fs /dev/loop0 295400 umount /mnt e2fsck -fy /tmp/foo.img Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | ext4: recalucate superblock checksum after updating free blocks/inodesTheodore Ts'o2018-09-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When mounting the superblock, ext4_fill_super() calculates the free blocks and free inodes and stores them in the superblock. It's not strictly necessary, since we don't use them any more, but it's nice to keep them roughly aligned to reality. Since it's not critical for file system correctness, the code doesn't call ext4_commit_super(). The problem is that it's in ext4_commit_super() that we recalculate the superblock checksum. So if we're not going to call ext4_commit_super(), we need to call ext4_superblock_csum_set() to make sure the superblock checksum is consistent. Most of the time, this doesn't matter, since we end up calling ext4_commit_super() very soon thereafter, and definitely by the time the file system is unmounted. However, it doesn't work in this sequence: mke2fs -Fq -t ext4 /dev/vdc 128M mount /dev/vdc /vdc cp xfstests/git-versions /vdc godown /vdc umount /vdc mount /dev/vdc tune2fs -l /dev/vdc With this commit, the "tune2fs -l" no longer fails. Reported-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | ext4: avoid arithemetic overflow that can trigger a BUGTheodore Ts'o2018-09-012-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A maliciously crafted file system can cause an overflow when the results of a 64-bit calculation is stored into a 32-bit length parameter. https://bugzilla.kernel.org/show_bug.cgi?id=200623 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org
| * | ext4: avoid divide by zero fault when deleting corrupted inline directoriesTheodore Ts'o2018-08-272-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A specially crafted file system can trick empty_inline_dir() into reading past the last valid entry in a inline directory, and then run into the end of xattr marker. This will trigger a divide by zero fault. Fix this by using the size of the inline directory instead of dir->i_size. Also clean up error reporting in __ext4_check_dir_entry so that the message is clearer and more understandable --- and avoids the division by zero trap if the size passed in is zero. (I'm not sure why we coded it that way in the first place; printing offset % size is actually more confusing and less useful.) https://bugzilla.kernel.org/show_bug.cgi?id=200933 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org
| * | ext4: check to make sure the rename(2)'s destination is not freedTheodore Ts'o2018-08-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the destination of the rename(2) system call exists, the inode's link count (i_nlinks) must be non-zero. If it is, the inode can end up on the orphan list prematurely, leading to all sorts of hilarity, including a use-after-free. https://bugzilla.kernel.org/show_bug.cgi?id=200931 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org
| * | ext4: add nonstring annotations to ext4.hTheodore Ts'o2018-08-271-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This suppresses some false positives in gcc 8's -Wstringop-truncation Suggested by Miguel Ojeda (hopefully the __nonstring definition will eventually get accepted in the compiler-gcc.h header file). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
* | | Merge tag '4.19-rc3-smb3-cifs' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2018-09-145-17/+43
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "Fixes for four CIFS/SMB3 potential pointer overflow issues, one minor build fix, and a build warning cleanup" * tag '4.19-rc3-smb3-cifs' of git://git.samba.org/sfrench/cifs-2.6: cifs: read overflow in is_valid_oplock_break() cifs: integer overflow in in SMB2_ioctl() CIFS: fix wrapping bugs in num_entries() cifs: prevent integer overflow in nxt_dir_entry() fs/cifs: require sha512 fs/cifs: suppress a string overflow warning
| * | | cifs: read overflow in is_valid_oplock_break()Dan Carpenter2018-09-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to verify that the "data_offset" is within bounds. Reported-by: Dr Silvio Cesare of InfoSect <silvio.cesare@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
| * | | cifs: integer overflow in in SMB2_ioctl()Dan Carpenter2018-09-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "le32_to_cpu(rsp->OutputOffset) + *plen" addition can overflow and wrap around to a smaller value which looks like it would lead to an information leak. Fixes: 4a72dafa19ba ("SMB2 FSCTL and IOCTL worker function") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org>
| * | | CIFS: fix wrapping bugs in num_entries()Dan Carpenter2018-09-121-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is that "entryptr + next_offset" and "entryptr + len + size" can wrap. I ended up changing the type of "entryptr" because it makes the math easier when we don't have to do so much casting. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org>
| * | | cifs: prevent integer overflow in nxt_dir_entry()Dan Carpenter2018-09-121-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "old_entry + le32_to_cpu(pDirInfo->NextEntryOffset)" can wrap around so I have added a check for integer overflow. Reported-by: Dr Silvio Cesare of InfoSect <silvio.cesare@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
| * | | fs/cifs: require sha512Stefan Metzmacher2018-09-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This got lost in commit 0fdfef9aa7ee68ddd508aef7c98630cfc054f8d6, which removed CONFIG_CIFS_SMB311. Signed-off-by: Stefan Metzmacher <metze@samba.org> Fixes: 0fdfef9aa7ee68ddd ("smb3: simplify code by removing CONFIG_CIFS_SMB311") CC: Stable <stable@vger.kernel.org> CC: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | fs/cifs: suppress a string overflow warningStephen Rothwell2018-09-091-3/+8
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A powerpc build of cifs with gcc v8.2.0 produces this warning: fs/cifs/cifssmb.c: In function ‘CIFSSMBNegotiate’: fs/cifs/cifssmb.c:605:3: warning: ‘strncpy’ writing 16 bytes into a region of size 1 overflows the destination [-Wstringop-overflow=] strncpy(pSMB->DialectsArray+count, protocols[i].name, 16); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since we are already doing a strlen() on the source, change the strncpy to a memcpy(). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | Merge tag 'nfs-for-4.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2018-09-144-24/+39
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Anna Schumaker: "These are a handful of fixes for problems that Trond found. Patch #1 and #3 have the same name, a second issue was found after applying the first patch. Stable bugfixes: - v4.17+: Fix tracepoint Oops in initiate_file_draining() - v4.11+: Fix an infinite loop on I/O Other fixes: - Return errors if a waiting layoutget is killed - Don't open code clearing of delegation state" * tag 'nfs-for-4.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Don't open code clearing of delegation state NFSv4.1 fix infinite loop on I/O. NFSv4: Fix a tracepoint Oops in initiate_file_draining() pNFS: Ensure we return the error if someone kills a waiting layoutget NFSv4: Fix a tracepoint Oops in initiate_file_draining()
| * | | NFS: Don't open code clearing of delegation stateTrond Myklebust2018-09-141-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper for the case when the nfs4 open state has been set to use a delegation stateid, and we want to revert to using the open stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | NFSv4.1 fix infinite loop on I/O.Trond Myklebust2018-09-142-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous fix broke recovery of delegated stateids because it assumes that if we did not mark the delegation as suspect, then the delegation has effectively been revoked, and so it removes that delegation irrespectively of whether or not it is valid and still in use. While this is "mostly harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning in an infinite loop while complaining that we're using an invalid stateid (in this case the all-zero stateid). What we rather want to do here is ensure that the delegation is always correctly marked as needing testing when that is the case. So we want to close the loophole offered by nfs4_schedule_stateid_recovery(), which marks the state as needing to be reclaimed, but not the delegation that may be backing it. Fixes: 0e3d3e5df07dc ("NFSv4.1 fix infinite loop on IO BAD_STATEID error") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | NFSv4: Fix a tracepoint Oops in initiate_file_draining()Trond Myklebust2018-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the value of 'ino' can be NULL or an ERR_PTR(), we need to change the test in the tracepoint. Fixes: ce5624f7e6675 ("NFSv4: Return NFS4ERR_DELAY when a layout fails...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | pNFS: Ensure we return the error if someone kills a waiting layoutgetTrond Myklebust2018-09-141-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If someone interrupts a wait on one or more outstanding layoutgets in pnfs_update_layout() then return the ERESTARTSYS/EINTR error. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | NFSv4: Fix a tracepoint Oops in initiate_file_draining()Trond Myklebust2018-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the value of 'ino' can be NULL or an ERR_PTR(), we need to change the test in the tracepoint. Fixes: ce5624f7e6675 ("NFSv4: Return NFS4ERR_DELAY when a layout fails...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | | Merge tag 'ovl-fixes-4.19-rc4' of ↵Linus Torvalds2018-09-133-15/+44
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "This fixes a regression in the recent file stacking update, reported and fixed by Amir Goldstein. The fix is fairly trivial, but involves adding a fadvise() f_op and the associated churn in the vfs. As discussed on -fsdevel, there are other possible uses for this method, than allowing proper stacking for overlays. And there's one other fix for a syzkaller detected oops" * tag 'ovl-fixes-4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix oopses in ovl_fill_super() failure paths ovl: add ovl_fadvise() vfs: implement readahead(2) using POSIX_FADV_WILLNEED vfs: add the fadvise() file operation Documentation/filesystems: update documentation of file_operations ovl: fix GPF in swapfile_activate of file from overlayfs over xfs ovl: respect FIEMAP_FLAG_SYNC flag
| * | | | ovl: fix oopses in ovl_fill_super() failure pathsMiklos Szeredi2018-09-101-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ovl_free_fs() dereferences ofs->workbasedir and ofs->upper_mnt in cases when those might not have been initialized yet. Fix the initialization order for these fields. Reported-by: syzbot+c75f181dc8429d2eb887@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.15 Fixes: 95e6d4177cb7 ("ovl: grab reference to workbasedir early") Fixes: a9075cdb467d ("ovl: factor out ovl_free_fs() helper")
| * | | | ovl: add ovl_fadvise()Amir Goldstein2018-09-031-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2) on an overlayfs file. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: d1d04ef8572b ("ovl: stack file ops") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | | | ovl: fix GPF in swapfile_activate of file from overlayfs over xfsAmir Goldstein2018-08-302-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since overlayfs implements stacked file operations, the underlying filesystems are not supposed to be exposed to the overlayfs file, whose f_inode is an overlayfs inode. Assigning an overlayfs file to swap_file results in an attempt of xfs code to dereference an xfs_inode struct from an ovl_inode pointer: CPU: 0 PID: 2462 Comm: swapon Not tainted 4.18.0-xfstests-12721-g33e17876ea4e #3402 RIP: 0010:xfs_find_bdev_for_inode+0x23/0x2f Call Trace: xfs_iomap_swapfile_activate+0x1f/0x43 __se_sys_swapon+0xb1a/0xee9 Fix this by not assigning the real inode mapping to f_mapping, which will cause swapon() to return an error (-EINVAL). Although it makes sense not to allow setting swpafile on an overlayfs file, some users may depend on it, so we may need to fix this up in the future. Keeping f_mapping pointing to overlay inode mapping will cause O_DIRECT open to fail. Fix this by installing ovl_aops with noop_direct_IO in overlay inode mapping. Keeping f_mapping pointing to overlay inode mapping will cause other a_ops related operations to fail (e.g. readahead()). Those will be fixed by follow up patches. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: f7c72396d0de ("ovl: add O_DIRECT support") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | | | ovl: respect FIEMAP_FLAG_SYNC flagAmir Goldstein2018-08-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stacked overlayfs fiemap operation broke xfstests that test delayed allocation (with "_test_generic_punch -d"), because ovl_fiemap() failed to write dirty pages when requested. Fixes: 9e142c4102db ("ovl: add ovl_fiemap()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | | | | Merge tag 'pstore-v4.19-rc4' of ↵Linus Torvalds2018-09-131-3/+14
|\ \ \ \ \ | |_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fix from Kees Cook: "This fixes a 6 year old pstore bug that everyone just got lucky in avoiding, likely due only using page-aligned persistent ram regions: - Handle page-vs-byte offset handling between iomap and vmap (Bin Yang)" * tag 'pstore-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: Fix incorrect persistent ram buffer mapping
| * | | | pstore: Fix incorrect persistent ram buffer mappingBin Yang2018-09-131-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | persistent_ram_vmap() returns the page start vaddr. persistent_ram_iomap() supports non-page-aligned mapping. persistent_ram_buffer_map() always adds offset-in-page to the vaddr returned from these two functions, which causes incorrect mapping of non-page-aligned persistent ram buffer. By default ftrace_size is 4096 and max_ftrace_cnt is nr_cpu_ids. Without this patch, the zone_sz in ramoops_init_przs() is 4096/nr_cpu_ids which might not be page aligned. If the offset-in-page > 2048, the vaddr will be in next page. If the next page is not mapped, it will cause kernel panic: [ 0.074231] BUG: unable to handle kernel paging request at ffffa19e0081b000 ... [ 0.075000] RIP: 0010:persistent_ram_new+0x1f8/0x39f ... [ 0.075000] Call Trace: [ 0.075000] ramoops_init_przs.part.10.constprop.15+0x105/0x260 [ 0.075000] ramoops_probe+0x232/0x3a0 [ 0.075000] platform_drv_probe+0x3e/0xa0 [ 0.075000] driver_probe_device+0x2cd/0x400 [ 0.075000] __driver_attach+0xe4/0x110 [ 0.075000] ? driver_probe_device+0x400/0x400 [ 0.075000] bus_for_each_dev+0x70/0xa0 [ 0.075000] driver_attach+0x1e/0x20 [ 0.075000] bus_add_driver+0x159/0x230 [ 0.075000] ? do_early_param+0x95/0x95 [ 0.075000] driver_register+0x70/0xc0 [ 0.075000] ? init_pstore_fs+0x4d/0x4d [ 0.075000] __platform_driver_register+0x36/0x40 [ 0.075000] ramoops_init+0x12f/0x131 [ 0.075000] do_one_initcall+0x4d/0x12c [ 0.075000] ? do_early_param+0x95/0x95 [ 0.075000] kernel_init_freeable+0x19b/0x222 [ 0.075000] ? rest_init+0xbb/0xbb [ 0.075000] kernel_init+0xe/0xfc [ 0.075000] ret_from_fork+0x3a/0x50 Signed-off-by: Bin Yang <bin.yang@intel.com> [kees: add comments describing the mapping differences, updated commit log] Fixes: 24c3d2f342ed ("staging: android: persistent_ram: Make it possible to use memory outside of bootmem") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
* | | | | afs: Fix cell specification to permit an empty address listDavid Howells2018-09-071-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the cell specification mechanism to allow cells to be pre-created without having to specify at least one address (the addresses will be upcalled for). This allows the cell information preload service to avoid the need to issue loads of DNS lookups during boot to get the addresses for each cell (500+ lookups for the 'standard' cell list[*]). The lookups can be done later as each cell is accessed through the filesystem. Also remove the print statement that prints a line every time a new cell is added. [*] There are 144 cells in the list. Each cell is first looked up for an SRV record, and if that fails, for an AFSDB record. These get a list of server names, each of which then has to be looked up to get the addresses for that server. E.g.: dig srv _afs3-vlserver._udp.grand.central.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'ceph-for-4.19-rc3' of https://github.com/ceph/ceph-clientLinus Torvalds2018-09-071-5/+11
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph fixes from Ilya Dryomov: "Two rbd patches to complete support for images within namespaces that went into -rc1 and a use-after-free fix. The rbd changes have been sitting in a branch for quite a while but couldn't be included into the -rc1 pull request because of a pending wire protocol backwards compatibility fixup that only got committed early this week" * tag 'ceph-for-4.19-rc3' of https://github.com/ceph/ceph-client: rbd: support cloning across namespaces rbd: factor out get_parent_info() ceph: avoid a use-after-free in ceph_destroy_options()
| * | | | | ceph: avoid a use-after-free in ceph_destroy_options()Ilya Dryomov2018-09-061-5/+11
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported a use-after-free in ceph_destroy_options(), called from ceph_mount(). The problem was that create_fs_client() consumed the opt pointer on some errors, but not on all of them. Make sure it always consumes both libceph and ceph options. Reported-by: syzbot+8ab6f1042021b4eed062@syzkaller.appspotmail.com Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
* | | | | Merge tag 'for_v4.19-rc3' of ↵Linus Torvalds2018-09-071-10/+3
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fix from Jan Kara: "A small fsnotify fix from Amir" * tag 'for_v4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: fix ignore mask logic in fsnotify()
| * | | | | fsnotify: fix ignore mask logic in fsnotify()Amir Goldstein2018-09-031-10/+3
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 92183a42898d ("fsnotify: fix ignore mask logic in send_to_group()") acknoledges the use case of ignoring an event on an inode mark, because of an ignore mask on a mount mark of the same group (i.e. I want to get all events on this file, except for the events that came from that mount). This change depends on correctly merging the inode marks and mount marks group lists, so that the mount mark ignore mask would be tested in send_to_group(). Alas, the merging of the lists did not take into account the case where event in question is not in the mask of any of the mount marks. To fix this, completely remove the tests for inode and mount event masks from the lists merging code. Fixes: 92183a42898d ("fsnotify: fix ignore mask logic in send_to_group") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* | | | | Merge tag '4.19-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2018-09-066-19/+40
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "Four small SMB3 fixes, three for stable, and one minor debug clarification" * tag '4.19-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: connect to servername instead of IP for IPC$ share smb3: check for and properly advertise directory lease support smb3: minor debugging clarifications in rfc1001 len processing SMB3: Backup intent flag missing for directory opens with backupuid mounts fs/cifs: don't translate SFM_SLASH (U+F026) to backslash
| * | | | | cifs: connect to servername instead of IP for IPC$ shareThomas Werschlein2018-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is required allows access to a Microsoft fileserver failover cluster behind a 1:1 NAT firewall. The change also provides stronger context for authentication and share connection (see MS-SMB2 3.3.5.7 and MS-SRVS 3.1.6.8) as noted by Tom Talpey, and addresses comments about the buffer size for the UNC made by Aurélien Aptel. Signed-off-by: Thomas Werschlein <thomas.werschlein@geo.uzh.ch> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Tom Talpey <ttalpey@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org>
| * | | | | smb3: check for and properly advertise directory lease supportSteve French2018-09-022-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although servers will typically ignore unsupported features, we should advertise the support for directory leases (as Windows e.g. does) in the negotiate protocol capabilities we pass to the server, and should check for the server capability (CAP_DIRECTORY_LEASING) before sending a lease request for an open of a directory. This will prevent us from accidentally sending directory leases to SMB2.1 or SMB2 server for example. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
| * | | | | smb3: minor debugging clarifications in rfc1001 len processingSteve French2018-09-021-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I ran into some cases where server was returning the wrong length on frames but I couldn't easily match them to the command in the network trace (or server logs) since I need the command and/or multiplex id to find the offending SMB2/SMB3 command. Add these two fields to the log message. In the case of padding too much it may not be a problem in all cases but might have correlated to a network disconnect case in some problems we have been looking at. In the case of frame too short is even more important. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
| * | | | | SMB3: Backup intent flag missing for directory opens with backupuid mountsSteve French2018-09-022-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When "backup intent" is requested on the mount (e.g. backupuid or backupgid mount options), the corresponding flag needs to be set on opens of directories (and files) but was missing in some places causing access denied trying to enumerate and backup servers. Fixes kernel bugzilla #200953 https://bugzilla.kernel.org/show_bug.cgi?id=200953 Reported-and-tested-by: <whh@rubrik.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
| * | | | | fs/cifs: don't translate SFM_SLASH (U+F026) to backslashJon Kuhn2018-09-021-3/+0
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a Mac client saves an item containing a backslash to a file server the backslash is represented in the CIFS/SMB protocol as as U+F026. Before this change, listing a directory containing an item with a backslash in its name will return that item with the backslash represented with a true backslash character (U+005C) because convert_sfm_character mapped U+F026 to U+005C when interpretting the CIFS/SMB protocol response. However, attempting to open or stat the path using a true backslash will result in an error because convert_to_sfm_char does not map U+005C back to U+F026 causing the CIFS/SMB request to be made with the backslash represented as U+005C. This change simply prevents the U+F026 to U+005C conversion from happenning. This is analogous to how the code does not do any translation of UNI_SLASH (U+F000). Signed-off-by: Jon Kuhn <jkuhn@barracuda.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | | | Merge tag 'for-4.19-rc2-tag' of ↵Linus Torvalds2018-09-069-55/+197
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix for improper fsync after hardlink - fix for a corruption during file deduplication - use after free fixes - RCU warning fix - fix for buffered write to nodatacow file * tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcu btrfs: use after free in btrfs_quota_enable btrfs: btrfs_shrink_device should call commit transaction at the end btrfs: fix qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadata Btrfs: fix data corruption when deduplicating between different files Btrfs: sync log after logging new name Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space
| * | | | | btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcuMisono Tomohiro2018-08-241-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 672d599041c8 ("btrfs: Use wrapper macro for rcu string to remove duplicate code") replaces some open coded RCU string handling with macro. It turns out that btrfs_debug_in_rcu() is used for the first time and the macro lacks lock/unlock of RCU string for non-debug case (i.e. when the message is not printed), leading to suspicious RCU usage warning when CONFIG_PROVE_RCU is on. Fix this by adding a wrapper to call lock/unlock for the non-debug case too. Fixes: 672d599041c8 ("btrfs: Use wrapper macro for rcu string to remove duplicate code") Reported-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | btrfs: use after free in btrfs_quota_enableDan Carpenter2018-08-231-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue here is that btrfs_commit_transaction() frees "trans" on both the error and the success path. So the problem would be if btrfs_commit_transaction() succeeds, and then qgroup_rescan_init() fails. That means that "ret" is non-zero and "trans" is non-NULL and it leads to a use after free inside the btrfs_end_transaction() macro. Fixes: 340f1aa27f36 ("btrfs: qgroups: Move transaction management inside btrfs_quota_enable/disable") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | btrfs: btrfs_shrink_device should call commit transaction at the endAnand Jain2018-08-231-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test case btrfs/164 reports use-after-free: [ 6712.084324] general protection fault: 0000 [#1] PREEMPT SMP .. [ 6712.195423] btrfs_update_commit_device_size+0x75/0xf0 [btrfs] [ 6712.201424] btrfs_commit_transaction+0x57d/0xa90 [btrfs] [ 6712.206999] btrfs_rm_device+0x627/0x850 [btrfs] [ 6712.211800] btrfs_ioctl+0x2b03/0x3120 [btrfs] Reason for this is that btrfs_shrink_device adds the resized device to the fs_devices::resized_devices after it has called the last commit transaction. So the list fs_devices::resized_devices is not empty when btrfs_shrink_device returns. Now the parent function btrfs_rm_device calls: btrfs_close_bdev(device); call_rcu(&device->rcu, free_device_rcu); and then does the transactio ncommit. It goes through the fs_devices::resized_devices in btrfs_update_commit_device_size and leads to use-after-free. Fix this by making sure btrfs_shrink_device calls the last needed btrfs_commit_transaction before the return. This is consistent with what the grow counterpart does and this makes sure the on-disk state is persistent when the function returns. Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Tested-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | btrfs: fix qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadataLu Fengqi2018-08-231-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After btrfs_qgroup_reserve_meta_prealloc(), num_bytes will be assigned again by btrfs_calc_trans_metadata_size(). Once block_rsv fails, we can't properly free the num_bytes of the previous qgroup_reserve. Use a separate variable to store the num_bytes of the qgroup_reserve. Delete the comment for the qgroup_reserved that does not exist and add a comment about use_global_rsv. Fixes: c4c129db5da8 ("btrfs: drop unused parameter qgroup_reserved") CC: stable@vger.kernel.org # 4.18+ Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | Btrfs: fix data corruption when deduplicating between different filesFilipe Manana2018-08-231-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we deduplicate extents between two different files we can end up corrupting data if the source range ends at the size of the source file, the source file's size is not aligned to the filesystem's block size and the destination range does not go past the size of the destination file size. Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0x6b 0 2518890" /mnt/foo # The first byte with a value of 0xae starts at an offset (2518890) # which is not a multiple of the sector size. $ xfs_io -c "pwrite -S 0xae 2518890 102398" /mnt/foo # Confirm the file content is full of bytes with values 0x6b and 0xae. $ od -t x1 /mnt/foo 0000000 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b * 11467540 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ae ae ae ae ae ae 11467560 ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae * 11777540 ae ae ae ae ae ae ae ae 11777550 # Create a second file with a length not aligned to the sector size, # whose bytes all have the value 0x6b, so that its extent(s) can be # deduplicated with the first file. $ xfs_io -f -c "pwrite -S 0x6b 0 557771" /mnt/bar # Now deduplicate the entire second file into a range of the first file # that also has all bytes with the value 0x6b. The destination range's # end offset must not be aligned to the sector size and must be less # then the offset of the first byte with the value 0xae (byte at offset # 2518890). $ xfs_io -c "dedupe /mnt/bar 0 1957888 557771" /mnt/foo # The bytes in the range starting at offset 2515659 (end of the # deduplication range) and ending at offset 2519040 (start offset # rounded up to the block size) must all have the value 0xae (and not # replaced with 0x00 values). In other words, we should have exactly # the same data we had before we asked for deduplication. $ od -t x1 /mnt/foo 0000000 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b * 11467540 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ae ae ae ae ae ae 11467560 ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae * 11777540 ae ae ae ae ae ae ae ae 11777550 # Unmount the filesystem and mount it again. This guarantees any file # data in the page cache is dropped. $ umount /dev/sdb $ mount /dev/sdb /mnt $ od -t x1 /mnt/foo 0000000 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b * 11461300 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 00 11461320 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 11470000 ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae * 11777540 ae ae ae ae ae ae ae ae 11777550 # The bytes in range 2515659 to 2519040 have a value of 0x00 and not a # value of 0xae, data corruption happened due to the deduplication # operation. So fix this by rounding down, to the sector size, the length used for the deduplication when the following conditions are met: 1) Source file's range ends at its i_size; 2) Source file's i_size is not aligned to the sector size; 3) Destination range does not cross the i_size of the destination file. Fixes: e1d227a42ea2 ("btrfs: Handle unaligned length in extent_same") CC: stable@vger.kernel.org # 4.2+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | Btrfs: sync log after logging new nameFilipe Manana2018-08-233-19/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we add a new name for an inode which was logged in the current transaction, we update the inode in the log so that its new name and ancestors are added to the log. However when we do this we do not persist the log, so the changes remain in memory only, and as a consequence, any ancestors that were created in the current transaction are updated such that future calls to btrfs_inode_in_log() return true. This leads to a subsequent fsync against such new ancestor directories returning immediately, without persisting the log, therefore after a power failure the new ancestor directories do not exist, despite fsync being called against them explicitly. Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/A $ mkdir /mnt/B $ mkdir /mnt/A/C $ touch /mnt/B/foo $ xfs_io -c "fsync" /mnt/B/foo $ ln /mnt/B/foo /mnt/A/C/foo $ xfs_io -c "fsync" /mnt/A <power failure> After the power failure, directory "A" does not exist, despite the explicit fsync on it. Instead of fixing this by changing the behaviour of the explicit fsync on directory "A" to persist the log instead of doing nothing, make the logging of the new file name (which happens when creating a hard link or renaming) persist the log. This approach not only is simpler, not requiring addition of new fields to the inode in memory structure, but also gives us the same behaviour as ext4, xfs and f2fs (possibly other filesystems too). A test case for fstests follows soon. Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes") Reported-by: Vijay Chidambaram <vvijay03@gmail.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | Btrfs: fix unexpected failure of nocow buffered writes after snapshotting ↵Robbie Ko2018-08-174-21/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when low on space Commit e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") forced nocow writes to fallback to COW, during writeback, when a snapshot is created. This resulted in writes made before creating the snapshot to unexpectedly fail with ENOSPC during writeback when success (0) was returned to user space through the write system call. The steps leading to this problem are: 1. When it's not possible to allocate data space for a write, the buffered write path checks if a NOCOW write is possible. If it is, it will not reserve space and success (0) is returned to user space. 2. Then when a snapshot is created, the root's will_be_snapshotted atomic is incremented and writeback is triggered for all inode's that belong to the root being snapshotted. Incrementing that atomic forces all previous writes to fallback to COW during writeback (running delalloc). 3. This results in the writeback for the inodes to fail and therefore setting the ENOSPC error in their mappings, so that a subsequent fsync on them will report the error to user space. So it's not a completely silent data loss (since fsync will report ENOSPC) but it's a very unexpected and undesirable behaviour, because if a clean shutdown/unmount of the filesystem happens without previous calls to fsync, it is expected to have the data present in the files after mounting the filesystem again. So fix this by adding a new atomic named snapshot_force_cow to the root structure which prevents this behaviour and works the following way: 1. It is incremented when we start to create a snapshot after triggering writeback and before waiting for writeback to finish. 2. This new atomic is now what is used by writeback (running delalloc) to decide whether we need to fallback to COW or not. Because we incremented this new atomic after triggering writeback in the snapshot creation ioctl, we ensure that all buffered writes that happened before snapshot creation will succeed and not fallback to COW (which would make them fail with ENOSPC). 3. The existing atomic, will_be_snapshotted, is kept because it is used to force new buffered writes, that start after we started snapshotting, to reserve data space even when NOCOW is possible. This makes these writes fail early with ENOSPC when there's no available space to allocate, preventing the unexpected behaviour of writeback later failing with ENOSPC due to a fallback to COW mode. Fixes: e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | | | nilfs2: convert to SPDX license tagsRyusuke Konishi2018-09-0439-390/+39
| |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the verbose license text from NILFS2 files and replace them with SPDX tags. This does not change the license of any of the code. Link: http://lkml.kernel.org/r/1535624528-5982-1-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2018-09-021-1/+0
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: "A small set of updates for core code: - Prevent tracing in functions which are called from trace patching via stop_machine() to prevent executing half patched function trace entries. - Remove old GCC workarounds - Remove pointless includes of notifier.h" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Remove workaround for unreachable warnings from old GCC notifier: Remove notifier header file wherever not used watchdog: Mark watchdog touch functions as notrace
| * | | | | notifier: Remove notifier header file wherever not usedMukesh Ojha2018-08-301-1/+0
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org
* | | | | Merge tag 'for_v4.19-rc2' of ↵Linus Torvalds2018-08-294-83/+37
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc fs fixes from Jan Kara: - make UDF to properly mount media created by Win7 - make isofs to properly refuse devices with large physical block size - fix a Spectre gadget in quotactl(2) - fix a warning in fsnotify code hit by syzkaller * tag 'for_v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix mounting of Win7 created UDF filesystems udf: Remove dead code from udf_find_fileset() fs/quota: Fix spectre gadget in do_quotactl fs/quota: Replace XQM_MAXQUOTAS usage with MAXQUOTAS isofs: reject hardware sector size > 2048 bytes fsnotify: fix false positive warning on inode delete
| * | | | udf: Fix mounting of Win7 created UDF filesystemsJan Kara2018-08-241-12/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Win7 is creating UDF filesystems with single partition with number 8192. Current partition descriptor scanning code does not handle this well as it incorrectly assumes that partition numbers will form mostly contiguous space of small numbers. This results in unmountable media due to errors like: UDF-fs: error (device dm-1): udf_read_tagged: tag version 0x0000 != 0x0002 || 0x0003, block 0 UDF-fs: warning (device dm-1): udf_fill_super: No fileset found Fix the problem by handling partition descriptors in a way that sparse partition numbering does not matter. Reported-and-tested-by: jean-luc malet <jeanluc.malet@gmail.com> CC: stable@vger.kernel.org Fixes: 7b78fd02fb19530fd101ae137a1f46aa466d9bb6 Signed-off-by: Jan Kara <jack@suse.cz>
OpenPOWER on IntegriCloud