summaryrefslogtreecommitdiffstats
path: root/fs/btrfs
Commit message (Collapse)AuthorAgeFilesLines
* btrfs: fix a possible umount deadlockAnand Jain2016-09-261-6/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_show_devname() is using the device_list_mutex, sometimes a call to blkdev_put() leads vfs calling into this func. So call blkdev_put() outside of device_list_mutex, as of now. [ 983.284212] ====================================================== [ 983.290401] [ INFO: possible circular locking dependency detected ] [ 983.296677] 4.8.0-rc5-ceph-00023-g1b39cec2 #1 Not tainted [ 983.302081] ------------------------------------------------------- [ 983.308357] umount/21720 is trying to acquire lock: [ 983.313243] (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff9128ec51>] blkdev_put+0x31/0x150 [ 983.321264] [ 983.321264] but task is already holding lock: [ 983.327101] (&fs_devs->device_list_mutex){+.+...}, at: [<ffffffffc033d6f6>] __btrfs_close_devices+0x46/0x200 [btrfs] [ 983.337839] [ 983.337839] which lock already depends on the new lock. [ 983.337839] [ 983.346024] [ 983.346024] the existing dependency chain (in reverse order) is: [ 983.353512] -> #4 (&fs_devs->device_list_mutex){+.+...}: [ 983.359096] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.365143] [<ffffffff91823125>] mutex_lock_nested+0x65/0x350 [ 983.371521] [<ffffffffc02d8116>] btrfs_show_devname+0x36/0x1f0 [btrfs] [ 983.378710] [<ffffffff9129523e>] show_vfsmnt+0x4e/0x150 [ 983.384593] [<ffffffff9126ffc7>] m_show+0x17/0x20 [ 983.389957] [<ffffffff91276405>] seq_read+0x2b5/0x3b0 [ 983.395669] [<ffffffff9124c808>] __vfs_read+0x28/0x100 [ 983.401464] [<ffffffff9124eb3b>] vfs_read+0xab/0x150 [ 983.407080] [<ffffffff9124ec32>] SyS_read+0x52/0xb0 [ 983.412609] [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.419617] -> #3 (namespace_sem){++++++}: [ 983.424024] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.430074] [<ffffffff918239e9>] down_write+0x49/0x80 [ 983.435785] [<ffffffff91272457>] lock_mount+0x67/0x1c0 [ 983.441582] [<ffffffff91272ab2>] do_add_mount+0x32/0xf0 [ 983.447458] [<ffffffff9127363a>] finish_automount+0x5a/0xc0 [ 983.453682] [<ffffffff91259513>] follow_managed+0x1b3/0x2a0 [ 983.459912] [<ffffffff9125b750>] lookup_fast+0x300/0x350 [ 983.465875] [<ffffffff9125d6e7>] path_openat+0x3a7/0xaa0 [ 983.471846] [<ffffffff9125ef75>] do_filp_open+0x85/0xe0 [ 983.477731] [<ffffffff9124c41c>] do_sys_open+0x14c/0x1f0 [ 983.483702] [<ffffffff9124c4de>] SyS_open+0x1e/0x20 [ 983.489240] [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.496254] -> #2 (&sb->s_type->i_mutex_key#3){+.+.+.}: [ 983.501798] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.507855] [<ffffffff918239e9>] down_write+0x49/0x80 [ 983.513558] [<ffffffff91366237>] start_creating+0x87/0x100 [ 983.519703] [<ffffffff91366647>] debugfs_create_dir+0x17/0x100 [ 983.526195] [<ffffffff911df153>] bdi_register+0x93/0x210 [ 983.532165] [<ffffffff911df313>] bdi_register_owner+0x43/0x70 [ 983.538570] [<ffffffff914080fb>] device_add_disk+0x1fb/0x450 [ 983.544888] [<ffffffff91580226>] loop_add+0x1e6/0x290 [ 983.550596] [<ffffffff91fec358>] loop_init+0x10b/0x14f [ 983.556394] [<ffffffff91002207>] do_one_initcall+0xa7/0x180 [ 983.562618] [<ffffffff91f932e0>] kernel_init_freeable+0x1cc/0x266 [ 983.569370] [<ffffffff918174be>] kernel_init+0xe/0x100 [ 983.575166] [<ffffffff9182620f>] ret_from_fork+0x1f/0x40 [ 983.581131] -> #1 (loop_index_mutex){+.+.+.}: [ 983.585801] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.591858] [<ffffffff91823125>] mutex_lock_nested+0x65/0x350 [ 983.598256] [<ffffffff9157ed3f>] lo_open+0x1f/0x60 [ 983.603704] [<ffffffff9128eec3>] __blkdev_get+0x123/0x400 [ 983.609757] [<ffffffff9128f4ea>] blkdev_get+0x34a/0x350 [ 983.615639] [<ffffffff9128f554>] blkdev_open+0x64/0x80 [ 983.621428] [<ffffffff9124aff6>] do_dentry_open+0x1c6/0x2d0 [ 983.627651] [<ffffffff9124c029>] vfs_open+0x69/0x80 [ 983.633181] [<ffffffff9125db74>] path_openat+0x834/0xaa0 [ 983.639152] [<ffffffff9125ef75>] do_filp_open+0x85/0xe0 [ 983.645035] [<ffffffff9124c41c>] do_sys_open+0x14c/0x1f0 [ 983.650999] [<ffffffff9124c4de>] SyS_open+0x1e/0x20 [ 983.656535] [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.663541] -> #0 (&bdev->bd_mutex){+.+.+.}: [ 983.668107] [<ffffffff910def43>] __lock_acquire+0x1003/0x17b0 [ 983.674510] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.680561] [<ffffffff91823125>] mutex_lock_nested+0x65/0x350 [ 983.686967] [<ffffffff9128ec51>] blkdev_put+0x31/0x150 [ 983.692761] [<ffffffffc033481f>] btrfs_close_bdev+0x4f/0x60 [btrfs] [ 983.699699] [<ffffffffc033d77b>] __btrfs_close_devices+0xcb/0x200 [btrfs] [ 983.707178] [<ffffffffc033d8db>] btrfs_close_devices+0x2b/0xa0 [btrfs] [ 983.714380] [<ffffffffc03081c5>] close_ctree+0x265/0x340 [btrfs] [ 983.721061] [<ffffffffc02d7959>] btrfs_put_super+0x19/0x20 [btrfs] [ 983.727908] [<ffffffff91250e2f>] generic_shutdown_super+0x6f/0x100 [ 983.734744] [<ffffffff91250f56>] kill_anon_super+0x16/0x30 [ 983.740888] [<ffffffffc02da97e>] btrfs_kill_super+0x1e/0x130 [btrfs] [ 983.747909] [<ffffffff91250fe9>] deactivate_locked_super+0x49/0x80 [ 983.754745] [<ffffffff912515fd>] deactivate_super+0x5d/0x70 [ 983.760977] [<ffffffff91270a1c>] cleanup_mnt+0x5c/0x80 [ 983.766773] [<ffffffff91270a92>] __cleanup_mnt+0x12/0x20 [ 983.772738] [<ffffffff910aa2fe>] task_work_run+0x7e/0xc0 [ 983.778708] [<ffffffff91081b5a>] exit_to_usermode_loop+0x7e/0xb4 [ 983.785373] [<ffffffff910039eb>] syscall_return_slowpath+0xbb/0xd0 [ 983.792212] [<ffffffff9182605c>] entry_SYSCALL_64_fastpath+0xbf/0xc1 [ 983.799225] [ 983.799225] other info that might help us debug this: [ 983.799225] [ 983.807291] Chain exists of: &bdev->bd_mutex --> namespace_sem --> &fs_devs->device_list_mutex [ 983.816521] Possible unsafe locking scenario: [ 983.816521] [ 983.822489] CPU0 CPU1 [ 983.827043] ---- ---- [ 983.831599] lock(&fs_devs->device_list_mutex); [ 983.836289] lock(namespace_sem); [ 983.842268] lock(&fs_devs->device_list_mutex); [ 983.849478] lock(&bdev->bd_mutex); [ 983.853127] [ 983.853127] *** DEADLOCK *** [ 983.853127] [ 983.859113] 3 locks held by umount/21720: [ 983.863145] #0: (&type->s_umount_key#35){++++..}, at: [<ffffffff912515f5>] deactivate_super+0x55/0x70 [ 983.872713] #1: (uuid_mutex){+.+.+.}, at: [<ffffffffc033d8d3>] btrfs_close_devices+0x23/0xa0 [btrfs] [ 983.882206] #2: (&fs_devs->device_list_mutex){+.+...}, at: [<ffffffffc033d6f6>] __btrfs_close_devices+0x46/0x200 [btrfs] [ 983.893422] [ 983.893422] stack backtrace: [ 983.897824] CPU: 6 PID: 21720 Comm: umount Not tainted 4.8.0-rc5-ceph-00023-g1b39cec2 #1 [ 983.905958] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 [ 983.913492] 0000000000000000 ffff8c8a53c17a38 ffffffff91429521 ffffffff9260f4f0 [ 983.921018] ffffffff92642760 ffff8c8a53c17a88 ffffffff911b2b04 0000000000000050 [ 983.928542] ffffffff9237d620 ffff8c8a5294aee0 ffff8c8a5294aeb8 ffff8c8a5294aee0 [ 983.936072] Call Trace: [ 983.938545] [<ffffffff91429521>] dump_stack+0x85/0xc4 [ 983.943715] [<ffffffff911b2b04>] print_circular_bug+0x1fb/0x20c [ 983.949748] [<ffffffff910def43>] __lock_acquire+0x1003/0x17b0 [ 983.955613] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.961123] [<ffffffff9128ec51>] ? blkdev_put+0x31/0x150 [ 983.966550] [<ffffffff91823125>] mutex_lock_nested+0x65/0x350 [ 983.972407] [<ffffffff9128ec51>] ? blkdev_put+0x31/0x150 [ 983.977832] [<ffffffff9128ec51>] blkdev_put+0x31/0x150 [ 983.983101] [<ffffffffc033481f>] btrfs_close_bdev+0x4f/0x60 [btrfs] [ 983.989500] [<ffffffffc033d77b>] __btrfs_close_devices+0xcb/0x200 [btrfs] [ 983.996415] [<ffffffffc033d8db>] btrfs_close_devices+0x2b/0xa0 [btrfs] [ 984.003068] [<ffffffffc03081c5>] close_ctree+0x265/0x340 [btrfs] [ 984.009189] [<ffffffff9126cc5e>] ? evict_inodes+0x15e/0x170 [ 984.014881] [<ffffffffc02d7959>] btrfs_put_super+0x19/0x20 [btrfs] [ 984.021176] [<ffffffff91250e2f>] generic_shutdown_super+0x6f/0x100 [ 984.027476] [<ffffffff91250f56>] kill_anon_super+0x16/0x30 [ 984.033082] [<ffffffffc02da97e>] btrfs_kill_super+0x1e/0x130 [btrfs] [ 984.039548] [<ffffffff91250fe9>] deactivate_locked_super+0x49/0x80 [ 984.045839] [<ffffffff912515fd>] deactivate_super+0x5d/0x70 [ 984.051525] [<ffffffff91270a1c>] cleanup_mnt+0x5c/0x80 [ 984.056774] [<ffffffff91270a92>] __cleanup_mnt+0x12/0x20 [ 984.062201] [<ffffffff910aa2fe>] task_work_run+0x7e/0xc0 [ 984.067625] [<ffffffff91081b5a>] exit_to_usermode_loop+0x7e/0xb4 [ 984.073747] [<ffffffff910039eb>] syscall_return_slowpath+0xbb/0xd0 [ 984.080038] [<ffffffff9182605c>] entry_SYSCALL_64_fastpath+0xbf/0xc1 Reported-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: fix memory leak in do_walk_downLiu Bo2016-09-261-0/+1
| | | | | | | | The extent buffer 'next' needs to be free'd conditionally. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: btrfs_debug should consume fs_info when DEBUG is not definedJeff Mahoney2016-09-261-4/+10
| | | | | | | | | | | | | | | | | | | We can hit unused variable warnings when btrfs_debug and friends are just aliases for no_printk. This is due to the fs_info not getting consumed by the function call, which can happen if convenenience variables are used. This patch adds a new btrfs_no_printk static inline that consumes the convenience variable and does nothing else. It silences the unused variable warning and has no impact on the generated code: $ size fs/btrfs/extent_io.o* text data bss dec hex filename 44072 152 32 44256 ace0 fs/btrfs/extent_io.o.btrfs_no_printk 44072 152 32 44256 ace0 fs/btrfs/extent_io.o.no_printk Fixes: 27a0dd61a5 (Btrfs: make btrfs_debug match pr_debug handling related to DEBUG) Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: convert send's verbose_printk to btrfs_debugJeff Mahoney2016-09-261-27/+38
| | | | | | | | This was basically an open-coded, less flexible dynamic printk. We can just use btrfs_debug instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: convert pr_* to btrfs_* where possibleJeff Mahoney2016-09-2620-177/+231
| | | | | | | | | | | | | | For many printks, we want to know which file system issued the message. This patch converts most pr_* calls to use the btrfs_* versions instead. In some cases, this means adding plumbing to allow call sites access to an fs_info pointer. fs/btrfs/check-integrity.c is left alone for another day. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: convert printk(KERN_* to use pr_* callsJeff Mahoney2016-09-2616-275/+205
| | | | | | | | | | This patch converts printk(KERN_* style messages to use the pr_* versions. One side effect is that anything that was KERN_DEBUG is now automatically a dynamic debug message. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: unsplit printed stringsJeff Mahoney2016-09-2627-391/+324
| | | | | | | | | | | CodingStyle chapter 2: "[...] never break user-visible strings such as printk messages, because that breaks the ability to grep for them." This patch unsplits user-visible strings. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: clean the old superblocks before freeing the deviceJeff Mahoney2016-09-261-27/+11
| | | | | | | | | | | | | | | | btrfs_rm_device frees the block device but then re-opens it using the saved device name. A race exists between the close and the re-open that allows the block size to be changed. The result is getting stuck forever in the reclaim loop in __getblk_slow. This patch moves the superblock cleanup before closing the block device, which is also consistent with other callers. We also don't need a private copy of dev_name as the whole routine operates under the uuid_mutex. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: kill BUG_ON in run_delayed_tree_refLiu Bo2016-09-261-1/+7
| | | | | | | | | | In a corrupted btrfs image, we can come across this BUG_ON and get an unreponsive system, but if we return errors instead, its caller can handle everything gracefully by aborting the current transaction. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: don't leak reloc root nodes on errorJosef Bacik2016-09-261-0/+4
| | | | | | | | | | We don't track the reloc roots in any sort of normal way, so the only way the root/commit_root nodes get free'd is if the relocation finishes successfully and the reloc root is deleted. Fix this by free'ing them in free_reloc_roots. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: squash lines for simple wrapper functionsMasahiro Yamada2016-09-265-28/+8
| | | | | | | | Remove unneeded variables and assignments. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: improve check_node to avoid reading corrupted nodesLiu Bo2016-09-261-4/+28
| | | | | | | | | | We need to check items in a node to make sure that we're reading a valid one, otherwise we could get various crashes while processing delayed_refs. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: add error handling for extent buffer in print treeLiu Bo2016-09-261-0/+7
| | | | | | | | | | | Somehow we missed btrfs_print_tree when last time we updated error handling for read_extent_block(). This keeps us from getting a NULL pointer panic when btrfs_print_tree's read_extent_block() fails. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: remove BUG_ON in start_transactionLiu Bo2016-09-261-4/+1
| | | | | | | | | | | | | | | Since we could get errors from the concurrent aborted transaction, the check of this BUG_ON in start_transaction is not true any more. Say, while flushing free space cache inode's dirty pages, btrfs_finish_ordered_io -> btrfs_join_transaction_nolock (the transaction has been aborted.) -> BUG_ON(type == TRANS_JOIN_NOLOCK); Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: memset to avoid stale content in btree node blockLiu Bo2016-09-261-0/+11
| | | | | | | | | | | | | | | | | | | | | During updating btree, we could push items between sibling nodes/leaves, for leaves data sections starts reversely from the end of the block while for nodes we only have key pairs which are stored one by one from the start of the block. So we could do try to push key pairs from one node to the next node right in the tree, and after that, we update the node's nritems to reflect the correct end while leaving the stale content in the node. One may intentionally corrupt the fs image and access the stale content by bumping the nritems and causes various crashes. This takes the in-memory @nritems as the correct one and gets to memset the unused part of a btree node. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: return gracefully from balance if fs tree is corruptedLiu Bo2016-09-261-6/+17
| | | | | | | | | | | | | | | | When relocating tree blocks, we firstly get block information from back references in the extent tree, we then search fs tree to try to find all parents of a block. However, if fs tree is corrupted, eg. if there're some missing items, we could come across these WARN_ONs and BUG_ONs. This makes us print some error messages and return gracefully from balance. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: kill BUG_ON()'s in btrfs_mark_extent_writtenJosef Bacik2016-09-261-8/+33
| | | | | | | | No reason to bug on in here, fs corruption could easily cause these things to happen. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: kill the start argument to read_extent_buffer_pagesJosef Bacik2016-09-263-28/+15
| | | | | | | Nobody uses this, it makes no sense to do partial reads of extent buffers. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: add a flags field to btrfs_fs_infoJosef Bacik2016-09-2616-109/+99
| | | | | | | | | | | We have a lot of random ints in btrfs_fs_info that can be put into flags. This is mostly equivalent with the exception of how we deal with quota going on or off, now instead we set a flag when we are turning it on or off and deal with that appropriately, rather than just having a pending state that the current quota_enabled gets set to. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: extend btrfs_set_extent_delalloc and its friends to support in-band ↵Qu Wenruo2016-09-267-25/+37
| | | | | | | | | | | | | | | dedupe and subpage size patchset Extend btrfs_set_extent_delalloc() and extent_clear_unlock_delalloc() parameters for both in-band dedupe and subpage sector size patchset. This should reduce conflict of both patchset and the effort to rebase them. Cc: Chandan Rajendra <chandan@linux.vnet.ibm.com> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: add dynamic debug supportJeff Mahoney2016-09-261-1/+30
| | | | | | | | | | | | | | We can re-use the dynamic debugging descriptor to make use of the dynamic debugging mechanism but still use our own printk interface. Defining the DEBUG macro works as it did before. When it's defined, all of the messages default to print. We can also enable all debug messages at boot or module-load time using the 'dyndbg' and 'btrfs.dyndbg' options. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: Fix warning "variable ‘gen’ set but not used"Luis Henriques2016-09-261-2/+0
| | | | | | | | | | Variable 'gen' in reada_for_search() is not used since commit 58dc4ce43251 ("btrfs: remove unused parameter from readahead_tree_block"). This patch simply removes this variable. Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: Fix warning "variable ‘blocksize’ set but not used"Luis Henriques2016-09-261-2/+0
| | | | | | | | | | Variable 'blocksize' in reada_walk_down() is not used since commit d3e46fea1b1e ("btrfs: sink blocksize parameter to readahead_tree_block"). This patch simply removes this variable. Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: let btrfs_delete_unused_bgs() to clean relocated bgsNaohiro Aota2016-09-262-15/+11
| | | | | | | | | | | | | | | | | Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But the work can be done by btrfs_delete_unused_bgs() (and it's better since it trim the BG). Let's dedupe the code. While btrfs_delete_unused_bgs() is already hitting the relocated BG, it skip the BG since the BG has "ro" flag set (to keep balancing BG intact). On the other hand, btrfs cannot drop "ro" flag here to prevent additional writes. So this patch make use of "removed" flag. btrfs_delete_unused_bgs() now detect the flag to distinguish whether a read-only BG is relocating or not. Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: bail out if block group has different mixed flagLiu Bo2016-09-261-0/+14
| | | | | | | | | | | | | | | | | | | | Currently we allow inconsistence about mixed flag (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA). We'd get ENOSPC if block group has mixed flag and btrfs doesn't. If that happens, we have one space_info with mixed flag and another space_info only with BTRFS_BLOCK_GROUP_METADATA, and global_block_rsv.space_info points to the latter one, but all bytes from block_group contributes to the mixed space_info, thus all the allocation will fail with ENOSPC. This adds a check for the above case. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> [ updated message ] Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: fix memory leak in reading btree blocksLiu Bo2016-09-261-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So we can read a btree block via readahead or intentional read, and we can end up with a memory leak when something happens as follows, 1) readahead starts to read block A but does not wait for read completion, 2) btree_readpage_end_io_hook finds that block A is corrupted, and it needs to clear all block A's pages' uptodate bit. 3) meanwhile an intentional read kicks in and checks block A's pages' uptodate to decide which page needs to be read. 4) when some pages have the uptodate bit during 3)'s check so 3) doesn't count them for eb->io_pages, but they are later cleared by 2) so we has to readpage on the page, we get the wrong eb->io_pages which results in a memory leak of this block. This fixes the problem by firstly getting all pages's locking and then checking pages' uptodate bit. t1(readahead) t2(readahead endio) t3(the following read) read_extent_buffer_pages end_bio_extent_readpage for pg in eb: for page 0,1,2 in eb: if pg is uptodate: btree_readpage_end_io_hook(pg) num_reads++ if uptodate: eb->io_pages = num_reads SetPageUptodate(pg) _______________ for pg in eb: for page 3 in eb: read_extent_buffer_pages if pg is NOT uptodate: btree_readpage_end_io_hook(pg) for pg in eb: __extent_read_full_page(pg) sanity check reports something wrong if pg is uptodate: clear_extent_buffer_uptodate(eb) num_reads++ for pg in eb: eb->io_pages = num_reads ClearPageUptodate(page) _______________ for pg in eb: if pg is NOT uptodate: __extent_read_full_page(pg) So t3's eb->io_pages is not consistent with the number of pages it's reading, and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative number so that we're not able to free the eb. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: remove BUG() in raid56Liu Bo2016-09-261-1/+4
| | | | | | | | | | | | | This BUG() has been triggered by a fuzz testing image, which contains an invalid chunk type, ie. a single stripe chunk has the raid6 type. Btrfs can handle this gracefully by returning -EIO, so besides using btrfs_warn to give us more debugging information rather than a single BUG(), we can return error properly. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: fix check_shared for fiemap ioctlLu Fengqi2016-09-262-10/+369
| | | | | | | | | | | | | | | | | | | Only in the case of different root_id or different object_id, check_shared identified extent as the shared. However, If a extent was referred by different offset of same file, it should also be identified as shared. In addition, check_shared's loop scale is at least n^3, so if a extent has too many references, even causes soft hang up. First, add all delayed_ref to the ref_tree and calculate the unqiue_refs, if the unique_refs is greater than one, return BACKREF_FOUND_SHARED. Then individually add the on-disk reference(inline/keyed) to the ref_tree and calculate the unique_refs of the ref_tree to check if the unique_refs is greater than one.Because once there are two references to return SHARED, so the time complexity is close to the constant. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: create example debugfs file only in debugging buildDavid Sterba2016-09-261-0/+9
| | | | | Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: David Sterba <dsterba@suse.com>
* btrfs: fix perms on demonstration debugfs interfaceEric Sandeen2016-09-261-1/+1
| | | | | | | | | | | | | btrfs provides a helpful demonstration of how to export a global variable via debugfs; however, it is unique among other debugfs files in that it is world-writable, which causes some concern to people who are not familiar with its purpose. Fix it so that it is only user-writable. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Btrfs: fix memory leak of block group cacheLiu Bo2016-09-263-0/+75
| | | | | | | | | | | | | | | | | While processing delayed refs, we may update block group's statistics and attach it to cur_trans->dirty_bgs, and later writing dirty block groups will process the list, which happens during btrfs_commit_transaction(). For whatever reason, the transaction is aborted and dirty_bgs is not processed in cleanup_transaction(), we end up with memory leak of these dirty block group cache. Since btrfs_start_dirty_block_groups() doesn't make it go to the commit critical section, this also adds the cleanup work inside it. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
* Merge branch 'for-linus-4.8' of ↵Linus Torvalds2016-09-232-6/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Josef fixed a problem when quotas are enabled with his latest ENOSPC rework, and Jeff added more checks into the subvol ioctls to avoid tripping up lookup_one_len" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: ensure that file descriptor used with subvol ioctls is a dir Btrfs: handle quota reserve failure properly
| * btrfs: ensure that file descriptor used with subvol ioctls is a dirJeff Mahoney2016-09-211-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If the subvol/snapshot create/destroy ioctls are passed a regular file with execute permissions set, we'll eventually Oops while trying to do inode->i_op->lookup via lookup_one_len. This patch ensures that the file descriptor refers to a directory. Fixes: cb8e70901d (Btrfs: Fix subvolume creation locking rules) Fixes: 76dda93c6a (Btrfs: add snapshot/subvolume destroy ioctl) Cc: <stable@vger.kernel.org> #v2.6.29+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
| * Btrfs: handle quota reserve failure properlyJosef Bacik2016-09-211-6/+3
| | | | | | | | | | | | | | | | | | | | btrfs/022 was spitting a warning for the case that we exceed the quota. If we fail to make our quota reservation we need to clean up our data space reservation. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Tested-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
* | Merge branch 'for-linus-4.8' of ↵Linus Torvalds2016-09-093-8/+17
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I'm not proud of how long it took me to track down that one liner in btrfs_sync_log(), but the good news is the patches I was trying to blame for these problems were actually fine (sorry Filipe)" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns btrfs: do not decrease bytes_may_use when replaying extents
| * Merge branch 'for-chris' of ↵Chris Mason2016-09-072-8/+16
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.8
| | * btrfs: introduce tickets_id to determine whether asynchronous metadata ↵Wang Xiaoguang2016-09-062-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reclaim work makes progress In btrfs_async_reclaim_metadata_space(), we use ticket's address to determine whether asynchronous metadata reclaim work is making progress. ticket = list_first_entry(&space_info->tickets, struct reserve_ticket, list); if (last_ticket == ticket) { flush_state++; } else { last_ticket = ticket; flush_state = FLUSH_DELAYED_ITEMS_NR; if (commit_cycles) commit_cycles--; } But indeed it's wrong, we should not rely on local variable's address to do this check, because addresses may be same. In my test environment, I dd one 168MB file in a 256MB fs, found that for this file, every time wait_reserve_ticket() called, local variable ticket's address is same, For above codes, assume a previous ticket's address is addrA, last_ticket is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and wake up it, then another ticket is added, but with the same address addrA, now last_ticket will be same to current ticket, then current ticket's flush work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR, which may result in some enospc issues(I have seen this in my test machine). Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
| | * btrfs: do not decrease bytes_may_use when replaying extentsWang Xiaoguang2016-09-051-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When replaying extents, there is no need to update bytes_may_use in btrfs_alloc_logged_file_extent(), otherwise it'll trigger a WARN_ON about bytes_may_use. Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely") Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returnsChris Mason2016-09-061-0/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | We use a btrfs_log_ctx structure to pass information into the tree log commit, and get error values out. It gets added to a per log-transaction list which we walk when things go bad. Commit d1433debe added an optimization to skip waiting for the log commit, but didn't take root_log_ctx out of the list. This patch makes sure we remove things before exiting. Signed-off-by: Chris Mason <clm@fb.com> Fixes: d1433debe7f4346cf9fc0dafc71c3137d2a97bc4 cc: stable@vger.kernel.org # 3.15+
* | Merge branch 'for-linus-4.8' of ↵Linus Torvalds2016-09-033-11/+15
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I'm still prepping a set of fixes for btrfs fsync, just nailing down a hard to trigger memory corruption. For now, these are tested and ready." * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket() Btrfs: fix endless loop in balancing block groups Btrfs: kill invalid ASSERT() in process_all_refs()
| * btrfs: fix one bug that process may endlessly wait for ticket in ↵Wang Xiaoguang2016-09-011-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wait_reserve_ticket() If can_overcommit() in btrfs_calc_reclaim_metadata_size() returns true, btrfs_async_reclaim_metadata_space() will not reclaim metadata space, just return directly and also forget to wake up process which are waiting for their tickets, so these processes will wait endlessly. Fstests case generic/172 with mount option "-o compress=lzo" have revealed this bug in my test machine. Here if we have tickets to handle, we must handle them first. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * Btrfs: fix endless loop in balancing block groupsLiu Bo2016-09-011-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Qgroup function may overwrite the saved error 'err' with 0 in case quota is not enabled, and this ends up with a endless loop in balance because we keep going back to balance the same block group. It really should use 'ret' instead. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * Btrfs: kill invalid ASSERT() in process_all_refs()Josef Bacik2016-09-011-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose you have the following tree in snap1 on a file system mounted with -o inode_cache so that inode numbers are recycled └── [ 258] a └── [ 257] b and then you remove b, rename a to c, and then re-create b in c so you have the following tree └── [ 258] c └── [ 257] b and then you try to do an incremental send you will hit ASSERT(pending_move == 0); in process_all_refs(). This is because we assume that any recycling of inodes will not have a pending change in our path, which isn't the case. This is the case for the DELETE side, since we want to remove the old file using the old path, but on the create side we could have a pending move and need to do the normal pending rename dance. So remove this ASSERT() and put a comment about why we ignore pending_move. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | Merge branch 'for-linus-4.8' of ↵Linus Torvalds2016-08-2620-181/+473
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We've queued up a few different fixes in here. These range from enospc corners to fsync and quota fixes, and a few targeted at error handling for corrupt metadata/fuzzing" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix lockdep warning on deadlock against an inode's log mutex Btrfs: detect corruption when non-root leaf has zero item Btrfs: check btree node's nritems btrfs: don't create or leak aliased root while cleaning up orphans Btrfs: fix em leak in find_first_block_group btrfs: do not background blkdev_put() Btrfs: clarify do_chunk_alloc()'s return value btrfs: fix fsfreeze hang caused by delayed iputs deal btrfs: update btrfs_space_info's bytes_may_use timely btrfs: divide btrfs_update_reserved_bytes() into two functions btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() btrfs: qgroup: Fix qgroup incorrectness caused by log replay btrfs: relocation: Fix leaking qgroups numbers on data extents btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent() btrfs: waiting on qgroup rescan should not always be interruptible btrfs: properly track when rescan worker is running btrfs: flush_space: treat return value of do_chunk_alloc properly Btrfs: add ASSERT for block group's memory leak btrfs: backref: Fix soft lockup in __merge_refs function Btrfs: fix memory leak of reloc_root
| * Btrfs: fix lockdep warning on deadlock against an inode's log mutexFilipe Manana2016-08-253-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink"), which landed in 4.8-rc2, introduced a possibility for a deadlock due to double locking of an inode's log mutex by the same task, which lockdep reports with: [23045.433975] ============================================= [23045.434748] [ INFO: possible recursive locking detected ] [23045.435426] 4.7.0-rc6-btrfs-next-34+ #1 Not tainted [23045.436044] --------------------------------------------- [23045.436044] xfs_io/3688 is trying to acquire lock: [23045.436044] (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] but task is already holding lock: [23045.436044] (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] other info that might help us debug this: [23045.436044] Possible unsafe locking scenario: [23045.436044] CPU0 [23045.436044] ---- [23045.436044] lock(&ei->log_mutex); [23045.436044] lock(&ei->log_mutex); [23045.436044] *** DEADLOCK *** [23045.436044] May be due to missing lock nesting notation [23045.436044] 3 locks held by xfs_io/3688: [23045.436044] #0: (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffffa035f2ae>] btrfs_sync_file+0x14e/0x425 [btrfs] [23045.436044] #1: (sb_internal#2){.+.+.+}, at: [<ffffffff8118446b>] __sb_start_write+0x5f/0xb0 [23045.436044] #2: (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] stack backtrace: [23045.436044] CPU: 4 PID: 3688 Comm: xfs_io Not tainted 4.7.0-rc6-btrfs-next-34+ #1 [23045.436044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [23045.436044] 0000000000000000 ffff88022f5f7860 ffffffff8127074d ffffffff82a54b70 [23045.436044] ffffffff82a54b70 ffff88022f5f7920 ffffffff81092897 ffff880228015d68 [23045.436044] 0000000000000000 ffffffff82a54b70 ffffffff829c3f00 ffff880228015d68 [23045.436044] Call Trace: [23045.436044] [<ffffffff8127074d>] dump_stack+0x67/0x90 [23045.436044] [<ffffffff81092897>] __lock_acquire+0xcbb/0xe4e [23045.436044] [<ffffffff8109155f>] ? mark_lock+0x24/0x201 [23045.436044] [<ffffffff8109179a>] ? mark_held_locks+0x5e/0x74 [23045.436044] [<ffffffff81092de0>] lock_acquire+0x12f/0x1c3 [23045.436044] [<ffffffff81092de0>] ? lock_acquire+0x12f/0x1c3 [23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffff814a51a4>] mutex_lock_nested+0x77/0x3a7 [23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffffa039705e>] ? btrfs_release_delayed_node+0xb/0xd [btrfs] [23045.436044] [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs] [23045.436044] [<ffffffff810a0ed1>] ? vprintk_emit+0x453/0x465 [23045.436044] [<ffffffffa0385a61>] btrfs_log_inode+0x66e/0xc95 [btrfs] [23045.436044] [<ffffffffa03c084d>] log_new_dir_dentries+0x26c/0x359 [btrfs] [23045.436044] [<ffffffffa03865aa>] btrfs_log_inode_parent+0x4a6/0x628 [btrfs] [23045.436044] [<ffffffffa0387552>] btrfs_log_dentry_safe+0x5a/0x75 [btrfs] [23045.436044] [<ffffffffa035f464>] btrfs_sync_file+0x304/0x425 [btrfs] [23045.436044] [<ffffffff811acaf4>] vfs_fsync_range+0x8c/0x9e [23045.436044] [<ffffffff811acb22>] vfs_fsync+0x1c/0x1e [23045.436044] [<ffffffff811acc79>] do_fsync+0x31/0x4a [23045.436044] [<ffffffff811ace99>] SyS_fsync+0x10/0x14 [23045.436044] [<ffffffff814a88e5>] entry_SYSCALL_64_fastpath+0x18/0xa8 [23045.436044] [<ffffffff8108f039>] ? trace_hardirqs_off_caller+0x3f/0xaa An example reproducer for this is: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ touch /mnt/dir/foo $ sync $ mv /mnt/dir/foo /mnt/dir/bar $ touch /mnt/dir/foo $ xfs_io -c "fsync" /mnt/dir/bar This is because while logging the inode of file bar we end up logging its parent directory (since its inode has an unlink_trans field matching the current transaction id due to the rename operation), which in turn logs the inodes for all its new dentries, so that the new inode for the new file named foo gets logged which in turn triggered another logging attempt for the inode we are fsync'ing, since that inode had an old name that corresponds to the name of the new inode. So fix this by ensuring that when logging the inode for a new dentry that has a name matching an old name of some other inode, we don't log again the original inode that we are fsync'ing. Fixes: 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
| * Btrfs: detect corruption when non-root leaf has zero itemLiu Bo2016-08-251-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we treat leaf which has zero item as a valid one because we could have an empty tree, that is, a root that is also a leaf without any item, however, in the same case but when the leaf is not a root, we can end up with hitting the BUG_ON(1) in btrfs_extend_item() called by setup_inline_extent_backref(). This makes us check the situation as a corruption if leaf is not its own root. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
| * Btrfs: check btree node's nritemsLiu Bo2016-08-251-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When btree node (level = 1) has nritems which equals to zero, we can end up with panic due to insert_ptr()'s BUG_ON(slot > nritems); where slot is 1 and nritems is 0, as copy_for_split() calls insert_ptr(.., path->slots[1] + 1, ...); A invalid value results in the whole mess, this adds the check for btree's node nritems so that we stop reading block when when something is wrong. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
| * btrfs: don't create or leak aliased root while cleaning up orphansJeff Mahoney2016-08-253-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 909c3a22da3 (Btrfs: fix loading of orphan roots leading to BUG_ON) avoids the BUG_ON but can add an aliased root to the dead_roots list or leak the root. Since we've already been loading roots into the radix tree, we should use it before looking the root up on disk. Cc: <stable@vger.kernel.org> # 4.5 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
| * Btrfs: fix em leak in find_first_block_groupJosef Bacik2016-08-251-0/+1
| | | | | | | | | | | | | | | | | | We need to call free_extent_map() on the em we look up. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
| * btrfs: do not background blkdev_put()Anand Jain2016-08-251-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the end of unmount/dev-delete, if the device exclusive open is not actually closed, then there might be a race with another program in the userland who is trying to open the device in exclusive mode and it may fail for eg: unmount /btrfs; fsck /dev/x btrfs dev del /dev/x /btrfs; fsck /dev/x so here background blkdev_put() is not a choice Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
OpenPOWER on IntegriCloud