<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-obmc-linux/fs, branch v4.18.19</title>
<subtitle>Talos™ II Linux sources for OpenBMC</subtitle>
<id>https://git.raptorcs.com/git/talos-obmc-linux/atom?h=v4.18.19</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-obmc-linux/atom?h=v4.18.19'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/'/>
<updated>2018-11-13T19:12:58+00:00</updated>
<entry>
<title>Btrfs: fix use-after-free when dumping free space</title>
<updated>2018-11-13T19:12:58+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2018-10-22T09:43:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=928b261cfbe6e7a028311f9d58e0d93024f51d18'/>
<id>urn:sha1:928b261cfbe6e7a028311f9d58e0d93024f51d18</id>
<content type='text'>
commit 9084cb6a24bf5838a665af92ded1af8363f9e563 upstream.

We were iterating a block group's free space cache rbtree without locking
first the lock that protects it (the free_space_ctl-&gt;free_space_offset
rbtree is protected by the free_space_ctl-&gt;tree_lock spinlock).

KASAN reported an use-after-free problem when iterating such a rbtree due
to a concurrent rbtree delete:

[ 9520.359168] ==================================================================
[ 9520.359656] BUG: KASAN: use-after-free in rb_next+0x13/0x90
[ 9520.359949] Read of size 8 at addr ffff8800b7ada500 by task btrfs-transacti/1721
[ 9520.360357]
[ 9520.360530] CPU: 4 PID: 1721 Comm: btrfs-transacti Tainted: G             L    4.19.0-rc8-nbor #555
[ 9520.360990] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 9520.362682] Call Trace:
[ 9520.362887]  dump_stack+0xa4/0xf5
[ 9520.363146]  print_address_description+0x78/0x280
[ 9520.363412]  kasan_report+0x263/0x390
[ 9520.363650]  ? rb_next+0x13/0x90
[ 9520.363873]  __asan_load8+0x54/0x90
[ 9520.364102]  rb_next+0x13/0x90
[ 9520.364380]  btrfs_dump_free_space+0x146/0x160 [btrfs]
[ 9520.364697]  dump_space_info+0x2cd/0x310 [btrfs]
[ 9520.364997]  btrfs_reserve_extent+0x1ee/0x1f0 [btrfs]
[ 9520.365310]  __btrfs_prealloc_file_range+0x1cc/0x620 [btrfs]
[ 9520.365646]  ? btrfs_update_time+0x180/0x180 [btrfs]
[ 9520.365923]  ? _raw_spin_unlock+0x27/0x40
[ 9520.366204]  ? btrfs_alloc_data_chunk_ondemand+0x2c0/0x5c0 [btrfs]
[ 9520.366549]  btrfs_prealloc_file_range_trans+0x23/0x30 [btrfs]
[ 9520.366880]  cache_save_setup+0x42e/0x580 [btrfs]
[ 9520.367220]  ? btrfs_check_data_free_space+0xd0/0xd0 [btrfs]
[ 9520.367518]  ? lock_downgrade+0x2f0/0x2f0
[ 9520.367799]  ? btrfs_write_dirty_block_groups+0x11f/0x6e0 [btrfs]
[ 9520.368104]  ? kasan_check_read+0x11/0x20
[ 9520.368349]  ? do_raw_spin_unlock+0xa8/0x140
[ 9520.368638]  btrfs_write_dirty_block_groups+0x2af/0x6e0 [btrfs]
[ 9520.368978]  ? btrfs_start_dirty_block_groups+0x870/0x870 [btrfs]
[ 9520.369282]  ? do_raw_spin_unlock+0xa8/0x140
[ 9520.369534]  ? _raw_spin_unlock+0x27/0x40
[ 9520.369811]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
[ 9520.370137]  commit_cowonly_roots+0x4b9/0x610 [btrfs]
[ 9520.370560]  ? commit_fs_roots+0x350/0x350 [btrfs]
[ 9520.370926]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
[ 9520.371285]  btrfs_commit_transaction+0x5e5/0x10e0 [btrfs]
[ 9520.371612]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
[ 9520.371943]  ? start_transaction+0x168/0x6c0 [btrfs]
[ 9520.372257]  transaction_kthread+0x21c/0x240 [btrfs]
[ 9520.372537]  kthread+0x1d2/0x1f0
[ 9520.372793]  ? btrfs_cleanup_transaction+0xb50/0xb50 [btrfs]
[ 9520.373090]  ? kthread_park+0xb0/0xb0
[ 9520.373329]  ret_from_fork+0x3a/0x50
[ 9520.373567]
[ 9520.373738] Allocated by task 1804:
[ 9520.373974]  kasan_kmalloc+0xff/0x180
[ 9520.374208]  kasan_slab_alloc+0x11/0x20
[ 9520.374447]  kmem_cache_alloc+0xfc/0x2d0
[ 9520.374731]  __btrfs_add_free_space+0x40/0x580 [btrfs]
[ 9520.375044]  unpin_extent_range+0x4f7/0x7a0 [btrfs]
[ 9520.375383]  btrfs_finish_extent_commit+0x15f/0x4d0 [btrfs]
[ 9520.375707]  btrfs_commit_transaction+0xb06/0x10e0 [btrfs]
[ 9520.376027]  btrfs_alloc_data_chunk_ondemand+0x237/0x5c0 [btrfs]
[ 9520.376365]  btrfs_check_data_free_space+0x81/0xd0 [btrfs]
[ 9520.376689]  btrfs_delalloc_reserve_space+0x25/0x80 [btrfs]
[ 9520.377018]  btrfs_direct_IO+0x42e/0x6d0 [btrfs]
[ 9520.377284]  generic_file_direct_write+0x11e/0x220
[ 9520.377587]  btrfs_file_write_iter+0x472/0xac0 [btrfs]
[ 9520.377875]  aio_write+0x25c/0x360
[ 9520.378106]  io_submit_one+0xaa0/0xdc0
[ 9520.378343]  __se_sys_io_submit+0xfa/0x2f0
[ 9520.378589]  __x64_sys_io_submit+0x43/0x50
[ 9520.378840]  do_syscall_64+0x7d/0x240
[ 9520.379081]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 9520.379387]
[ 9520.379557] Freed by task 1802:
[ 9520.379782]  __kasan_slab_free+0x173/0x260
[ 9520.380028]  kasan_slab_free+0xe/0x10
[ 9520.380262]  kmem_cache_free+0xc1/0x2c0
[ 9520.380544]  btrfs_find_space_for_alloc+0x4cd/0x4e0 [btrfs]
[ 9520.380866]  find_free_extent+0xa99/0x17e0 [btrfs]
[ 9520.381166]  btrfs_reserve_extent+0xd5/0x1f0 [btrfs]
[ 9520.381474]  btrfs_get_blocks_direct+0x60b/0xbd0 [btrfs]
[ 9520.381761]  __blockdev_direct_IO+0x10ee/0x58a1
[ 9520.382059]  btrfs_direct_IO+0x25a/0x6d0 [btrfs]
[ 9520.382321]  generic_file_direct_write+0x11e/0x220
[ 9520.382623]  btrfs_file_write_iter+0x472/0xac0 [btrfs]
[ 9520.382904]  aio_write+0x25c/0x360
[ 9520.383172]  io_submit_one+0xaa0/0xdc0
[ 9520.383416]  __se_sys_io_submit+0xfa/0x2f0
[ 9520.383678]  __x64_sys_io_submit+0x43/0x50
[ 9520.383927]  do_syscall_64+0x7d/0x240
[ 9520.384165]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 9520.384439]
[ 9520.384610] The buggy address belongs to the object at ffff8800b7ada500
                which belongs to the cache btrfs_free_space of size 72
[ 9520.385175] The buggy address is located 0 bytes inside of
                72-byte region [ffff8800b7ada500, ffff8800b7ada548)
[ 9520.385691] The buggy address belongs to the page:
[ 9520.385957] page:ffffea0002deb680 count:1 mapcount:0 mapping:ffff880108a1d700 index:0x0 compound_mapcount: 0
[ 9520.388030] flags: 0x8100(slab|head)
[ 9520.388281] raw: 0000000000008100 ffffea0002deb608 ffffea0002728808 ffff880108a1d700
[ 9520.388722] raw: 0000000000000000 0000000000130013 00000001ffffffff 0000000000000000
[ 9520.389169] page dumped because: kasan: bad access detected
[ 9520.389473]
[ 9520.389658] Memory state around the buggy address:
[ 9520.389943]  ffff8800b7ada400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 9520.390368]  ffff8800b7ada480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 9520.390796] &gt;ffff8800b7ada500: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc
[ 9520.391223]                    ^
[ 9520.391461]  ffff8800b7ada580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 9520.391885]  ffff8800b7ada600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 9520.392313] ==================================================================
[ 9520.392772] BTRFS critical (device vdc): entry offset 2258497536, bytes 131072, bitmap no
[ 9520.393247] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
[ 9520.393705] PGD 800000010dbab067 P4D 800000010dbab067 PUD 107551067 PMD 0
[ 9520.394059] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 9520.394378] CPU: 4 PID: 1721 Comm: btrfs-transacti Tainted: G    B        L    4.19.0-rc8-nbor #555
[ 9520.394858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 9520.395350] RIP: 0010:rb_next+0x3c/0x90
[ 9520.396461] RSP: 0018:ffff8801074ff780 EFLAGS: 00010292
[ 9520.396762] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81b5ac4c
[ 9520.397115] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000011
[ 9520.397468] RBP: ffff8801074ff7a0 R08: ffffed0021d64ccc R09: ffffed0021d64ccc
[ 9520.397821] R10: 0000000000000001 R11: ffffed0021d64ccb R12: ffff8800b91e0000
[ 9520.398188] R13: ffff8800a3ceba48 R14: ffff8800b627bf80 R15: 0000000000020000
[ 9520.398555] FS:  0000000000000000(0000) GS:ffff88010eb00000(0000) knlGS:0000000000000000
[ 9520.399007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9520.399335] CR2: 0000000000000011 CR3: 0000000106b52000 CR4: 00000000000006a0
[ 9520.399679] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9520.400023] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9520.400400] Call Trace:
[ 9520.400648]  btrfs_dump_free_space+0x146/0x160 [btrfs]
[ 9520.400974]  dump_space_info+0x2cd/0x310 [btrfs]
[ 9520.401287]  btrfs_reserve_extent+0x1ee/0x1f0 [btrfs]
[ 9520.401609]  __btrfs_prealloc_file_range+0x1cc/0x620 [btrfs]
[ 9520.401952]  ? btrfs_update_time+0x180/0x180 [btrfs]
[ 9520.402232]  ? _raw_spin_unlock+0x27/0x40
[ 9520.402522]  ? btrfs_alloc_data_chunk_ondemand+0x2c0/0x5c0 [btrfs]
[ 9520.402882]  btrfs_prealloc_file_range_trans+0x23/0x30 [btrfs]
[ 9520.403261]  cache_save_setup+0x42e/0x580 [btrfs]
[ 9520.403570]  ? btrfs_check_data_free_space+0xd0/0xd0 [btrfs]
[ 9520.403871]  ? lock_downgrade+0x2f0/0x2f0
[ 9520.404161]  ? btrfs_write_dirty_block_groups+0x11f/0x6e0 [btrfs]
[ 9520.404481]  ? kasan_check_read+0x11/0x20
[ 9520.404732]  ? do_raw_spin_unlock+0xa8/0x140
[ 9520.405026]  btrfs_write_dirty_block_groups+0x2af/0x6e0 [btrfs]
[ 9520.405375]  ? btrfs_start_dirty_block_groups+0x870/0x870 [btrfs]
[ 9520.405694]  ? do_raw_spin_unlock+0xa8/0x140
[ 9520.405958]  ? _raw_spin_unlock+0x27/0x40
[ 9520.406243]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
[ 9520.406574]  commit_cowonly_roots+0x4b9/0x610 [btrfs]
[ 9520.406899]  ? commit_fs_roots+0x350/0x350 [btrfs]
[ 9520.407253]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
[ 9520.407589]  btrfs_commit_transaction+0x5e5/0x10e0 [btrfs]
[ 9520.407925]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
[ 9520.408262]  ? start_transaction+0x168/0x6c0 [btrfs]
[ 9520.408582]  transaction_kthread+0x21c/0x240 [btrfs]
[ 9520.408870]  kthread+0x1d2/0x1f0
[ 9520.409138]  ? btrfs_cleanup_transaction+0xb50/0xb50 [btrfs]
[ 9520.409440]  ? kthread_park+0xb0/0xb0
[ 9520.409682]  ret_from_fork+0x3a/0x50
[ 9520.410508] Dumping ftrace buffer:
[ 9520.410764]    (ftrace buffer empty)
[ 9520.411007] CR2: 0000000000000011
[ 9520.411297] ---[ end trace 01a0863445cf360a ]---
[ 9520.411568] RIP: 0010:rb_next+0x3c/0x90
[ 9520.412644] RSP: 0018:ffff8801074ff780 EFLAGS: 00010292
[ 9520.412932] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81b5ac4c
[ 9520.413274] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000011
[ 9520.413616] RBP: ffff8801074ff7a0 R08: ffffed0021d64ccc R09: ffffed0021d64ccc
[ 9520.414007] R10: 0000000000000001 R11: ffffed0021d64ccb R12: ffff8800b91e0000
[ 9520.414349] R13: ffff8800a3ceba48 R14: ffff8800b627bf80 R15: 0000000000020000
[ 9520.416074] FS:  0000000000000000(0000) GS:ffff88010eb00000(0000) knlGS:0000000000000000
[ 9520.416536] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9520.416848] CR2: 0000000000000011 CR3: 0000000106b52000 CR4: 00000000000006a0
[ 9520.418477] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9520.418846] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9520.419204] Kernel panic - not syncing: Fatal exception
[ 9520.419666] Dumping ftrace buffer:
[ 9520.419930]    (ftrace buffer empty)
[ 9520.420168] Kernel Offset: disabled
[ 9520.420406] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fix this by acquiring the respective lock before iterating the rbtree.

Reported-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Btrfs: fix use-after-free during inode eviction</title>
<updated>2018-11-13T19:12:58+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2018-10-12T12:02:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=b7edab6df938ba3c73fbe9d19c1bb88846444645'/>
<id>urn:sha1:b7edab6df938ba3c73fbe9d19c1bb88846444645</id>
<content type='text'>
commit 421f0922a2cfb0c75acd9746454aaa576c711a65 upstream.

At inode.c:evict_inode_truncate_pages(), when we iterate over the
inode's extent states, we access an extent state record's "state" field
after we unlocked the inode's io tree lock. This can lead to a
use-after-free issue because after we unlock the io tree that extent
state record might have been freed due to being merged into another
adjacent extent state record (a previous inflight bio for a read
operation finished in the meanwhile which unlocked a range in the io
tree and cause a merge of extent state records, as explained in the
comment before the while loop added in commit 6ca0709756710 ("Btrfs: fix
hang during inode eviction due to concurrent readahead")).

Fix this by keeping a copy of the extent state's flags in a local
variable and using it after unlocking the io tree.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201189
Fixes: b9d0b38928e2 ("btrfs: Add handler for invalidate page")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: move the dio_sem higher up the callchain</title>
<updated>2018-11-13T19:12:58+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2018-10-12T19:32:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=4fbcf14ceb4cba90de0998990731204ea42e5de2'/>
<id>urn:sha1:4fbcf14ceb4cba90de0998990731204ea42e5de2</id>
<content type='text'>
commit c495144bc6962186feae31d687596d2472000e45 upstream.

We're getting a lockdep splat because we take the dio_sem under the
log_mutex.  What we really need is to protect fsync() from logging an
extent map for an extent we never waited on higher up, so just guard the
whole thing with dio_sem.

======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411 Not tainted
------------------------------------------------------
aio-dio-invalid/30928 is trying to acquire lock:
0000000092621cfd (&amp;mm-&gt;mmap_sem){++++}, at: get_user_pages_unlocked+0x5a/0x1e0

but task is already holding lock:
00000000cefe6b35 (&amp;ei-&gt;dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-&gt; #5 (&amp;ei-&gt;dio_sem){++++}:
       lock_acquire+0xbd/0x220
       down_write+0x51/0xb0
       btrfs_log_changed_extents+0x80/0xa40
       btrfs_log_inode+0xbaf/0x1000
       btrfs_log_inode_parent+0x26f/0xa80
       btrfs_log_dentry_safe+0x50/0x70
       btrfs_sync_file+0x357/0x540
       do_fsync+0x38/0x60
       __ia32_sys_fdatasync+0x12/0x20
       do_fast_syscall_32+0x9a/0x2f0
       entry_SYSENTER_compat+0x84/0x96

-&gt; #4 (&amp;ei-&gt;log_mutex){+.+.}:
       lock_acquire+0xbd/0x220
       __mutex_lock+0x86/0xa10
       btrfs_record_unlink_dir+0x2a/0xa0
       btrfs_unlink+0x5a/0xc0
       vfs_unlink+0xb1/0x1a0
       do_unlinkat+0x264/0x2b0
       do_fast_syscall_32+0x9a/0x2f0
       entry_SYSENTER_compat+0x84/0x96

-&gt; #3 (sb_internal#2){.+.+}:
       lock_acquire+0xbd/0x220
       __sb_start_write+0x14d/0x230
       start_transaction+0x3e6/0x590
       btrfs_evict_inode+0x475/0x640
       evict+0xbf/0x1b0
       btrfs_run_delayed_iputs+0x6c/0x90
       cleaner_kthread+0x124/0x1a0
       kthread+0x106/0x140
       ret_from_fork+0x3a/0x50

-&gt; #2 (&amp;fs_info-&gt;cleaner_delayed_iput_mutex){+.+.}:
       lock_acquire+0xbd/0x220
       __mutex_lock+0x86/0xa10
       btrfs_alloc_data_chunk_ondemand+0x197/0x530
       btrfs_check_data_free_space+0x4c/0x90
       btrfs_delalloc_reserve_space+0x20/0x60
       btrfs_page_mkwrite+0x87/0x520
       do_page_mkwrite+0x31/0xa0
       __handle_mm_fault+0x799/0xb00
       handle_mm_fault+0x7c/0xe0
       __do_page_fault+0x1d3/0x4a0
       async_page_fault+0x1e/0x30

-&gt; #1 (sb_pagefaults){.+.+}:
       lock_acquire+0xbd/0x220
       __sb_start_write+0x14d/0x230
       btrfs_page_mkwrite+0x6a/0x520
       do_page_mkwrite+0x31/0xa0
       __handle_mm_fault+0x799/0xb00
       handle_mm_fault+0x7c/0xe0
       __do_page_fault+0x1d3/0x4a0
       async_page_fault+0x1e/0x30

-&gt; #0 (&amp;mm-&gt;mmap_sem){++++}:
       __lock_acquire+0x42e/0x7a0
       lock_acquire+0xbd/0x220
       down_read+0x48/0xb0
       get_user_pages_unlocked+0x5a/0x1e0
       get_user_pages_fast+0xa4/0x150
       iov_iter_get_pages+0xc3/0x340
       do_direct_IO+0xf93/0x1d70
       __blockdev_direct_IO+0x32d/0x1c20
       btrfs_direct_IO+0x227/0x400
       generic_file_direct_write+0xcf/0x180
       btrfs_file_write_iter+0x308/0x58c
       aio_write+0xf8/0x1d0
       io_submit_one+0x3a9/0x620
       __ia32_compat_sys_io_submit+0xb2/0x270
       do_int80_syscall_32+0x5b/0x1a0
       entry_INT80_compat+0x88/0xa0

other info that might help us debug this:

Chain exists of:
  &amp;mm-&gt;mmap_sem --&gt; &amp;ei-&gt;log_mutex --&gt; &amp;ei-&gt;dio_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&amp;ei-&gt;dio_sem);
                               lock(&amp;ei-&gt;log_mutex);
                               lock(&amp;ei-&gt;dio_sem);
  lock(&amp;mm-&gt;mmap_sem);

 *** DEADLOCK ***

1 lock held by aio-dio-invalid/30928:
 #0: 00000000cefe6b35 (&amp;ei-&gt;dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

stack backtrace:
CPU: 0 PID: 30928 Comm: aio-dio-invalid Not tainted 4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 dump_stack+0x7c/0xbb
 print_circular_bug.isra.37+0x297/0x2a4
 check_prev_add.constprop.45+0x781/0x7a0
 ? __lock_acquire+0x42e/0x7a0
 validate_chain.isra.41+0x7f0/0xb00
 __lock_acquire+0x42e/0x7a0
 lock_acquire+0xbd/0x220
 ? get_user_pages_unlocked+0x5a/0x1e0
 down_read+0x48/0xb0
 ? get_user_pages_unlocked+0x5a/0x1e0
 get_user_pages_unlocked+0x5a/0x1e0
 get_user_pages_fast+0xa4/0x150
 iov_iter_get_pages+0xc3/0x340
 do_direct_IO+0xf93/0x1d70
 ? __alloc_workqueue_key+0x358/0x490
 ? __blockdev_direct_IO+0x14b/0x1c20
 __blockdev_direct_IO+0x32d/0x1c20
 ? btrfs_run_delalloc_work+0x40/0x40
 ? can_nocow_extent+0x490/0x490
 ? kvm_clock_read+0x1f/0x30
 ? can_nocow_extent+0x490/0x490
 ? btrfs_run_delalloc_work+0x40/0x40
 btrfs_direct_IO+0x227/0x400
 ? btrfs_run_delalloc_work+0x40/0x40
 generic_file_direct_write+0xcf/0x180
 btrfs_file_write_iter+0x308/0x58c
 aio_write+0xf8/0x1d0
 ? kvm_clock_read+0x1f/0x30
 ? __might_fault+0x3e/0x90
 io_submit_one+0x3a9/0x620
 ? io_submit_one+0xe5/0x620
 __ia32_compat_sys_io_submit+0xb2/0x270
 do_int80_syscall_32+0x5b/0x1a0
 entry_INT80_compat+0x88/0xa0

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: don't run delayed_iputs in commit</title>
<updated>2018-11-13T19:12:58+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2018-10-11T19:54:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=904c7dc9dec1c84d550d814fb7ee50200b48e9d9'/>
<id>urn:sha1:904c7dc9dec1c84d550d814fb7ee50200b48e9d9</id>
<content type='text'>
commit 30928e9baac238a7330085a1c5747f0b5df444b4 upstream.

This could result in a really bad case where we do something like

evict
  evict_refill_and_join
    btrfs_commit_transaction
      btrfs_run_delayed_iputs
        evict
          evict_refill_and_join
            btrfs_commit_transaction
... forever

We have plenty of other places where we run delayed iputs that are much
safer, let those do the work.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: fix insert_reserved error handling</title>
<updated>2018-11-13T19:12:58+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2018-10-11T19:54:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=13d6628c019d7d3ddd749d09ffb4152ae0ad8e8b'/>
<id>urn:sha1:13d6628c019d7d3ddd749d09ffb4152ae0ad8e8b</id>
<content type='text'>
commit 80ee54bfe8a3850015585ebc84e8d207fcae6831 upstream.

We were not handling the reserved byte accounting properly for data
references.  Metadata was fine, if it errored out the error paths would
free the bytes_reserved count and pin the extent, but it even missed one
of the error cases.  So instead move this handling up into
run_one_delayed_ref so we are sure that both cases are properly cleaned
up in case of a transaction abort.

CC: stable@vger.kernel.org # 4.18+
Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: only free reserved extent if we didn't insert it</title>
<updated>2018-11-13T19:12:58+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2018-10-11T19:54:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=68c7db7c8f76e80d2128da46249561208f51d988'/>
<id>urn:sha1:68c7db7c8f76e80d2128da46249561208f51d988</id>
<content type='text'>
commit 49940bdd57779c78462da7aa5a8650b2fea8c2ff upstream.

When we insert the file extent once the ordered extent completes we free
the reserved extent reservation as it'll have been migrated to the
bytes_used counter.  However if we error out after this step we'll still
clear the reserved extent reservation, resulting in a negative
accounting of the reserved bytes for the block group and space info.
Fix this by only doing the free if we didn't successfully insert a file
extent for this extent.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: don't use ctl-&gt;free_space for max_extent_size</title>
<updated>2018-11-13T19:12:58+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2018-10-11T19:54:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=3e9205b1d619280c1b7a1348e2e0180b1561a73c'/>
<id>urn:sha1:3e9205b1d619280c1b7a1348e2e0180b1561a73c</id>
<content type='text'>
commit fb5c39d7a887108087de6ff93d3f326b01b4ef41 upstream.

max_extent_size is supposed to be the largest contiguous range for the
space info, and ctl-&gt;free_space is the total free space in the block
group.  We need to keep track of these separately and _only_ use the
max_free_space if we don't have a max_extent_size, as that means our
original request was too large to search any of the block groups for and
therefore wouldn't have a max_extent_size set.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: set max_extent_size properly</title>
<updated>2018-11-13T19:12:57+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2018-10-12T19:32:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=171d45102ea88b69e686d95717f29e8fcffe2e1d'/>
<id>urn:sha1:171d45102ea88b69e686d95717f29e8fcffe2e1d</id>
<content type='text'>
commit ad22cf6ea47fa20fbe11ac324a0a15c0a9a4a2a9 upstream.

We can't use entry-&gt;bytes if our entry is a bitmap entry, we need to use
entry-&gt;max_extent_size in that case.  Fix up all the logic to make this
consistent.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: reset max_extent_size properly</title>
<updated>2018-11-13T19:12:57+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2018-10-11T19:54:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=17a7240a9508d8fbb189456960628170bb9b8833'/>
<id>urn:sha1:17a7240a9508d8fbb189456960628170bb9b8833</id>
<content type='text'>
commit 21a94f7acf0f748599ea552af5d9ee7d7e41c72f upstream.

If we use up our block group before allocating a new one we'll easily
get a max_extent_size that's set really really low, which will result in
a lot of fragmentation.  We need to make sure we're resetting the
max_extent_size when we add a new chunk or add new space.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Btrfs: fix deadlock when writing out free space caches</title>
<updated>2018-11-13T19:12:57+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2018-10-12T09:03:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=645dd2f9feab405216397a40b0ed5ef17a6e8aa2'/>
<id>urn:sha1:645dd2f9feab405216397a40b0ed5ef17a6e8aa2</id>
<content type='text'>
commit 5ce555578e0919237fa4bda92b4670e2dd176f85 upstream.

When writing out a block group free space cache we can end deadlocking
with ourselves on an extent buffer lock resulting in a warning like the
following:

  [245043.379979] WARNING: CPU: 4 PID: 2608 at fs/btrfs/locking.c:251 btrfs_tree_lock+0x1be/0x1d0 [btrfs]
  [245043.392792] CPU: 4 PID: 2608 Comm: btrfs-transacti Tainted: G
    W I      4.16.8 #1
  [245043.395489] RIP: 0010:btrfs_tree_lock+0x1be/0x1d0 [btrfs]
  [245043.396791] RSP: 0018:ffffc9000424b840 EFLAGS: 00010246
  [245043.398093] RAX: 0000000000000a30 RBX: ffff8807e20a3d20 RCX: 0000000000000001
  [245043.399414] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8807e20a3d20
  [245043.400732] RBP: 0000000000000001 R08: ffff88041f39a700 R09: ffff880000000000
  [245043.402021] R10: 0000000000000040 R11: ffff8807e20a3d20 R12: ffff8807cb220630
  [245043.403296] R13: 0000000000000001 R14: ffff8807cb220628 R15: ffff88041fbdf000
  [245043.404780] FS:  0000000000000000(0000) GS:ffff88082fc80000(0000) knlGS:0000000000000000
  [245043.406050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [245043.407321] CR2: 00007fffdbdb9f10 CR3: 0000000001c09005 CR4: 00000000000206e0
  [245043.408670] Call Trace:
  [245043.409977]  btrfs_search_slot+0x761/0xa60 [btrfs]
  [245043.411278]  btrfs_insert_empty_items+0x62/0xb0 [btrfs]
  [245043.412572]  btrfs_insert_item+0x5b/0xc0 [btrfs]
  [245043.413922]  btrfs_create_pending_block_groups+0xfb/0x1e0 [btrfs]
  [245043.415216]  do_chunk_alloc+0x1e5/0x2a0 [btrfs]
  [245043.416487]  find_free_extent+0xcd0/0xf60 [btrfs]
  [245043.417813]  btrfs_reserve_extent+0x96/0x1e0 [btrfs]
  [245043.419105]  btrfs_alloc_tree_block+0xfb/0x4a0 [btrfs]
  [245043.420378]  __btrfs_cow_block+0x127/0x550 [btrfs]
  [245043.421652]  btrfs_cow_block+0xee/0x190 [btrfs]
  [245043.422979]  btrfs_search_slot+0x227/0xa60 [btrfs]
  [245043.424279]  ? btrfs_update_inode_item+0x59/0x100 [btrfs]
  [245043.425538]  ? iput+0x72/0x1e0
  [245043.426798]  write_one_cache_group.isra.49+0x20/0x90 [btrfs]
  [245043.428131]  btrfs_start_dirty_block_groups+0x102/0x420 [btrfs]
  [245043.429419]  btrfs_commit_transaction+0x11b/0x880 [btrfs]
  [245043.430712]  ? start_transaction+0x8e/0x410 [btrfs]
  [245043.432006]  transaction_kthread+0x184/0x1a0 [btrfs]
  [245043.433341]  kthread+0xf0/0x130
  [245043.434628]  ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
  [245043.435928]  ? kthread_create_worker_on_cpu+0x40/0x40
  [245043.437236]  ret_from_fork+0x1f/0x30
  [245043.441054] ---[ end trace 15abaa2aaf36827f ]---

This is because at write_one_cache_group() when we are COWing a leaf from
the extent tree we end up allocating a new block group (chunk) and,
because we have hit a threshold on the number of bytes reserved for system
chunks, we attempt to finalize the creation of new block groups from the
current transaction, by calling btrfs_create_pending_block_groups().
However here we also need to modify the extent tree in order to insert
a block group item, and if the location for this new block group item
happens to be in the same leaf that we were COWing earlier, we deadlock
since btrfs_search_slot() tries to write lock the extent buffer that we
locked before at write_one_cache_group().

We have already hit similar cases in the past and commit d9a0540a79f8
("Btrfs: fix deadlock when finalizing block group creation") fixed some
of those cases by delaying the creation of pending block groups at the
known specific spots that could lead to a deadlock. This change reworks
that commit to be more generic so that we don't have to add similar logic
to every possible path that can lead to a deadlock. This is done by
making __btrfs_cow_block() disallowing the creation of new block groups
(setting the transaction's can_flush_pending_bgs to false) before it
attempts to allocate a new extent buffer for either the extent, chunk or
device trees, since those are the trees that pending block creation
modifies. Once the new extent buffer is allocated, it allows creation of
pending block groups to happen again.

This change depends on a recent patch from Josef which is not yet in
Linus' tree, named "btrfs: make sure we create all new block groups" in
order to avoid occasional warnings at btrfs_trans_release_chunk_metadata().

Fixes: d9a0540a79f8 ("Btrfs: fix deadlock when finalizing block group creation")
CC: stable@vger.kernel.org # 4.4+
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199753
Link: https://lore.kernel.org/linux-btrfs/CAJtFHUTHna09ST-_EEiyWmDH6gAqS6wa=zMNMBsifj8ABu99cw@mail.gmail.com/
Reported-by: E V &lt;eliventer@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
