diff options
author | Josef Bacik <jbacik@fusionio.com> | 2013-04-05 16:51:15 -0400 |
---|---|---|
committer | Josef Bacik <jbacik@fusionio.com> | 2013-05-06 15:54:34 -0400 |
commit | 09a2a8f96e3009273bed1833b3f210e2c68728a5 (patch) | |
tree | f4742a6e962991e9f7b0252805186466f5a005c6 /fs/btrfs/extent_map.c | |
parent | cc95bef635a649d595cf8d1cd4fcff5b6bf13023 (diff) | |
download | talos-obmc-linux-09a2a8f96e3009273bed1833b3f210e2c68728a5.tar.gz talos-obmc-linux-09a2a8f96e3009273bed1833b3f210e2c68728a5.zip |
Btrfs: fix bad extent logging
A user sent me a btrfs-image of a file system that was panicing on mount during
the log recovery. I had originally thought these problems were from a bug in
the free space cache code, but that was just a symptom of the problem. The
problem is if your application does something like this
[prealloc][prealloc][prealloc]
the internal extent maps will merge those all together into one extent map, even
though on disk they are 3 separate extents. So if you go to write into one of
these ranges the extent map will be right since we use the physical extent when
doing the write, but when we log the extents they will use the wrong sizes for
the remainder prealloc space. If this doesn't happen to trip up the free space
cache (which it won't in a lot of cases) then you will get bogus entries in your
extent tree which will screw stuff up later. The data and such will still work,
but everything else is broken. This patch fixes this by not allowing extents
that are on the modified list to be merged. This has the side effect that we
are no longer adding everything to the modified list all the time, which means
we now have to call btrfs_drop_extents every time we log an extent into the
tree. So this allows me to drop all this speciality code I was using to get
around calling btrfs_drop_extents. With this patch the testcase I've created no
longer creates a bogus file system after replaying the log. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Diffstat (limited to 'fs/btrfs/extent_map.c')
-rw-r--r-- | fs/btrfs/extent_map.c | 18 |
1 files changed, 13 insertions, 5 deletions
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 2834ca5768ea..ca968742c3db 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -174,6 +174,14 @@ static int mergable_maps(struct extent_map *prev, struct extent_map *next) test_bit(EXTENT_FLAG_LOGGING, &next->flags)) return 0; + /* + * We don't want to merge stuff that hasn't been written to the log yet + * since it may not reflect exactly what is on disk, and that would be + * bad. + */ + if (!list_empty(&prev->list) || !list_empty(&next->list)) + return 0; + if (extent_map_end(prev) == next->start && prev->flags == next->flags && prev->bdev == next->bdev && @@ -209,9 +217,7 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em) em->mod_len = (em->mod_len + em->mod_start) - merge->mod_start; em->mod_start = merge->mod_start; em->generation = max(em->generation, merge->generation); - list_move(&em->list, &tree->modified_extents); - list_del_init(&merge->list); rb_erase(&merge->rb_node, &tree->map); free_extent_map(merge); } @@ -227,7 +233,6 @@ static void try_merge_map(struct extent_map_tree *tree, struct extent_map *em) merge->in_tree = 0; em->mod_len = (merge->mod_start + merge->mod_len) - em->mod_start; em->generation = max(em->generation, merge->generation); - list_del_init(&merge->list); free_extent_map(merge); } } @@ -302,7 +307,7 @@ void clear_em_logging(struct extent_map_tree *tree, struct extent_map *em) * reference dropped if the merge attempt was successful. */ int add_extent_mapping(struct extent_map_tree *tree, - struct extent_map *em) + struct extent_map *em, int modified) { int ret = 0; struct rb_node *rb; @@ -324,7 +329,10 @@ int add_extent_mapping(struct extent_map_tree *tree, em->mod_start = em->start; em->mod_len = em->len; - try_merge_map(tree, em); + if (modified) + list_move(&em->list, &tree->modified_extents); + else + try_merge_map(tree, em); out: return ret; } |