blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	Btrfs: automatic rescan after "quota enable" command	Jan Schmidt	2013-05-06	1	-0/+11
\| \| \| \| \| \| \| \|	When qgroup tracking is enabled, we do an automatic cycle of the new rescan mechanism. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: rescan for qgroups	Jan Schmidt	2013-05-06	4	-34/+389
\| \| \| \| \| \| \| \| \| \| \| \| \|	If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: split btrfs_qgroup_account_ref into four functions	Jan Schmidt	2013-05-06	1	-105/+148
\| \| \| \| \| \| \| \| \| \|	The function is separated into a preparation part and the three accounting steps mentioned in the qgroups documentation. The goal is to make steps two and three usable by the rescan functionality. A side effect is that the function is restructured into readable subunits. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: allocate new chunks if the space is not enough for global rsv	Miao Xie	2013-05-06	1	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running the 208th of xfstests, the fs returned the enospc error when there was lots of free space in the disk. By bisect debug, we found it was introduced by commit 96f1bb5777. This commit makes the space check for the global reservation in can_overcommit() be inconsistent with should_alloc_chunk(). can_overcommit() requires that the free space is 2 times the size of the global reservation, or we can't do overcommit. And instead, we need reclaim some reserved space, and if we still don't have enough free space, we need allocate a new chunk. But unfortunately, should_alloc_chunk() just requires that the free space is 1 time the size of the global reservation, that is we would not try to allocate a new chunk if the free space size is in the middle of these two requires, and just return the enospc error. Fix it. Cc: Jim Schutt <jaschut@sandia.gov> Cc: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: separate sequence numbers for delayed ref tracking and tree mod log	Jan Schmidt	2013-05-06	7	-19/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sequence numbers for delayed refs have been introduced in the first version of the qgroup patch set. To solve the problem of find_all_roots on a busy file system, the tree mod log was introduced. The sequence numbers for that were simply shared between those two users. However, at one point in qgroup's quota accounting, there's a statement accessing the previous sequence number, that's still just doing (seq - 1) just as it would have to in the very first version. To satisfy that requirement, this patch makes the sequence number counter 64 bit and splits it into a major part (used for qgroup sequence number counting) and a minor part (incremented for each tree modification in the log). This enables us to go exactly one major step backwards, as required for qgroups, while still incrementing the sequence counter for tree mod log insertions to keep track of their order. Keeping them in a single variable means there's no need to change all the code dealing with comparisons of two sequence numbers. The sequence number is reset to 0 on commit (not new in this patch), which ensures we won't overflow the two 32 bit counters. Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs from the tree mod log code may happen. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	btrfs: move leak debug code to functions	Eric Sandeen	2013-05-06	3	-56/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up the leak debugging in extent_io.c by moving the debug code into functions. This also removes the list_heads used for debugging from the extent_buffer and extent_state structures when debug is not enabled. Since we need a global debug config to do that last part, implement CONFIG_BTRFS_DEBUG to accommodate. Thanks to Dave Sterba for the Kconfig bit. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: return free space in cow error path	Liu Bo	2013-05-06	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \|	Replace some BUG_ONs with proper handling and take allocated space back to free space cache for later use. We don't have to worry about extent maps since they'd be freed in releasepage path. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: set UUID in root_item for created trees	Stefan Behrens	2013-05-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is a rare exception that a new tree is created, like the qgroups tree. So far these new trees have an all-zero UUID in their root items. All trees that mkfs.btrfs has created get an UUID during the first mount when btrfs_read_root_item() rewrites the root_item to the v2 structure style. These UUID are never used so far, but anyway, since it is better to have it uniform for all trees, this commit adds some lines that generate and write an UUID for newly created trees. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: delete unused parameter to btrfs_read_root_item()	Stefan Behrens	2013-05-06	3	-8/+6
\| \| \| \| \|	Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix error handling in btrfs_ioctl_send()	Tsutomu Itoh	2013-05-06	1	-2/+2
\| \| \| \| \| \| \|	fget() returns NULL if error. So, we should check NULL or not. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: remove unused variable in __process_changed_new_xattr()	Tsutomu Itoh	2013-05-06	1	-2/+0
\| \| \| \| \| \| \|	Variable 'p' is not used any more. So, remove it. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: various abort cleanups	Josef Bacik	2013-05-06	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I have a broken file system that when it aborts leaves all sorts of accounting things wrong and gives you lots of WARN_ON()'s other than the abort. This is because we're not cleaning up various parts of the file system when we abort. The first chunks are specific to mount failures, we weren't cleaning up the block group cached inodes and we weren't cleaning up any transactions that had been aborted, which leaves a bunch of things laying around. The second half of this are related to the cleanup parts. First we don't need to release space for the dirty pages from the trans_block_rsv, that's all handled by the trans handles so this is just plain wrong. The other thing is we need to pin down extents that were set ->must_insert_reserved for delayed refs. This isn't so much for the pinning but more for the cleaning up the cache->reserved counter since we are no longer going to use those reserved bytes. With this patch I no longer see a bunch of WARN_ON()'s when I try to mount this broken file system, just the initial one from the abort. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: cleanup destroy_marked_extents	Josef Bacik	2013-05-06	3	-32/+11
\| \| \| \| \| \| \| \|	We can just look up the extent_buffers for the range and free stuff that way. This makes the cleanup a bit cleaner and we can make sure to evict the extent_buffers pretty quickly by marking them as stale. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: check return value of commit when recovering log	Josef Bacik	2013-05-06	1	-5/+6
\| \| \| \| \| \| \| \| \|	We need to check the return value of the commit in case something goes wrong, otherwise we could end up going down the line and doing more stuff (like orphan cleanup) before we notice we should have errored out. We need to do this before we free up the log_tree_root since the caller will handle all of that. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: don't panic if we're trying to drop too many refs	Josef Bacik	2013-05-06	1	-1/+7
\| \| \| \| \| \| \|	This is just obnoxious. Just print a message, abort the transaction, and return an error. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: cleanup fs roots if we fail to mount	Josef Bacik	2013-05-06	1	-31/+31
\| \| \| \| \| \| \| \| \|	We can run the tree logging recovery or the orphan cleanup on mount, so we'll end up looking up a random fs tree in the meantime. So we need to clean this up so we don't leave extent buffers hanging around on the cache. With this patch we no longer leak extent buffers on failure to mount. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix extent logging with O_DIRECT into prealloc	Josef Bacik	2013-05-06	1	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the same as the fix from commit Btrfs: fix bad extent logging but for O_DIRECT. I missed this when I fixed the problem originally, we were still using the em for the orig_start and orig_block_len, which would be the merged extent. We need to use the actual extent from the on disk file extent item, which we have to lookup to make sure it's ok to nocow anyway so just pass in some pointers to hold this info. Thanks, Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix all callers of read_tree_block	Josef Bacik	2013-05-06	5	-13/+59
\| \| \| \| \| \| \| \| \| \| \|	We kept leaking extent buffers when mounting a broken file system and it turns out it's because not everybody uses read_tree_block properly. You need to check and make sure the extent_buffer is uptodate before you use it. This patch fixes everybody who calls read_tree_block directly to make sure they check that it is uptodate and free it and return an error if it is not. With this we no longer leak EB's when things go horribly wrong. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: only exclude supers in the range of our block group	Josef Bacik	2013-05-06	1	-3/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we fail to load block groups halfway through we can leave extent_state's on the excluded tree. This is because we just lookup the supers and add them to the excluded tree regardless of which block group we are looking at currently. This is a problem because we remove the excluded extents for the range of the block group only, so if we don't ever load a block group for one of the excluded extents we won't ever free it. This fixes the problem by only adding excluded extents if it falls in the block group range we care about. With this patch we're no longer leaking space when we fail to read all of the block groups. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: add tree block level sanity check	Josef Bacik	2013-05-06	1	-0/+6
\| \| \| \| \| \| \| \|	With a users corrupted fs I was getting weird behavior and panics and it turns out it was because one of his tree blocks had a bogus header level. So add this to the sanity checks in the endio handler for tree blocks. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: don't try and free ebs twice in log replay	Josef Bacik	2013-05-06	1	-7/+0
\| \| \| \| \| \| \|	This work is done by btrfs_free_path() anyway so there's no need for this duplicate work. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: don't BUG_ON() in btrfs_num_copies	Josef Bacik	2013-05-06	1	-2/+18
\| \| \| \| \| \| \| \| \| \|	A user sent me a btrfs-image that was panicing because of some corruption. This is because we pass in a bogus value to btrfs_num_copies, and it panics. Instead just return 1. We only call btrfs_num_copies to see if there are other copies to try and read for things, so if we just return 1 it will make the callers exit out with an appropriate error value. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: don't call readahead hook until we have read the entire eb	Josef Bacik	2013-05-06	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \|	Martin Steigerwald reported a BUG_ON() where we were given a bogus bytenr to map. Turns out he is using > PAGESIZE leafsizes. The readahead stuff is called every time we do a completion, but we may not have finished reading in all the pages, so the bytenr we read off the node could be completely bogus. Fix this by only calling the readahead hook once all pages have been read in. Thanks, Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: deal with bad mappings in btrfs_map_block	Josef Bacik	2013-05-06	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \|	Martin Steigerwald reported a BUG_ON() in btrfs_map_block where we didn't find a chunk for a particular block we were trying to map. This happened because the block was bogus. We shouldn't be BUG_ON()'ing in this case, just print a message and return an error. This came from reada_add_block and it appears to deal with an error fine so we should be good there. Thanks, Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: use REQ_META for all metadata IO	Josef Bacik	2013-05-06	1	-8/+10
\| \| \| \| \| \| \|	We need to tag metadata io with REQ_META to avoid priority inversion when using io throttling cqroups. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix possible infinite loop in slow caching	Josef Bacik	2013-05-06	1	-2/+7
\| \| \| \| \| \| \| \| \| \|	So I noticed there is an infinite loop in the slow caching code. If we return 1 when we hit the end of the tree, so we could end up caching the last block group the slow way and suddenly we're looping forever because we just keep re-searching and trying again. Fix this by only doing btrfs_next_leaf() if we don't need_resched(). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix lockdep warning	Josef Bacik	2013-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The locking order for stuff is __sb_start_write ordered_mutex but with sync() we don't do __sb_start_write for some strange reason, which means that our iput in wait_ordered_extents could start a transaction which does the __sb_start_write while we're holding the ordered_mutex. Fix this by using delayed iput in sync. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: add all ioctl checks before user change for quota operations	Wang Shilong	2013-05-06	1	-5/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since all the quota configurations are loaded in memory, and we can have ioctl checks before operating in the disk. It is safe to do such things because qgroup_ioctl_lock is held outside. Without these extra checks firstly, it should be ok to do user change for quota operations. For example: if we want to add an existed qgroup, we will do: ->add_qgroup_item() ->add_qgroup_rb() add_qgroup_item() will return -EEXIST to us, however, qgroups are all in memory, why not check them in memory firstly. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix missing check about ulist_add() in qgroup.c	Wang Shilong	2013-05-06	1	-18/+44
\| \| \| \| \| \| \| \|	ulist_add() may return -ENOMEM, fix missing check about return value. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: clear received_uuid field for new writable snapshots	Stefan Behrens	2013-05-06	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For created snapshots, the full root_item is copied from the source root and afterwards selectively modified. The current code forgets to clear the field received_uuid. The only problem is that it is confusing when you look at it with 'btrfs subv list', since for writable snapshots, the contents of the snapshot can be completely unrelated to the previously received snapshot. The receiver ignores such snapshots anyway because he also checks the field stransid in the root_item and that value used to be reset to zero for all created snapshots. This commit changes two things: - clear the received_uuid field for new writable snapshots. - don't clear the send/receive related information like the stransid for read-only snapshots (which makes them useable as a parent for the automatic selection of parents in the receive code). Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: don't force pages under writeback to finish when aborting	Josef Bacik	2013-05-06	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Dave reported a BUG_ON() that happened in end_page_writeback() after an abort. This happened because we unconditionally call end_page_writeback() in the endio case, which is right. However when we abort the transaction we will call end_page_writeback() on any writeback pages we find, which is wrong. We need to lock the page and wait on page writeback to complete if it is. There is nothing unsafe about this since we are discarding the transaction anyway. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: remove unused variable in the iterate_extent_inodes()	Wang Shilong	2013-05-06	1	-2/+0
\| \| \| \| \|	Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: return error when we specify wrong start to defrag	Liu Bo	2013-05-06	1	-4/+7
\| \| \| \| \| \| \| \| \| \|	We need such a sanity check for wrong start when we defrag a file, otherwise, even with a wrong start that's larger than file size, we can end up changing not only inode's force compress flag but also FS's incompat flags. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix reada debug code compilation	Vincent	2013-05-06	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the following errors: fs/btrfs/reada.c: In function ‘btrfs_reada_wait’: fs/btrfs/reada.c:958:42: error: invalid operands to binary < (have ‘atomic_t’ and ‘int’) fs/btrfs/reada.c:961:41: error: invalid operands to binary < (have ‘atomic_t’ and ‘int’) Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: cleanup of function where btrfs_extend_item() is called	Tsutomu Itoh	2013-05-06	1	-3/+2
\| \| \| \| \| \| \| \|	Argument 'trans' became unnecessary from setup_inline_extent_backref() that called btrfs_extend_item(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: remove unused argument of btrfs_extend_item()	Tsutomu Itoh	2013-05-06	7	-11/+9
\| \| \| \| \| \| \|	Argument 'trans' is not used in btrfs_extend_item(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: cleanup of function where fixup_low_keys() is called	Tsutomu Itoh	2013-05-06	10	-51/+38
\| \| \| \| \| \| \| \|	If argument 'trans' is unnecessary in the function where fixup_low_keys() is called, 'trans' is deleted. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: remove unused argument of fixup_low_keys()	Tsutomu Itoh	2013-05-06	1	-10/+8
\| \| \| \| \| \| \|	Argument 'trans' is not used in fixup_low_keys(). So, remove it. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix confusing edquot happening case	Wang Shilong	2013-05-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Step to reproduce: mkfs.btrfs <disk> mount <disk> <mnt> dd if=/dev/zero of=/<mnt>/data bs=1M count=10 sync btrfs quota enable <mnt> btrfs qgroup create 0/5 <mnt> btrfs qgroup limit 5M 0/5 <mnt> rm -f /<mnt>/data sync btrfs qgroup show <mnt> dd if=/dev/zero of=data bs=1M count=1 >From the perspective of users, qgroup's referenced or exclusive is negative,but user can not continue to write data! a workaround way is to cast u64 to s64 when doing qgroup reservation. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: do not continue if out of memory happens	Wang Shilong	2013-05-06	1	-2/+4
\| \| \| \| \| \| \| \|	If out of memory happens, we should return -ENOMEM directly to the caller rather than continue the work. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	btrfs: fix minor typo in comment	Nathaniel Yazdani	2013-05-06	1	-1/+1
\| \| \| \| \| \| \| \|	In the comment describing the sync_writers field of the btrfs_inode struct, "fsyncing" was misspelled "fsycing." Signed-off-by: Nathaniel Yazdani <n1ght.4nd.d4y@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: cleanup to remove reduplicate code in transaction.c	Wang Shilong	2013-05-06	1	-12/+2
\| \| \| \| \|	Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix unlock after free on rewinded tree blocks	Jan Schmidt	2013-05-06	1	-7/+11
\| \| \| \| \| \| \| \| \|	When tree_mod_log_rewind decides to make a copy of the current tree buffer for its modifications, it subsequently freed the buffer before unlocking it. Obviously, those operations are required in reverse order. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix accessing the root pointer in tree mod log functions	Jan Schmidt	2013-05-06	1	-20/+20
\| \| \| \| \| \| \| \| \| \|	The tree mod log functions were accessing root->node->... directly, without use of btrfs_root_node() or explicit rcu locking. This could lead to an extent buffer reference being leaked and another reference being freed too early when preemtion was enabled. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix tree mod log regression on root split operations	Jan Schmidt	2013-05-06	1	-26/+29
\| \| \| \| \| \| \| \| \| \| \| \|	Commit d9abbf1c changed tree mod log locking around ROOT_REPLACE operations. When a tree root is split, however, we were logging removal of all elements from the root node before logging removal of half of the elements for the split operation. This leads to a BUG_ON when rewinding. This commit removes the erroneous logging of removal of all elements. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: use a lock to protect incompat/compat flag of the super block	Miao Xie	2013-05-06	3	-11/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following case will make the incompat/compat flag of the super block be recovered. Task1 \|Task2 flags = btrfs_super_incompat_flags(); \| \|flags = btrfs_super_incompat_flags(); flags \|= new_flag1; \| \|flags \|= new_flag2; btrfs_set_super_incompat_flags(flags); \| \|btrfs_set_super_incompat_flags(flags); the new_flag1 is recovered. In order to avoid this problem, we introduce a lock named super_lock into the btrfs_fs_info structure. If we want to update incompat/compat flags of the super block, we must hold it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix unblocked autodefraggers when remount	Miao Xie	2013-05-06	1	-3/+7
\| \| \| \| \| \| \| \| \|	The new mount option is set after parsing the remount arguments, so it is wrong that checking the autodefrag is close or not at btrfs_remount_prepare(). Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: add a rb_tree to improve performance of ulist search	Wang Shilong	2013-05-06	2	-8/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Walking backref tree and btrfs quota rely on ulist very much. This patch tries to use rb_tree to speed up search time. The original code always checks whether an element exists before adding a new element, however it costs O(n). I try to add a rb_tree in the ulist,this is only used to speed up search. I also do some measurements with quota enabled. fsstress -p 4 -n 10000 Without this path: real 0m51.058s 2m4.745s 1m28.222s 1m5.137s user 0m0.035s 0m0.041s 0m0.105s 0m0.100s sys 0m12.009s 0m11.246s 0m10.901s 0m10.999s 0m11.287s With this path: real 0m55.295s 0m50.960s 1m2.214s 0m48.273s user 0m0.053s 0m0.095s 0m0.135s 0m0.107s sys 0m7.766s 0m6.013s 0m6.319s 0m6.030s 0m6.532s After applying the patch,the execute time is down by ~42%.(11.287s->6.532s) Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: allow omitting stream header and end-cmd for btrfs send	Stefan Behrens	2013-05-06	1	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two new flags are added to allow omitting the stream header and the end command for btrfs send streams. This is used in cases where you send multiple snapshots back-to-back in one stream. This used to be encoded like this (with 2 snapshots in this example): <stream header> + <sequence of commands> + <end cmd> + <stream header> + <sequence of commands> + <end cmd> + EOF The new format (if the two new flags are used) is this one: <stream header> + <sequence of commands> + <sequence of commands> + <end cmd> Note that the currently existing receivers treat <end cmd> only as an indication that a new <stream header> is following. This means, you can just skip the sequence <end cmd> <stream header> without loosing compatibility. As long as an EOF is following, the currently existing receivers handle the new format (if the two new flags are used) exactly as the old one. So what is the benefit of this change? The goal is to be able to use a single stream (one TCP connection) to multiplex a request/response handshake plus Btrfs send streams, all in the same stream. In this case you cannot evaluate an EOF condition as an end of the Btrfs send stream. You need something else, and the <end cmd> is just perfect for this purpose. The summary is: The format change is driven by the need to send several Btrfs send streams over a single TCP connections, with the ability for a repeated request/response handshake in the middle. And this format change does not break any existing tool, it is completely compatible. You could compare the old behaviour of the Btrfs send stream to the one of ftp where you need a seperate request/response channel and newly opened data transfer channels for each file, while the new behaviour is more like http using a single stream for everything. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: make __merge_refs() return type be void	Wang Shilong	2013-05-06	1	-8/+3
\| \| \| \| \| \| \| \|	__merge_refs() always return 0, it is unnecessary for the caller to check the return value. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>