summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: call the qgroup accounting functionsJan Schmidt2012-07-122-0/+17
| | | | Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: qgroup implementation and prototypesArne Jansen2012-07-127-1/+1681
| | | | | Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: Test code to change the order of delayed-ref processingArne Jansen2012-07-101-0/+49
| | | | | | | | | | Normally delayed refs get processed in ascending bytenr order. This correlates in most cases to the order added. To expose dependencies on this order, we start to process the tree in the middle instead of the beginning. This code is only effective when SCRAMBLE_DELAYED_REFS is defined. Signed-off-by: Arne Jansen <sensille@gmx.net>
* Btrfs: qgroup state and initializationArne Jansen2012-07-102-0/+31
| | | | | | Add state to fs_info. Signed-off-by: Arne Jansen <sensille@gmx.net>
* Btrfs: added helper to create new treesArne Jansen2012-07-102-1/+83
| | | | | | | This creates a brand new tree. Will be used to create the quota tree. Signed-off-by: Arne Jansen <sensille@gmx.net>
* Btrfs: check the root passed to btrfs_end_transactionArne Jansen2012-07-102-0/+12
| | | | | | | | This patch only add a consistancy check to validate that the same root is passed to start_transaction and end_transaction. Subvolume quota depends on this. Signed-off-by: Arne Jansen <sensille@gmx.net>
* Btrfs: add helper for tree enumerationArne Jansen2012-07-102-0/+75
| | | | | | | | | Often no exact match is wanted but just the next lower or higher item. There's a lot of duplicated code throughout btrfs to deal with the corner cases. This patch adds a helper function that can facilitate searching. Signed-off-by: Arne Jansen <sensille@gmx.net>
* Btrfs: qgroup on-disk formatArne Jansen2012-07-101-0/+136
| | | | | | | Not all features are in use by the current version and thus may change in the future. Signed-off-by: Arne Jansen <sensille@gmx.net>
* Btrfs: join tree mod log code with the code holding back delayed refsJan Schmidt2012-07-109-219/+240
| | | | | | | | | | | | We've got two mechanisms both required for reliable backref resolving (tree mod log and holding back delayed refs). You cannot make use of one without the other. So instead of requiring the user of this mechanism to setup both correctly, we join them into a single interface. Additionally, we stop inserting non-blockers into fs_info->tree_mod_seq_list as we did before, which was of no value. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: fix buffer leak in btrfs_next_old_leafJan Schmidt2012-07-101-0/+1
| | | | | | | When calling btrfs_next_old_leaf, we were leaking an extent buffer in the rare case of using the deadlock avoidance code needed for the tree mod log. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: run delayed directory updates during log replayChris Mason2012-07-021-0/+6
| | | | | | | | | | | | | | While we are resolving directory modifications in the tree log, we are triggering delayed metadata updates to the filesystem btrees. This commit forces the delayed updates to run so the replay code can find any modifications done. It stops us from crashing because the directory deleltion replay expects items to be removed immediately from the tree. Signed-off-by: Chris Mason <chris.mason@fusionio.com> cc: stable@kernel.org
* Btrfs: hold a ref on the inode during writepagesJosef Bacik2012-07-021-0/+14
| | | | | | | | | | | | We can race with unlink and not actually be able to do our igrab in btrfs_add_ordered_extent. This will result in all sorts of problems. Instead of doing the complicated work to try and handle returning an error properly from btrfs_add_ordered_extent, just hold a ref to the inode during writepages. If we cannot grab a ref we know we're freeing this inode anyway and can just drop the dirty pages on the floor, because screw them we're going to invalidate them anyway. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: fix tree log remove space corner caseJosef Bacik2012-07-021-93/+52
| | | | | | | | | | | | | | The tree log stuff can have allocated space that we end up having split across a bitmap and a real extent. The free space code does not deal with this, it assumes that if it finds an extent or bitmap entry that the entire range must fall within the entry it finds. This isn't necessarily the case, so rework the remove function so it can handle this case properly. This fixed two panics the user hit, first in the case where the space was initially in a bitmap and then in an extent entry, and then the reverse case. Thanks, Reported-and-tested-by: Shaun Reich <sreich@kde.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: fix wrong check during log recoveryLiu Bo2012-07-021-1/+1
| | | | | | | | | | | When we're evicting an inode during log recovery, we need to ensure that the inode is not in orphan state any more, which means inode's run_time flags has _no_ BTRFS_INODE_HAS_ORPHAN_ITEM. Thus, the BUG_ON was triggered because of a wrong check for the flags. Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGSAlexander Block2012-07-021-1/+1
| | | | | | | | | We used the wrong ioctl macro for the getflags ioctl before. As we don't have the set/getflags ioctls in the user space ioctl.h at the moment, it's safe to fix it now. Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Alexander Block <ablock84@googlemail.com>
* Btrfs: resume balance on rw (re)mounts properlyIlya Dryomov2012-07-024-18/+47
| | | | | | | | This introduces btrfs_resume_balance_async(), which, given that restriper state was recovered earlier by btrfs_recover_balance(), resumes balance in btrfs-balance kthread. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Btrfs: restore restriper state on all mountsIlya Dryomov2012-07-023-25/+26
| | | | | | | | | | Fix a bug that triggered asserts in btrfs_balance() in both normal and resume modes -- restriper state was not properly restored on read-only mounts. This factors out resuming code from btrfs_restore_balance(), which is now also called earlier in the mount sequence to avoid the problem of some early writes getting the old profile. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Btrfs: fix dio write vs buffered read raceJosef Bacik2012-07-022-18/+50
| | | | | | | | | | | | | | | | | | | | Miao pointed out there's a problem with mixing dio writes and buffered reads. If the read happens between us invalidating the page range and actually locking the extent we can bring in pages into page cache. Then once the write finishes if somebody tries to read again it will just find uptodate pages and we'll read stale data. So we need to lock the extent and check for uptodate bits in the range. If there are uptodate bits we need to unlock and invalidate again. This will keep this race from happening since we will hold the extent locked until we create the ordered extent, and then teh read side always waits for ordered extents. There was also a race in how we updated i_size, previously we were relying on the generic DIO stuff to adjust the i_size after the DIO had completed, but this happens outside of the extent lock which means reads could come in and not see the updated i_size. So instead move this work into where we create the extents, and then this way the update ordered i_size stuff works properly in the endio handlers. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: don't count I/O statistic read errors for missing devicesStefan Behrens2012-07-021-10/+12
| | | | | | | | | | | It is normal behaviour of the low level btrfs function btrfs_map_bio() to complete a bio with -EIO if the device is missing, instead of just preventing the bio creation in an earlier step. This used to cause I/O statistic read error increments and annoying printk_ratelimited messages. This commit fixes the issue. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Reported-by: Carey Underwood <cwillu@cwillu.com>
* Btrfs: resolve tree mod log locking issue in btrfs_next_leafJan Schmidt2012-06-271-0/+12
| | | | | | | | | | | | | | | | | | | With the tree mod log, we may end up with two roots (the current root and a rewinded version of it) both pointing to two leaves, l1 and l2, of which l2 had already been cow-ed in the current transaction. If we don't rewind any tree blocks, we cannot have two roots both pointing to an already cowed tree block. Now there is btrfs_next_leaf, which has a leaf locked and wants a lock on the next (right) leaf. And there is push_leaf_left, which has a (cowed!) leaf locked and wants a lock on the previous (left) leaf. In order to solve this dead lock situation, we use try_lock in btrfs_next_leaf (only in case it's called with a tree mod log time_seq paramter) and if we fail to get a lock on the next leaf, we give up our lock on the current leaf and retry from the very beginning. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: fix tree mod log rewind of ADD operationsJan Schmidt2012-06-271-5/+1
| | | | | | | | | | | | | | When a MOD_LOG_KEY_ADD operation is rewinded, we remove the key from the tree block. If its not the last key, removal involves a move operation. This move operation was explicitly done before this commit. However, at insertion time, there's a move operation before the actual addition to make room for the new key, which is recorded in the tree mod log as well. This means, we must drop the move operation when rewinding the add operation, because the next operation we'll be rewinding will be the corresponding MOD_LOG_MOVE_KEYS operation. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: leave critical region in btrfs_find_all_roots as soon as possibleJan Schmidt2012-06-271-2/+1
| | | | | | | | When delayed refs exist, btrfs_find_all_roots used to hold the delayed ref mutex way longer than actually required. We ought to drop it immediately after we're done collecting all the delayed refs. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: always put insert_ptr modifications into the tree mod logJan Schmidt2012-06-271-7/+7
| | | | | | | | | Several callers of insert_ptr set the tree_mod_log parameter to 0 to avoid addition to the tree mod log. In fact, we need all of those operations. This commit simply removes the additional parameter and makes addition to the tree mod log unconditional. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: fix tree mod log for root replacements at leaf levelJan Schmidt2012-06-271-13/+15
| | | | | | | | | | | | | | | For the tree mod log, we don't log any operations at leaf level. If the root is at the leaf level (i.e. the tree consists only of the root), then __tree_mod_log_oldest_root will find a ROOT_REPLACE operation in the log (because we always log that one no matter which level), but no other operations. With this patch __tree_mod_log_oldest_root exits cleanly instead of BUGging in this situation. get_old_root checks if its really a root at leaf level in case we don't have any operations and WARNs if this assumption breaks. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: support root level changes in __resolve_indirect_refJan Schmidt2012-06-271-4/+8
| | | | | | | | | With the tree mod log, we can have a tree that's two levels high, but btrfs_search_old_slot may still return a path with the tree root at level one instead. __resolve_indirect_ref must care for this and accept parents in a lower level than expected. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: avoid waiting for delayed refs when we must notJan Schmidt2012-06-271-5/+6
| | | | | | | | | | | | | | We track two conditions to decide if we should sleep while waiting for more delayed refs, the number of delayed refs (num_refs) and the first entry in the list of blockers (first_seq). When we suspect staleness, we save num_refs and do one more cycle. If nothing changes, we then save first_seq for later comparison and do wait_event. We ought to save first_seq the very same moment we're saving num_refs. Otherwise we cannot be sure that nothing has changed and we might start waiting when we shouldn't, which could lead to starvation. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* Btrfs: delay iput with async extentsJosef Bacik2012-06-211-2/+2
| | | | | | | | | | | There is some concern that these iput()'s could be the final iputs and could induce lockups on people waiting on writeback. This would happen in the rare case that we don't create ordered extents because of an error, but it is theoretically possible and we already have a mechanism to deal with this so just make them delayed iputs to negate any worry. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: add a missing spin_lockJosef Bacik2012-06-211-0/+1
| | | | | | | | | | | | | | | When fixing up the locking in the delayed ref destruction work I accidently broke the locking myself ;(. Add back a spin_lock that should be there and we are now all set. Thanks, Btrfs: add a missing spin_lock When fixing up the locking in the delayed ref destruction work I accidently broke the locking myself ;(. Add back a spin_lock that should be there and we are now all set. Thanks, Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: don't assume to be on the correct extent in add_all_parentsAlexander Block2012-06-211-42/+52
| | | | | | | | | | | | | | | | | | | | add_all_parents did assume that path is already at a correct extent data item, which may not be true in case of data extents that were partly rewritten and splitted. We need to check if we're on a matching extent for every item and only for the ones after the first. The loop is changed to do this now. This patch also fixes a bug introduced with commit 3b127fd8 "Btrfs: remove obsolete btrfs_next_leaf call from __resolve_indirect_ref". The removal of next_leaf did sometimes result in slot==nritems when the above described case happens, and thus resulting in invalid values (e.g. wanted_obejctid) in add_all_parents (leading to missed backrefs or even crashes). Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: introduce btrfs_next_old_itemAlexander Block2012-06-211-2/+7
| | | | | | | | | | | We introduce btrfs_next_old_item that uses btrfs_next_old_leaf instead of btrfs_next_leaf. btrfs_next_item is also changed to simply call btrfs_next_old_item with time_seq being 0. Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: cast devid to unsigned long long for printk %lluChris Mason2012-06-151-1/+2
| | | | | | Avoid warning in 32 bit machines Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: init old_generation in get_old_rootChris Mason2012-06-151-1/+1
| | | | | | | | gcc was giving an uninit variable warning here. Strictly speaking we don't need to init it, but this will make things much less error prone. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: destroy the items of the delayed inodes in error handling routineMiao Xie2012-06-153-0/+27
| | | | | | | | the items of the delayed inodes were forgotten to be freed, this patch fixes it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: make sure that we've made everything in pinned tree cleanLiu Bo2012-06-151-0/+11
| | | | | | | | Since we have two trees for recording pinned extents, we need to go through both of them to make sure that we've done everything clean. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: avoid memory leak of extent state in error handling routineLiu Bo2012-06-151-0/+2
| | | | | | | | | | | | | | | We've forgotten to clear extent states in pinned tree, which will results in space counter mismatch and memory leak: WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 [btrfs]() ... space_info 2 has 8380416 free, is not full space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, readonly=4194304 btrfs state leak: start 29364224 end 29376511 state 1 in tree ffff880075f20090 refs 1 ... Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: do not resize a seeding deviceLiu Bo2012-06-151-0/+7
| | | | | | | Seeding devices are not supposed to change any more. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: fix missing inherited flag in renameLiu Bo2012-06-151-3/+6
| | | | | | | | | | | | | When we move a file into a directory with compression flag, we need to inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well. But if we move a file into a directory without compression flag, we need to clear both of them. It is the way how our setflags deals with compression flag, so keep the same behaviour here. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into ↵Chris Mason2012-06-153-35/+70
|\ | | | | | | for-linus
| * Btrfs: fix race in tree mod log additionJan Schmidt2012-06-141-4/+19
| | | | | | | | | | | | | | | | | | When adding to the tree modification log, we grab two locks at different stages. We must not drop the outer lock until we're done with section protected by the inner lock. This moves the unlock call for the outer lock to the appropriate position. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: add btrfs_next_old_leafJan Schmidt2012-06-143-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | To make sense of the tree mod log, the backref walker not only needs btrfs_search_old_slot, but it also called btrfs_next_leaf, which in turn was calling btrfs_search_slot. This obviously didn't give the correct result. This commit adds btrfs_next_old_leaf, a drop-in replacement for btrfs_next_leaf with a time_seq parameter. If it is zero, it behaves exactly like btrfs_next_leaf. If it is non-zero, it will use btrfs_search_old_slot with this time_seq parameter. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: fix return value for __tree_mod_log_oldest_rootJan Schmidt2012-06-141-13/+20
| | | | | | | | | | | | | | | | | | | | In __tree_mod_log_oldest_root() we must return the found operation even if it's not a ROOT_REPLACE operation. Otherwise, the caller assumes that there are no operations to be rewinded and returns immediately. The code in the caller is modified to improve readability. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: use btrfs_read_lock_root_node in get_old_rootJan Schmidt2012-06-141-4/+16
| | | | | | | | | | | | | | | | | | get_old_root could race with root node updates because we weren't locking the node early enough. Use btrfs_read_lock_root_node to grab the root locked in the very beginning and release the lock as soon as possible (just like btrfs_search_slot does). Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: remove obsolete btrfs_next_leaf call from __resolve_indirect_refJan Schmidt2012-06-141-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When resolving indirect refs, we used to call btrfs_next_leaf in case we didn't find an exact match. While we should find exact matches most of the time, in case we don't, we must continue searching. Treating those matches differently depending on the level we're searching doesn't make sense. Even worse, we might end up searching for a key larger than the largest, in which case there is no next_leaf and subsequent jobs would fail. This commit drops the bogous lines. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: remove call to btrfs_header_nritems with no effectJan Schmidt2012-06-041-3/+0
| | | | | | | | | | | | | | | | This is a leftover from cleanup patch 559af821. Before the cleanup, btrfs_header_nritems was called inside an if condition. As it has no side effects we need to preserve here, it should simply be dropped. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* | Btrfs: fix incompat flags settingLi Zefan2012-06-141-1/+1
| | | | | | | | | | | | | | It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which has only one bit set. Signed-off-by: Li Zefan <lizefan@huawei.com>
* | Btrfs: fix defrag regressionLi Zefan2012-06-141-48/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a file has 3 small extents: | ext1 | ext2 | ext3 | Running "btrfs fi defrag" will only defrag the last two extents, if those extent mappings hasn't been read into memory from disk. This bug was introduced by commit 17ce6ef8d731af5edac8c39e806db4c7e1f6956f ("Btrfs: add a check to decide if we should defrag the range") The cause is, that commit looked into previous and next extents using lookup_extent_mapping() only. While at it, remove the code that checks the previous extent, since it's sufficient to check the next extent. Signed-off-by: Li Zefan <lizefan@huawei.com>
* | Btrfs: call filemap_fdatawrite twice for compressionJosef Bacik2012-06-143-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I removed this in an earlier commit and I was wrong. Because compression can return from filemap_fdatawrite() without having actually set any of it's pages as writeback() it can make filemap_fdatawait() do essentially nothing, and then we won't find any ordered extents because they may not have been created yet. So not only does this make fsync() completely useless, but it will also screw up if you truncate on a non-page aligned offset since we zero out the end and then wait on ordered extents and then call drop caches. We can drop the cache before the io completes and then we try to unpin the extent we just wrote we won't find it and everything goes sideways. So fix this by putting it back and put a giant comment there to keep me from trying to remove it in the future. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: keep inode pinned when compressing writesJosef Bacik2012-06-141-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported lots of problems using compression on the new code and it turns out part of the problem was that igrab() was failing when we added a new ordered extent. This is because when writing out an inode under compression we immediately return without actually doing anything to the pages, and then in another thread at some point down the line actually do the ordered dance. The problem is between the point that we start writeback and we actually add the ordered extent we could be trying to reclaim the inode, which makes igrab() return NULL. So we need to do an igrab() when we create the async extent and then drop it when we are done with it. This makes sure we stay pinned in memory until the ordered extent can get a reference on it and we are good to go. With this patch we no longer panic in btrfs_finish_ordered_io(). Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: implement ->show_devnameJosef Bacik2012-06-141-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because btrfs can remove the device that was mounted we need to have a ->show_devname so that in this case we can print out some other device in the file system to /proc/mount. So if there are multiple devices in a btrfs file system we will just print the device with the lowest devid that we can find. This will make everything consistent and deal with device removal properly. The drawback is if you mount with a device that is higher than the lowest devicd it won't show up as the mounted device in /proc/mounts, but this is a small price to pay. This was inspired by Miao Xie's patch. Thanks, Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <josef@redhat.com>
* | Btrfs: use rcu to protect device->nameJosef Bacik2012-06-148-64/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Al pointed out that we can just toss out the old name on a device and add a new one arbitrarily, so anybody who uses device->name in printk could possibly use free'd memory. Instead of adding locking around all of this he suggested doing it with RCU, so I've introduced a struct rcu_string that does just that and have gone through and protected all accesses to device->name that aren't under the uuid_mutex with rcu_read_lock(). This protects us and I will use it for dealing with removing the device that we used to mount the file system in a later patch. Thanks, Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com>
OpenPOWER on IntegriCloud