summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: fix missed transaction->aborted checkMiao Xie2013-01-241-0/+16
| | | | | | | | | | | | | | | First, though the current transaction->aborted check can stop the commit early and avoid unnecessary operations, it is too early, and some transaction handles don't end, those handles may set transaction->aborted after the check. Second, when we commit the transaction, we will wake up some worker threads to flush the space cache and inode cache. Those threads also allocate some transaction handles and may set transaction->aborted if some serious error happens. So we need more check for ->aborted when committing the transaction. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: Add ACCESS_ONCE() to transaction->abort accessesMiao Xie2013-01-242-2/+3
| | | | | | | | | We may access and update transaction->aborted on the different CPUs without lock, so we need ACCESS_ONCE() wrapper to prevent the compiler from creating unsolicited accesses and make sure we can get the right value. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: put csums on the right ordered extentJosef Bacik2013-01-241-2/+2
| | | | | | | | | | | | I noticed a WARN_ON going off when adding csums because we were going over the amount of csum bytes that should have been allowed for an ordered extent. This is a leftover from when we used to hold the csums privately for direct io, but now we use the normal ordered sum stuff so we need to make sure and check if we've moved on to another extent so that the csums are added to the right extent. Without this we could end up with csums for bytenrs that don't have extents to cover them yet. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: use right range to find checksum for compressed extentsLiu Bo2013-01-241-0/+5
| | | | | | | | | For compressed extents, the range of checksum is covered by disk length, and the disk length is different with ram length, so we need to use disk length instead to get us the right checksum. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: fix panic when recovering tree logJosef Bacik2013-01-241-8/+12
| | | | | | | | | | | | | | A user reported a BUG_ON(ret) that occured during tree log replay. Ret was -EAGAIN, so what I think happened is that we removed an extent that covered a bitmap entry and an extent entry. We remove the part from the bitmap and return -EAGAIN and then search for the next piece we want to remove, which happens to be an entire extent entry, so we just free the sucker and return. The problem is ret is still set to -EAGAIN so we trip the BUG_ON(). The user used btrfs-zero-log so I'm not 100% sure this is what happened so I've added a WARN_ON() to catch the other possibility. Thanks, Reported-by: Jan Steffens <jan.steffens@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: do not allow logged extents to be merged or removedJosef Bacik2013-01-243-3/+16
| | | | | | | | | | | We drop the extent map tree lock while we're logging extents, so somebody could come in and merge another extent into this one and screw up our logging, or they could even remove us from the list which would keep us from logging the extent or freeing our ref on it, so we need to make sure to not clear LOGGING until after the extent is logged, and then we can merge it to adjacent extents. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: fix a regression in balance usage filterIlya Dryomov2013-01-211-1/+8
| | | | | | | | | Commit 3fed40cc ("Btrfs: cleanup duplicated division functions"), which was merged into 3.8-rc1, has introduced a regression by removing logic that was guarding us against bad user input. Bring it back. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Merge branch 'mutex-ops@next-for-chris' of ↵Chris Mason2013-01-212-31/+86
|\ | | | | | | git://github.com/idryomov/btrfs-unstable into linus
| * Btrfs: reorder locks and sanity checks in btrfs_ioctl_defragIlya Dryomov2013-01-201-8/+9
| | | | | | | | | | | | | | Operation-specific check (whether subvol is readonly or not) should go after the mutual exclusiveness check. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * Btrfs: fix unlock order in btrfs_ioctl_rm_devIlya Dryomov2013-01-201-1/+1
| | | | | | | | | | | | Fix unlock order in btrfs_ioctl_rm_dev(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * Btrfs: fix unlock order in btrfs_ioctl_resizeIlya Dryomov2013-01-201-1/+1
| | | | | | | | | | | | Fix unlock order in btrfs_ioctl_resize(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * Btrfs: fix "mutually exclusive op is running" error codeIlya Dryomov2013-01-201-4/+4
| | | | | | | | | | | | | | | | | | | | The error code that is returned in response to starting a mutually exclusive operation when there is one already running got silently changed from EINVAL to EINPROGRESS by 5ac00add. Returning EINPROGRESS to, say, add_dev, when rm_dev is running is misleading. Furthermore, the operation itself may want to use EINPROGRESS for other purposes. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * Btrfs: bring back balance pause/resume logicIlya Dryomov2013-01-202-17/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1 as part of dev-replace merge). Offending commit took a stab at making mutually exclusive volume operations (add_dev, rm_dev, resize, balance, replace_dev) not block behind volume_mutex if another such operation is in progress and instead return an error right away. Balancing front-end relied on the blocking behaviour, so the fix is ugly, but short of a complete rework, it's the best we can do. Reported-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | Merge branch 'for-chris' of ↵Chris Mason2013-01-216-35/+91
|\ \ | |/ | | | | git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into linus
| * btrfs: update timestamps on truncate()Eric Sandeen2013-01-141-3/+14
| | | | | | | | | | | | | | | | | | truncate() vs. ftruncate() differ in the VFS; truncate() doesn't set (ATTR_CTIME | ATTR_MTIME), and it's up to the fs to do the timestamp updates if the size changes. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
| * btrfs: fix btrfs_cont_expand() freeing IS_ERR emZach Brown2013-01-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_cont_expand() tries to free an IS_ERR em as it gets an error from btrfs_get_extent() and breaks out of its loop. An instance of -EEXIST was reported in the wild: https://bugzilla.redhat.com/show_bug.cgi?id=874407 I have no idea if that -EEXIST is surprising, or not. Regardless, this error handling should be cleaned up to handle other reasonable errors (ENOMEM, EIO; whatever). This seemed to be the only buggy freeing of the relatively rare IS_ERR em so I opted to fix the caller rather than teach free_extent_map() to use IS_ERR_OR_NULL(). Signed-off-by: Zach Brown <zab@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
| * Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extentsLiu Bo2013-01-142-6/+14
| | | | | | | | | | | | | | | | | | | | | | xfstests case 285 complains. It it because btrfs did not try to find unwritten delalloc bytes(only dirty pages, not yet writeback) behind prealloc extents, it ends up finding nothing while we're with SEEK_DATA. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * Btrfs: fix off-by-one in lseekLiu Bo2013-01-141-0/+1
| | | | | | | | | | | | | | Lock end is inclusive. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * Btrfs: reset path lock state to zeroLiu Bo2013-01-141-0/+2
| | | | | | | | | | | | | | | | | | We forgot to reset the path lock state to zero after we unlock the path block, and this can lead to the ASSERT checker in tree unlock API. Reported-by: Slava Barinov <rayslava@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * Btrfs: let allocation start from the right raid typeLiu Bo2013-01-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | This'd avoid us empty looping. Say we have only one disk and the metadata raid type will be defaultly DUP, and we do not need to start from index=0(RAID10) and get over two empty loops to index=2(DUP). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * Btrfs: add orphan before truncating pagecacheJosef Bacik2013-01-141-15/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | Running xfstests 83 in a loop would sometimes fail the fsck. This happens because if we invalidate a page that already has an ordered extent setup for it we will complete the ordered extent ourselves, assuming that the truncate will clean everything up. The problem with this is there is plenty of time for the truncate to fail after we've done this work. So to fix this we need to add the orphan item first to make sure the cleanup gets done properly, and then we can truncate the pagecache and all that stuff and be safe. This fixes the btrfsck failures I was seeing while running 83 in a loop. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * Btrfs: set flushing if we're limited flushingJosef Bacik2013-01-141-1/+1
| | | | | | | | | | | | | | We still need to say we're flushing if we're limit flushing to keep somebody from coming in and stealing our reservation. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * Btrfs: fix missing write access release in btrfs_ioctl_resize()Miao Xie2013-01-141-0/+1
| | | | | | | | | | | | | | | | We forget to give up the write access after we find some device operation is going on. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * Btrfs: fix resize a readonly deviceMiao Xie2013-01-141-2/+4
| | | | | | | | | | | | | | We should not resize a readonly device, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * Btrfs: do not delete a subvolume which is in a R/O subvolumeMiao Xie2013-01-141-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Step to reproduce: # mkfs.btrfs <disk> # mount <disk> <mnt> # btrfs sub create <mnt>/subv0 # btrfs sub snap <mnt> <mnt>/subv0/snap0 # change <mnt>/subv0 from R/W to R/O # btrfs sub del <mnt>/subv0/snap0 We deleted the snapshot successfully. I think we should not be able to delete the snapshot since the parent subvolume is R/O. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
| * Btrfs: disable qgroup id 0Miao Xie2013-01-141-0/+5
| | | | | | | | | | | | | | Qgroup id 0 is a special number, we should set the id of a qgroup to 0. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
| * btrfs: get the device in write mode when deleting itLukas Czerner2013-01-141-1/+1
| | | | | | | | | | | | | | | | When we're deleting the device we should get it in write mode since we're going to re-write the super block magic on that device. And it should fail if the device is read-only. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
| * Btrfs: fix memory leak in name_cache_insert()Tsutomu Itoh2013-01-141-1/+3
| | | | | | | | | | | | | | We should free name_cache_entry before returning from the error handling code. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
* | Btrfs: prevent qgroup destroy when there are still relationsArne Jansen2013-01-211-1/+12
| | | | | | | | | | | | | | | | | | | | Currently you can just destroy a qgroup even though it is in use by other qgroups or has qgroups assigned to it. This patch prevents destruction of qgroups unless they are completely unused. Otherwise destroy will return EBUSY. Reported-by: Eric Hopper <hopper@omnifarious.org> Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* | Btrfs: ignore orphan qgroup relationsArne Jansen2013-01-211-0/+7
|/ | | | | | | | | | If a qgroup that has still assignments is deleted by the user, the corresponding relations are left in the tree. This leads to an unmountable filesystem. With this patch, those relations are simple ignored. Reported-by: Eric Hopper <hopper@omnifarious.org> Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Revert "Btrfs: reorder tree mod log operations in deleting a pointer"Chris Mason2012-12-181-6/+4
| | | | | | | | | | | | This reverts commit 6a7a665d78c5dd8bc76a010648c4e7d84517ab5a. This was bug was fixed differently in 3.6, so this commit isn't needed. Conflicts: fs/btrfs/ctree.c Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Revert "Btrfs: MOD_LOG_KEY_REMOVE_WHILE_MOVING never change node's nritems"Chris Mason2012-12-181-2/+2
| | | | | | | | This reverts commit 95c80bb1f6b24b57058d971ed252b2c1c5121b51. The bug addressed by this commit was fixed differently back in 3.6 Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: fix a bug of per-file nocowLiu Bo2012-12-172-3/+5
| | | | | | | | | | | | | | | | | | | | | | Users report a bug, the reproducer is: $ mkfs.btrfs /dev/loop0 $ mount /dev/loop0 /mnt/btrfs/ $ mkdir /mnt/btrfs/dir $ chattr +C /mnt/btrfs/dir/ $ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=10; $ lsattr /mnt/btrfs/dir/foo ---------------C- /mnt/btrfs/dir/foo $ filefrag /mnt/btrfs/dir/foo /mnt/btrfs/dir/foo: 1 extent found ---> an extent $ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=1 seek=5 conv=notrunc,nocreat; sync $ filefrag /mnt/btrfs/dir/foo /mnt/btrfs/dir/foo: 3 extents found ---> with nocow, btrfs breaks the extent into three parts The new created file should not only inherit the NODATACOW flag, but also honor NODATASUM flag, because we must do COW on a file extent with checksum. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: fix hash overflow handlingChris Mason2012-12-175-2/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | The handling for directory crc hash overflows was fairly obscure, split_leaf returns EOVERFLOW when we try to extend the item and that is supposed to bubble up to userland. For a while it did so, but along the way we added better handling of errors and forced the FS readonly if we hit IO errors during the directory insertion. Along the way, we started testing only for EEXIST and the EOVERFLOW case was dropped. The end result is that we may force the FS readonly if we catch a directory hash bucket overflow. This fixes a few problem spots. First I add tests for EOVERFLOW in the places where we can safely just return the error up the chain. btrfs_rename is harder though, because it tries to insert the new directory item only after it has already unlinked anything the rename was going to overwrite. Rather than adding very complex logic, I added a helper to test for the hash overflow case early while it is still safe to bail out. Snapshot and subvolume creation had a similar problem, so they are using the new helper now too. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Pascal Junod <pascal@junod.info>
* Btrfs: don't take inode delalloc mutex if we're a free space inodeJosef Bacik2012-12-161-6/+19
| | | | | | | | | This confuses and angers lockdep even though it's ok. We don't really need the lock for free space inodes since only the transaction committer will be reserving space. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: fix autodefrag and umount lockupJosef Bacik2012-12-161-2/+17
| | | | | | | | | | This happens because writeback_inodes_sb_nr_if_idle does down_read. This doesn't work for us and it has not been fixed upstream yet, so do it ourselves and use that instead so we can stop having this stupid long standing lockup. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: fix permissions of empty files not affected by umaskFilipe Brandenburger2012-12-161-0/+4
| | | | | | | | | | | | | | | | When a new file is created with btrfs_create(), the inode will initially be created with permissions 0666 and later on in btrfs_init_acl() it will be adapted to mask out the umask bits. The problem is that this change won't make it into the btrfs_inode unless there's another change to the inode (e.g. writing content changing the size or touching the file changing the mtime.) This fix adds a call to btrfs_update_inode() to btrfs_create() to make sure that the change will not get lost if the in-memory inode is flushed before other changes are made to the file. Signed-off-by: Filipe Brandenburger <filbranden@google.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: put raid properties into global tableLiu Bo2012-12-164-33/+29
| | | | | | | | Raid properties can be shared among raid calculation code, we can put them into a global table to keep it simple. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: fix BUG() in scrub when first superblock reading gives EIOStefan Behrens2012-12-161-0/+11
| | | | | | | | | | | | | | | This fixes a very special case that can be reproduced by just disconnecting a disk at runtime, and without unmounting the filesystem first, start scrub on the filesystem with the disconnected disk. All read and write EIOs are handled correctly, only the first superblock is an exception and gives a BUG() in a subfunction. The BUG() is correct, it would crash later otherwise. The subfunction must not be called for superblocks and this is what the fix changes. Reported-by: Joeri Vanthienen <mail@joerivanthienen.be> Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: do not call file_update_time in aio_writeJosef Bacik2012-12-162-29/+48
| | | | | | | | | | | This starts a transaction and dirties the inode everytime we call it, which is super expensive if you have a write heavy workload. We will be updating the inode when the IO completes and we reserve the space for the inode update when we reserve space for the write, so there is no chance of loss of information or enospc issues. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: only unlock and relock if we have toJosef Bacik2012-12-161-1/+4
| | | | | | | | | | | I noticed while doing fsync tests that we were always dropping the path and re-searching when we first cow the log root even though we've already gotten the write lock on the root. That's because we don't take into account that there might not be a parent node, so fix the check to make sure there is actually a parent node before we undo all of this work for nothing. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: use tokens where we can in the tree logJosef Bacik2012-12-161-54/+73
| | | | | | | | | | | If we are syncing over and over the overhead of doing all those maps in fill_inode_item and log_changed_extents really starts to hurt, so use map tokens so we can avoid all the extra mapping. Since the token maps from our offset to the end of the page make sure to set the first thing in the item first so we really only do one map. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: optimize leaf_space_usedJosef Bacik2012-12-161-2/+9
| | | | | | | | | | This gets called at least 4 times for every level while adding an object, and it involves 3 kmapping calls, which on my box take about 5us a piece. So instead use a token, which brings us down to 1 kmap call and makes this function take 1/3 of the time per call. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: don't memset new tokensJosef Bacik2012-12-161-1/+1
| | | | | | | | | Our token logic depends on token->kaddr being set, and if it is not it sets everything properly as needed. So instead of memsetting just set token->kaddr to NULL. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: only clear dirty on the buffer if it is marked as dirtyJosef Bacik2012-12-161-4/+4
| | | | | | | | No reason to set the path blocking or loop through all of the pages if the extent buffer isn't actually marked dirty. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: move checks in set_page_dirty under DEBUGJosef Bacik2012-12-161-0/+2
| | | | | | | | This is a high traffic function, let's try and do as little as possible during normal operations shall we? Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: log changed inodes based on the extent map treeJosef Bacik2012-12-167-210/+372
| | | | | | | | | | We don't really need to copy extents from the source tree since we have all of the information already available to us in the extent_map tree. So instead just write the extents straight to the log tree and don't bother to copy the extent items from the source tree. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: add path->really_keep_locksJosef Bacik2012-12-162-2/+6
| | | | | | | | | | | | You'd think path->keep_locks would keep all the locks wouldn't you? You'd be wrong. It only keeps them if the slot is pointing to the last item in the node. This is for use with btrfs_next_leaf, which needs this sort of thing. But the horrible horrible things I'm going to do to the tree log means I really need everything held from root to leaf so I can add and delete items in the same search. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: do not mark ems as prealloc if we are writing to themJosef Bacik2012-12-163-4/+5
| | | | | | | | | | We are going to use EM's to log extents in the future, so we need to not mark them as prealloc if they aren't actually prealloc extents. Instead mark them with FILLING so we know to ammend mod_start/mod_len and that way we don't confuse the extent logging code. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: keep track of the extents original block lengthJosef Bacik2012-12-163-4/+24
| | | | | | | | | | | If we've written to a prealloc extent we need to know the original block len for the extent. We can't figure this out currently since ->block_len is just set to the extent length. So introduce ->orig_block_len so that we know how many bytes were in the original extent for proper extent logging that future patches will need. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
OpenPOWER on IntegriCloud