summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | | | | | Btrfs: inline checksums into the disk free space cacheJosef Bacik2011-10-194-68/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yeah yeah I know this is how we used to do it and then I changed it, but damnit I'm changing it back. The fact is that writing out checksums will modify metadata, which could cause us to dirty a block group we've already written out, so we have to truncate it and all of it's checksums and re-write it which will write new checksums which could dirty a blockg roup that has already been written and you see where I'm going with this? This can cause unmount or really anything that depends on a transaction to commit to take it's sweet damned time to happen. So go back to the way it was, only this time we're specifically setting NODATACOW because we can't go through the COW pathway anyway and we're doing our own built-in cow'ing by truncating the free space cache. The other new thing is once we truncate the old cache and preallocate the new space, we don't need to do that song and dance at all for the rest of the transaction, we can just overwrite the existing space with the new cache if the block group changes for whatever reason, and the NODATACOW will let us do this fine. So keep track of which transaction we last cleared our cache in and if we cleared it in this transaction just say we're all setup and carry on. This survives xfstests and stress.sh. The inode cache will continue to use the normal csum infrastructure since it only gets written once and there will be no more modifications to the fs tree in a transaction commit. Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: take overflow into account in reserving spaceJosef Bacik2011-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My overcommit stuff can be a little racy when we're filling up the disk with fs_mark and we overcommit into things that quickly get used up for data. So use num_bytes to see if we have enough available space so we're less likely to overcommit ourselves out of the ability to make reservations. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: check the return value of filemap_write_and_wait in the space cacheJosef Bacik2011-10-191-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to check the return value of filemap_write_and_wait in the space cache writeout code. Also don't set the inode's generation until we're sure nothing else is going to fail. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: add a io_ctl struct and helpers for dealing with the space cacheJosef Bacik2011-10-191-318/+375
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In writing and reading the space cache we have one big loop that keeps track of which page we are on and then a bunch of sizeable loops underneath this big loop to try and read/write out properly. Especially in the write case this makes things hugely complicated and hard to follow, and makes our error checking and recovery equally as complex. So add a io_ctl struct with a bunch of helpers to keep track of the pages we have, where we are, if we have enough space etc. This unifies how we deal with the pages we're writing and keeps all the messy tracking internal. This allows us to kill the big loops in both the read and write case and makes reviewing and chaning the write and read paths much simpler. I've run xfstests and stress.sh on this code and it survives. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: don't skip writing out a empty block groups cacheJosef Bacik2011-10-191-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed a slight bug where we will not bother writing out the block group cache's space cache if it's space tree is empty. Since it could have a cluster or pinned extents that need to be written out this is just not a valid test. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: introduce mount option no_space_cacheJosef Bacik2011-10-193-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some users have requested this and I've found I needed a way to disable cache loading without actually clearing the cache, so introduce the no_space_cache option. Before we check the super blocks cache generation field and if it was populated we always turned space caching on. Now we check this and set the space cache option on, and then parse the mount options so that if we want it off it get's turned off. Then we check the mount option all the places we do the caching work instead of checking the super's cache generation. This makes things more consistent and lets us turn space caching off. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: only inherit btrfs specific flags when creating filesJosef Bacik2011-10-191-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xfstests 79 was failing because we were inheriting the S_APPEND flag when we weren't supposed to. There isn't any specific documentation on this so I'm taking the test as the standard of how things work, and having S_APPEND set on a directory doesn't mean that S_APPEND gets inherited by its children according to this test. So only inherit btrfs specific things. This will let us set compress/nocompress on specific directories and everything in the directories will inherit this flag, same with nodatacow. With this patch test 79 passes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: allow us to overcommit our enospc reservationsJosef Bacik2011-10-194-18/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the things that kills us is the fact that our ENOSPC reservations are horribly over the top in most normal cases. There isn't too much that can be done about this because when we are completely full we really need them to work like this so we don't under reserve. However if there is plenty of unallocated chunks on the disk we can use that to gauge how much we can overcommit. So this patch adds chunk free space accounting so we always know how much unallocated space we have. Then if we fail to make a reservation within our allocated space, check to see if we can overcommit. In the normal flushing case (like with delalloc metadata reservations) we'll take the free space and divide it by 2 if our metadata profile is setup for DUP or any of those, and then divide it by 8 to make sure we don't overcommit too much. Then if we're in a non-flushing case (we really need this reservation now!) we only limit ourselves to half of the free space. This makes this fio test [torrent] filename=torrent-test rw=randwrite size=4g ioengine=sync directory=/mnt/btrfs-test go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB file system. This doesn't seem to break my other enospc tests, but could really use some more testing as this is a super scary change. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: break out of orphan cleanup if we can't make progressJosef Bacik2011-10-191-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed while running xfstests 83 that if we didn't have enough space to delete our inode the orphan cleanup would just loop. This is because it keeps finding the same orphan item and keeps trying to kill it but can't because we don't get an error back from iput for deleting the inode. So keep track of the last guy we tried to kill, if it's the same as the one we're trying to kill currently we know we are having problems and can just error out. I don't have a way to test this so look hard and make sure it's right. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: use the global reserve as a backup for deleting inodesJosef Bacik2011-10-191-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xfstests 83 really stresses our ENOSPC since it uses a 100mb fs which ends up with the mixed block group stuff. Because of this we can run into a situation where we don't have enough space to delete inodes, or even worse we can't free the inodes when we next mount the fs which causes the orphan code to lose its mind. So if we fail to make our reservation, steal from the global reserve. The global reserve will end up taking up the entire rest of the free space on the fs in this worst case so there really is no other option. With this patch test 83 doesn't freak out. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: stop using write_one_pageJosef Bacik2011-10-192-67/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While looking for a performance regression a user was complaining about, I noticed that we had a regression with the varmail test of filebench. This was introduced by 0d10ee2e6deb5c8409ae65b970846344897d5e4e which keeps us from calling writepages in writepage. This is a correct change, however it happens to help the varmail test because we write out in larger chunks. This is largly to do with how we write out dirty pages for each transaction. If you run filebench with load varmail set $dir=/mnt/btrfs-test run 60 prior to this patch you would get ~1420 ops/second, but with the patch you get ~1200 ops/second. This is a 16% decrease. So since we know the range of dirty pages we want to write out, don't write out in one page chunks, write out in ranges. So to do this we call filemap_fdatawrite_range() on the range of bytes. Then we convert the DIRTY extents to NEED_WAIT extents. When we then call btrfs_wait_marked_extents() we only have to filemap_fdatawait_range() on that range and clear the NEED_WAIT extents. This doesn't get us back to our original speeds, but I've been seeing ~1380 ops/second, which is a <5% regression as opposed to a >15% regression. That is acceptable given that the original commit greatly reduces our latency to begin with. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: introduce convert_extent_bitJosef Bacik2011-10-192-0/+190
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If I have a range where I know a certain bit is and I want to set it to another bit the only option I have is to call set and then clear bit, which will result in 2 tree searches. This is inefficient, so introduce convert_extent_bit which will go through and set the bit I want and clear the old bit I don't want. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: check unused against how much space we actually wantJosef Bacik2011-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a bug that may lead to early ENOSPC in our reservation code. We've been checking against num_bytes which may be above and beyond what we want to actually reserve, which could give us a false ENOSPC. Fix this by making sure the unused space is above how much we want to reserve and not how much we're trying to flush. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: fix orphan cleanup regressionJosef Bacik2011-10-191-19/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In fixing how we deal with bad inodes, we had a regression in the orphan cleanup code, since it expects to get a bad inode back. So fix it to deal with getting -ESTALE back by deleting the orphan item manually and moving on. Thanks, Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: use the inode's mapping mask for allocating pagesJosef Bacik2011-10-196-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Johannes pointed out we were allocating only kernel pages for doing writes, which is kind of a big deal if you are on 32bit and have more than a gig of ram. So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we don't re-enter. Thanks, Reported-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: delay iput when deleting a block groupJosef Bacik2011-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I kept getting warnings from evict because we were calling btrfs_start_transaction() with a transaction already started when doing a balance. This is because we remove a block group which requires a transaction, and the put the last reference on the cache inode. Instead of doing this we need to delay the iput so it is done not within a transaction having started. This gets rid of our warnings. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: make sure to unset trans->block_rsv before running delayed refsJosef Bacik2011-10-191-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Checksums are charged in 2 different ways. The first case is when we're writing to the disk, we account for the new checksums with the delalloc block rsv. In order for this to work we check if we're allocating a block for the csum root and if trans->block_rsv == the delalloc block rsv. But when we're deleting the csums because of cow, this is charged to the global block rsv, and is done when we run the delayed refs. So we need to make sure that trans->block_rsv == NULL when running the delayed refs. So set it to NULL and reset it in should_end_transaction, and set it to NULL in commit_transaction. This got rid of the ridiculous amount of warnings I was seeing when trying to do a balance. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: stop passing a trans handle all around the reservation codeJosef Bacik2011-10-196-43/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only thing that we need to have a trans handle for is in reserve_metadata_bytes and thats to know how much flushing we can do. So instead of passing it around, just check current->journal_info for a trans_handle so we know if we can commit a transaction to try and free up space or not. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: don't get the block_rsv in btrfs_free_tree_blockJosef Bacik2011-10-191-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the durable block rsv stuff has been killed there is no need to get the block_rsv in btrfs_free_tree_block anymore. Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: use the transactions block_rsv for the csum rootJosef Bacik2011-10-192-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The alloc warnings everybody has been seeing is because we have been reserving space for csums, but we weren't actually using that space. So make get_block_rsv() return the trans->block_rsv if we're modifying the csum root. Also set the trans->block_rsv to NULL so that if we modify the csum root when running delayed ref's that comes out of the global reserve like it's supposed to. With this patch I'm not seeing those alloc warnings anymore. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: handle enospc accounting for free space inodesJosef Bacik2011-10-194-23/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since free space inodes now use normal checksumming we need to make sure to account for their metadata use. So reserve metadata space, and then if we fail to write out the metadata we can just release it, otherwise it will be freed up when the io completes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: put the block group cache after we commit the superJosef Bacik2011-10-192-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In moving some enospc stuff around I noticed that when we unmount we are often evicting the free space cache inodes before we do our last commit. This isn't bad, but it makes us constantly have to re-read the inodes back. So instead don't evict the cache until after we do our last commit, this will make things a little less crappy and makes a future enospc change work properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: set truncate block rsv's sizeJosef Bacik2011-10-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While debugging a different issue I noticed that we were always reserving space when we tried to use our truncate block rsv's. This is because they didn't have a ->size value, so use_block_rsv just assumes there is nothing reserved and it does a reserve_metadata_bytes. This is because btrfs_check_block_rsv() doesn't actually add to the size of the block rsv. That seems to be the right thing to do so set ->size to the minimum truncate size we need, since we will always only refill to that size anyway, and this way everything works out correctly. Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: don't increase the block_rsv's size when emergency allocating spaceJosef Bacik2011-10-191-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have to emergency reserve space we need to not increase the block_rsv size, otherwise we'll leak space. Take for instance delalloc, say we reserve 4k, and we use that 4k, and then we have to emergency allocate another 4k, we bump the size up to 8k, however we've only accounted for 4k in reservations in all of our supporting logic, so we'll go to free the 4k and end up having a size of 4k, which will cause us to later not free as much space. I saw this doing testing where I wasn't reserving enough space for something but was still leaking space, very frustrating. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: fix space leak when we fail to make an allocationJosef Bacik2011-10-191-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When changing back to using a spin_lock to protect the extent counters I decided that since we would only be dropping our original extent, it was ok to just drop the extent and return. However since somebody else could have come in and done a reservation, we need to do the normal song and dance to clear the reservation out properly. So calculate how much space we need to free, and then subtract what we just attempted to reserve. If it's more then we know we need to drop those bytes from the delalloc block rsv. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: fix call to btrfs_search_slot in free space cacheJosef Bacik2011-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are setting ins_len to 1 even tho we are just modifying an item that should be there already. This may cause the search stuff to split nodes on the way down needelessly. Set this to 0 since we aren't inserting anything. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_checkJosef Bacik2011-10-196-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you run xfstest 224 it you will get lots of messages about not being able to delete inodes and that they will be cleaned up next mount. This is because btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to flush, so if there was not enough space, it simply failed. But in truncate and evict case we could easily flush space to try and get enough space to do our work, so make btrfs_block_rsv_check take a flush argument to pass down to reserve_metadata_bytes. Now xfstests 224 runs fine without all those complaints. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: reduce the amount of space needed for truncatesJosef Bacik2011-10-192-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With btrfs_truncate_inode_items we always return if we have to go to another leaf, which makes us do our reservation again. This means we will only ever modify one leaf at a time, so we only need 1 items worth of slack space. Also, since we are deleting we will not be creating nodes as we go down, if anything we'll be free'ing them as we merge them together, so make a different calculation for truncate which will only have the worst case useage of COW'ing the entire path down to the leaf. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: only reserve space in fallocate if we have to do a preallocateJosef Bacik2011-10-191-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lukas found a problem where if he tries to fallocate over the same region twice and the first fallocate took up all the space we would fail with ENOSPC. This is because we reserve the total space we want to use for fallocate, regardless of wether or not we will have to actually preallocate. So instead move the check into the loop where we actually have to do the preallocate. Thanks, Tested-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: kill btrfs_truncate_reserve_metadataJosef Bacik2011-10-192-34/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we've optimized the truncate path, we no longer require this function. Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: optimize how we account for space in truncateJosef Bacik2011-10-191-29/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we're starting and stopping a transaction for no real reason, so kill that and just reserve enough space as if we can truncate all in one transaction. Also use btrfs_block_rsv_check() for our reserve to minimize the amount of space we may have to allocate for our slack space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: don't try to commit in btrfs_block_rsv_checkJosef Bacik2011-10-191-25/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will try and reserve metadata bytes in btrfs_block_rsv_check and if we cannot because we have a transaction open it will return EAGAIN, so we do not need to try and commit the transaction again. Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: kill unused parts of block_rsvJosef Bacik2011-10-193-22/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The priority and refill_used flags are not used anymore, and neither is the usage counter, so just remove them from btrfs_block_rsv. Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: ratelimit the generation printk for the free space cacheJosef Bacik2011-10-191-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported getting spammed when moving to 3.0 by this message. Since we switched to the normal checksumming infrastructure all old free space caches will be wrong and need to be regenerated so people are likely to see this message a lot, so ratelimit it so it doesn't fill up their logs and freak them out. Thanks, Reported-by: Andrew Lutomirski <luto@mit.edu> Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: fix how we reserve space for deleting inodesJosef Bacik2011-10-191-11/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I converted btrfs_truncate to do sane reservations for truncate, but didn't convert btrfs_evict_inode. Basically we need to save the orphan_rsv for deleting the orphan item, and do normal reservations for our truncate. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: kill the durable block rsv stuffJosef Bacik2011-10-195-101/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is confusing code and isn't used by anything anymore, so delete it. Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: kill the orphan space calculation for snapshotsJosef Bacik2011-10-193-90/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch kills off the calculation for the amount of space needed for the orphan operations during a snapshot. The thing is we only do snapshots on commit, so any space that is in the block_rsv->freed[] isn't going to be in the new snapshot anyway, so there isn't any reason to require that space to be reserved for the snapshot to occur. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: calculate checksum space correctlyJosef Bacik2011-10-193-8/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have not been reserving enough space for checksums. We were just reserving bytes for the checksum items themselves, we were not taking into account having to cow the tree and such. This patch adds a csum_bytes counter to the inode for keeping track of the number of bytes outstanding we have for checksums. Then we calculate how many leaves would be required for the checksums we are given and use that to reserve space. This adds a significant amount of bytes to our reservations, but we will handle this later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: skip looking for delalloc if we don't have ->fill_delallocJosef Bacik2011-10-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We always look for delalloc bytes in our io_tree so we can fill in delalloc. This is fine in most cases, but if we're writing out the btree_inode this is just a superfluous tree search on the io_tree, and if we have a lot of metadata dirty this could be an expensive check. So instead check to see if our io_tree has a ->fill_delalloc op, and if not don't even bother doing the lookup. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: use bytes_may_use for all ENOSPC reservationsJosef Bacik2011-10-193-82/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been using bytes_reserved for metadata reservations, which is wrong since we use that to keep track of outstanding reservations from the allocator. This resulted in us doing a lot of silly things to make sure we don't allocate a bunch of metadata chunks since we never had a real view of how much space was actually in use by metadata. This passes Arne's enospc test and xfstests as well as my own enospc tests. Hopefully this will get us moving in the right direction. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: fix how we mount subvol=<whatever>Josef Bacik2011-10-191-64/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've only been able to mount with subvol=<whatever> where whatever was a subvol within whatever root we had as the default. This allows us to mount -o subvol=path/to/subvol/you/want relative from the normal fs_tree root. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: use d_obtain_alias when mounting subvol/subvolidJosef Bacik2011-10-191-24/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently what we do is just wrong. We either 1) Alloc a new "root" dentry with sb->s_root as it's parent which is just wrong as we could walk into this subvol later on via another path and hilarity could ensue. Also we don't check the return value of d_splice_alias which isn't good either. or 2) Do a d_find_alias() which we could have lost our dentry from cache at this point and found nothing. So use d_obtain_alias(). In the case that we already have the inode/dentry in cache we will get the correct dentry. If not we will get a disconnected dentry tree so if we walk into it later on everything will be connected up properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: kill reserved_bytes in inodeJosef Bacik2011-10-193-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reserved_bytes is not used for anything in the inode, remove it. Signed-off-by: Josef Bacik <josef@redhat.com>
| | * | | | | | Btrfs: move stuff around in btrfs_inode to get better packingJosef Bacik2011-10-191-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moving things around to give us better packing in the btrfs_inode. This reduces the size of our inode by 8 bytes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* | | | | | | | Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds2011-11-062057-606/+2248
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
| * | | | | | | | Revert "tracing: Include module.h in define_trace.h"Paul Gortmaker2011-10-311-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 3a9f987b3141f086de27832514aad9f50a53f754. With all the files that are real modules now having module.h explicitly called out for inclusion, and no reliance on any implicit presence of module.h assumed, we should no longer need this workaround. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * | | | | | | | irq: don't put module.h into irq.h for tracking irqgen modules.Paul Gortmaker2011-10-313-21/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent commit "irq: Track the owner of irq descriptor" in commit ID b6873807a7143b7 placed module.h into linux/irq.h but we are trying to limit module.h inclusion to just C files that really need it, due to its size and number of children includes. This targets just reversing that include. Add in the basic "struct module" since that is all we really need to ensure things compile. In theory, b687380 should have added the module.h include to the irqdesc.h header as well, but the implicit module.h everywhere presence masked this from showing up. So give it the "struct module" as well. As for the C files, irqdesc.c is only using THIS_MODULE, so it does not need module.h - give it export.h instead. The C file irq/manage.c is now (as of b687380) using try_module_get and module_put and so it needs module.h (which it already has). Also convert the irq_alloc_descs variants to macros, since all they really do is is call the __irq_alloc_descs primitive. This avoids including export.h and no debug info is lost. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * | | | | | | | bluetooth: macroize two small inlines to avoid module.hPaul Gortmaker2011-10-311-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These two small inlines make calls to try_module_get() and module_put() which would force us to keep module.h present within yet another common include header. We can avoid this by turning them into macros. The hci_dev_hold construct is patterned off of raw_spin_trylock_irqsave() in spinlock.h Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * | | | | | | | ip_vs.h: fix implicit use of module_get/module_put from module.hPaul Gortmaker2011-10-311-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file was using the module get/put functions in two simple inline functions. But module_get/put were only within scope because of the implicit presence of module.h being everywhere. Rather than add module.h to another file in include/ -- which is exactly the thing we are trying to avoid, simply convert these one-line functions into a define, as per what was done for the device_schedule_callback() in commit 523ded71de0c5e669733. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * | | | | | | | nf_conntrack.h: fix up fallout from implicit moduleparam.h presencePaul Gortmaker2011-10-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implicit presence of module.h everywhere meant that this header also was getting moduleparam.h which defines struct kernel_param. Since it only needs to know that kernel_param is a struct, call that out instead of adding an include of moduleparam.h -- to get rid of this: include/net/netfilter/nf_conntrack.h:316: warning: 'struct kernel_param' declared inside parameter list include/net/netfilter/nf_conntrack.h:316: warning: its scope is only this definition or declaration, which is probably not what you want Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
OpenPOWER on IntegriCloud