summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/ctree.h
Commit message (Collapse)AuthorAgeFilesLines
...
* Btrfs: Add a thread pool just for submit_bioChris Mason2008-09-251-0/+4
| | | | | | | | If a bio submission is after a lock holder waiting for the bio on the work queue, it is possible to deadlock. Move the bios into their own pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: split out ioctl.cChristoph Hellwig2008-09-251-1/+8
| | | | | | | | Split the ioctl handling out of inode.c into a file of it's own. Also fix up checkpatch.pl warnings for the moved code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add a mount option to control worker thread pool sizeChris Mason2008-09-251-0/+1
| | | | | | | | | | | | mount -o thread_pool_size changes the default, which is min(num_cpus + 2, 8). Larger thread pools would make more sense on very large disk arrays. This mount option controls the max size of each thread pool. There are multiple thread pools, so the total worker count will be larger than the mount option. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add async worker threads for pre and post IO checksummingChris Mason2008-09-251-3/+11
| | | | | | | | | | | | | | | | | | | | Btrfs has been using workqueues to spread the checksumming load across other CPUs in the system. But, workqueues only schedule work on the same CPU that queued the work, giving them a limited benefit for systems with higher CPU counts. This code adds a generic facility to schedule work with pools of kthreads, and changes the bio submission code to queue bios up. The queueing is important to make sure large numbers of procs on the system don't turn streaming workloads into random workloads by sending IO down concurrently. The end result of all of this is much higher performance (and CPU usage) when doing checksumming on large machines. Two worker pools are created, one for writes and one for endio processing. The two could deadlock if we tried to service both from a single pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: sanity mount option parsing and early mount codeChristoph Hellwig2008-09-251-2/+1
| | | | | | | Also adds lots of comments to describe what's going on here. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: transaction ioctlsSage Weil2008-09-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These ioctls let a user application hold a transaction open while it performs a series of operations. A final ioctl does a sync on the fs (closing the current transaction). This is the main requirement for Ceph's OSD to be able to keep the data it's storing in a btrfs volume consistent, and AFAICS it works just fine. The application would do something like fd = ::open("some/file", O_RDONLY); ::ioctl(fd, BTRFS_IOC_TRANS_START); /* do a bunch of stuff */ ::ioctl(fd, BTRFS_IOC_TRANS_END); or just ::close(fd); And to ensure it commits to disk, ::ioctl(fd, BTRFS_IOC_SYNC); When a transaction is held open, the trans_handle is attached to the struct file (via private_data) so that it will get cleaned up if the process dies unexpectedly. A held transaction is also ended on fsync() to avoid a deadlock. A misbehaving application could also deliberately hold a transaction open, effectively locking up the FS, so it may make sense to restrict something like this to root or something. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Invalidate dcache entry after creating snapshot andSven Wegener2008-09-251-0/+3
| | | | | | | | | | | | | | We need to invalidate an existing dcache entry after creating a new snapshot or subvolume, because a negative dache entry will stop us from accessing the new snapshot or subvolume. --- ctree.h | 23 +++++++++++++++++++++++ inode.c | 4 ++++ transaction.c | 4 ++++ 3 files changed, 31 insertions(+) Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Allocator fix variety packChris Mason2008-09-251-0/+2
| | | | | | | | | | | | | | * Force chunk allocation when find_free_extent has to do a full scan * Record the max key at the start of defrag so it doesn't run forever * Block groups might not be contiguous, make a forward search for the next block group in extent-tree.c * Get rid of extra checks for total fs size * Fix relocate_one_reference to avoid relocating the same file data block twice when referenced by an older transaction * Use the open device count when allocating chunks so that we don't try to allocate from devices that don't exist Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Change the congestion functions to meter the number of async submits ↵Chris Mason2008-09-251-0/+1
| | | | | | | | | as well The async submit workqueue was absorbing too many requests, leading to long stalls where the async submitters were stalling. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount -o degraded to allow mounts to continue with missing devicesChris Mason2008-09-251-0/+3
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Update nodatacow mode to support cloned single files and resizingChris Mason2008-09-251-1/+1
| | | | | | | | | | | Before, nodatacow only checked to make sure multiple roots didn't have references on a single extent. This check makes sure that multiple inodes don't have references. nodatacow needed an extra check to see if the block group was currently readonly. This way cows forced by the chunk relocation code are honored. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Properly find the root for snapshotted blocks during chunk relocationChris Mason2008-09-251-0/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for online device removalChris Mason2008-09-251-1/+2
| | | | | | | | | | | | | This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Clone file data ioctlSage Weil2008-09-251-2/+2
| | | | | | Add a new ioctl to clone file data Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add balance ioctl to restripe the chunksChris Mason2008-09-251-1/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add new ioctl to add devicesChris Mason2008-09-251-0/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Make the resizer work based on shrinking and growing devicesChris Mason2008-09-251-0/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for labels in the super blockChris Mason2008-09-251-0/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Check device uuids along with devidsChris Mason2008-09-251-0/+5
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Write bio checksumming outside the FS mutexChris Mason2008-09-251-1/+3
| | | | | | | This significantly improves streaming write performance by allowing concurrency in the data checksumming. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Create a work queue for bio writesChris Mason2008-09-251-0/+3
| | | | | | | This allows checksumming to happen in parallel among many cpus, and keeps us from bogging down pdflush with the checksumming code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add RAID10 supportChris Mason2008-09-251-0/+7
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add chunk uuids and update multi-device back referencesChris Mason2008-09-251-13/+67
| | | | | | | | | | | | | | | | | | | | Block headers now store the chunk tree uuid Chunk items records the device uuid for each stripes Device extent items record better back refs to the chunk tree Block groups record better back refs to the chunk tree The chunk tree format has also changed. The objectid of BTRFS_CHUNK_ITEM_KEY used to be the logical offset of the chunk. Now it is a chunk tree id, with the logical offset being stored in the offset field of the key. This allows a single chunk tree to record multiple logical address spaces, upping the number of bytes indexed by a chunk tree from 2^64 to 2^128. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Add a min size parameter to btrfs_alloc_extentChris Mason2008-09-251-1/+2
| | | | | | | | | On huge machines, delayed allocation may try to allocate massive extents. This change allows btrfs_alloc_extent to return something smaller than the caller asked for, and the data allocation routines will loop over the allocations until it fills the whole delayed alloc. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Do metadata checksums for reads via a workqueueChris Mason2008-09-251-0/+4
| | | | | | | | | | | | | | | | Before, metadata checksumming was done by the callers of read_tree_block, which would set EXTENT_CSUM bits in the extent tree to show that a given range of pages was already checksummed and didn't need to be verified again. But, those bits could go away via try_to_releasepage, and the end result was bogus checksum failures on pages that never left the cache. The new code validates checksums when the page is read. It is a little tricky because metadata blocks can span pages and a single read may end up going via multiple bios. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix allocation profile initChris Mason2008-09-251-6/+7
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for duplicate blocks on a single spindleChris Mason2008-09-251-0/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for mirroring across drivesChris Mason2008-09-251-2/+7
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Reorder the flags field in struct btrfs_header and record a flag on writeoutChris Mason2008-09-251-3/+25
| | | | | | | | This allows detection of blocks that have already been written in the running transaction so they can be recowed instead of modified again. It is step one in trusting the transid field of the block pointers. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Create a btrfs backing dev infoChris Mason2008-09-251-0/+2
| | | | | | This allows intelligent versions of unplug and congestion functions Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Implement raid0 when multiple devices are presentChris Mason2008-09-251-0/+3
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for device scanning and detection ioctlsChris Mason2008-09-251-2/+19
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Bring back mount -o ssd optimizationsChris Mason2008-09-251-0/+3
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Move device information into the super block so it can be scannedChris Mason2008-09-251-19/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Make the FS tree the last objectid in the tree of tree rootsChris Mason2008-09-251-9/+8
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Dynamic chunk and block group allocationChris Mason2008-09-251-1/+11
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for multiple devices per filesystemChris Mason2008-09-251-27/+286
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: checksum file data at bio submission time instead of during writepageChris Mason2008-09-251-5/+3
| | | | | | | | | | | | | | | | When we checkum file data during writepage, the checksumming is done one page at a time, making it difficult to do bulk metadata modifications to insert checksums for large ranges of the file at once. This patch changes btrfs to checksum on a per-bio basis instead. The bios are checksummed before they are handed off to the block layer, so each bio is contiguous and only has pages from the same inode. Checksumming on a bio basis allows us to insert and modify the file checksum items in large groups. It also allows the checksumming to be done more easily by async worker threads. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: unaligned access fixesDavid Miller2008-09-251-11/+6
| | | | | | | | | | | | | Btrfs set/get macros lose type information needed to avoid unaligned accesses on sparc64. ere is a patch for the kernel bits which fixes most of the unaligned accesses on sparc64. btrfs_name_hash is modified to return the hash value instead of getting a return location via a (potentially unaligned) pointer. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix i_blocks accountingChris Mason2008-09-251-0/+9
| | | | | | | | | | Now that delayed allocation accounting works, i_blocks accounting is changed to only modify i_blocks when extents inserted or removed. The fillattr call is changed to include the delayed allocation byte count in the i_blocks result. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Update magicChris Mason2008-09-251-1/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add data block hints to SSD mode tooChris Mason2008-09-251-0/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: mount -o max_inline=size to control the maximum inline extent sizeChris Mason2008-09-251-0/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add inode item and backref in one insert, reducing cpu usageChris Mason2008-09-251-3/+14
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: During deletes and truncate, remove many items at once from the treeChris Mason2008-09-251-2/+10
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Split the extent_map code into two partsChris Mason2008-09-251-7/+8
| | | | | | | | | | | | | | There is now extent_map for mapping offsets in the file to disk and extent_io for state tracking, IO submission and extent_bufers. The new extent_map code shifts from [start,end] pairs to [start,len], and pushes the locking out into the caller. This allows a few performance optimizations and is easier to use. A number of extent_map usage bugs were fixed, mostly with failing to remove extent_map entries when changing the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix hole insertion corner casesChris Mason2008-09-251-0/+1
| | | | | | | | | | There were a few places that could cause duplicate extent insertion, this adjusts the code that creates holes to avoid it. lookup_extent_map is changed to correctly return all of the extents in a range, even when there are none matching at the start of the range. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount -o ssd, which includes optimizations for seek free storageChris Mason2008-09-251-0/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Run igrab on data=ordered inodes to prevent deadlocks during writeoutChris Mason2008-09-251-1/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Rework btrfs_drop_inode to avoid schedulingChris Mason2008-09-251-0/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
OpenPOWER on IntegriCloud