blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2015-09-05	18	-1074/+865
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "In this one: - d_move fixes (Eric Biederman) - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error handling ought to be fixed) - switch of sb_writers to percpu rwsem (Oleg Nesterov) - superblock scalability (Josef Bacik and Dave Chinner) - swapon(2) race fix (Hugh Dickins)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits) vfs: Test for and handle paths that are unreachable from their mnt_root dcache: Reduce the scope of i_lock in d_splice_alias dcache: Handle escaped paths in prepend_path mm: fix potential data race in SyS_swapon inode: don't softlockup when evicting inodes inode: rename i_wb_list to i_io_list sync: serialise per-superblock sync operations inode: convert inode_sb_list_lock to per-sb inode: add hlist_fake to avoid the inode hash lock in evict writeback: plug writeback at a high level change sb_writers to use percpu_rw_semaphore shift percpu_counter_destroy() into destroy_super_work() percpu-rwsem: kill CONFIG_PERCPU_RWSEM percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire() percpu-rwsem: introduce percpu_down_read_trylock() document rwsem_release() in sb_wait_write() fix the broken lockdep logic in __sb_start_write() introduce __sb_writers_{acquired,release}() helpers ufs_inode_get{frag,block}(): get rid of 'phys' argument ufs_getfrag_block(): tidy up a bit ...
\| *	vfs: Test for and handle paths that are unreachable from their mnt_root	Eric W. Biederman	2015-08-21	1	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In rare cases a directory can be renamed out from under a bind mount. In those cases without special handling it becomes possible to walk up the directory tree to the root dentry of the filesystem and down from the root dentry to every other file or directory on the filesystem. Like division by zero .. from an unconnected path can not be given a useful semantic as there is no predicting at which path component the code will realize it is unconnected. We certainly can not match the current behavior as the current behavior is a security hole. Therefore when encounting .. when following an unconnected path return -ENOENT. - Add a function path_connected to verify path->dentry is reachable from path->mnt.mnt_root. AKA to validate that rename did not do something nasty to the bind mount. To avoid races path_connected must be called after following a path component to it's next path component. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	dcache: Reduce the scope of i_lock in d_splice_alias	Eric W. Biederman	2015-08-21	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	i_lock is only needed until __d_find_any_alias calls dget on the alias dentry. After that the reference to new ensures that dentry_kill and d_delete will not remove the inode from the dentry, and remove the dentry from the inode->d_entry list. The inode i_lock came to be held over the the __d_move calls in d_splice_alias through a series of introduction of locks with increasing smaller scope. First it was the dcache_lock, then it was the dcache_inode_lock, and finally inode->i_lock. Furthermore inode->i_lock is not held over any other calls to d_move or __d_move so it can not provide any meaningful rename protection. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	dcache: Handle escaped paths in prepend_path	Eric W. Biederman	2015-08-21	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A rename can result in a dentry that by walking up d_parent will never reach it's mnt_root. For lack of a better term I call this an escaped path. prepend_path is called by four different functions __d_path, d_absolute_path, d_path, and getcwd. __d_path only wants to see paths are connected to the root it passes in. So __d_path needs prepend_path to return an error. d_absolute_path similarly wants to see paths that are connected to some root. Escaped paths are not connected to any mnt_root so d_absolute_path needs prepend_path to return an error greater than 1. So escaped paths will be treated like paths on lazily unmounted mounts. getcwd needs to prepend "(unreachable)" so getcwd also needs prepend_path to return an error. d_path is the interesting hold out. d_path just wants to print something, and does not care about the weird cases. Which raises the question what should be printed? Given that <escaped_path>/<anything> should result in -ENOENT I believe it is desirable for escaped paths to be printed as empty paths. As there are not really any meaninful path components when considered from the perspective of a mount tree. So tweak prepend_path to return an empty path with an new error code of 3 when it encounters an escaped path. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	Merge branch 'superblock-scaling' of ↵	Al Viro	2015-08-21	8	-79/+108
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-next Conflicts: include/linux/fs.h
\| \| *	inode: don't softlockup when evicting inodes	Josef Bacik	2015-08-18	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a box with a lot of ram (148gb) I can make the box softlockup after running an fs_mark job that creates hundreds of millions of empty files. This is because we never generate enough memory pressure to keep the number of inodes on our unused list low, so when we go to unmount we have to evict ~100 million inodes. This makes one processor a very unhappy person, so add a cond_resched() in dispose_list() and if we need a resched when processing the s_inodes list do that and run dispose_list() on what we've currently culled. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz>
\| \| *	inode: rename i_wb_list to i_io_list	Dave Chinner	2015-08-17	3	-28/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's a small consistency problem between the inode and writeback naming. Writeback calls the "for IO" inode queues b_io and b_more_io, but the inode calls these the "writeback list" or i_wb_list. This makes it hard to an new "under writeback" list to the inode, or call it an "under IO" list on the bdi because either way we'll have writeback on IO and IO on writeback and it'll just be confusing. I'm getting confused just writing this! So, rename the inode "for IO" list variable to i_io_list so we can add a new "writeback list" in a subsequent patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
\| \| *	sync: serialise per-superblock sync operations	Dave Chinner	2015-08-17	2	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When competing sync(2) calls walk the same filesystem, they need to walk the list of inodes on the superblock to find all the inodes that we need to wait for IO completion on. However, when multiple wait_sb_inodes() calls do this at the same time, they contend on the the inode_sb_list_lock and the contention causes system wide slowdowns. In effect, concurrent sync(2) calls can take longer and burn more CPU than if they were serialised. Stop the worst of the contention by adding a per-sb mutex to wrap around wait_sb_inodes() so that we only execute one sync(2) IO completion walk per superblock superblock at a time and hence avoid contention being triggered by concurrent sync(2) calls. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
\| \| *	inode: convert inode_sb_list_lock to per-sb	Dave Chinner	2015-08-17	8	-51/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The process of reducing contention on per-superblock inode lists starts with moving the locking to match the per-superblock inode list. This takes the global lock out of the picture and reduces the contention problems to within a single filesystem. This doesn't get rid of contention as the locks still have global CPU scope, but it does isolate operations on different superblocks form each other. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Dave Chinner <dchinner@redhat.com>
\| \| *	writeback: plug writeback at a high level	Dave Chinner	2015-08-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Doing writeback on lots of little files causes terrible IOPS storms because of the per-mapping writeback plugging we do. This essentially causes imeediate dispatch of IO for each mapping, regardless of the context in which writeback is occurring. IOWs, running a concurrent write-lots-of-small 4k files using fsmark on XFS results in a huge number of IOPS being issued for data writes. Metadata writes are sorted and plugged at a high level by XFS, so aggregate nicely into large IOs. However, data writeback IOs are dispatched in individual 4k IOs, even when the blocks of two consecutively written files are adjacent. Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem, metadata CRCs enabled. Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches) Test: $ ./fs_mark -D 10000 -S0 -n 10000 -s 4096 -L 120 -d /mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d /mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d /mnt/scratch/6 -d /mnt/scratch/7 Result: wall sys create rate Physical write IO time CPU (avg files/s) IOPS Bandwidth ----- ----- ------------ ------ --------- unpatched 6m56s 15m47s 24,000+/-500 26,000 130MB/s patched 5m06s 13m28s 32,800+/-600 1,500 180MB/s improvement -26.44% -14.68% +36.67% -94.23% +38.46% If I use zero length files, this workload at about 500 IOPS, so plugging drops the data IOs from roughly 25,500/s to 1000/s. 3 lines of code, 35% better throughput for 15% less CPU. The benefits of plugging at this layer are likely to be higher for spinning media as the IO patterns for this workload are going make a much bigger difference on high IO latency devices..... Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| * \|	Merge branch 'ufs' into for-next	Al Viro	2015-08-18	6	-882/+644
\| \|\ \
\| \| * \|	ufs_inode_get{frag,block}(): get rid of 'phys' argument	Al Viro	2015-07-06	1	-15/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just pass NULL as locked_page in case of first block in the indirect chain. Old calling conventions aside, a reason for having 'phys' was that ufs_inode_getfrag() used to be able to do _two_ allocations - indirect block and extending/reallocating a tail. We needed locked_page for the latter (it's a data), but we also needed to figure out that indirect block is metadata. So we used to pass non-NULL locked_page in all cases and used NULL phys as indication of being asked to allocate an indirect. With tail unpacking taken into a separate function we don't need those convolutions anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_getfrag_block(): tidy up a bit	Al Viro	2015-07-06	1	-33/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_inode_getblock(): failure to read an indirect block is -EIO	Al Viro	2015-07-06	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... and not "write to beginning of the disk", TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_getfrag_block(): turn following indirects into a loop	Al Viro	2015-07-06	1	-24/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_inode_getfrag(): pass index instead of 'fragment'	Al Viro	2015-07-06	1	-33/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	same story as with ufs_inode_getblock() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_inode_getfrag(): split extending the partial blocks off	Al Viro	2015-07-06	1	-63/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ufs_extend_tail() is handling that now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_inode_getblock(): pass indirect block number and full index	Al Viro	2015-07-06	1	-46/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... instead of messing with buffer_head. We can bloody well do sb_bread() in there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_inode_getblock(): pass index instead of 'fragment'	Al Viro	2015-07-06	1	-19/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value passed to ufs_inode_getblock() as the 3rd argument had lower bits ignored; the upper bits were shifted down and used and they actually make sense - those are _lower_ bits of index in indirect block (i.e. they form the index within a fragment within an indirect block). Pass those as argument. Upper bits of index (i.e. the number of fragment within indirect block) will join them shortly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_inode_get{frag,block}(): leave sb_getblk() to caller	Al Viro	2015-07-06	1	-33/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	just return the damn block number Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_getfrag_block(): get rid of macro jungles	Al Viro	2015-07-06	1	-29/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_inode_get{frag,block}(): consolidate success exits	Al Viro	2015-07-06	1	-28/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These calling conventions are rudiments of pre-2.3 times; they really need to be sanitized. This is the first step; next will be _always_ returning a block number, instead of this "return a pointer to buffer_head, except when we get to the actual data" crap. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs: use the branch depth in ufs_getfrag_block()	Al Viro	2015-07-06	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	we'd already calculated it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs: move calculation of offsets into ufs_getfrag_block()	Al Viro	2015-07-06	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... and massage ufs_frag_map() to take those instead of fragment number. As it is, we duplicate the damn thing on the write side, open-coded and bloody hard to follow. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_inode_get{frag,block}(): get rid of retries	Al Viro	2015-07-06	1	-35/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are holding ->truncate_mutex, so nobody else can alter our block pointers. Rechecks/retries were needed back when we only held BKL there, and had to cope with write_begin/writepage and writepage/truncate races. Can't happen anymore... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	__ufs_truncate_blocks(): avoid excessive dirtying of indirect blocks	Al Viro	2015-07-06	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's a case when an indirect block gets dirtied for no good reason - when there's a hole starting in the middle of area covered by it and spanning past its end, and truncate() is done precisely to the beginning of the hole. The block is obviously not modified at all - all removals happen beyond it. However, existing code ends up dirtying it just in case. It's trivial to fix and while it's not a real bug by any stretch of imagination, it makes the damn thing harder to follow. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	free_full_branch(): don't bother modifying the block we are going to free	Al Viro	2015-07-06	1	-12/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note that it's already made unreachable from the inode, so we don't have to worry about ufs_frag_map() walking into something already freed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	move marking inode dirty to the end of __ufs_truncate_blocks()	Al Viro	2015-07-06	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	free_full_branch(): saner calling conventions	Al Viro	2015-07-06	1	-49/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Have caller fetch the block number and remove it from wherever it was. Pass the block number instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_trunc_branch(): kill recursion	Al Viro	2015-07-06	1	-26/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	turn recursion into a pair of loops Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_trunc_branch(): massage towards killing recursion	Al Viro	2015-07-06	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We always have 0 < depth2 <= depth in there, so if (--depth) { if (--depth2) A B } else { C // not using depth2 } D // not using depth2 is equivalent to if (--depth2) A with s/depth/depth - 1/ if (--depth) B else C D Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	split ufs_truncate_branch() into full- and partial-branch variants	Al Viro	2015-07-06	1	-16/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs: unify the logics for collecting adjacent data blocks to free	Al Viro	2015-07-06	1	-34/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	open-coded in several places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_trunc_branch(): separate the calls with non-NULL offsets	Al Viro	2015-07-06	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_trunc_branch(): never call with offsets != NULL && depth2 == 0	Al Viro	2015-07-06	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For calls in __ufs_truncate_blocks() it's just a matter of not incrementing offsets[0] and not making that call - immediately following loop will be executed one extra time and we'll be just fine. For recursive call in ufs_trunc_branch() itself, just assing NULL to offsets if we would be about to make such call. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	__ufs_trunc_blocks(): turn the part after switch into a loop	Al Viro	2015-07-06	1	-25/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... and turn the switch into if (), since all cases with depth != 1 have just become identical. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	__ufs_truncate_blocks(): unify freeing the full branches	Al Viro	2015-07-06	1	-15/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	unify ufs_trunc_..indirect()	Al Viro	2015-07-06	1	-138/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_trunc_..indirect(): more massage towards unifying	Al Viro	2015-07-06	1	-17/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of manually checking that the array contains only zeroes, find the position of the last non-zero (in __ufs_truncate(), where we can conveniently do that) and use that to tell if there's any non-zero in the array tail passed to ufs_trunc_...indirect(). The goal of all that clumsiness is to get fold these functions together. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_trunc_...indirect(): pass the array of indices instead of offsets	Al Viro	2015-07-06	1	-28/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rather than bitslicing the offset just formed as sum of shifted indices, pass the array of those indices itself. NULL is used as equivalent of "all zeroes" (== free the entire branch). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	__ufs_truncate(); find cutoff distances into branches by offsets[] array	Al Viro	2015-07-06	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_trunc_dindirect(): pass the number of blocks to keep	Al Viro	2015-07-06	1	-31/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	same as the previous two. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_trunc_indirect(): pass the index of the first pointer to free	Al Viro	2015-07-06	1	-33/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... instead of file offset. Same cleanups as in the tindirect conversion in previous commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs_trunc_tindirect(): pass the number of blocks to keep	Al Viro	2015-07-06	1	-17/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IOW, the distance of cutoff from the begining of the branch (in blocks). That (and the fact that block just prior to cutoff is guaranteed to be present) allows to tell whether to free triple indirect block just by looking at the offset. While we are at it, using u64 for index in the block is wrong - those should be unsigned int. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs: beginning of __ufs_truncate_block() massage	Al Viro	2015-07-06	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use ufs_block_to_path() to find the cutoff path in the block pointers' tree. For now just use the information about the depth (to bypass the fully preserved subtrees); subsequent commits will use the information about actual path. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs: the offsets ufs_block_to_path() puts into array are not sector_t	Al Viro	2015-07-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type makes no sense - those are indices in block number arrays, not block numbers. And no, UFS is not likely to grow indirect blocks with 4Gpointers in them... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs: move truncate code into inode.c	Al Viro	2015-07-06	4	-533/+470
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is closely tied to block pointers handling there, can benefit from existing helpers, etc. - no point keeping them apart. Trimmed the trailing whitespaces in inode.c at the same time. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs: no retries are needed on truncate	Al Viro	2015-07-06	1	-40/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs: ufs_trunc_...() has exclusion with everything that might cause allocations	Al Viro	2015-07-06	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently - on lock_ufs(), eventually - on per-inode mutex. lock_ufs() used to be mere BKL, which is much weaker, so it needed those rechecks. BKL doesn't provide any exclusion once we lose CPU; its blind replacement, OTOH, _does_. Making that per-filesystem was an atrocity, but at least we can simplify life here. And yes, we certainly need to make that sucker per-inode - these days inode.c and truncate.c uses are needed only to protect the block pointers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| \| * \|	ufs: ufs_trunc_direct() always returns 0	Al Viro	2015-07-06	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	make it return void Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>