talos-obmc-linux - Talos™ II Linux sources for OpenBMC

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm	Linus Torvalds	2008-07-21	9	-47/+262
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: dm crypt: add merge dm table: remove merge_bvec sector restriction dm: linear add merge dm: introduce merge_bvec_fn dm snapshot: use per device mempools dm snapshot: fix race during exception creation dm snapshot: track snapshot reads dm mpath: fix test for reinstate_path dm mpath: return parameter error dm io: remove struct padding dm log: make dm_dirty_log init and exit static dm mpath: free path selector on invalid args
\| *	dm crypt: add merge	Milan Broz	2008-07-21	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements biovec merge function for crypt target. If the underlying device has merge function defined, call it. If not, keep precomputed value. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm table: remove merge_bvec sector restriction	Milan Broz	2008-07-21	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove max_sector restriction - merge function replaced it. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm: linear add merge	Milan Broz	2008-07-21	1	-5/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements biovec merge function for linear target. If the underlying device has merge function defined, call it. If not, keep precomputed value. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm: introduce merge_bvec_fn	Milan Broz	2008-07-21	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a bvec merge function for device mapper devices for dynamic size restrictions. This code ensures the requested biovec lies within a single target and then calls a target-specific function to check against any constraints imposed by underlying devices. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm snapshot: use per device mempools	Mikulas Patocka	2008-07-21	2	-18/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change snapshot per-module mempool to per-device mempool. Per-module mempools could cause a deadlock if multiple snapshot devices are stacked above each other. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm snapshot: fix race during exception creation	Mikulas Patocka	2008-07-21	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a race condition that returns incorrect data when a write causes an exception to be allocated whilst a read is still in flight. The race condition happens as follows: * A read to non-reallocated sector in the snapshot is submitted so that the read is routed to the original device. * A write to the original device is submitted. The write causes an exception that reallocates the block. The write proceeds. * The original read is dequeued and reads the wrong data. This race can be triggered with CFQ scheduler and one thread writing and multiple threads reading simultaneously. (This patch relies upon the earlier dm-kcopyd-per-device.patch to avoid a deadlock.) Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm snapshot: track snapshot reads	Mikulas Patocka	2008-07-21	2	-10/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Whenever a snapshot read gets mapped through to the origin, track it in a per-snapshot hash table indexed by chunk number, using memory allocated from a new per-snapshot mempool. We need to track these reads to avoid race conditions which will be fixed by patches that follow. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm mpath: fix test for reinstate_path	Alasdair G Kergon	2008-07-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix test for reinstate_path method before attempting to use it. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Julia Lawall <julia@diku.dk>
\| *	dm mpath: return parameter error	Mikulas Patocka	2008-07-21	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Return a specific error message if there are an invalid number of multipath arguments. This invalid command returns an "Unknown error" because the ti->error field is not set dmsetup create --table '0 2 multipath 0 0 1 1 round-robin 0 1 1 /dev/sdh' mpath0 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm io: remove struct padding	Richard Kennedy	2008-07-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rearrange struct dm_io. Shrinks size from 40 -> 32 allowing more objects/slab. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm log: make dm_dirty_log init and exit static	Adrian Bunk	2008-07-21	2	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dm_dirty_log_{init,exit}() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
\| *	dm mpath: free path selector on invalid args	Mikulas Patocka	2008-07-21	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Free path selector if the arguments are invalid. This command (note that it is invalid) causes reference leak on module "dm_round_robin" and prevents the module from being removed. dmsetup create --table '0 2 multipath 0 0 1 1 round-robin /dev/sdh' mpath0 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* \|	Merge branch 'for-linus' of git://neil.brown.name/md	Linus Torvalds	2008-07-21	9	-761/+752
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://neil.brown.name/md: (52 commits) md: Protect access to mddev->disks list using RCU md: only count actual openers as access which prevent a 'stop' md: linear: Make array_size sector-based and rename it to array_sectors. md: Make mddev->array_size sector-based. md: Make super_type->rdev_size_change() take sector-based sizes. md: Fix check for overlapping devices. md: Tidy up rdev_size_store a bit: md: Remove some unused macros. md: Turn rdev->sb_offset into a sector-based quantity. md: Make calc_dev_sboffset() return a sector count. md: Replace calc_dev_size() by calc_num_sectors(). md: Make update_size() take the number of sectors. md: Better control of when do_md_stop is allowed to stop the array. md: get_disk_info(): Don't convert between signed and unsigned and back. md: Simplify restart_array(). md: alloc_disk_sb(): Return proper error value. md: Simplify sb_equal(). md: Simplify uuid_equal(). md: sb_equal(): Fix misleading printk. md: Fix a typo in the comment to cmd_match(). ...
\| *	md: Protect access to mddev->disks list using RCU	NeilBrown	2008-07-21	2	-17/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All modifications and most access to the mddev->disks list are made under the reconfig_mutex lock. However there are three places where the list is walked without any locking. If a reconfig happens at this time, havoc (and oops) can ensue. So use RCU to protect these accesses: - wrap them in rcu_read_{,un}lock() - use list_for_each_entry_rcu - add to the list with list_add_rcu - delete from the list with list_del_rcu - delay the 'free' with call_rcu rather than schedule_work Note that export_rdev did a list_del_init on this list. In almost all cases the entry was not in the list anymore so it was a no-op and so safe. It is no longer safe as after list_del_rcu we may not touch the list_head. An audit shows that export_rdev is called: - after unbind_rdev_from_array, in which case the delete has already been done, - after bind_rdev_to_array fails, in which case the delete isn't needed. - before the device has been put on a list at all (e.g. in add_new_disk where reading the superblock fails). - and in autorun devices after a failure when the device is on a different list. So remove the list_del_init call from export_rdev, and add it back immediately before the called to export_rdev for that last case. Note also that ->same_set is sometimes used for lists other than mddev->list (e.g. candidates). In these cases rcu is not needed. Signed-off-by: NeilBrown <neilb@suse.de>
\| *	md: only count actual openers as access which prevent a 'stop'	NeilBrown	2008-07-21	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Open isn't the only thing that increments ->active. e.g. reading /proc/mdstat will increment it briefly. So to avoid false positives in testing for concurrent access, introduce a new counter that counts just the number of times the md device it open. Signed-off-by: NeilBrown <neilb@suse.de>
\| *	md: linear: Make array_size sector-based and rename it to array_sectors.	Andre Noll	2008-07-21	1	-8/+8
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
\| *	md: Make mddev->array_size sector-based.	Andre Noll	2008-07-21	8	-27/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch renames the array_size field of struct mddev_s to array_sectors and converts all instances to use units of 512 byte sectors instead of 1k blocks. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
\| *	md: Make super_type->rdev_size_change() take sector-based sizes.	Andre Noll	2008-07-21	1	-21/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also, change the type of the size parameter from unsigned long long to sector_t and rename it to num_sectors. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
\| *	md: Fix check for overlapping devices.	Andre Noll	2008-07-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The checks in overlaps() expect all parameters either in block-based or sector-based quantities. However, its single caller passes two rdev->data_offset arguments as well as two rdev->size arguments, the former being sector counts while the latter are measured in 1K blocks. This could cause rdev_size_store() to accept an invalid size from user space. Fix it by passing only sector-based quantities to overlaps(). Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
\| *	md: Tidy up rdev_size_store a bit:	Neil Brown	2008-07-21	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- used strict_strtoull in place of simple_strtoull - use my_mddev in place of rdev->mddev (they have the same value) and more significantly, - don't adjust mddev->size to fit, rather reject changes which make rdev->size smaller than mddev->size Adjusting mddev->size is a hangover from bind_rdev_to_array which does a similar thing. But it really is a better design to insist that mddev->size is set as required, then the rdev->sizes are set to allow for that. The previous way invites confusion. Signed-off-by: NeilBrown <neilb@suse.de>
\| *	md: Turn rdev->sb_offset into a sector-based quantity.	Andre Noll	2008-07-11	2	-48/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename it to sb_start to make sure all users have been converted. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: Make calc_dev_sboffset() return a sector count.	Andre Noll	2008-07-11	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As BLOCK_SIZE_BITS is 10 and MD_NEW_SIZE_SECTORS(2 * x) = 2 * NEW_SIZE_BLOCKS(x), the return value of calc_dev_sboffset() doubles. Fix up all three callers accordingly. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: Replace calc_dev_size() by calc_num_sectors().	Andre Noll	2008-07-11	1	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Number of sectors is the preferred unit for sizes of raid devices, so change calc_dev_size() so that it returns this unit instead of the number of 1K blocks. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: Make update_size() take the number of sectors.	Andre Noll	2008-07-11	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changing the internal representations of sizes of raid devices from 1K blocks to sector counts (512B units) is desirable because it allows to get rid of many divisions/multiplications and unnecessary casts that are present in the current code. This patch is a first step in this direction. It replaces the old 1K-based "size" argument of update_size() by "num_sectors" and fixes up its two callers. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: Better control of when do_md_stop is allowed to stop the array.	Neil Brown	2008-07-11	1	-14/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	do_md_stop check the number of active users before allowing the array to be stopped. Two problems: 1/ it assumes the request is coming through an open file descriptor (via ioctl) so it allows for that. This is not always the case. 2/ it doesn't do the check it the array hasn't been activated. This is not good for cases when we use an inactive array to hold some devices in a container. Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: get_disk_info(): Don't convert between signed and unsigned and back.	Andre Noll	2008-07-11	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current code copies a signed int from user space, converts it to unsigned and passes the unsigned value to find_rdev_nr() which expects a signed value. Simply pass the signed value from user space directly. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: Simplify restart_array().	Andre Noll	2008-07-11	1	-32/+17
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: alloc_disk_sb(): Return proper error value.	Andre Noll	2008-07-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If alloc_page() fails, ENOMEM is a more suitable error value than EINVAL. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: Simplify sb_equal().	Andre Noll	2008-07-11	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only caller of sb_equal() tests the return value against zero, so it's OK to return the negated return value of memcmp(). Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: Simplify uuid_equal().	Andre Noll	2008-07-11	1	-9/+4
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: sb_equal(): Fix misleading printk.	Andre Noll	2008-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: Fix a typo in the comment to cmd_match().	Andre Noll	2008-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: Fix typo in array_state comment.	Andre Noll	2008-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: sync_speed_show(): Trivial cleanups.	Andre Noll	2008-07-08	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Remove superfluous parentheses. - Make format string match the type of the variable that is printed. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: do_md_run(): Fix misleading error message.	Andre Noll	2008-07-08	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case pers->run() succeeds but creating the bitmap fails, we print an error message stating that pers->run() has failed. Print this message only if pers->run() really failed. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: md_getgeo(): Move comment to proper position.	Andre Noll	2008-07-08	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	md: md_ioctl(): Fix misleading indentation.	Andre Noll	2008-07-08	1	-3/+1
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: Neil Brown <neilb@suse.de>
\| *	Merge branch 'for-neil' of ↵	Neil Brown	2008-07-08	3	-16/+29
\| \|\ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/djbw/md into for-next
\| \| *	md: resolve external metadata handling deadlock in md_allow_write	Dan Williams	2008-06-30	3	-16/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	md_allow_write() marks the metadata dirty while holding mddev->lock and then waits for the write to complete. For externally managed metadata this causes a deadlock as userspace needs to take the lock to communicate that the metadata update has completed. Change md_allow_write() in the 'external' case to start the 'mark active' operation and then return -EAGAIN. The expected side effects while waiting for userspace to write 'active' to 'array_state' are holding off reshape (code currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and cause updates to 'raid_disks' to fail. Except for 'stripe_cache_size' changes these failures can be mitigated by coordinating with mdmon. md_write_start() still prevents writes from occurring until the metadata handler has had a chance to take action as it unconditionally waits for MD_CHANGE_CLEAN to be cleared. [neilb@suse.de: return -EAGAIN, try GFP_NOIO] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
\| * \|	Merge branch 'master' into for-next	Neil Brown	2008-07-08	1	-0/+1
\| \|\ \ \| \| \|/ \| \|/\|
\| * \|	md: rationalize raid5 function names	Dan Williams	2008-06-28	1	-36/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Dan Williams <dan.j.williams@intel.com> Commit a4456856 refactored some of the deep code paths in raid5.c into separate functions. The names chosen at the time do not consistently indicate what is going to happen to the stripe. So, update the names, and since a stripe is a cache element use cache semantics like fill, dirty, and clean. (also, fix up the indentation in fetch_block5) Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>
\| * \|	md: handle operation chaining in raid5_run_ops	Dan Williams	2008-06-28	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Dan Williams <dan.j.williams@intel.com> Neil said: > At the end of ops_run_compute5 you have: > /* ack now if postxor is not set to be run */ > if (tx && !test_bit(STRIPE_OP_POSTXOR, &s->ops_run)) > async_tx_ack(tx); > > It looks odd having that test there. Would it fit in raid5_run_ops > better? The intended global interpretation is that raid5_run_ops can build a chain of xor and memcpy operations. When MD registers the compute-xor it tells async_tx to keep the operation handle around so that another item in the dependency chain can be submitted. If we are just computing a block to satisfy a read then we can terminate the chain immediately. raid5_run_ops gives a better context for this test since it cares about the entire chain. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>
\| * \|	md: replace R5_WantPrexor with R5_WantDrain, add 'prexor' reconstruct_states	Dan Williams	2008-06-28	1	-60/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Dan Williams <dan.j.williams@intel.com> Currently ops_run_biodrain and other locations have extra logic to determine which blocks are processed in the prexor and non-prexor cases. This can be eliminated if handle_write_operations5 flags the blocks to be processed in all cases via R5_Wantdrain. The presence of the prexor operation is tracked in sh->reconstruct_state. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>
\| * \|	md: replace STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} with 'reconstruct_states'	Dan Williams	2008-06-28	1	-142/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Dan Williams <dan.j.williams@intel.com> Track the state of reconstruct operations (recalculating the parity block usually due to incoming writes, or as part of array expansion) Reduces the scope of the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags to only tracking whether a reconstruct operation has been requested via the ops_request field of struct stripe_head_state. This is the final step in the removal of ops.{pending,ack,complete,count}, i.e. the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags only request an operation and do not track the state of the operation. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>
\| * \|	md: replace STRIPE_OP_COMPUTE_BLK with STRIPE_COMPUTE_RUN	Dan Williams	2008-06-28	1	-47/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Dan Williams <dan.j.williams@intel.com> Track the state of compute operations (recalculating a block from all the other blocks in a stripe) with a state flag. Reduces the scope of the STRIPE_OP_COMPUTE_BLK flag to only tracking whether a compute operation has been requested via the ops_request field of struct stripe_head_state. Note, the compute operation that is performed in the course of doing a 'repair' operation (check the parity block, recalculate it and write it back if the check result is not zero) is tracked separately with the 'check_state' variable. Compute operations are held off while a 'check' is in progress, and moving this check out to handle_issuing_new_read_requests5 the helper routine __handle_issuing_new_read_requests5 can be simplified. This is another step towards the removal of ops.{pending,ack,complete,count}, i.e. STRIPE_OP_COMPUTE_BLK only requests an operation and does not track the state of the operation. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>
\| * \|	md: replace STRIPE_OP_BIOFILL with STRIPE_BIOFILL_RUN	Dan Williams	2008-06-28	1	-21/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Dan Williams <dan.j.williams@intel.com> Track the state of read operations (copying data from the stripe cache to bio buffers outside the lock) with a state flag. Reduce the scope of the STRIPE_OP_BIOFILL flag to only tracking whether a biofill operation has been requested via the ops_request field of struct stripe_head_state. This is another step towards the removal of ops.{pending,ack,complete,count}, i.e. STRIPE_OP_BIOFILL only requests an operation and does not track the state of the operation. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>
\| * \|	md: replace STRIPE_OP_CHECK with 'check_states'	Dan Williams	2008-06-28	1	-89/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Dan Williams <dan.j.williams@intel.com> The STRIPE_OP_* flags record the state of stripe operations which are performed outside the stripe lock. Their use in indicating which operations need to be run is straightforward; however, interpolating what the next state of the stripe should be based on a given combination of these flags is not straightforward, and has led to bugs. An easier to read implementation with minimal degrees of freedom is needed. Towards this goal, this patch introduces explicit states to replace what was previously interpolated from the STRIPE_OP_* flags. For now this only converts the handle_parity_checks5 path, removing a user of the ops.{pending,ack,complete,count} fields of struct stripe_operations. This conversion also found a remaining issue with the current code. There is a small window for a drive to fail between when we schedule a repair and when the parity calculation for that repair completes. When this happens we will writeback to 'failed_num' when we really want to write back to 'pd_idx'. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>
\| * \|	md: unify raid5/6 i/o submission	Dan Williams	2008-06-28	1	-61/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Dan Williams <dan.j.williams@intel.com> Let the raid6 path call ops_run_io to get pending i/o submitted. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>
\| * \|	md: use stripe_head_state in ops_run_io()	Dan Williams	2008-06-28	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Dan Williams <dan.j.williams@intel.com> In handle_stripe after taking sh->lock we sample some bits into 's' (struct stripe_head_state): s.syncing = test_bit(STRIPE_SYNCING, &sh->state); s.expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state); s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state); Use these values from 's' in ops_run_io() rather than re-sampling the bits. This ensures a consistent snapshot (as seen under sh->lock) is used. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de>