summaryrefslogtreecommitdiffstats
path: root/block
Commit message (Collapse)AuthorAgeFilesLines
...
* [PATCH] Split struct request ->flags into two partsJens Axboe2006-09-304-83/+52
| | | | | | | | | | Right now ->flags is a bit of a mess: some are request types, and others are just modifiers. Clean this up by splitting it into ->cmd_type and ->cmd_flags. This allows introduction of generic Linux block message types, useful for sending generic Linux commands to block devices. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] ifdef blktrace debugging fieldsAlexey Dobriyan2006-09-292-4/+5
| | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] block: handle subsystem_register() init errorsRandy Dunlap2006-09-291-2/+7
| | | | | | | | | Check and handle init errors. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_privateTheodore Ts'o2006-09-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The following patches reduce the size of the VFS inode structure by 28 bytes on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction in the inode size on a UP kernel that is configured in a production mode (i.e., with no spinlock or other debugging functions enabled; if you want to save memory taken up by in-core inodes, the first thing you should do is disable the debugging options; they are responsible for a huge amount of bloat in the VFS inode structure). This patch: The filesystem or device-specific pointer in the inode is inside a union, which is pretty pointless given that all 30+ users of this field have been using the void pointer. Get rid of the union and rename it to i_private, with a comment to explain who is allowed to use the void pointer. This is just a cleanup, but it allows us to reuse the union 'u' for something something where the union will actually be used. [judith@osdl.org: powerpc build fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Judith Lebzelter <judith@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge mulgrave-w:git/linux-2.6James Bottomley2006-09-231-0/+12
|\ | | | | | | | | | | | | | | Conflicts: include/linux/blkdev.h Trivial merge to incorporate tag prototypes.
| * Add a real API for dealing with blk_congestion_wait()Trond Myklebust2006-09-221-0/+12
| | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | [SCSI] block: add support for shared tag mapsJames Bottomley2006-08-311-21/+88
|/ | | | | | | | | | The current block queue implementation already contains most of the machinery for shared tag maps. The only remaining pieces are a way to allocate and destroy a tag map independently of the queues (so that the maps can be managed on the life cycle of the overseeing entity) Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* elv_unregister: fix possible crash on module unloadOleg Nesterov2006-08-221-1/+2
| | | | | | | | | | An exiting task or process which didn't do I/O yet have no io context, elv_unregister() should check it is not NULL. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* [PATCH] cfq_cic_link: fix usage of wrong cfq_io_contextOleg Nesterov2006-08-211-1/+1
| | | | | | | | Obviously, cfq_cic_link() shouldn't free a just allocated cfq_io_context? The dead key is from __cic, so drop that. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Fix current_io_context() vs set_task_ioprio() raceOleg Nesterov2006-08-211-0/+2
| | | | | | | | | | | I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe. current_io_context() initializes "struct io_context", then sets ->io_context. set_task_ioprio() running on another cpu may see the changes out of order, so ->set_ioprio(ioc) may use io_context which was not initialized properly. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: don't use a hard jiffies value, translate from msecsJens Axboe2006-07-251-1/+1
| | | | | | | | | | | | The CIC_SEEKY() test really wants to use the minimum of either: - 2 msecs (not jiffies) - or, the pending slice time So code it like that. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] blktrace: fix read-ahead bitMilton Miller2006-07-251-1/+1
| | | | | | It should be toggling the same bit on and off, fix it up. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] lockdep: annotate the BLKPG_DEL_PARTITION ioctlArjan van de Ven2006-07-141-2/+2
| | | | | | | | | | | The delete partition IOCTL takes the bd_mutex for both the disk and the partition; these have an obvious hierarchical relationship and this patch annotates this relationship for lockdep. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Only the first two bits in bio->bi_rw and rq->flags matchJens Axboe2006-07-061-2/+2
| | | | | | | Not three, as assumed. This causes the barrier bit to be needlessly set for some IO. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] blktrace: readahead supportNathan Scott2006-07-061-1/+4
| | | | | | | | Provide the needed kernel support for distinguishing readahead from regular read requests when tracing block devices. Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] lockdep: annotate on-stack completionsIngo Molnar2006-07-031-1/+1
| | | | | | | | | | | | lockdep needs to have the waitqueue lock initialized for on-stack waitqueues implicitly initialized by DECLARE_COMPLETION(). Annotate on-stack completions accordingly. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds2006-06-307-7/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: Remove obsolete #include <linux/config.h> remove obsolete swsusp_encrypt arch/arm26/Kconfig typos Documentation/IPMI typos Kconfig: Typos in net/sched/Kconfig v9fs: do not include linux/version.h Documentation/DocBook/mtdnand.tmpl: typo fixes typo fixes: specfic -> specific typo fixes in Documentation/networking/pktgen.txt typo fixes: occuring -> occurring typo fixes: infomation -> information typo fixes: disadvantadge -> disadvantage typo fixes: aquire -> acquire typo fixes: mecanism -> mechanism typo fixes: bandwith -> bandwidth fix a typo in the RTC_CLASS help text smb is no longer maintained Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
| * Remove obsolete #include <linux/config.h>Jörn Engel2006-06-307-7/+0
| | | | | | | | | | Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* | [PATCH] Light weight event countersChristoph Lameter2006-06-301-2/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The remaining counters in page_state after the zoned VM counter patches have been applied are all just for show in /proc/vmstat. They have no essential function for the VM. We use a simple increment of per cpu variables. In order to avoid the most severe races we disable preempt. Preempt does not prevent the race between an increment and an interrupt handler incrementing the same statistics counter. However, that race is exceedingly rare, we may only loose one increment or so and there is no requirement (at least not in kernel) that the vm event counters have to be accurate. In the non preempt case this results in a simple increment for each counter. For many architectures this will be reduced by the compiler to a single instruction. This single instruction is atomic for i386 and x86_64. And therefore even the rare race condition in an interrupt is avoided for both architectures in most cases. The patchset also adds an off switch for embedded systems that allows a building of linux kernels without these counters. The implementation of these counters is through inline code that hopefully results in only a single instruction increment instruction being emitted (i386, x86_64) or in the increment being hidden though instruction concurrency (EPIC architectures such as ia64 can get that done). Benefits: - VM event counter operations usually reduce to a single inline instruction on i386 and x86_64. - No interrupt disable, only preempt disable for the preempt case. Preempt disable can also be avoided by moving the counter into a spinlock. - Handling is similar to zoned VM counters. - Simple and easily extendable. - Can be omitted to reduce memory use for embedded use. References: RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2 RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2 local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2 V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpu hotplug: use hotplug version of cpu notifier in appropriate placesChandra Seetharaman2006-06-271-3/+1
| | | | | | | | | | Make use the of newly defined hotplug version of cpu_notifier functionality wherever appropriate. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17Chandra Seetharaman2006-06-271-1/+1
| | | | | | | | | This patch reverts notifier_block changes made in 2.6.17 Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* spelling fixesAndreas Mohr2006-06-262-3/+3
| | | | | | | | | | | | acquired (aquired) contiguous (contigious) successful (succesful, succesfull) surprise (suprise) whether (weather) some other misspellings Signed-off-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* [BLOCK] Fix bounce limit address checkAndi Kleen2006-06-231-1/+1
| | | | | | | | Do a safer check for when to enable DMA. Currently we enable ISA DMA for cases that do not need it, resulting in OOM conditions when ZONE_DMA runs out of space. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] rbtree: support functions used by the io schedulersJens Axboe2006-06-233-32/+20
| | | | | | | They all duplicate macros to check for empty root and/or node, and clearing a node. So put those in rbtree.h. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: rq update fixesJens Axboe2006-06-231-5/+5
| | | | | | | | | - Remember to set ->last_sector so that the cfq_choose_req() logic works correctly. - Remove redundant call to cfq_choose_req() Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: many performance fixesJens Axboe2006-06-231-40/+76
| | | | | | | | | | | | | | | | | | | | This is a collection of patches that greatly improve CFQ performance in some circumstances. - Change the idling logic to only kick in after a request is done and we are deciding what to do. Before the idling included the request service time, so it was hard to adjust. Now it's true think/idle time. - Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep it in control for sync and sequential (or close to) workloads. - Expire queues immediately and move on to other busy queues, if we are not going to idle after the current one finishes. - Don't rearm idle timer if there are no busy queues. Just leave the system idle. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: correctly set ioprio on both targetsJens Axboe2006-06-231-3/+2
| | | | | | | | | | | | | | | | | Patch originally from Vasily Tarasov <vtaras@sw.ru> If you set io-priority of process 1 using sys_ioprio_set system call by another process 2 (like ionice do), then cfq_init_prio_data() function sets priority of process 2 (current) on queue of process 1 and clears the flag, that designates change of ioprio. So the process 1 will work like with priority of process 2. I propose not to call cfq_init_prio_data() on io-priority change, but only mark queue as queue with changed prority. Every time when new request comes cfq-scheduler checks for this flag and atomaticaly changes priority of queue to new value. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Make CFQ the default IO schedulerJens Axboe2006-06-231-1/+1
| | | | Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Kill PF_SYNCWRITE flagJens Axboe2006-06-233-4/+5
| | | | | | | | | | | | | | | A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] cfq-iosched: Don't set the queue batching limitsJens Axboe2006-06-231-45/+3
| | | | | | | | We cannot update them if the user changes nr_requests, so don't set it in the first place. The gains are pretty questionable as well. The batching loss has been shown to decrease throughput. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] remove dead code from elevator switchingDave Jones2006-06-231-3/+0
| | | | | | | | | | We already drop the refcount in elevator_exit(), and as we're setting 'e' to NULL, we'll never take that branch anyway. Finally, as 'e' is a local var that isn't referenced afterwards, setting it to NULL is pointless. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] blk_start_queue() must be called with irq disabled - add warningPaolo 'Blaisorblade' Giarrusso2006-06-231-1/+4
| | | | | | | | | | | | | | The queue lock can be taken from interrupts so it must always be taken with irq disabling primitives. Some primitives already verify this. blk_start_queue() is called under this lock, so interrupts must be disabled. Also document this requirement clearly in blk_init_queue(), where the queue spinlock is set. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] iosched: use hlist for request hashtableAkinobu Mita2006-06-232-49/+35
| | | | | | | | | | | | Use hlist instead of list_head for request hashtable in deadline-iosched and as-iosched. It also can remove the flag to know hashed or unhashed. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Jens Axboe <axboe@suse.de> block/as-iosched.c | 45 +++++++++++++++++++-------------------------- block/deadline-iosched.c | 39 ++++++++++++++++----------------------- 2 files changed, 35 insertions(+), 49 deletions(-)
* [PATCH] list: use list_replace_init() instead of list_splice_init()Oleg Nesterov2006-06-231-3/+2
| | | | | | | | | | list_splice_init(list, head) does unneeded job if it is known that list_empty(head) == 1. We can use list_replace_init() instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Driver core: add generic "subsystem" link to all devicesKay Sievers2006-06-211-5/+2
| | | | | | | | | | | Like the SUBSYTEM= key we find in the environment of the uevent, this creates a generic "subsystem" link in sysfs for every device. Userspace usually doesn't care at all if its a "class" or a "bus" device. This provides an unified way to determine the subsytem of a device, regardless of the way the driver core has created it. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Fix up CFQ scheduler for recent rbtree node shrinkageLinus Torvalds2006-06-201-1/+0
| | | | | | | | The color is now in the low bits of the parent pointer, and initializing it to 0 happens as part of the whole memset above, so just remove the unnecessary RB_CLEAR_COLOR. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge git://git.infradead.org/~dwmw2/rbtree-2.6Linus Torvalds2006-06-203-13/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/~dwmw2/rbtree-2.6: [RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency Update UML kernel/physmem.c to use rb_parent() accessor macro [RBTREE] Update hrtimers to use rb_parent() accessor macro. [RBTREE] Add explicit alignment to sizeof(long) for struct rb_node. [RBTREE] Merge colour and parent fields of struct rb_node. [RBTREE] Remove dead code in rb_erase() [RBTREE] Update JFFS2 to use rb_parent() accessor macro. [RBTREE] Update eventpoll.c to use rb_parent() accessor macro. [RBTREE] Update key.c to use rb_parent() accessor macro. [RBTREE] Update ext3 to use rb_parent() accessor macro. [RBTREE] Change rbtree off-tree marking in I/O schedulers. [RBTREE] Add accessor macros for colour and parent fields of rb_node
| * [RBTREE] Change rbtree off-tree marking in I/O schedulers.David Woodhouse2006-04-213-13/+5
| | | | | | | | | | | | | | | | | | They were abusing the rb_color field to mark nodes which weren't currently on the tree. Fix that to use the same method as eventpoll did -- setting the parent pointer to point back to itself. And use the appropriate accessor macros for setting and reading the parent. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
* | [PATCH] cfq-iosched: fix crash in do_div()Jens Axboe2006-06-141-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | We don't clear the seek stat values in cfq_alloc_io_context(), and if ->seek_mean is unlucky enough to be set to -36 by chance, the first invocation of cfq_update_io_seektime() will oops with a divide by zero in do_div(). Just memset the entire cic instead of filling invididual values independently. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] elevator switching raceJens Axboe2006-06-085-45/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a race between shutting down one io scheduler and firing up the next, in which a new io could enter and cause the io scheduler to be invoked with bad or NULL data. To fix this, we need to maintain the queue lock for a bit longer. Unfortunately we cannot do that, since the elevator init requires to be run without the lock held. This isn't easily fixable, without also changing the mempool API. So split the initialization into two parts, and alloc-init operation and an attach operation. Then we can preallocate the io scheduler and related structures, and run the attach inside the lock after we detach the old one. This patch has survived 30 minutes of 1 second io scheduler switching with a very busy io load. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] cfq-iosched: busy_rr fairness fixJens Axboe2006-06-011-2/+5
| | | | | | | | | | | | | | Now that we select busy_rr for possible service, insert entries at the back of that list instead of at the front. Signed-off-by: Jens Axboe <axboe@suse.de>
* | [PATCH] cfq-iosched: fix bug in timer handling for the idle classJens Axboe2006-06-011-4/+3
| | | | | | | | | | | | | | | | | | There's a small window from when the timer is entered and we grab the queue lock, where cfq_set_active_queue() could be rearming the timer for us. Seen in the wild on a 12-way ppc box. Fix this by just using mod_timer(), which will do the right thing for us. Signed-off-by: Jens Axboe <axboe@suse.de>
* | [PATCH] cfq-iosched: Detect hardware queueingJens Axboe2006-06-011-2/+13
| | | | | | | | | | | | | | | | If the hardware is doing real queueing, decide that it's worthless to idle the hardware. It does reasonable simultaneous io in that case anyways, and the idling hurts some work loads. Signed-off-by: Jens Axboe <axboe@suse.de>
* | [PATCH] cfq-iosched: Detect idle process issuing async requestJens Axboe2006-06-011-3/+13
| | | | | | | | | | | | | | | | If we are anticipating a sync request from this process and we are waiting for that and see an async request come in, expire that slice and move on. Signed-off-by: Jens Axboe <axboe@suse.de>
* | [PATCH] cfq-iosched: check busy queues before deciding we are idleJens Axboe2006-06-011-0/+7
| | | | | | | | | | | | | | | | For just one busy queue (like async write out), we often overlooked that we could queue more io and decided we were idle instead. This causes us quite a bit of performance loss. Signed-off-by: Jens Axboe <axboe@suse.de>
* | [PATCH] cfq-iosched: fixup locking and ->queue_list list managementJens Axboe2006-05-301-12/+13
| | | | | | | | | | | | | | | | - Drop cic from the list when seen as dead. - Fixup the locking, just use a simple spinlock. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] blk: fix gendisk->in_flight accounting during barrier sequenceJens Axboe2006-05-231-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While executing barrrier sequence, the bar_rq which carries actual write was accounted as normal IO on completion, while it wasn't on queueing. This caused gendisk->in_flight to be decremented by 1 after each barrier thus messed up statistics. This patch makes bar_rq not accounted as normal IO. As the containing barrier request as a whole is accounted, part of it shouldn't be. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | Revert "[BLOCK] Fix oops on removal of SD/MMC card"Linus Torvalds2006-05-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 56cf6504fc1c0c221b82cebc16a444b684140fb7. Both Erik Mouw and Andrew Vasquez independently pinpointed this commit as causing problems, where the slab cache for a driver is never released (most obviously causing problems when immediately re-loading that driver, resulting in a "kmem_cache_create: duplicate cache <xyz>" message, but it can also cause other trouble). James Bottomley dug into it, and reports: "OK, here's the scoop. The problem patch adds a get of driverfs_dev in add_disk(), but doesn't put it again until disk_release() (which occurs on final put_disk() of the gendisk). However, in SCSI, the driverfs_dev is the sdev_gendev. That means there's a reference held on sdev_gendev until final disk put. Unfortunately, we use the driver model driver_remove to trigger del_gendisk (which removes the gendisk from visibility and decrements the refcount), so we've introduced an unbreakable deadlock in the reference counting with this. I suggest simply reversing this patch at the moment. If Russell and Jens can tell me what they're trying to do I'll see if there's another way to do it." so hereby the patch gets reverted, waiting for a better fix. Cc: Jens Axboe <axboe@suse.de> Cc: Russell King <rmk@arm.linux.org.uk> Cc: James Bottomley <James.Bottomley@SteelEye.com> Cc: Erik Mouw <erik@harddisk-recovery.com> Cc: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [BLOCK] limit request_fn recursionJens Axboe2006-05-112-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't recurse back into the driver even if the unplug threshold is met, when the driver asks for a requeue. This is both silly from a logical point of view (requeues typically happen due to driver/hardware shortage), and also dangerous since we could hit an endless request_fn -> requeue -> unplug -> request_fn loop and crash on stack overrun. Also limit blk_run_queue() to one level of recursion, similar to how blk_start_queue() works. This patch fixed a real problem with SLES10 and lpfc, and it could hit any SCSI lld that returns non-zero from it's ->queuecommand() handler. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [BLOCK] Fix oops on removal of SD/MMC cardRussell King2006-05-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The block layer keeps a reference (driverfs_dev) to the struct device associated with the block device, and uses it internally for generating uevents in block_uevent. Block device uevents include umounting the partition, which can occur after the backing device has been removed. Unfortunately, this reference is not counted. This means that if the struct device is removed from the device tree, the block layers reference will become stale. Guard against this by holding a reference to the struct device in add_disk(), and only drop the reference when we're releasing the gendisk kobject - in other words when we can be sure that no further uevents will be generated for this block device. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Jens Axboe <axboe@suse.de>
OpenPOWER on IntegriCloud