summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* fsnotify: include pathnames with entries when possibleEric Paris2009-06-114-21/+41
| | | | | | | | | | When inotify wants to send events to a directory about a child it includes the name of the original file. This patch collects that filename and makes it available for notification. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
* fsnotify: generic notification queue and waitqEric Paris2009-06-114-7/+278
| | | | | | | | | | | inotify needs to do asyc notification in which event information is stored on a queue until the listener is ready to receive it. This patch implements a generic notification queue for inotify (and later fanotify) to store events to be sent at a later time. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
* dnotify: reimplement dnotify using fsnotifyEric Paris2009-06-117-183/+398
| | | | | | | | Reimplement dnotify using fsnotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
* fsnotify: parent event notificationEric Paris2009-06-116-10/+205
| | | | | | | | | | | inotify and dnotify both use a similar parent notification mechanism. We add a generic parent notification mechanism to fsnotify for both of these to use. This new machanism also adds the dentry flag optimization which exists for inotify to dnotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
* fsnotify: add marks to inodes so groups can interpret how to handle those inodesEric Paris2009-06-119-3/+483
| | | | | | | | | | | | | | | | | This patch creates a way for fsnotify groups to attach marks to inodes. These marks have little meaning to the generic fsnotify infrastructure and thus their meaning should be interpreted by the group that attached them to the inode's list. dnotify and inotify will make use of these markings to indicate which inodes are of interest to their respective groups. But this implementation has the useful property that in the future other listeners could actually use the marks for the exact opposite reason, aka to indicate which inodes it had NO interest in. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
* fsnotify: unified filesystem notification backendEric Paris2009-06-119-35/+705
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
* Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2009-06-11158-2760/+3790
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c
| * block: add request clone interface (v2)Kiyoshi Ueda2009-06-112-0/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the following 2 interfaces for request-stacking drivers: - blk_rq_prep_clone(struct request *clone, struct request *orig, struct bio_set *bs, gfp_t gfp_mask, int (*bio_ctr)(struct bio *, struct bio*, void *), void *data) * Clones bios in the original request to the clone request (bio_ctr is called for each cloned bios.) * Copies attributes of the original request to the clone request. The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not copied. - blk_rq_unprep_clone(struct request *clone) * Frees cloned bios from the clone request. Request stacking drivers (e.g. request-based dm) need to make a clone request for a submitted request and dispatch it to other devices. To allocate request for the clone, request stacking drivers may not be able to use blk_get_request() because the allocation may be done in an irq-disabled context. So blk_rq_prep_clone() takes a request allocated by the caller as an argument. For each clone bio in the clone request, request stacking drivers should be able to set up their own completion handler. So blk_rq_prep_clone() takes a callback function which is called for each clone bio, and a pointer for private data which is passed to the callback. NOTE: blk_rq_prep_clone() doesn't copy any actual data of the original request. Pages are shared between original bios and cloned bios. So caller must not complete the original request before the clone request. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * floppy: fix hibernationOndrej Zary2009-06-101-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on Ingo Molnar's patch from 2006, this makes the floppy work after resume from hibernation, at least on my machine. This fix resets the floppy controller on resume. It was experimentally determined to bring the controller back to life - we don't really know why it works. floppy_init() does the same thing at boot/modprobe time. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * ramdisk: remove long-deprecated "ramdisk=" boot-time parameterRobert P. J. Day2009-06-101-6/+1
| | | | | | | | | | | | | | | | | | | | | | The "ramdisk" parameter was removed from the defunct rd.c file quite some time ago, in favour of the more specific "ramdisk_size" parameter so, for consistency, the same should be done here. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * fs/bio.c: add missing __user annotationMichal Simek2009-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by sparse: fs/bio.c:720:13: warning: incorrect type in assignment (different address spaces) fs/bio.c:720:13: expected char *iov_addr fs/bio.c:720:13: got void [noderef] <asn:1>* fs/bio.c:724:36: warning: incorrect type in argument 2 (different address spaces) fs/bio.c:724:36: expected void const [noderef] <asn:1>*from fs/bio.c:724:36: got char *iov_addr Signed-off-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: prevent possible io_context->refcount overflowNikanth Karthikesan2009-06-103-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently io_context has an atomic_t(32-bit) as refcount. In the case of cfq, for each device against whcih a task does I/O, a reference to the io_context would be taken. And when there are multiple process sharing io_contexts(CLONE_IO) would also have a reference to the same io_context. Theoretically the possible maximum number of processes sharing the same io_context + the number of disks/cfq_data referring to the same io_context can overflow the 32-bit counter on a very high-end machine. Even though it is an improbable case, let us make it atomic_long_t. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * Add serial number support for virtio_blk, V4ajohn cooper2009-06-092-3/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extracts the opaque data from pci i/o region 0 via the added VIRTIO_BLK_F_IDENTIFY field. By convention this data takes the form of that returned by an ATA IDENTIFY DEVICE command, however the driver (except for structure size) makes no interpretation of the data. The structure data is copied wholesale to userspace via a HDIO_GET_IDENTITY ioctl command (eg: hdparm -i <dev>). Signed-off-by: john cooper <john.cooper@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: Add missing bounce_pfn stacking and fix commentsMartin K. Petersen2009-06-091-2/+3
| | | | | | | | | | | | | | | | | | | | DM no longer needs to set limits explicitly when calling blk_stack_limits. Let the latter automatically deal with bounce_pfn scaling. Fix kerneldoc variable names. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * Revert "block: Fix bounce limit setting in DM"Jens Axboe2009-06-093-19/+1
| | | | | | | | | | | | | | | | This reverts commit a05c0205ba031c01bba33a21bf0a35920eb64833. DM doesn't need to access the bounce_pfn directly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: decode unit attention in SCSI error handling codescameron@beardog.cca.cpqcorp.net2009-06-091-7/+16
| | | | | | | | | | | | | | | | | | Make SCSI reset error handler decode unit attention ASC and after a target reset wait for a unit attention that indicates a reset occurred rather than just for any old unit attention. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: Remove no longer needed sendcmd reject processing codescameron@beardog.cca.cpqcorp.net2009-06-092-104/+3
| | | | | | | | | | | | | | | | | | | | | | Now that the cciss SCSI error handling routines operate with interrupts enabled, we no longer need to maintain the list of command completions that sendcmd() might inadvertantly scoop up, since now it only runs at driver init time, and there won't be any other commands for it to scoop up. So we can remove that list and the code that adds to it and processes it. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: change SCSI error handling routines to work with interrupts enabled.scameron@beardog.cca.cpqcorp.net2009-06-092-15/+12
| | | | | | | | | | | | | | Change cciss scsi error handling routines to work with interrupts enabled. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: separate error processing and command retrying code in ↵scameron@beardog.cca.cpqcorp.net2009-06-092-32/+46
| | | | | | | | | | | | | | | | | | | | | | | | sendcmd_withirq_core() Separate the error processing from sendcmd_withirq_core from the code which retries commands. The rationale for this is that the SCSI error handling code can then be made to use sendcmd_withirq_core, but avoid retrying commands. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: factor out fix target status processing code from sendcmd functionsscameron@beardog.cca.cpqcorp.net2009-06-091-22/+34
| | | | | | | | | | | | | | | | | | Factor out code to process target status of completed commands in sendcmd() and sendcmd_withirq_core(), and fix problem that bad target status was ignored in sendcmd_withirq_core. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: simplify interface of sendcmd() and sendcmd_withirq()scameron@beardog.cca.cpqcorp.net2009-06-093-84/+59
| | | | | | | | | | | | | | | | Simplify interfaces of sendcmd() and sendcmd_withirq() so that they provide only one way to address commands instead of three ways. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: factor out core of sendcmd_withirq() for use by SCSI error handling codescameron@beardog.cca.cpqcorp.net2009-06-091-87/+96
| | | | | | | | | | | | | | | | Factor the core of sendcmd_withirq out to provide a simpler interface which provides access to full error information. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: Use schedule_timeout_uninterruptible in SCSI error handling codescameron@beardog.cca.cpqcorp.net2009-06-091-1/+1
| | | | | | | | | | | | | | | | | | Use schedule_timeout_uninterruptible instead of schedule_timeout in the scsi error handling code when waiting between TUR polls since we are not interested in nor want to be interrupted by signals. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: needs to set the residual length of a bidi requestFUJITA Tomonori2009-06-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | Tejun's "block: set rq->resid_len to blk_rq_bytes() on issue" patch seems to be incomplete; It doesn't set rq->resid_len to blk_rq_bytes() for a bidi request (req->next_rq). As a result, all bidi users are broken. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * Revert "block: implement blkdev_readpages"Jens Axboe2009-06-041-7/+0
| | | | | | | | | | | | | | | | | | | | This reverts commit db2dbb12dc47a50c7a4c5678f526014063e486f6. It apparently causes problems with partition table read-ahead on archs with large page sizes. Until that problem is diagnosed further, just drop the readpages support on block devices. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: Fix bounce limit setting in DMMartin K. Petersen2009-06-033-1/+19
| | | | | | | | | | | | | | | | | | | | blk_queue_bounce_limit() is more than a wrapper about the request queue limits.bounce_pfn variable. Introduce blk_queue_bounce_pfn() which can be called by stacking drivers that wish to set the bounce limit explicitly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txtvibi sreenivasan2009-06-021-1/+1
| | | | | | | | | | | | | | | | File Documentation/PCI/PCI-DMA-mapping.txt does not exist. Documentation/DMA-mapping.txt contains DMA Mapping details Signed-off-by: vibi sreenivasan <vibi_sreenivasan@cms.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: use schedule_timeout_interruptible()Andrew Morton2009-06-021-2/+1
| | | | | | | | | | | | | | | | | | | | Use schedule_timeout_interruptible() instead of open-coding the set and schedule parts. Cc: Mike Miller <mikem@beardog.cca.cpqcorp.net> Cc: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: add cciss driver sysfs entriesAndrew Patterson2009-06-023-10/+314
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add sysfs entries to the cciss driver needed for the dm/multipath tools. A file for vendor, model, rev, and unique_id is added for each logical drive under directory /sys/bus/pci/devices/<dev>/ccissX/cXdY. Where X = the controller (or host) number and Y is the logical drive number. A link from /sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY to /sys/block/cciss!cXdY/device is also created. A bus is created in /sys/bus/cciss. A link is created from the pci ccissX entry to /sys/bus/cciss/devices/ccissX. Please consider this for inclusion. Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: fix SCSI device reset handlerStephen M. Cameron2009-06-022-9/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the SCSI reset error handler to send a working, properly addressed reset message to the target device and add code to wait for the target device to become ready by polling it with Test Unit Ready. The existing reset code was broken in that it didn't bother to set the 8-byte LUN address to anything besides zero, so the command was addressed to the controller, which pretended to the driver that the command succeeded, while doing nothing. Ages ago I tested this code, but unbeknownst to me, my test was flawed, and what I thought was a tape drive getting reset was actually nothing of the sort. Unfortunately, there is still lots of Smartarray firmware that doesn't handle doing target resets right, and this code won't help in those cases, but it also shouldn't make things worse in those cases than they already are. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Cc: Mike Miller <mikem@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * cciss: factor out core of sendcmd() for a more sane interfaceStephen M. Cameron2009-06-021-116/+109
| | | | | | | | | | | | | | | | | | | | | | | | Factor out the core of sendcmd() to provide a simpler interface which exposes all the error information to the caller and make the original sendcmd use this new function. Rationale: The SCSI error handling routines need to send commands with interrupts turned off, but they also need access to the full error information. Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Cc: Mike Miller <mikem@beardog.cca.cpqcorp.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: fix a possible oops on elv_abort_queue()Kiyoshi Ueda2009-06-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found one more mis-conversion to the 'request is always dequeued when completing' model in elv_abort_queue() during code inspection. Although I haven't hit any problem caused by this mis-conversion yet and just done compile/boot test, please apply if you have no problem. Request must be dequeued when it completes. However, elv_abort_queue() completes requests without dequeueing. This will cause oops in the __blk_end_request_all(). This patch fixes the oops. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: fix an oops on BLKPREP_KILLJames Bottomley2009-05-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Doing a bit of torture testing, I ran across a BUG in the block subsystem (at blk-core.c:2048): the test for if the request is queued. It turns out the trigger was a BLKPREP_KILL coming out of the SCSI prep function. Currently for BLKPREP_KILL requests, we send them straight into __blk_end_request_all() with an error, but they've never been dequeued, so they trip the bug. Fix this by starting requests before killing them. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: export blk_stack_limits()Mike Snitzer2009-05-281-0/+1
| | | | | | | | | | | | | | | | DM needs to use blk_stack_limits(), so it needs to be exported. Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: fix no diskstat problemKiyoshi Ueda2009-05-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit below in 2.6-block/for-2.6.31 causes no diskstat problem because the blk_discard_rq() check was added with '&&'. It should be 'blk_fs_request() || blk_discard_rq()'. This patch does it and fixes the no diskstat problem. Please review and apply. ------ /proc/diskstat without this patch ------------------------------------- 8 0 sda 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------ ----- /proc/diskstat with this patch applied --------------------------------- 8 0 sda 4186 303 373621 61600 9578 3859 107468 169479 2 89755 231059 ------------------------------------------------------------------------------ -------------------------------------------------------------------------- commit c69d48540c201394d08cb4d48b905e001313d9b8 Author: Jens Axboe <jens.axboe@oracle.com> Date: Fri Apr 24 08:12:19 2009 +0200 block: include discard requests in IO accounting We currently don't do merging on discard requests, but we potentially could. If we do, then we need to include discard requests in the IO accounting, or merging would end up decrementing in_flight IO counters for an IO which never incremented them. So enable accounting for discard requests. <snip> static inline int blk_do_io_stat(struct request *rq) { - return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq); + return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq) && + blk_discard_rq(rq); } -------------------------------------------------------------------------- Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: fix oops with block tag queueingJames Bottomley2009-05-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e8939a50466fd963eb1ba9118c34b9ffb7ff6aa6 Author: Tejun Heo <tj@kernel.org> Date: Fri May 8 11:54:16 2009 +0900 block: implement and enforce request peek/start/fetch Added a BUG_ON(blk_queued_rq(req)) to the top of blk_finish_req(). Unfortunately, this checks whether req->queuelist is empty. This list is doing double duty both as the queue list and the tag list, so tagged requests come in here with this not empty and boom (the tag list is emptied by blk_queue_end_tag() lower down). Fix this by moving the BUG_ON to below the end tag we also seem vulnerable to this in blk_requeue_request() as well. I think all uses of blk_queued_rq() need auditing because the check is clearly wrong in the tagged case. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * ide-disk: fix missing max_sectors accessor functionMartin K. Petersen2009-05-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The recent move to accessor functions for querying queue limits missed an entry in ide-disk.c: drivers/ide/ide-disk.c: In function ‘ide_disk_setup’: drivers/ide/ide-disk.c:642: error: ‘struct request_queue’ has no member named ‘max_sectors’ Fix it. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: Export I/O topology for block devices and partitionsMartin K. Petersen2009-05-227-0/+347
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support devices with physical block sizes bigger than 512 bytes we need to ensure proper alignment. This patch adds support for exposing I/O topology characteristics as devices are stacked. logical_block_size is the smallest unit the device can address. physical_block_size indicates the smallest I/O the device can write without incurring a read-modify-write penalty. The io_min parameter is the smallest preferred I/O size reported by the device. In many cases this is the same as the physical block size. However, the io_min parameter can be scaled up when stacking (RAID5 chunk size > physical block size). The io_opt characteristic indicates the optimal I/O size reported by the device. This is usually the stripe width for arrays. The alignment_offset parameter indicates the number of bytes the start of the device/partition is offset from the device's natural alignment. Partition tools and MD/DM utilities can use this to pad their offsets so filesystems start on proper boundaries. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: Expose stacked device queues in sysfsMartin K. Petersen2009-05-222-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently stacking devices do not have a queue directory in sysfs. However, many of the I/O characteristics like sector size, maximum request size, etc. are queue properties. This patch enables the queue directory for MD/DM devices. The elevator code has been modified to deal with queues that do not have an I/O scheduler. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: Move queue limits to an embedded structMartin K. Petersen2009-05-222-39/+60
| | | | | | | | | | | | | | | | To accommodate stacking drivers that do not have an associated request queue we're moving the limits to a separate, embedded structure. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: Use accessor functions for queue limitsMartin K. Petersen2009-05-2225-97/+147
| | | | | | | | | | | | | | | | Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: Do away with the notion of hardsect_sizeMartin K. Petersen2009-05-2254-98/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * Merge branch 'master' into for-2.6.31Jens Axboe2009-05-2299-384/+559
| |\ | | | | | | | | | | | | | | | | | | Conflicts: drivers/ide/ide-io.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * \ Merge branch 'master' into for-2.6.31Jens Axboe2009-05-22976-19430/+23650
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/block/hd.c drivers/block/mg_disk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | xen-blkfront: beyond ARRAY_SIZE of info->shadowRoel Kluin2009-05-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not go beyond ARRAY_SIZE of info->shadow Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | block: change the tag sync vs async restriction logicJens Axboe2009-05-205-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Make them fully share the tag space, but disallow async requests using the last any two slots. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | scsi_lib: remove unused variableBoaz Harrosh2009-05-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last request completion cleanup in scsi_lib left an unused this_count variable in scsi_io_completion(). (It was used before in a code segment that now uses blk_end_request_all()) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | block: add warning to blk_make_request()Jens Axboe2009-05-191-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a note about how one needs to be careful when setting up these bio chains. Extracted from Boaz's updated patch. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | block: Un-export blk_rq_append_bioBoaz Harrosh2009-05-193-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OSD was the last in-tree user of blk_rq_append_bio(). Now that it is fixed blk_rq_append_bio is un-exported and is only used internally by block layer. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | libosd: Use of new blk_make_requestBoaz Harrosh2009-05-191-25/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use new blk_make_request() to allocate a request from bio and avoid using deprecated blk_rq_append_bio(). This patch is dependent on a block layer patch titled: [BLOCK] New blk_make_request() takes bio returns request This is the last usage of blk_rq_append_bio in osd, it can now be un-exported. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> CC: Jeff Garzik <jeff@garzik.org> CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
OpenPOWER on IntegriCloud