summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-2.6.40/drivers' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-05-2527-307/+2677
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.40/drivers' of git://git.kernel.dk/linux-2.6-block: (110 commits) loop: handle on-demand devices correctly loop: limit 'max_part' module param to DISK_MAX_PARTS drbd: fix warning drbd: fix warning drbd: Fix spelling drbd: fix schedule in atomic drbd: Take a more conservative approach when deciding max_bio_size drbd: Fixed state transitions after async outdate-peer-handler returned drbd: Disallow the peer_disk_state to be D_OUTDATED while connected drbd: Fix for the connection problems on high latency links drbd: fix potential activity log refcount imbalance in error path drbd: Only downgrade the disk state in case of disk failures drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int drbd: fix potential distributed deadlock lru_cache.h: fix comments referring to ts_ instead of lc_ drbd: Fix for application IO with the on-io-error=pass-on policy xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. xen/blkback: don't fail empty barrier requests xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end() ...
| * loop: handle on-demand devices correctlyNamhyung Kim2011-05-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When finding or allocating a loop device, loop_probe() did not take partition numbers into account so that it can result to a different device. Consider following example: $ sudo modprobe loop max_part=15 $ ls -l /dev/loop* brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0 brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1 brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2 brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3 brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4 brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5 brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6 brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7 $ sudo mknod /dev/loop8 b 7 128 $ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img $ sudo losetup -a /dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img) $ ls -l /dev/loop* brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0 brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1 brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128 brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1 brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2 brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3 brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2 brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3 brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4 brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5 brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6 brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7 brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8 After this patch, /dev/loop8 - instead of /dev/loop128 - was accessed correctly. In addition, 'range' passed to blk_register_region() should include all range of dev_t that LOOP_MAJOR can address. It does not need to be limited by partition numbers unless 'max_loop' param was specified. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * loop: limit 'max_part' module param to DISK_MAX_PARTSNamhyung Kim2011-05-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'max_part' parameter controls the number of maximum partition a loop block device can have. However if a user specifies very large value it would exceed the limitation of device minor number and can cause a kernel panic (or, at least, produce invalid device nodes in some cases). On my desktop system, following command kills the kernel. On qemu, it triggers similar oops but the kernel was alive: $ sudo modprobe loop max_part0000 ------------[ cut here ]------------ kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65! invalid opcode: 0000 [#1] SMP last sysfs file: CPU 0 Modules linked in: loop(+) Pid: 43, comm: insmod Tainted: G W 2.6.39-qemu+ #155 Bochs Bochs RIP: 0010:[<ffffffff8113ce61>] [<ffffffff8113ce61>] internal_create_group= +0x2a/0x170 RSP: 0018:ffff880007b3fde8 EFLAGS: 00000246 RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4 RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878 RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000 R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50 R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800 FS: 0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000= 00 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c= 0) Stack: ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b 0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868 0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8 Call Trace: [<ffffffff811e66dd>] ? device_add+0x4bc/0x5af [<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e [<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12 [<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16 [<ffffffff8116a090>] blk_register_queue+0x47/0xf7 [<ffffffff8116f527>] add_disk+0xdf/0x290 [<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop] [<ffffffffa0006000>] ? 0xffffffffa0005fff [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e [<ffffffff81096804>] sys_init_module+0x9c/0x1e0 [<ffffffff813329bb>] system_call_fastpath+0x16/0x1b Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb= 48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe = 48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49 RIP [<ffffffff8113ce61>] internal_create_group+0x2a/0x170 RSP <ffff880007b3fde8> ---[ end trace a123eb592043acad ]--- Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * drbd: fix warningAndrew Morton2011-05-242-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | In file included from drivers/block/drbd/drbd_main.c:54: drivers/block/drbd/drbd_int.h:1190: warning: parameter has incomplete type Forward declarations of enums do not work. Fix it unpleasantly by moving the prototype. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Lars Ellenberg <drbd-dev@lists.linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * drbd: fix warningPhilipp Reisner2011-05-242-7/+1
| | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
| * drbd: Fix spellingBart Van Assche2011-05-2411-45/+45
| | | | | | | | | | | | | | | | Found these with the help of ispell -l. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
| * drbd: fix schedule in atomicLars Ellenberg2011-05-243-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An administrative detach used to request a state change directly to D_DISKLESS, first suspending IO to avoid the last put_ldev() occuring from an endio handler, potentially in irq context. This is not enough on the receiving side (typically secondary), we may miss some peer_req on the way to local disk, which then may do the last put_ldev() from their drbd_peer_request_endio(). This patch makes the detach always go through the intermediate D_FAILED state. We may consider to rename it D_DETACHING. Alternative approach would be to create yet an other work item to be scheduled on the worker, do the destructor work from there, and get the timing right. manually picked commit 564040f from the drbd 8.4 branch. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Take a more conservative approach when deciding max_bio_sizePhilipp Reisner2011-05-244-50/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old (optimistic) implementation could shrink the bio size on an primary device. Shrinking the bio size on a primary device is bad. Since there we might get BIOs with the old (bigger) size shortly after we published the new size. The new implementation is more conservative, and eventually increases the max_bio_size on a primary device (which is valid). It does so, when it knows the local limit AND the remote limit. We cache the last seen max_bio_size of the peer in the meta data, and rely on that, to make the operation of single nodes more efficient. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Fixed state transitions after async outdate-peer-handler returnedPhilipp Reisner2011-05-241-1/+14
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Disallow the peer_disk_state to be D_OUTDATED while connectedPhilipp Reisner2011-05-241-0/+3
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Fix for the connection problems on high latency linksPhilipp Reisner2011-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | It seems that the real cause of all the issues where that we did not noticed in drbd_try_connect() when the other guy closes one socket if the round trip time gets higher than 100ms. There were that 100ms hard coded! Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: fix potential activity log refcount imbalance in error pathLars Ellenberg2011-05-241-1/+1
| | | | | | | | | | | | | | | | | | It is no longer sufficient to trigger on local WRITE, we need to check on (rq_state & RQ_IN_ACT_LOG) before calling drbd_al_complete_io also in the error path. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Only downgrade the disk state in case of disk failuresPhilipp Reisner2011-05-241-1/+2
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: fix disconnect/reconnect loop, if ping-timeout == ping-intLars Ellenberg2011-05-241-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is no replication traffic within the idle timeout (ping-int seconds), DRBD will send a P_PING, and adjust the timeout to ping-timeout. If there is no P_PING_ACK received within this ping-timeout, DRBD finally drops the connection, and tries to re-establish it. To decide which timeout was active, we compared the current timeout with the ping-timeout, and dropped the connection, if that was the case. By default, ping-int is 10 seconds, ping-timeout is 500 ms. Unfortunately, if you configure ping-timeout to be the same as ping-int, expiry of the idle-timeout had been mistaken for a missing ping ack, and caused an immediate reconnection attempt. Fix: Allow both timeouts to be equal, use a local variable to store which timeout is active. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: fix potential distributed deadlockLars Ellenberg2011-05-241-35/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We limit ourselves to a configurable maximum number of pages used as temporary bio pages. If the configured "max_buffers" is not big enough to match the bandwidth of the respective deployment, a distributed deadlock could be triggered by e.g. fast online verify and heavy application IO. TCP connections would block on congestion, because both receivers would wait on pages to become available. Fortunately the respective senders in this case would be able to give back some pages already. So do that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * lru_cache.h: fix comments referring to ts_ instead of lc_Lars Ellenberg2011-05-241-6/+6
| | | | | | | | | | | | | | | | | | | | For some time we contemplated calling the "struct lru_cache" a "struct tracked_set", and some comments kept the ts_ prefix. Fix those to match the member field names. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Fix for application IO with the on-io-error=pass-on policyPhilipp Reisner2011-05-242-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case a write failes on the local disk, go into D_INCONSISTENT disk state. That causes future reads of that block to be shipped to the peer. Read retry remote was already in place. Actually the documentation needs to get fixed now. Since the application is still shielded from the error. (as long as we have only a single disk failing) The difference to detach is that we keep the disk. And therefore might keep all the other, still working sectors up to date. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * Merge branches 'for-jens/xen-backend-fixes' and 'for-jens/xen-blkback-v3.3' ↵Jens Axboe2011-05-196-0/+1850
| |\ | | | | | | | | | of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-2.6.40/drivers
| | * xen/blkback: don't fail empty barrier requestsJan Beulich2011-05-181-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sector number on empty barrier requests may (will?) be -1, which, given that it's being treated as unsigned 64-bit quantity, will almost always exceed the actual (virtual) disk's size. Inspired by Konrad's "When writting barriers set the sector number to zero...". While at it also add overflow checking to the math in vbd_translate(). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: fix xenbus_transaction_start() hang caused by double ↵Laszlo Ersek2011-05-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xenbus_transaction_end() vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double xenbus_transaction_end() calls. The next down_read() in xenbus_transaction_start() (at eg. the next resize attempt) hangs. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317 Acked-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Align the tabs on the structure.Konrad Rzeszutek Wilk2011-05-121-1/+1
| | | | | | | | | | | | | | | | | | The recent changes caused this field of the structure to be offset a bit. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: if log_stats is enabled print out the data.Konrad Rzeszutek Wilk2011-05-121-1/+1
| | | | | | | | | | | | | | | And not depend on the driver being built with -DDEBUG flag. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Add the prefix XEN in the common.h.Konrad Rzeszutek Wilk2011-05-121-3/+3
| | | | | | | | | | | | Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Prefix 'vbd' with 'xen' in structs and functions.Konrad Rzeszutek Wilk2011-05-123-29/+29
| | | | | | | | | | | | Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Change structure name blkif_st to xen_blkif.Konrad Rzeszutek Wilk2011-05-123-27/+27
| | | | | | | | | | | | | | | | | | No need for that '_st' and xen_blkif is more apt. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Remove the unused typedefs.Konrad Rzeszutek Wilk2011-05-121-4/+0
| | | | | | | | | | | | Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Move include/xen/blkif.h into drivers/block/xen-blkback/common.hKonrad Rzeszutek Wilk2011-05-122-96/+71
| | | | | | | | | | | | | | | Not point of the blkif.h file. It is not used by the frontend. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Fixing some more of the cleanpatch.pl warnings.Konrad Rzeszutek Wilk2011-05-122-3/+3
| | | | | | | | | | | | Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Checkpatch.pl recommend against multiple assigments.Konrad Rzeszutek Wilk2011-05-122-5/+10
| | | | | | | | | | | | | | | | | | CHECK: multiple assignments should be avoided Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Fix checkpatch.pl warnings about more than 80 lines.Konrad Rzeszutek Wilk2011-05-121-3/+6
| | | | | | | | | | | | | | | | | | Break up the macro usage. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Flesh out the description in the Kconfig.Konrad Rzeszutek Wilk2011-05-121-0/+13
| | | | | | | | | | | | | | | | | | with more details. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Fix spelling mistakes.Konrad Rzeszutek Wilk2011-05-121-2/+2
| | | | | | | | | | | | Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Move blkif_get_x86_[32|64]_req to common.h in block/xen-blkback ↵Konrad Rzeszutek Wilk2011-05-122-30/+32
| | | | | | | | | | | | | | | | | | | | | | | | dir. From the blkif.h header, which was exposed to the frontend. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Removing the debug_lvl option.Konrad Rzeszutek Wilk2011-05-121-7/+0
| | | | | | | | | | | | | | | | | | It is not really used for anything. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Use the DRV_PFX in the pr_.. macros.Konrad Rzeszutek Wilk2011-05-123-22/+23
| | | | | | | | | | | | | | | To make it easier to read. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Make the DPRINTK uniform.Konrad Rzeszutek Wilk2011-05-122-8/+3
| | | | | | | | | | | | Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Change printk/DPRINTK to pr_.. type variant.Konrad Rzeszutek Wilk2011-05-122-40/+37
| | | | | | | | | | | | | | | | | | And also make them uniform and prefix the message with 'xen-blkback'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Fixed up comments and converted spaces to tabs.Konrad Rzeszutek Wilk2011-05-113-81/+105
| | | | | | | | | | | | | | | Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Fix up some of the comments.Konrad Rzeszutek Wilk2011-05-051-3/+3
| | | | | | | | | | | | | | | | | | They had the wrong data or were in the wrong spot. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Squash the checking for operation into dispatch_rw_block_ioKonrad Rzeszutek Wilk2011-05-051-32/+13
| | | | | | | | | | | | | | | | | | | | | | | | We do a check for the operations right before calling dispatch_rw_block_io. And then we do the same check in dispatch_rw_block_io. This patch squashes those checks into the 'dispatch_rw_block_io' function. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Add support for BLKIF_OP_FLUSH_DISKCACHE and drop ↵Konrad Rzeszutek Wilk2011-05-053-25/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BLKIF_OP_WRITE_BARRIER. We drop the support for 'feature-barrier' and add in the support for the 'feature-flush-cache' if the real backend storage supports flushing. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen-blkfront: Provide for 'feature-flush-cache' the ↵Konrad Rzeszutek Wilk2011-05-051-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BLKIF_OP_WRITE_FLUSH_CACHE operation. The operation BLKIF_OP_WRITE_FLUSH_CACHE has existed in the Xen tree header file for years but it was never present in the Linux tree because the frontend (nor the backend) supported this interface. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * Revert "xen/blkback: Move the plugging/unplugging to a higher level."Konrad Rzeszutek Wilk2011-04-271-6/+7
| | | | | | | | | | | | | | | | | | This reverts commit 97961ef46b9b5a6a7c918a38b898a7b3e49869f4 b/c we lose about 15% performance if we do the unplugging and the end of the reading the ring buffer.
| | * xen/blkback: Stick REQ_SYNC on WRITEs to deal with CFQ I/O scheduler.Konrad Rzeszutek Wilk2011-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If one runs a simple fio request with random read/write with a 20%/80% ratio, the numbers are incredibly bad when using the CFQ scheduler. IOmeter | | | | 64K, randrw | NOOP | CFQ | deadline | randrwmix=80 | | | | --------------+-------+------+----------+ blkback |103/27 |32/10 | 102/27 | --------------+-------+------+----------+ QEMU qdisk |103/27 |102/27| 102/27 | The problem as explained by Vivek Goyal was: ".. that difference is that sync vs async requests. In the case of a kernel thread submitting IO, [..] all the WRITES might be being considered as async and will go in a different queue. If you mix those with some READS, they are always sync and will go in differnet queue. In presence of sync queue, CFQ will idle and choke up WRITES in an attempt to improve latencies of READs. In case of AIO [note: this is what QEMU qdisk is doing] , [..] it is direct IO and both READS and WRITES will be considered SYNC and will go in a single queue and no choking of WRITES will take place." The solution is quite simple, tack on REQ_SYNC (which is what the WRITE_ODIRECT macro points to) and the numbers go back up. Suggested-by: Vivek Goyal <vgoyal@redhat.com Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Move the plugging/unplugging to a higher level.Konrad Rzeszutek Wilk2011-04-261-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to the plug/unplug on the submit_bio. But that means if within a stream of WRITE, WRITE, WRITE,...,WRITE we have one READ, it could stall the pipeline (as the 'submio_bio' could trigger the unplug_fnc to be called and stall/sync when doing the READ). Instead we want to move the unplugging when the whole (or as a much as possible) ring buffer has been processed. This also eliminates us doing plug/unplug for each request. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Prefix exposed functions with xen_Konrad Rzeszutek Wilk2011-04-203-66/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | And also shorten the name if it has blkback to blkbk. This results in the symbol table (if compiled in the kernel) to be much shorter, prettier, and also easier to search for. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen-blkback: Inline some of the functions that were moved from vbd/interface.cKonrad Rzeszutek Wilk2011-04-203-93/+65
| | | | | | | | | | | | | | | | | | Shuffling code around. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen-blkback: Remove from the copyright notice the address.Konrad Rzeszutek Wilk2011-04-201-3/+0
| | | | | | | | | | | | | | | | | | | | | There is no need for it, as the address is updated constatly in the root of the Linux kernel. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Squash vbd.c,interface.c in blkback.c and xenbus.c respectivly.Konrad Rzeszutek Wilk2011-04-205-348/+287
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Stodden suggested to eliminate vbd.c and interface.c, inlining the critical bits where they belong, respectively. Leaving only blkback.c for the data- and xenbus.c for the control path. Suggested-by: Daniel Stodden <daniel.stodden@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/blkback: Move it from drivers/xen to drivers/blockKonrad Rzeszutek Wilk2011-04-1810-9/+9
| | | | | | | | | | | | | | | | | | .. and modify the Makefile and Kconfig files appropriately. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
OpenPOWER on IntegriCloud