summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* virtio-blk: use ida to allocate disk indexMichael S. Tsirkin2011-10-311-6/+24
| | | | | | | | | | | | | Based on a patch by Mark Wu <dwu@redhat.com> Current index allocation in virtio-blk is based on a monotonically increasing variable "index". This means we'll run out of numbers after a while. It also could cause confusion about the disk name in the case of hot-plugging disks. Change virtio-blk to use ida to allocate index, instead. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* hpsa: add small delay when using PCI Power Management to reset for kumpMike Miller2011-10-211-0/+7
| | | | | | | | | | The P600 requires a small delay when changing states. Otherwise we may think the board did not reset and we bail. This for kdump only and is particular to the P600. Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* cciss: add small delay when using PCI Power Management to reset for kumpMike Miller2011-10-201-0/+7
| | | | | | | | | The P600 requires a small delay when changing states. Otherwise we may think the board did not reset and we bail. This for kdump only and is particular to the P600. Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'stable/for-jens-3.2' of git://oss.oracle.com/git/kwilk/xen ↵Jens Axboe2011-10-205-66/+403
|\ | | | | | | into for-3.2/drivers
| * xen/blkback: Fix two races in the handling of barrier requests.Konrad Rzeszutek Wilk2011-10-171-5/+5
| | | | | | | | | | | | | | There are two windows of opportunity to cause a race when processing a barrier request. This patch fixes this. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/blkback: Check for proper operation.Konrad Rzeszutek Wilk2011-10-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | The patch titled: "xen/blkback: Fix the inhibition to map pages when discarding sector ranges." had the right idea except that it used the wrong comparison operator. It had == instead of !=. This fixes the bug where all (except discard) operations would have been ignored. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/blkback: Fix the inhibition to map pages when discarding sector ranges.Konrad Rzeszutek Wilk2011-10-131-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | The 'operation' parameters are the ones provided to the bio layer while the req->operation are the ones passed in between the backend and frontend. We used the wrong 'operation' value to squash the call to map pages when processing the discard operation resulting in an hypercall that did nothing. Lets guard against going in the mapping function by checking for the proper operation type. CC: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/blkback: Report VBD_WSECT (wr_sect) properly.Konrad Rzeszutek Wilk2011-10-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | We did not increment the amount of sectors written to disk b/c we tested for the == WRITE which is incorrect - as the operations are more of WRITE_FLUSH, WRITE_ODIRECT. This patch fixes it by doing a & WRITE check. CC: stable@kernel.org Reported-by: Andy Burns <xen.lists@burns.me.uk> Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.Konrad Rzeszutek Wilk2011-10-133-2/+58
| | | | | | | | | | | | | | | | | | | | | | | | We emulate the barrier requests by draining the outstanding bio's and then sending the WRITE_FLUSH command. To drain the I/Os we use the refcnt that is used during disconnect to wait for all the I/Os before disconnecting from the frontend. We latch on its value and if it reaches either the threshold for disconnect or when there are no more outstanding I/Os, then we have drained all I/Os. Suggested-by: Christopher Hellwig <hch@infradead.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-blkfront: plug device number leak in xlblk_init() error pathLaszlo Ersek2011-10-131-1/+9
| | | | | | | | | | | | | | | | ... though after a failed xenbus_register_frontend() all may be lost. Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-blkfront: If no barrier or flush is supported, use invalid operation.Konrad Rzeszutek Wilk2011-10-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Guard against issuing BLKIF_OP_WRITE_BARRIER or BLKIF_OP_FLUSH_CACHE by checking whether we successfully negotiated with the backend. The negotiation with the backend also sets the q->flush_flags which fortunately for us is also used when submitting an bio to us. If we don't support barriers or flushes it would be set to zero so we should never end up having to deal with REQ_FLUSH | REQ_FUA. However, other third party implementations of __make_request that might be stacked on top of us might not be so smart, so lets fix this up. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-blkback: use kzalloc() in favor of kmalloc()+memset()Jan Beulich2011-10-131-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the problem of three of those four memset()-s having improper size arguments passed: Sizeof a pointer-typed expression returns the size of the pointer, not that of the pointed to data. It also reverts using kmalloc() instead of kzalloc() for the allocation of the pending grant handles array, as that array gets fully initialized in a subsequent loop. Reported-by: Julia Lawall <julia@diku.dk> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-blkback: fixed indentation and commentsJoe Jin2011-10-132-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | This patch fixes belows: 1. Fix code style issue. 2. Fix incorrect functions name in comments. Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-blkfront: fix a deadlock while handling discard responseLi Dongyang2011-10-131-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we get -EOPNOTSUPP response for a discard request, we will clear the discard flag on the request queue so we won't attempt to send discard requests to backend again, and this should be protected under rq->queue_lock. However, when we setup the request queue, we pass blkif_io_lock to blk_init_queue so rq->queue_lock is blkif_io_lock indeed, and this lock is already taken when we are in blkif_interrpt, so remove the spin_lock/spin_unlock when we clear the discard flag or we will end up with deadlock here Signed-off-by: Li Dongyang <lidongyang@novell.com> [v1: Updated description a bit and removed comment from source] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-blkfront: Handle discard requests.Li Dongyang2011-10-131-23/+88
| | | | | | | | | | | | | | | | | | | | | | If the backend advertises 'feature-discard', then interrogate the backend for alignment and granularity. Setup the request queue with the appropiate values and send the discard operation as required. Signed-off-by: Li Dongyang <lidongyang@novell.com> [v1: Amended commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-blkback: Implement discard requests ('feature-discard')Li Dongyang2011-10-133-31/+206
| | | | | | | | | | | | | | | | | | | | | | | | | | ..aka ATA TRIM/SCSI UNMAP command to be passed through the frontend and used as appropiately by the backend. We also advertise certain granulity parameters to the frontend so it can plug them in. If the backend is a realy device - we just end up using 'blkdev_issue_discard' while for loopback devices - we just punch a hole in the image file. Signed-off-by: Li Dongyang <lidongyang@novell.com> [v1: Fixed up pr_debug and commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-blkfront: add BLKIF_OP_DISCARD and discard request structLi Dongyang2011-10-131-0/+36
| | | | | | | | | | | | | | | | | | Now we use BLKIF_OP_DISCARD and add blkif_request_discard to blkif_request union, the patch is taken from Owen Smith and Konrad, Thanks Signed-off-by: Owen Smith <owen.smith@citrix.com> Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()Ayan George2011-09-211-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | If the loop device is associated (lo->lo_state == Lo_bound), it will have a valid bdev pointed to by lo->lo_device. There is no reason to ever pass an additional block_device pointer. Signed-off-by: Ayan George <ayan.george@canonical.com> Cc: Phillip Susi <psusi@cfl.rr.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | drivers/block/loop.c: emit uevent on auto releasePhillip Susi2011-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The loopback driver failed to emit the change uevent when auto releasing the device. Fixed lo_release() to pass the bdev to loop_clr_fd() so it can emit the event. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ayan George <ayan@ayan.net> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | drivers/block/cpqarray.c: use pci_dev->revisionSergei Shtylyov2011-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This driver uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so it wasn't converted by commit 44c10138fd4bbc4b6 ("PCI: Change all drivers to use pci_device->revision"). Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Acked-by: Mike Miller <mike.miller@hp.com> Cc: Chirag Kantharia <chirag.kantharia@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | loop: always allow userspace partitions and optionally support automatic ↵Kay Sievers2011-08-232-4/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | scanning Automatic partition scanning can be requested individually per loop device during its setup by setting LO_FLAGS_PARTSCAN. By default, no partition tables are scanned. Userspace can now always add and remove partitions from all loop devices, regardless if the in-kernel partition scanner is enabled or not. The needed partition minor numbers are allocated from the extended minors space, the main loop device numbers will continue to match the loop minors, regardless of the number of partitions used. # grep . /sys/class/block/loop1/loop/* /sys/block/loop1/loop/autoclear:0 /sys/block/loop1/loop/backing_file:/home/kay/data/stuff/part.img /sys/block/loop1/loop/offset:0 /sys/block/loop1/loop/partscan:1 /sys/block/loop1/loop/sizelimit:0 # ls -l /dev/loop* brw-rw---- 1 root disk 7, 0 Aug 14 20:22 /dev/loop0 brw-rw---- 1 root disk 7, 1 Aug 14 20:23 /dev/loop1 brw-rw---- 1 root disk 259, 0 Aug 14 20:23 /dev/loop1p1 brw-rw---- 1 root disk 259, 1 Aug 14 20:23 /dev/loop1p2 brw-rw---- 1 root disk 7, 99 Aug 14 20:23 /dev/loop99 brw-rw---- 1 root disk 259, 2 Aug 14 20:23 /dev/loop99p1 brw-rw---- 1 root disk 259, 3 Aug 14 20:23 /dev/loop99p2 crw------T 1 root root 10, 237 Aug 14 20:22 /dev/loop-control Cc: Karel Zak <kzak@redhat.com> Cc: Davidlohr Bueso <dave@gnu.org> Acked-By: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | Merge branch 'for-3.2/core' into for-3.2/driversJens Axboe2011-08-234-6/+8
|\ \
| * | block: add GENHD_FL_NO_PART_SCANTejun Heo2011-08-234-6/+8
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are cases where suppressing partition scan is useful - e.g. for lo devices and pseudo SATA devices which advertise to be a disk but get upset on partition scan (some port multiplier control devices show such behavior). This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan regardless of the number of possible partitions. disk_partitionable() is renamed to disk_part_scan_enabled() as suppressing partition scan doesn't imply the device can't be partitioned using BLKPG_ADD/DEL_PARTITION calls from userland. show_partition() now directly tests disk_max_parts() to maintain backward-compatibility. -v2: Updated to make it clear that only partition scan is suppressed not partitioning itself as suggested by Kay Sievers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | loop: add discard support for loop devicesLukas Czerner2011-08-191-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds discard support for loop devices. Discard is usually supported by SSD and thinly provisioned devices as a method for reclaiming unused space. This is no different than trying to reclaim back space which is not used by the file system on the image, but it still occupies space on the host file system. We can do the reclamation on file system which does support hole punching. So when discard request gets to the loop driver we can translate that to punch a hole to the underlying file, hence reclaim the free space. This is very useful for trimming down the size of the image to only what is really used by the file system on that image. Fstrim may be used for that purpose. It has been tested on ext4, xfs and btrfs with the image file systems ext4, ext3, xfs and btrfs. ext4, or ext6 image on ext4 file system has some problems but it seems that ext4 punch hole implementation is somewhat flawed and it is unrelated to this commit. Also this is a very good method of validating file systems punch hole implementation. Note that when encryption is used, discard support is disabled, because using it might leak some information useful for possible attacker. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | nbd-replace-some-printk-with-dev_warn-and-dev_info-checkpatch-fixesAndrew Morton2011-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ERROR: code indent should use tabs where possible #30: FILE: drivers/block/nbd.c:578: +^I dev_info(disk_to_dev(lo->disk), "NBD_DISCONNECT\n");$ total: 1 errors, 0 warnings, 35 lines checked NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or scripts/cleanfile ./patches/nbd-replace-some-printk-with-dev_warn-and-dev_info.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: WANG Cong <amwang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | nbd: replace some printk with dev_warn() and dev_info()WANG Cong2011-08-191-6/+5
| | | | | | | | | | | | | | Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | nbd: lower the loglevel of an error messageWANG Cong2011-08-191-1/+1
| | | | | | | | | | | | | | | | | | This is only an error, no need to use KERN_CRIT log level. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | nbd: replace printk KERN_ERR with dev_err()WANG Cong2011-08-191-25/+25
| | | | | | | | | | | | | | Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | nbd: replace sysfs_create_file() with device_create_file()WANG Cong2011-08-191-3/+3
| | | | | | | | | | | | | | Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | nbd: use task_pid_nr() to get current pidWANG Cong2011-08-191-1/+1
| | | | | | | | | | | | | | Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | cciss: add transport mode attribute to sysJoe Handzik2011-08-082-0/+20
| | | | | | | | | | | | Signed-off-by: Joseph Handzik <joseph.t.handzik@beardog.cce.hp.com> Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | cciss: Adds simple mode functionalityJoseph Handzik2011-08-083-11/+56
|/ | | | | | Signed-off-by: Joseph Handzik <joseph.t.handzik@beardog.cce.hp.com> Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* Linux 3.1-rc1v3.1-rc1Linus Torvalds2011-08-071-2/+2
|
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds2011-08-071-2/+4
|\ | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: Fix build with DEBUG_PAGEALLOC enabled.
| * sparc: Fix build with DEBUG_PAGEALLOC enabled.David S. Miller2011-08-061-2/+4
| | | | | | | | | | | | | | arch/sparc/mm/init_64.c:1622:22: error: unused variable '__swapper_4m_tsb_phys_patch_end' [-Werror=unused-variable] arch/sparc/mm/init_64.c:1621:22: error: unused variable '__swapper_4m_tsb_phys_patch' [-Werror=unused-variable] Signed-off-by: David S. Miller <davem@davemloft.net>
* | sh: Fix boot crash related to SCIRafael J. Wysocki2011-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Commit d006199e72a9 ("serial: sh-sci: Regtype probing doesn't need to be fatal.") made sci_init_single() return when sci_probe_regmap() succeeds, although it should return when sci_probe_regmap() fails. This causes systems using the serial sh-sci driver to crash during boot. Fix the problem by using the right return condition. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | arm: remove stale export of 'sha_transform'Linus Torvalds2011-08-071-3/+0
| | | | | | | | | | | | | | The generic library code already exports the generic function, this was left-over from the ARM-specific version that just got removed. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | arm: remove "optimized" SHA1 routinesLinus Torvalds2011-08-072-212/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 1eb19a12bd22 ("lib/sha1: use the git implementation of SHA-1"), the ARM SHA1 routines no longer work. The reason? They depended on the larger 320-byte workspace, and now the sha1 workspace is just 16 words (64 bytes). So the assembly version would overwrite the stack randomly. The optimized asm version is also probably slower than the new improved C version, so there's no reason to keep it around. At least that was the case in git, where what appears to be the same assembly language version was removed two years ago because the optimized C BLK_SHA1 code was faster. Reported-and-tested-by: Joachim Eastwood <manabian@gmail.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | fix rcu annotations noise in cred.hAl Viro2011-08-071-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | task->cred is declared as __rcu, and access to other tasks' ->cred is, indeed, protected. Access to current->cred does not need rcu_dereference() at all, since only the task itself can change its ->cred. sparse, of course, has no way of knowing that... Add force-cast in current_cred(), make current_fsuid() et.al. use it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | vfs: rename 'do_follow_link' to 'should_follow_link'Linus Torvalds2011-08-071-2/+2
| | | | | | | | | | | | | | | | Al points out that the do_follow_link() helper function really is misnamed - it's about whether we should try to follow a symlink or not, not about actually doing the following. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Fix POSIX ACL permission checkAri Savolainen2011-08-071-1/+1
| | | | | | | | | | | | | | | | | | After commit 3567866bf261: "RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached" posix_acl_permission is being called with an unsupported flag and the permission check fails. This patch fixes the issue. Signed-off-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds2011-08-069-504/+617
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.open-osd.org/linux-open-osd: ore: Make ore its own module exofs: Rename raid engine from exofs/ios.c => ore exofs: ios: Move to a per inode components & device-table exofs: Move exofs specific osd operations out of ios.c exofs: Add offset/length to exofs_get_io_state exofs: Fix truncate for the raid-groups case exofs: Small cleanup of exofs_fill_super exofs: BUG: Avoid sbi realloc exofs: Remove pnfs-osd private definitions nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
| * | ore: Make ore its own moduleBoaz Harrosh2011-08-063-1/+23
| | | | | | | | | | | | | | | | | | | | | Export everything from ore need exporting. Change Kbuild and Kconfig to build ore.ko as an independent module. Import ore from exofs Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: Rename raid engine from exofs/ios.c => oreBoaz Harrosh2011-08-066-255/+295
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ORE stands for "Objects Raid Engine" This patch is a mechanical rename of everything that was in ios.c and its API declaration to an ore.c and an osd_ore.h header. The ore engine will later be used by the pnfs objects layout driver. * File ios.c => ore.c * Declaration of types and API are moved from exofs.h to a new osd_ore.h * All used types are prefixed by ore_ from their exofs_ name. * Shift includes from exofs.h to osd_ore.h so osd_ore.h is independent, include it from exofs.h. Other than a pure rename there are no other changes. Next patch will move the ore into it's own module and will export the API to be used by exofs and later the layout driver Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: ios: Move to a per inode components & device-tableBoaz Harrosh2011-08-064-183/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exofs raid engine was saving on memory space by having a single layout-info, single pid, and a single device-table, global to the filesystem. Then passing a credential and object_id info at the io_state level, private for each inode. It would also devise this contraption of rotating the device table view for each inode->ino to spread out the device usage. This is not compatible with the pnfs-objects standard, demanding that each inode can have it's own layout-info, device-table, and each object component it's own pid, oid and creds. So: Bring exofs raid engine to be usable for generic pnfs-objects use by: * Define an exofs_comp structure that holds obj_id and credential info. * Break up exofs_layout struct to an exofs_components structure that holds a possible array of exofs_comp and the array of devices + the size of the arrays. * Add a "comps" parameter to get_io_state() that specifies the ids creds and device array to use for each IO. This enables to keep the layout global, but the device-table view, creds and IDs at the inode level. It only adds two 64bit to each inode, since some of these members already existed in another form. * ios raid engine now access layout-info and comps-info through the passed pointers. Everything is pre-prepared by caller for generic access of these structures and arrays. At the exofs Level: * Super block holds an exofs_components struct that holds the device array, previously in layout. The devices there are in device-table order. The device-array is twice bigger and repeats the device-table twice so now each inode's device array can point to a random device and have a round-robin view of the table, making it compatible to previous exofs versions. * Each inode has an exofs_components struct that is initialized at load time, with it's own view of the device table IDs and creds. When doing IO this gets passed to the io_state together with the layout. While preforming this change. Bugs where found where credentials with the wrong IDs where used to access the different SB objects (super.c). As well as some dead code. It was never noticed because the target we use does not check the credentials. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: Move exofs specific osd operations out of ios.cBoaz Harrosh2011-08-064-73/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ios.c will be moving to an external library, for use by the objects-layout-driver. Remove from it some exofs specific functions. Also g_attr_logical_length is used both by inode.c and ios.c move definition to the later, to keep it independent Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: Add offset/length to exofs_get_io_stateBoaz Harrosh2011-08-063-16/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In future raid code we will need to know the IO offset/length and if it's a read or write to determine some of the array sizes we'll need. So add a new exofs_get_rw_state() API for use when writeing/reading. All other simple cases are left using the old way. The major change to this is that now we need to call exofs_get_io_state later at inode.c::read_exec and inode.c::write_exec when we actually know these things. So this patch is kept separate so I can test things apart from other changes. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: Fix truncate for the raid-groups caseBoaz Harrosh2011-08-041-20/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the general raid-group case the truncate was wrong in that it did not also fix the object length of the neighboring groups. There are two bad cases in the old code: 1. Space that should be freed was not. 2. If a file That was big is truncated small, then made bigger again, the holes would not contain zeros but could expose old data. (If the growing of the file expands to more than a full groups cycle + group size (> S + T)) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: Small cleanup of exofs_fill_superBoaz Harrosh2011-08-041-4/+2
| | | | | | | | | | | | | | | | | | | | | Small cleanup that unifies duplicated code used in both the error and success cases Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: BUG: Avoid sbi reallocBoaz Harrosh2011-08-042-24/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the beginning we realloced the sbi structure when a bigger then one device table was specified. (I know that was really stupid). Then much later when "register bdi" was added (By Jens) it was registering the pointer to sbi->bdi before the realloc. We never saw this problem because up till now the realloc did not do anything since the device table was small enough to fit in the original allocation. But once we starting testing with large device tables (Bigger then 28) we noticed the crash of writeback operating on a deallocated pointer. * Avoid the all mess by allocating the device-table as a second array and get rid of the variable-sized structure and the rest of this mess. * Take the chance to clean near by structures and comments. * Add a needed dprint on startup to indicate the loaded layout. * Also move the bdi registration to the very end because it will only fail in a low memory, which will probably fail before hand. There are many more likely causes to not load before that. This way the error handling is made simpler. (Just doing this would be enough to fix the BUG) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
OpenPOWER on IntegriCloud