summaryrefslogtreecommitdiffstats
path: root/fs/gfs2/lops.c
Commit message (Collapse)AuthorAgeFilesLines
* GFS2: Misc fixesSteven Whitehouse2011-10-211-2/+0
| | | | | | | Some items picked up through automated code analysis. A few bits of unreachable code and two unchecked return values. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count schemeBob Peterson2011-10-211-38/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is an update of Bob's original rbtree patch which, in addition, also resolves the rather strange ref counting that was being done relating to the bitmap blocks. Originally we had a dual system for journaling resource groups. The metadata blocks were journaled and also the rgrp itself was added to a list. The reason for adding the rgrp to the list in the journal was so that the "repolish clones" code could be run to update the free space, and potentially send any discard requests when the log was flushed. This was done by comparing the "cloned" bitmap with what had been written back on disk during the transaction commit. Due to this, there was a requirement to hang on to the rgrps' bitmap buffers until the journal had been flushed. For that reason, there was a rather complicated set up in the ->go_lock ->go_unlock functions for rgrps involving both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference count on the buffers. However, the journal maintains a reference count on the buffers anyway, since they are being journaled as metadata buffers. So by moving the code which deals with the post-journal accounting for bitmap blocks to the metadata journaling code, we can entirely dispense with the rather strange buffer ref counting scheme and also the requirement to journal the rgrps. The net result of all this is that the ->sd_rindex_spin is left to do exactly one job, and that is to look after the rbtree or rgrps. This patch is designed to be a stepping stone towards using RCU for the rbtree of resource groups, however the reduction in the number of uses of the ->sd_rindex_spin is likely to have benefits for multi-threaded workloads, anyway. The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also be removed in future in favour of calling the functions directly where required in the code. That will allow locking of resource groups without needing to actually read them in - something that could be useful in speeding up statfs. In the mean time though it is valid to dereference ->bi_bh only when the rgrp is locked. This is basically the same rule as before, modulo the references not being valid until the following journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Benjamin Marzinski <bmarzins@redhat.com>
* GFS2: Optimise glock lru and end of life inodesSteven Whitehouse2011-04-201-5/+22
| | | | | | | | | | | | | | | | | | | | | The GLF_LRU flag introduced in the previous patch can be used to check if a glock is on the lru list when a new holder is queued and if so remove it, without having first to get the lru_lock. The main purpose of this patch however is to optimise the glocks left over when an inode at end of life is being evicted. Previously such glocks were left with the GLF_LFLUSH flag set, so that when reclaimed, each one required a log flush. This patch resets the GLF_LFLUSH flag when there is nothing left to flush thus preventing later log flushes as glocks are reused or demoted. In order to do this, we need to keep track of the number of revokes which are outstanding, and also to clear the GLF_LFLUSH bit after a log commit when only revokes have been processed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Alter point of entry to glock lru list for glocks with an address_spaceSteven Whitehouse2011-04-201-7/+5
| | | | | | | | | | | | | | | | | Rather than allowing the glocks to be scheduled for possible reclaim as soon as they have exited the journal, this patch delays their entry to the list until the glocks in question are no longer in use. This means that we will rely on the vm for writeback of all dirty data and metadata from now on. When glocks are added to the lru list they should be freeable much faster since all the I/O required to free them should have already been completed. This should lead to much better I/O patterns under low memory conditions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-03-241-6/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}
| * block: kill off REQ_UNPLUGJens Axboe2011-03-101-6/+6
| | | | | | | | | | | | | | | | | | | | With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | GFS2: Update to AIL list lockingSteven Whitehouse2011-03-141-0/+2
| | | | | | | | | | | | | | | | | | The previous patch missed a couple of places where the AIL list needed locking, so this fixes up those places, plus a comment is corrected too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>
* | GFS2: introduce AIL lockDave Chinner2011-03-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | GFS2: Use RCU for glock hash tableSteven Whitehouse2011-01-211-1/+2
|/ | | | | | | | | | | | | | | | | | | This has a number of advantages: - Reduces contention on the hash table lock - Makes the code smaller and simpler - Should speed up glock dumps when under load - Removes ref count changing in examine_bucket - No longer need hash chain lock in glock_put() in common case There are some further changes which this enables and which we may do in the future. One is to look at using SLAB_RCU, and another is to look at using a per-cpu counter for the per-sb glock counter, since that is touched twice in the lifetime of each glock (but only used at umount time). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* GFS2: Various gfs2_logd improvementsBenjamin Marzinski2010-05-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains various tweaks to how log flushes and active item writeback work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits for gfs2_logd to do the log flushing. Multiple functions were rewritten to remove the need to call gfs2_log_lock(). Instead of using one test to see if gfs2_logd had work to do, there are now seperate tests to check if there are two many buffers in the incore log or if there are two many items on the active items list. This patch is a port of a patch Steve Whitehouse wrote about a year ago, with some minor changes. Since gfs2_ail1_start always submits all the active items, it no longer needs to keep track of the first ai submitted, so this has been removed. In gfs2_log_reserve(), the order of the calls to prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has been switched. If it called wake_up first there was a small window for a race, where logd could run and return before gfs2_log_reserve was ready to get woken up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve() would be left waiting for gfs2_logd to eventualy run because it timed out. Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times out, and flushes the log, can now be set on mount with ar_commit. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: ordered writes are backwardsDave Chinner2010-03-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | When we queue data buffers for ordered write, the buffers are added to the head of the ordered write list. When the log needs to push these buffers to disk, it also walks the list from the head. The result is that the the ordered buffers are submitted to disk in reverse order. For large writes, this means that whenever the log flushes large streams of reverse sequential order buffers are pushed down into the block layers. The elevators don't handle this particularly well, so IO rates tend to be significantly lower than if the IO was issued in ascending block order. Queue new ordered buffers to the tail of the ordered buffer list to ensure that IO is dispatched in the order it was submitted. This should significantly improve large sequential write speeds. On a disk capable of 85MB/s, speeds increase from 50MB/s to 65MB/s for noop and from 38MB/s to 50MB/s for cfq. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Tag all metadata with jidSteven Whitehouse2009-12-031-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two spare field in the header common to all GFS2 metadata. One is just the right size to fit a journal id in it, and this patch updates the journal code so that each time a metadata block is modified, we tag it with the journal id of the node which is performing the modification. The reason for this is that it should make it much easier to debug issues which arise if we can tell which node was the last to modify a particular metadata block. Since the field is updated before the block is written into the journal, each journal should only contain metadata which is tagged with its own journal id. The one exception to this is the journal header block, which might have a different node's id in it, if that journal was recovered by another node in the cluster. Thus each journal will contain a record of which nodes recovered it, via the journal header. The other field in the metadata header could potentially be used to hold information about what kind of operation was performed, but for the time being we just zero it on each transaction so that if we use it for that in future, we'll know that the information (where it exists) is reliable. I did consider using the other field to hold the journal sequence number, however since in GFS2's journaling we write the modified data into the journal and not the original data, this gives no information as to what action caused the modification, so I think we can probably come up with a better use for those 64 bits in the future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add tracepointsSteven Whitehouse2009-06-121-0/+3
| | | | | | | | | | | This patch adds the ability to trace various aspects of the GFS2 filesystem. The trace points are divided into three groups, glocks, logging and bmap. These points have been chosen because they allow inspection of the major internal functions of GFS2 and they are also generic enough that they are unlikely to need any major changes as the filesystem evolves. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Update the rw flagsSteven Whitehouse2009-05-111-6/+8
| | | | | | | | | | | | | | | After Jens recent updates: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a1f242524c3c1f5d40f1c9c343427e34d1aadd6e et al. this is a patch to bring gfs2 uptodate with the core code. Also I've managed to squash another call to ll_rw_block() along the way. There is still one part of the GFS2 I/O paths which are not correctly annotated and that is due to the sharing of the writeback code between the data and metadata address spaces. I would like to change that too, but this patch is still worth doing on its own, I think. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Merge lock_dlm module into GFS2Steven Whitehouse2009-03-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Plug an unlikely leakBob Peterson2008-03-311-1/+3
| | | | | Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Only do lo_incore_commit onceBob Peterson2008-03-311-17/+0
| | | | | | | | | | | | | This patch is performance related. When we're doing a log flush, I noticed we were calling buf_lo_incore_commit twice: once for data bufs and once for metadata bufs. Since this is the same function and does the same thing in both cases, there should be no reason to call it twice. Since we only need to call it once, we can also make it faster by removing it from the generic "lops" code and making it a stand-along static function. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Don't add glocks to the journalSteven Whitehouse2008-01-251-53/+5
| | | | | | | | | | | The only reason for adding glocks to the journal was to keep track of which locks required a log flush prior to release. We add a flag to the glock to allow this check to be made in a simpler way. This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64) and means that we can avoid extra work during the journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Split gfs2_writepage into three casesSteven Whitehouse2008-01-251-7/+10
| | | | | | | | | | | | This patch splits gfs2_writepage into separate functions for each of the three cases: writeback, ordered and journalled. As a result it becomes a lot easier to see what each one is doing. The common code is moved into gfs2_writepage_common. This fixes a performance bug where we were doing more work than strictly required in the ordered write case. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Clean up journaled data writingSteven Whitehouse2007-10-101-116/+126
| | | | | | | | | | | This patch cleans up the code for writing journaled data into the log. It also removes the need to allocate a small "tag" structure for each block written into the log. Instead we just keep count of the outstanding I/O so that we can be sure that its all been written at the correct time. Another result of this patch is that a number of ll_rw_block() calls have become submit_bh() calls, closing some races at the same time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Clean up gfs2_trans_add_revoke()Steven Whitehouse2007-10-101-3/+0
| | | | | | | | | | | | | | | | | The following alters gfs2_trans_add_revoke() to take a struct gfs2_bufdata as an argument. This eliminates the memory allocation which was previously required by making use of the already existing struct gfs2_bufdata. It makes some sanity checks to ensure that the gfs2_bufdata has been removed from all the lists before its recycled as a revoke structure. This saves one memory allocation and one free per revoke structure. Also as a result, and to simplify the locking, since there is no longer any blocking code in gfs2_trans_add_revoke() we must hold the log lock whenever this function is called. This reduces the amount of times we take and unlock the log lock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Use slab operations for all gfs2_bufdata allocationsSteven Whitehouse2007-10-101-1/+1
| | | | | | | | | | | The old revoke structure was allocated using kalloc/kfree but there is a slab cache for gfs2_bufdata, so we should use that now that the structures have been converted. This is part two of the patch series to merge the revoke and gfs2_bufdata structures. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Replace revoke structure with bufdata structureSteven Whitehouse2007-10-101-5/+5
| | | | | | | | | | | | | | Both the revoke structure and the bufdata structure are quite similar. They are basically small tags which are put on lists. In addition to which the revoke structure is always allocated when there is a bufdata structure which is (or can be) freed. As such it should be possible to reduce the number of frees and allocations by using the same structure for both purposes. This patch is the first step along that path. It replaces existing uses of the revoke structure with the bufdata structure. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Clean up ordered write codeSteven Whitehouse2007-10-101-120/+40
| | | | | | | | | | | | | | The following patch removes the ordered write processing from databuf_lo_before_commit() and moves it to log.c. This has the effect of greatly simplyfying databuf_lo_before_commit() and well as potentially making the ordered write code more efficient. As a side effect of this, its now possible to remove ordered buffers from the ordered buffer list at any time, so we now make use of this in invalidatepage and releasepage to ensure timely release of these buffers. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Move pin/unpin into lops.c, clean up lockingSteven Whitehouse2007-10-101-28/+89
| | | | | | | | | | | | | | | | | gfs2_pin and gfs2_unpin are only used in lops.c, despite being defined in meta_io.c, so this patch moves them into lops.c and makes them static. At the same time, its possible to clean up the locking in the buf and databuf _lo_add() functions so that we only need to grab the spinlock once. Also we have to move lock_buffer() around the _lo_add() functions since we can't do that in gfs2_pin() any more since we hold the spinlock for the duration of that function. As a result, the code shrinks by 12 lines and we do far fewer operations when adding buffers to the log. It also makes the code somewhat easier to read & understand. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Patch to protect sd_log_num_jdataBob Peterson2007-10-101-1/+2
| | | | | | | | | | | | | | | | | This is a patch to GFS2 to protect sd_log_num_jdata with the gfs2_log_lock. Without this patch, there is a timing window where you can get hit the following assert from function gfs2_log_flush(): gfs2_assert_withdraw(sdp, sdp->sd_log_num_buf + sdp->sd_log_num_jdata == sdp->sd_log_commited_buf + sdp->sd_log_commited_databuf); I've tested it on my roth cluster and it fixes the problem. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Move some code inside the log lockBob Peterson2007-10-101-3/+14
| | | | | | | | | | This is the first of five patches for bug #248176: There were still some critical variables being manipulated outside the log_lock spinlock. That usually resulted in a hang. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] soft lockup detected in databuf_lo_before_commitBob Peterson2007-08-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | This is part 2 of the patch for bug #245832, part 1 of which is already in the git tree. The problem was that sdp->sd_log_num_databuf was not always being protected by the gfs2_log_lock spinlock, but the sd_log_le_databuf (which it is supposed to reflect) was protected. That meant there was a timing window during which gfs2_log_flush called databuf_lo_before_commit and the count didn't match what was really on the linked list in that window. So when it ran out of items on the linked list, it decremented total_dbuf from 0 to -1 and thus never left the "while(total_dbuf)" loop. The solution is to protect the variable sdp->sd_log_num_databuf so that the value will always match the contents of the linked list, and therefore the number will never go negative, and therefore, the loop will be exited properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Addendum to the journaled file/unmount patchRobert Peterson2007-07-091-2/+4
| | | | | | | | This patch is an addendum to the previous journaled file/unmount patch. It fixes a problem discovered during testing. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] assertion failure after writing to journaled file, umountRobert Peterson2007-07-091-30/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch passes all my nasty tests that were causing the code to fail under one circumstance or another. Here is a complete summary of all changes from today's git tree, in order of appearance: 1. There are now separate variables for metadata buffer accounting. 2. Variable sd_log_num_hdrs is no longer needed, since the header accounting is taken care of by the reserve/refund sequence. 3. Fixed a tiny grammatical problem in a comment. 4. Added a new function "calc_reserved" to calculate the reserved log space. This isn't entirely necessary, but it has two benefits: First, it simplifies the gfs2_log_refund function greatly. Second, it allows for easier debugging because I could sprinkle the code with calls to this function to make sure the accounting is proper (by adding asserts and printks) at strategic point of the code. 5. In log_pull_tail there apparently was a kludge to fix up the accounting based on a "pull" parameter. The buffer accounting is now done properly, so the kludge was removed. 6. File sync operations were making a call to gfs2_log_flush that writes another journal header. Since that header was unplanned for (reserved) by the reserve/refund sequence, the free space had to be decremented so that when log_pull_tail gets called, the free space is be adjusted properly. (Did I hear you call that a kludge? well, maybe, but a lot more justifiable than the one I removed). 7. In the gfs2_log_shutdown code, it optionally syncs the log by specifying the PULL parameter to log_write_header. I'm not sure this is necessary anymore. It just seems to me there could be cases where shutdown is called while there are outstanding log buffers. 8. In the (data)buf_lo_before_commit functions, I changed some offset values from being calculated on the fly to being constants. That simplified some code and we might as well let the compiler do the calculation once rather than redoing those cycles at run time. 9. This version has my rewritten databuf_lo_add function. This version is much more like its predecessor, buf_lo_add, which makes it easier to understand. Again, this might not be necessary, but it seems as if this one works as well as the previous one, maybe even better, so I decided to leave it in. 10. In databuf_lo_before_commit, a previous data corruption problem was caused by going off the end of the buffer. The proper solution is to have the proper limit in place, rather than stopping earlier. (Thus my previous attempt to fix it is wrong). If you don't wrap the buffer, you're stopping too early and that causes more log buffer accounting problems. 11. In lops.h there are two new (previously mentioned) constants for figuring out the data offset for the journal buffers. 12. There are also two new functions, buf_limit and databuf_limit to calculate how many entries will fit in the buffer. 13. In function gfs2_meta_wipe, it needs to distinguish between pinned metadata buffers and journaled data buffers for proper journal buffer accounting. It can't use the JDATA gfs2_inode flag because it's sometimes passed the "real" inode and sometimes the "metadata inode" and the inode flags will be random bits in a metadata gfs2_inode. It needs to base its decision on which was passed in. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Journaled file write/unstuff bugRobert Peterson2007-07-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is for bugzilla bug 283162, which uncovered a number of bugs pertaining to writing to files that have the journaled bit on. These bugs happen most often when writing to the meta_fs because the files are always journaled. So operations like gfs2_grow were particularly vulnerable, although many of the problems could be recreated with normal files after setting the journaled bit on. The problems fixed are: -GFS2 wasn't ever writing unstuffed journaled data blocks to their in-place location on disk. Now it does. -If you unmounted too quickly after doing IO to a journaled file, GFS2 was crashing because you would discard a buffer whose bufdata was still on the active items list. GFS2 now deals with this gracefully. -GFS2 was losing track of the bufdata for journaled data blocks, and it wasn't getting freed, causing an error when you tried to unmount the module. GFS2 now frees all the bufdata structures. -There was a memory corruption occurring because GFS2 wrote twice as many log entries for journaled buffers. -It was occasionally trying to write journal headers in buffers that weren't currently mapped. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] fix jdata issuesBenjamin Marzinski2007-07-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | This is a patch for the first three issues of RHBZ #238162 The first issue is that when you allocate a new page for a file, it will not start off uptodate. This makes sense, since you haven't written anything to that part of the file yet. Unfortunately, gfs2_pin() checks to make sure that the buffers are uptodate. The solution to this is to mark the buffers uptodate in gfs2_commit_write(), after they have been zeroed out and have the data written into them. I'm pretty confident with this fix, although it's not completely obvious that there is no problem with marking the buffers uptodate here. The second issue is simply that you can try to pin a data buffer that is already on the incore log, and thus, already pinned. This patch checks to see if this buffer is already on the log, and exits databuf_lo_add() if it is, just like buf_lo_add() does. The third issue is that gfs2_log_flush() doesn't do it's block accounting correctly. Both metadata and journaled data are logged, but gfs2_log_flush() only compares the number of metadata blocks with the number of blocks to commit to the ondisk journal. This patch also counts the journaled data blocks. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Fix log entry list corruptionBenjamin Marzinski2007-05-011-9/+11
| | | | | | | | | | | | | When glock_lo_add and rg_lo_add attempt to add an element to the log, they check to see if has already been added before locking the log. If another process adds that element to the log in this window between the check and locking the log, the element will be added to the list twice. This causes the log element list to become corrupted in such a way that the log element can never be successfully removed from the list. This patch pulls the list_empty() check inside the log lock, to remove this window. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Fix list corruption in lops.cSteven Whitehouse2007-02-051-3/+11
| | | | | | | | | | | | The patch below appears to fix the list corruption that we are seeing on occasion. Although the transaction structure is private to a single thread, when the queued structures are dismantled during an in-core commit, its possible for a different thread to be trying to add the same structure to another, new, transaction at the same time. To avoid this, this patch takes the log spinlock during this operation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Fix race in logging codeRussell Cattelan2006-11-301-15/+21
| | | | | | | | | | | | The log lock is dropped prior to io submittion, but this exposes a hole in which the log data structures may be going away due to a truncate. Store the buffer head in a local pointer prior to dropping the lock and relay on the buffer_head lock for consitency on the buffer head. Signed-Off-By: Russell Cattelan <cattelan@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] split and annotate gfs2_log_headAl Viro2006-11-301-2/+2
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Pass the correct value to kunmap_atomicRussell Cattelan2006-10-121-2/+2
| | | | | | | Pass kaddr rather than (incorrect) struct page to kunmap_atomic. Signed-off-by: Russell Cattelan <cattelan@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2/DLM] Fix trailing whitespaceSteven Whitehouse2006-09-251-1/+1
| | | | | | | As per Andrew Morton's request, removed trailing whitespace. Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Tidy up meta_io codeSteven Whitehouse2006-09-211-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a bug in the directory reading code, where we might have dereferenced a NULL pointer in case of OOM. Updated the directory code to use the new & improved version of gfs2_meta_ra() which now returns the first block that was being read. Previously it was releasing it requiring following code to grab the block again at each point it was called. Also turned off readahead on directory lookups since we are reading a hash table, and therefore reading the entries in order is very unlikely. Readahead is still used for all other calls to the directory reading function (e.g. when growing the hash table). Removed the DIO_START constant. Everywhere this was used, it was used to unconditionally start i/o aside from a couple of places, so I've removed it and made the couple of exceptions to this rule into separate functions. Also hunted through the other DIO flags and removed them as arguments from functions which were always called with the same combination of arguments. Updated gfs2_meta_indirect_buffer to be a bit more efficient and hopefully also be a bit easier to read. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Export lm_interface to kernel headersFabio Massimo Di Nitto2006-09-191-1/+1
| | | | | | | | | | | | | | | lm_interface.h has a few out of the tree clients such as GFS1 and userland tools. Right now, these clients keeps a copy of the file in their build tree that can go out of sync. Move lm_interface.h to include/linux, export it to userland and clean up fs/gfs2 to use the new location. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Style changes in logging codeSteven Whitehouse2006-09-051-1/+1
| | | | | | | | As per Jan Engelhardt's comments, removed some unused code and removed some brackets which were not required. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Fix end of multi-line structuresSteven Whitehouse2006-09-051-6/+6
| | | | | | | | | As per Jan Engelhardt's request, I've added a ',' to the end of each of the multi-line structures which didn't already have one (most already did). Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] More style changesSteven Whitehouse2006-09-041-5/+4
| | | | | | | | As per Jan Engelhardt's fourth email, this is the first part of the change set with a few minor style points. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Change all types to uX styleSteven Whitehouse2006-09-041-9/+9
| | | | | | | This makes all fixed size types have consistent names. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Update copyright, tidy up incore.hSteven Whitehouse2006-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this updates the copyright message to say "version" in full rather than "v.2". Also incore.h has been updated to remove forward structure declarations which are not required. The gfs2_quota_lvb structure has now had endianess annotations added to it. Also quota.c has been updated so that we now store the lvb data locally in endian independant format to avoid needing a structure in host endianess too. As a result the endianess conversions are done as required at various points and thus the conversion routines in lvb.[ch] are no longer required. I've moved the one remaining constant in lvb.h thats used into lm.h and removed the unused lvb.[ch]. I have not changed the HIF_ constants. That is left to a later patch which I hope will unify the gh_flags and gh_iflags fields of the struct gfs2_holder. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Fix releasepage bug (fixes direct i/o writes)Steven Whitehouse2006-08-311-0/+11
| | | | | | | | | | | | | | | | | | | This patch fixes three main bugs. Firstly the direct i/o get_block was returning the wrong return code in certain cases. Secondly, the GFS2's releasepage function was not dealing with cases when clean, ordered buffers were found still queued on a transaction (which can happen depending on the ordering of journal flushes). Thirdly, the journaling code itself needed altering to take account of the after effects of removing the clean ordered buffers from the transactions before a journal flush. The releasepage bug did also show up under "normal" buffered i/o as well, so its not just a fix for direct i/o. In fact its not normally used in the direct i/o path at all, except when flushing existing buffers after performing a direct i/o write, but that was the code path that led us to spot this. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Another list_del bugSteven Whitehouse2006-08-221-1/+1
| | | | | | Another case where list_del should be list_del_init. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Fix to list_del in lops.cSteven Whitehouse2006-08-221-1/+1
| | | | | | | A list_del should have been a list_del_init in lops.c which was resulting in incorrect status returns from list_empty(). Signed-off-by: Steven Whitheouse <swhiteho@redhat.com>
* [GFS2] Fix leak of gfs2_bufdataSteven Whitehouse2006-08-181-3/+3
| | | | | | | | | | | This fixes a memory leak of struct gfs2_bufdata and also some problems in the ordered write handling code. It needs a bit more testing, but I believe that the reference counting of ordered write buffers should now be correct. This is aimed at fixing Red Hat bugzilla: #201028 and #201082 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Always include glock in transactionSteven Whitehouse2006-06-191-1/+1
| | | | | | | | Include the glock in the transaction, even when not journaling data in order that ordered write data will be correctly flushed when the lock is released. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
OpenPOWER on IntegriCloud