summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* ocfs2: Synchronize feature incompat flags in ocfs2_fs.hMark Fasheh2006-12-071-0/+9
| | | | | | | | These got a little bit out of date with ocfs2-tools, make things consistent again. We reserve a flag for sparse allocation code as that's pretty close to testable at this point. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* ocfs2: local mountsSunil Mushran2006-12-0711-66/+193
| | | | | | | | | | This allows users to format an ocfs2 file system with a special flag, OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it will not use any cluster services, nor will it require a cluster configuration, thus acting like a 'local' file system. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linusTrond Myklebust2006-12-07219-2232/+3451
|\
| * Merge master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds2006-12-0756-1190/+2224
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (73 commits) [DLM] Clean up lowcomms [GFS2] Change gfs2_fsync() to use write_inode_now() [GFS2] Fix indent in recovery.c [GFS2] Don't flush everything on fdatasync [GFS2] Add a comment about reading the super block [GFS2] Mount problem with the GFS2 code [GFS2] Remove gfs2_check_acl() [DLM] fix format warnings in rcom.c and recoverd.c [GFS2] lock function parameter [DLM] don't accept replies to old recovery messages [DLM] fix size of STATUS_REPLY message [GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning [DLM] fix add_requestqueue checking nodes list [GFS2] Fix recursive locking in gfs2_getattr [GFS2] Fix recursive locking in gfs2_permission [GFS2] Reduce number of arguments to meta_io.c:getbuf() [GFS2] Move gfs2_meta_syncfs() into log.c [GFS2] Fix journal flush problem [GFS2] mark_inode_dirty after write to stuffed file [GFS2] Fix glock ordering on inode creation ...
| | * [DLM] Clean up lowcommsPatrick Caulfield2006-12-074-357/+261
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes up most of the things pointed out by akpm and Pavel Machek with comments below indicating why some things have been left: Andrew Morton wrote: > >> +static struct nodeinfo *nodeid2nodeinfo(int nodeid, gfp_t alloc) >> +{ >> + struct nodeinfo *ni; >> + int r; >> + int n; >> + >> + down_read(&nodeinfo_lock); > > Given that this function can sleep, I wonder if `alloc' is useful. > > I see lots of callers passing in a literal "0" for `alloc'. That's in fact > a secret (GFP_ATOMIC & ~__GFP_HIGH). I doubt if that's what you really > meant. Particularly as the code could at least have used __GFP_WAIT (aka > GFP_NOIO) which is much, much more reliable than "0". In fact "0" is the > least reliable mode possible. > > IOW, this is all bollixed up. When 0 is passed into nodeid2nodeinfo the function does not try to allocate a new structure at all. it's an indication that the caller only wants the nodeinfo struct for that nodeid if there actually is one in existance. I've tidied the function itself so it's more obvious, (and tidier!) >> +/* Data received from remote end */ >> +static int receive_from_sock(void) >> +{ >> + int ret = 0; >> + struct msghdr msg; >> + struct kvec iov[2]; >> + unsigned len; >> + int r; >> + struct sctp_sndrcvinfo *sinfo; >> + struct cmsghdr *cmsg; >> + struct nodeinfo *ni; >> + >> + /* These two are marginally too big for stack allocation, but this >> + * function is (currently) only called by dlm_recvd so static should be >> + * OK. >> + */ >> + static struct sockaddr_storage msgname; >> + static char incmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; > > whoa. This is globally singly-threaded code?? Yes. it is only ever run in the context of dlm_recvd. >> >> +static void initiate_association(int nodeid) >> +{ >> + struct sockaddr_storage rem_addr; >> + static char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; > > Another static buffer to worry about. Globally singly-threaded code? Yes. Only ever called by dlm_sendd. >> + >> +/* Send a message */ >> +static int send_to_sock(struct nodeinfo *ni) >> +{ >> + int ret = 0; >> + struct writequeue_entry *e; >> + int len, offset; >> + struct msghdr outmsg; >> + static char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; > > Singly-threaded? Yep. >> >> +static void dealloc_nodeinfo(void) >> +{ >> + int i; >> + >> + for (i=1; i<=max_nodeid; i++) { >> + struct nodeinfo *ni = nodeid2nodeinfo(i, 0); >> + if (ni) { >> + idr_remove(&nodeinfo_idr, i); > > Didn't that need locking? Not. it's only ever called at DLM shutdown after all the other threads have been stopped. >> >> +static int write_list_empty(void) >> +{ >> + int status; >> + >> + spin_lock_bh(&write_nodes_lock); >> + status = list_empty(&write_nodes); >> + spin_unlock_bh(&write_nodes_lock); >> + >> + return status; >> +} > > This function's return value is meaningless. As soon as the lock gets > dropped, the return value can get out of sync with reality. > > Looking at the caller, this _might_ happen to be OK, but it's a nasty and > dangerous thing. Really the locking should be moved into the caller. It's just an optimisation to allow the caller to schedule if there is no work to do. if something arrives immediately afterwards then it will get picked up when the process re-awakes (and it will be woken by that arrival). The 'accepting' atomic has gone completely. as Andrew pointed out it didn't really achieve much anyway. I suspect it was a plaster over some other startup or shutdown bug to be honest. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Pavel Machek <pavel@ucw.cz>
| | * [GFS2] Change gfs2_fsync() to use write_inode_now()Steven Whitehouse2006-12-071-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a bit better than the previous version of gfs2_fsync() although it would be better still if we were able to call a function which only wrote the inode & metadata. Its no big deal though that this will potentially write the data as well since the VFS has already done that before calling gfs2_fsync(). I've also added a comment to explain whats going on here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@osdl.org>
| | * [GFS2] Fix indent in recovery.cSteven Whitehouse2006-12-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per comments from Andrew Morton and Jan Engelhardt, this fixes the indent and removes the "static" from a variable declaration since its not needed in this case (now allocated on the stack of the function in question). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Cc: Andrew Morton <akpm@osdl.org>
| | * [GFS2] Don't flush everything on fdatasyncSteven Whitehouse2006-11-301-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The gfs2_fsync() function was doing a journal flush on each and every call. While this is correct, its also a lot of overhead. This patch means that on fdatasync flushes we rely on the VFS to flush the data for us and we don't do a journal flush unless we really need to. We have to do a journal flush for stuffed files though because they have the data and the inode metadata in the same block. Journaled files also need a journal flush too of course. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Add a comment about reading the super blockSteven Whitehouse2006-11-301-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | The comment explains why we use the bio functions to read the super block. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Srinivasa Ds <srinivasa@in.ibm.com>
| | * [GFS2] Mount problem with the GFS2 codeSrinivasa Ds2006-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While mounting the gfs2 filesystem,our test team had a problem and we got this error message. ======================================================= GFS2: fsid=: Trying to join cluster "lock_nolock", "dasde1" GFS2: fsid=dasde1.0: Joined cluster. Now mounting FS... GFS2: not a GFS2 filesystem GFS2: fsid=dasde1.0: can't read superblock: -22 ========================================================================== On debugging further we found that problem is while reading the super block(gfs2_read_super) and comparing the magic number in it. When I replace the submit_bio() call(present in gfs2_read_super) with the sb_getblk() and ll_rw_block(), mount operation succeded. On further analysis we found that before calling submit_bio(), bio->bi_sector was set to "sector" variable. This "sector" variable has the same value of bh->b_blocknr(block number). Hence there is a need to multiply this valuwith (blocksize >> 9)(9 because,sector size 2^9,samething happens in ll_rw_block also, before calling submit_bio()). So I have developed the patch which solves this problem. Please let me know your comments. ================================================================ Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Remove gfs2_check_acl()Steven Whitehouse2006-11-303-19/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | As pointed out by Adrian Bunk, the gfs2_check_acl() function is no longer used. This patch removes it and renamed gfs2_check_acl_locked() to gfs2_check_acl() since we only need one variant of that function now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Adrian Bunk <bunk@stusta.de>
| | * [DLM] fix format warnings in rcom.c and recoverd.cRyusuke Konishi2006-11-302-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the following gcc warnings generated on the architectures where uint64_t != unsigned long long (e.g. ppc64). fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t' fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t' fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t' fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t' fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t' Signed-off-by: Ryusuke Konishi <ryusuke@osrg.net> Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] lock function parameterRandy Dunlap2006-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix function parameter typing: fs/gfs2/glock.c:100: warning: function declaration isn't a prototype Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] don't accept replies to old recovery messagesDavid Teigland2006-11-303-11/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We often abort a recovery after sending a status request to a remote node. We want to ignore any potential status reply we get from the remote node. If we get one of these unwanted replies, we've often moved on to the next recovery message and incremented the message sequence counter, so the reply will be ignored due to the seq number. In some cases, we've not moved on to the next message so the seq number of the reply we want to ignore is still correct, causing the reply to be accepted. The next recovery message will then mistake this old reply as a new one. To fix this, we add the flag RCOM_WAIT to indicate when we can accept a new reply. We clear this flag if we abort recovery while waiting for a reply. Before the flag is set again (to allow new replies) we know that any old replies will be rejected due to their sequence number. We also initialize the recovery-message sequence number to a random value when a lockspace is first created. This makes it clear when messages are being rejected from an old instance of a lockspace that has since been recreated. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] fix size of STATUS_REPLY messageDavid Teigland2006-11-301-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the not_ready routine sends a "fake" status reply with blank status flags, it needs to use the correct size for a normal STATUS_REPLY by including the size of the would-be config parameters. We also fill in the non-existant config parameters with an invalid lvblen value so it's easier to notice if these invalid paratmers are ever being used. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] fs/gfs2/log.c:log_bmap() fix printk format warningRyusuke Konishi2006-11-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Fix a printk format warning in fs/gfs2/log.c: fs/gfs2/log.c:322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t' Signed-off-by: Ryusuke Konishi <ryusuke@osrg.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] fix add_requestqueue checking nodes listDavid Teigland2006-11-303-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Requests that arrive after recovery has started are saved in the requestqueue and processed after recovery is done. Some of these requests are purged during recovery if they are from nodes that have been removed. We move the purging of the requests (dlm_purge_requestqueue) to later in the recovery sequence which allows the routine saving requests (dlm_add_requestqueue) to avoid filtering out requests by nodeid since the same will be done by the purge. The current code has add_requestqueue filtering by nodeid but doesn't hold any locks when accessing the list of current nodes. This also means that we need to call the purge routine when the lockspace is being shut down since the add routine will not be rejecting requests itself any more. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Fix recursive locking in gfs2_getattrSteven Whitehouse2006-11-301-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The readdirplus NFS operation can result in gfs2_getattr being called with the glock already held. In this case we do not want to try and grab the lock again. This fixes Red Hat bugzilla #215727 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Fix recursive locking in gfs2_permissionSteven Whitehouse2006-11-301-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since gfs2_permission may be called either from the VFS (in which case we need to obtain a shared glock) or from GFS2 (in which case we already have a glock) we need to test to see whether or not a lock is required. The original test was buggy due to a potential race. This one should be safe. This fixes Red Hat bugzilla #217129 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Reduce number of arguments to meta_io.c:getbuf()Steven Whitehouse2006-11-301-14/+12
| | | | | | | | | | | | | | | | | | | | | | | | Since the superblock and the address_space are determined by the glock, we might as well just pass that as the argument since all the callers already have that available. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Move gfs2_meta_syncfs() into log.cSteven Whitehouse2006-11-304-20/+21
| | | | | | | | | | | | | | | | | | | | | By moving gfs2_meta_syncfs() into log.c, gfs2_ail1_start() can be made static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Fix journal flush problemSteven Whitehouse2006-11-307-99/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a bug which resulted in poor performance due to flushing the journal too often. The code path in question was via the inode_go_sync() function in glops.c. The solution is not to flush the journal immediately when inodes are ejected from memory, but batch up the work for glockd to deal with later on. This means that glocks may now live on beyond the end of the lifetime of their inodes (but not very much longer in the normal case). Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in calculation of the number of free journal blocks. The gfs2_logd process has been altered to be more responsive to the journal filling up. We now wake it up when the number of uncommitted journal blocks has reached the threshold level rather than trying to flush directly at the end of each transaction. This again means doing fewer, but larger, log flushes in general. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] mark_inode_dirty after write to stuffed fileSteven Whitehouse2006-11-301-1/+3
| | | | | | | | | | | | | | | | | | Writes to stuffed files were not being marked dirty correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Fix glock ordering on inode creationSteven Whitehouse2006-11-301-27/+4
| | | | | | | | | | | | | | | | | | | | | The lock order here should be parent -> child rather than numeric order. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Simplify glops functionsSteven Whitehouse2006-11-304-51/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The go_sync callback took two flags, but one of them was set on every call, so this patch removes once of the flags and makes the previously conditional operations (on this flag), unconditional. The go_inval callback took three flags, each of which was set on every call to it. This patch removes the flags and makes the operations unconditional, which makes the logic rather more obvious. Two now unused flags are also removed from incore.h. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Fix Kconfig wrt CRC32Steven Whitehouse2006-11-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | GFS2 requires the CRC32 library function. This was reported by Toralf Förster. Cc: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Make sentinel dirents compatible with gfs1Steven Whitehouse2006-11-301-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When deleting directory entries, we set the inum.no_addr to zero in a dirent when its the first dirent in a block and thus cannot be merged into the previous dirent as is the usual case. In gfs1, inum.no_formal_ino was used instead. This patch changes gfs2 to set both inum.no_addr and inum.no_formal_ino to zero. It also changes the test from just looking at inum.no_addr to look at both inum.no_addr and inum.no_formal_ino and a sentinel is now considered to be a dirent in which _either_ (or both) of them is set to zero. This resolves Red Hat bugzillas: #215809, #211465 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Remove unused function from inode.cSteven Whitehouse2006-11-304-95/+5
| | | | | | | | | | | | | | | | | | | | | | | | The gfs2_glock_nq_m_atime function is unused in so far as its only ever called with num_gh = 1, and this falls through to the gfs2_glock_nq_atime function, so we might as well call that directly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Remove unused sysfs filesSteven Whitehouse2006-11-301-8/+0
| | | | | | | | | | | | | | | | | | Four of the sysfs files are unused and can therefore be removed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Tidy up bmap & fix boundary bugSteven Whitehouse2006-11-301-63/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves the locking for bmap into the bmap function itself rather than using a wrapper function. It also fixes a bug where the boundary flag was set on the wrong bh. Also the flags on the mapped bh are reset earlier in the function to ensure that they are 100% correct on the error path. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Fix memory allocation in glock.cSteven Whitehouse2006-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | Change from GFP_KERNEL to GFP_NOFS as this was causing a slow down when trying to push inodes from cache. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] Fix DLM configPatrick Caulfield2006-11-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The attached patch fixes the DLM config so that it selects the chosen network transport. It should fix the bug where DLM can be left selected when NET gets unselected. This incorporates all the comments received about this patch. Cc: Adrian Bunk <bunk@stusta.de> Cc: Andrew Morton <akpm@osdl.org> Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] clear sbflags on lock masterDavid Teigland2006-11-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RH BZ 211622 The ALTMODE flag can be set in the lock master's copy of the lock but never cleared, so ALTMODE will also be returned in a subsequent conversion of the lock when it shouldn't be. This results in lock_dlm incorrectly switching to the alternate lock mode when returning the result to gfs which then asserts when it sees the wrong lock state. The fix is to propagate the cleared sbflags value to the master node when the lock is requested. QA's d_rwrandirectlarge test triggers this bug very quickly. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] do full recover_locks barrierDavid Teigland2006-11-301-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Red Hat BZ 211914 The previous patch "[DLM] fix aborted recovery during node removal" was incomplete as discovered with further testing. It set the bit for the RS_LOCKS barrier but did not then wait for the barrier. This is often ok, but sometimes it will cause yet another recovery hang. If it's a new node that also has the lowest nodeid that skips the barrier wait, then it misses the important step of collecting and reporting the barrier status from the other nodes (which is the job of the low nodeid in the barrier wait routine). Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] fix stopping unstarted recoveryDavid Teigland2006-11-301-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Red Hat BZ 211914 When many nodes are joining a lockspace simultaneously, the dlm gets a quick sequence of stop/start events, a pair for adding each node. dlm_controld in user space sends dlm_recoverd in the kernel each stop and start event. dlm_controld will sometimes send the stop before dlm_recoverd has had a chance to take up the previously queued start. The stop aborts the processing of the previous start by setting the RECOVERY_STOP flag. dlm_recoverd is erroneously clearing this flag and ignoring the stop/abort if it happens to take up the start after the stop meant to abort it. The fix is to check the sequence number that's incremented for each stop/start before clearing the flag. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] fix aborted recovery during node removalDavid Teigland2006-11-302-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Red Hat BZ 211914 With the new cluster infrastructure, dlm recovery for a node removal can be aborted and restarted for a node addition. When this happens, the restarted recovery isn't aware that it's doing recovery for the earlier removal as well as the addition. So, it then skips the recovery steps only required when nodes are removed. This can result in locks not being purged for failed/removed nodes. The fix is to check for removed nodes for which recovery has not been completed at the start of a new recovery sequence. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] fix requestqueue raceDavid Teigland2006-11-303-9/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Red Hat BZ 211914 There's a race between dlm_recoverd (1) enabling locking and (2) clearing out the requestqueue, and dlm_recvd (1) checking if locking is enabled and (2) adding a message to the requestqueue. An order of recoverd(1), recvd(1), recvd(2), recoverd(2) will result in a message being left on the requestqueue. The fix is to have dlm_recvd check if dlm_recoverd has enabled locking after taking the mutex for the requestqueue and if it has processing the message instead of queueing it. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] status messages ping-pong between unmounted nodesDavid Teigland2006-11-301-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Red Hat BZ 213682 If two nodes leave the lockspace (while unmounting the fs in the case of gfs) after one has sent a STATUS message to the other, STATUS/STATUS_REPLY messages will then ping-pong between the nodes when neither of them can find the lockspace in question any longer. We kill this by not sending another STATUS message when we get a STATUS_REPLY for an unknown lockspace. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] res_recover_locks_count not reset when recover_locks is abortedDavid Teigland2006-11-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Red Hat BZ 213684 If a node sends an lkb to the new master (RCOM_LOCK message) during recovery and recovery is then aborted on both nodes before it gets a reply, the res_recover_locks_count needs to be reset to 0 so that when the subsequent recovery comes along and sends the lkb to the new master again the assertion doesn't trigger that checks that counter is zero. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [DLM] Add support for tcp communicationsPatrick Caulfield2006-11-304-1/+1283
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following patch adds a TCP based communications layer to the DLM which is compile time selectable. The existing SCTP layer gives the advantage of allowing multihoming, whereas the TCP layer has been heavily tested in previous versions of the DLM and is known to be robust and therefore can be used as a baseline for performance testing. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Remove unused zero_readpage from stuffed_readpageRussell Cattelan2006-11-301-16/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stuffed files only consist of a maximum of (gfs2 block size - sizeof(struct gfs2_dinode)) bytes. Since the gfs2 block size is always less than page size, we will never see a call to stuffed_readpage for anything other than the first page in the file. Signed-off-by: Russell Cattelan <cattelan@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Fix race in logging codeRussell Cattelan2006-11-301-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The log lock is dropped prior to io submittion, but this exposes a hole in which the log data structures may be going away due to a truncate. Store the buffer head in a local pointer prior to dropping the lock and relay on the buffer_head lock for consitency on the buffer head. Signed-Off-By: Russell Cattelan <cattelan@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Remove gfs2_inode_attr_inSteven Whitehouse2006-11-306-24/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function wasn't really doing the right thing. There was no need to update the inode size at this point and the updating of the i_blocks field has now been moved to the places where di_blocks is updated. A result of this patch and some those preceeding it is that unlocking a glock is now a much more efficient process, since there is no longer any requirement to copy data from the gfs2 inode into the vfs inode at this point. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Inode number is constantSteven Whitehouse2006-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | Since the inode number is constant, we don't need to keep updating it everytime we refresh the other inode fields. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Only set inode flags when requiredSteven Whitehouse2006-11-303-11/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were setting the inode flags from GFS2's flags far too often, even when they couldn't possibly have changed. This patch reduces the amount of flag setting going on so that we do it only when the inode is read in or when the flags have changed. The create case is covered by the "when the inode is read in" case. This also fixes a bug where we didn't set S_SYNC correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Fix page lock/glock deadlockSteven Whitehouse2006-11-302-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a race between the glock and the page lock encountered during truncate in gfs2_readpage and gfs2_prepare_write. The gfs2_readpages function doesn't need the same fix since it only uses a try lock anyway, so it will fail back to gfs2_readpage in the case of a potential deadlock. This bug was spotted by Russell Cattelan. Cc: Russell Cattelan <cattelan@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Remove unused GL_DUMP flagSteven Whitehouse2006-11-302-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | There is no way to set the GL_DUMP flag, and in any case the same thing can be done with systemtap if required for debugging, so this removes it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Don't copy meta_header for rgrp in and outSteven Whitehouse2006-11-301-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The meta_header for an ondisk rgrp never changes, so there is no point copying it in and back out to disk. Also there is no reason to keep a copy for each rgrp in memory. The code already checks to ensure that the header is correct before it calls the routine to copy the data in, so that we don't even need to check whether its correct on disk in the functions in ondisk.c Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Tidy up 0 initialisations in inode.cSteven Whitehouse2006-11-301-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to use endian conversions for 0 initialisations when creating a new on-disk inode. Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| | * [GFS2] Shrink gfs2_inode (8) - i_vnSteven Whitehouse2006-11-304-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | This shrinks the size of the gfs2_inode by 8 bytes by replacing the version counter with a one bit valid/invalid flag. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
OpenPOWER on IntegriCloud