From c378051177dce4421428fd1691ffdf15ad57c161 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 6 Dec 2006 11:46:33 -0600
Subject: [GFS2] don't try to lockfs after shutdown

If an fs has already been shut down, a lockfs callback should do nothing.
An fs that's been shut down can't acquire locks or do anything with
respect to the cluster.

Also, remove FIXME comment in withdraw function.  The missing bits of the
withdraw procedure are now all done by user space.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/lm.c        | 8 +-------
 fs/gfs2/ops_super.c | 3 +++
 2 files changed, 4 insertions(+), 7 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/lm.c b/fs/gfs2/lm.c
index effe4a337c1d..e30673dd37e0 100644
--- a/fs/gfs2/lm.c
+++ b/fs/gfs2/lm.c
@@ -104,15 +104,9 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, char *fmt, ...)
 	vprintk(fmt, args);
 	va_end(args);
 
-	fs_err(sdp, "about to withdraw from the cluster\n");
+	fs_err(sdp, "about to withdraw this file system\n");
 	BUG_ON(sdp->sd_args.ar_debug);
 
-
-	fs_err(sdp, "waiting for outstanding I/O\n");
-
-	/* FIXME: suspend dm device so oustanding bio's complete
-	   and all further io requests fail */
-
 	fs_err(sdp, "telling LM to withdraw\n");
 	gfs2_withdraw_lockproto(&sdp->sd_lockstruct);
 	fs_err(sdp, "withdrawn\n");
diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
index 7685b46f934b..b2837839eed9 100644
--- a/fs/gfs2/ops_super.c
+++ b/fs/gfs2/ops_super.c
@@ -173,6 +173,9 @@ static void gfs2_write_super_lockfs(struct super_block *sb)
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	int error;
 
+	if (test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
+		return;
+
 	for (;;) {
 		error = gfs2_freeze_fs(sdp);
 		if (!error)
-- 
cgit v1.2.1


From dc200a8848cca8b0e99012996c66f4b379a390ed Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 13 Dec 2006 10:36:37 -0600
Subject: [DLM] fix resend rcom lock

There's a chance the new master of resource hasn't learned it's the new
master before another node sends it a lock during recovery.  The node
sending the lock needs to resend if this happens.

- A sends a master lookup for resource R to C
- B sends a master lookup for resource R to C
- C receives A's lookup, assigns A to be master of R and
  sends a reply back to A
- C receives B's lookup and sends a reply back to B saying
  that A is the master
- B receives lookup reply from C and sends its lock for R to A
- A receives lock from B, doesn't think it's the master of R
  and sends an error back to B
- A receives lookup reply from C and becomes master of R
- B gets error back from A and resends its lock back to A
  (this resending is what this patch does)
- A receives lock from B, it now sees it's the master of R
  and takes the lock

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lock.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 30878defaeb6..69ada5887078 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3571,6 +3571,14 @@ int dlm_recover_process_copy(struct dlm_ls *ls, struct dlm_rcom *rc)
 	lock_rsb(r);
 
 	switch (error) {
+	case -EBADR:
+		/* There's a chance the new master received our lock before
+		   dlm_recover_master_reply(), this wouldn't happen if we did
+		   a barrier between recover_masters and recover_locks. */
+		log_debug(ls, "master copy not ready %x r %lx %s", lkb->lkb_id,
+			  (unsigned long)r, r->res_name);
+		dlm_send_rcom_lock(r, lkb);
+		goto out;
 	case -EEXIST:
 		log_debug(ls, "master copy exists %x", lkb->lkb_id);
 		/* fall through */
@@ -3585,7 +3593,7 @@ int dlm_recover_process_copy(struct dlm_ls *ls, struct dlm_rcom *rc)
 	/* an ack for dlm_recover_locks() which waits for replies from
 	   all the locks it sends to new masters */
 	dlm_recovered_lock(r);
-
+ out:
 	unlock_rsb(r);
 	put_rsb(r);
 	dlm_put_lkb(lkb);
-- 
cgit v1.2.1


From 38aa8b0c59c35d10d15ebf00ceee641f9ed7acba Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 13 Dec 2006 10:37:16 -0600
Subject: [DLM] fix old rcom messages

A reply to a recovery message will often be received after the relevant
recovery sequence has aborted and the next recovery sequence has begun.
We need to ignore replies to these old messages from the previous
recovery.  There's already a way to do this for synchronous recovery
requests using the rc_id number, but not for async.

Each recovery sequence already has a locally unique sequence number
associated with it.  This patch adds a field to the rcom (recovery
message) structure where this recovery sequence number can be placed,
rc_seq.  When a node sends a reply to a recovery request, it copies the
rc_seq number it received into rc_seq_reply.  When the first node receives
the reply to its recovery message, it will check whether rc_seq_reply
matches the current recovery sequence number, ls_recover_seq, and if not
then it ignores the old reply.

An old, inadequate approach to filtering out old replies (checking if the
current stage of recovery has moved back to the start) has been removed
from two spots.

The protocol version number is changed to reflect the different rcom
structures.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/dlm_internal.h |  6 +++--
 fs/dlm/rcom.c         | 61 ++++++++++++++++++++++++++++++++-------------------
 fs/dlm/util.c         |  4 ++++
 3 files changed, 46 insertions(+), 25 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index 1ee8195e6fc0..7185a132a8b5 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -309,8 +309,8 @@ static inline int rsb_flag(struct dlm_rsb *r, enum rsb_flags flag)
 
 /* dlm_header is first element of all structs sent between nodes */
 
-#define DLM_HEADER_MAJOR	0x00020000
-#define DLM_HEADER_MINOR	0x00000001
+#define DLM_HEADER_MAJOR	0x00030000
+#define DLM_HEADER_MINOR	0x00000000
 
 #define DLM_MSG			1
 #define DLM_RCOM		2
@@ -386,6 +386,8 @@ struct dlm_rcom {
 	uint32_t		rc_type;	/* DLM_RCOM_ */
 	int			rc_result;	/* multi-purpose */
 	uint64_t		rc_id;		/* match reply with request */
+	uint64_t		rc_seq;		/* sender's ls_recover_seq */
+	uint64_t		rc_seq_reply;	/* remote ls_recover_seq */
 	char			rc_buf[0];
 };
 
diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c
index 4cc31be9cd9d..521ad9bb47b7 100644
--- a/fs/dlm/rcom.c
+++ b/fs/dlm/rcom.c
@@ -56,6 +56,10 @@ static int create_rcom(struct dlm_ls *ls, int to_nodeid, int type, int len,
 
 	rc->rc_type = type;
 
+	spin_lock(&ls->ls_recover_lock);
+	rc->rc_seq = ls->ls_recover_seq;
+	spin_unlock(&ls->ls_recover_lock);
+
 	*mh_ret = mh;
 	*rc_ret = rc;
 	return 0;
@@ -159,6 +163,7 @@ static void receive_rcom_status(struct dlm_ls *ls, struct dlm_rcom *rc_in)
 	if (error)
 		return;
 	rc->rc_id = rc_in->rc_id;
+	rc->rc_seq_reply = rc_in->rc_seq;
 	rc->rc_result = dlm_recover_status(ls);
 	make_config(ls, (struct rcom_config *) rc->rc_buf);
 
@@ -224,21 +229,7 @@ static void receive_rcom_names(struct dlm_ls *ls, struct dlm_rcom *rc_in)
 {
 	struct dlm_rcom *rc;
 	struct dlm_mhandle *mh;
-	int error, inlen, outlen;
-	int nodeid = rc_in->rc_header.h_nodeid;
-	uint32_t status = dlm_recover_status(ls);
-
-	/*
-	 * We can't run dlm_dir_rebuild_send (which uses ls_nodes) while
-	 * dlm_recoverd is running ls_nodes_reconfig (which changes ls_nodes).
-	 * It could only happen in rare cases where we get a late NAMES
-	 * message from a previous instance of recovery.
-	 */
-
-	if (!(status & DLM_RS_NODES)) {
-		log_debug(ls, "ignoring RCOM_NAMES from %u", nodeid);
-		return;
-	}
+	int error, inlen, outlen, nodeid;
 
 	nodeid = rc_in->rc_header.h_nodeid;
 	inlen = rc_in->rc_header.h_length - sizeof(struct dlm_rcom);
@@ -248,6 +239,7 @@ static void receive_rcom_names(struct dlm_ls *ls, struct dlm_rcom *rc_in)
 	if (error)
 		return;
 	rc->rc_id = rc_in->rc_id;
+	rc->rc_seq_reply = rc_in->rc_seq;
 
 	dlm_copy_master_names(ls, rc_in->rc_buf, inlen, rc->rc_buf, outlen,
 			      nodeid);
@@ -294,6 +286,7 @@ static void receive_rcom_lookup(struct dlm_ls *ls, struct dlm_rcom *rc_in)
 		ret_nodeid = error;
 	rc->rc_result = ret_nodeid;
 	rc->rc_id = rc_in->rc_id;
+	rc->rc_seq_reply = rc_in->rc_seq;
 
 	send_rcom(ls, mh, rc);
 }
@@ -375,20 +368,13 @@ static void receive_rcom_lock(struct dlm_ls *ls, struct dlm_rcom *rc_in)
 
 	memcpy(rc->rc_buf, rc_in->rc_buf, sizeof(struct rcom_lock));
 	rc->rc_id = rc_in->rc_id;
+	rc->rc_seq_reply = rc_in->rc_seq;
 
 	send_rcom(ls, mh, rc);
 }
 
 static void receive_rcom_lock_reply(struct dlm_ls *ls, struct dlm_rcom *rc_in)
 {
-	uint32_t status = dlm_recover_status(ls);
-
-	if (!(status & DLM_RS_DIR)) {
-		log_debug(ls, "ignoring RCOM_LOCK_REPLY from %u",
-			  rc_in->rc_header.h_nodeid);
-		return;
-	}
-
 	dlm_recover_process_copy(ls, rc_in);
 }
 
@@ -415,6 +401,7 @@ static int send_ls_not_ready(int nodeid, struct dlm_rcom *rc_in)
 
 	rc->rc_type = DLM_RCOM_STATUS_REPLY;
 	rc->rc_id = rc_in->rc_id;
+	rc->rc_seq_reply = rc_in->rc_seq;
 	rc->rc_result = -ESRCH;
 
 	rf = (struct rcom_config *) rc->rc_buf;
@@ -426,6 +413,31 @@ static int send_ls_not_ready(int nodeid, struct dlm_rcom *rc_in)
 	return 0;
 }
 
+static int is_old_reply(struct dlm_ls *ls, struct dlm_rcom *rc)
+{
+	uint64_t seq;
+	int rv = 0;
+
+	switch (rc->rc_type) {
+	case DLM_RCOM_STATUS_REPLY:
+	case DLM_RCOM_NAMES_REPLY:
+	case DLM_RCOM_LOOKUP_REPLY:
+	case DLM_RCOM_LOCK_REPLY:
+		spin_lock(&ls->ls_recover_lock);
+		seq = ls->ls_recover_seq;
+		spin_unlock(&ls->ls_recover_lock);
+		if (rc->rc_seq_reply != seq) {
+			log_error(ls, "ignoring old reply %x from %d "
+				      "seq_reply %llx expect %llx",
+				      rc->rc_type, rc->rc_header.h_nodeid,
+				      (unsigned long long)rc->rc_seq_reply,
+				      (unsigned long long)seq);
+			rv = 1;
+		}
+	}
+	return rv;
+}
+
 /* Called by dlm_recvd; corresponds to dlm_receive_message() but special
    recovery-only comms are sent through here. */
 
@@ -454,6 +466,9 @@ void dlm_receive_rcom(struct dlm_header *hd, int nodeid)
 		goto out;
 	}
 
+	if (is_old_reply(ls, rc))
+		goto out;
+
 	if (nodeid != rc->rc_header.h_nodeid) {
 		log_error(ls, "bad rcom nodeid %d from %d",
 			  rc->rc_header.h_nodeid, nodeid);
diff --git a/fs/dlm/util.c b/fs/dlm/util.c
index 767197db9944..963889cf6740 100644
--- a/fs/dlm/util.c
+++ b/fs/dlm/util.c
@@ -134,6 +134,8 @@ void dlm_rcom_out(struct dlm_rcom *rc)
 	rc->rc_type		= cpu_to_le32(rc->rc_type);
 	rc->rc_result		= cpu_to_le32(rc->rc_result);
 	rc->rc_id		= cpu_to_le64(rc->rc_id);
+	rc->rc_seq		= cpu_to_le64(rc->rc_seq);
+	rc->rc_seq_reply	= cpu_to_le64(rc->rc_seq_reply);
 
 	if (type == DLM_RCOM_LOCK)
 		rcom_lock_out((struct rcom_lock *) rc->rc_buf);
@@ -151,6 +153,8 @@ void dlm_rcom_in(struct dlm_rcom *rc)
 	rc->rc_type		= le32_to_cpu(rc->rc_type);
 	rc->rc_result		= le32_to_cpu(rc->rc_result);
 	rc->rc_id		= le64_to_cpu(rc->rc_id);
+	rc->rc_seq		= le64_to_cpu(rc->rc_seq);
+	rc->rc_seq_reply	= le64_to_cpu(rc->rc_seq_reply);
 
 	if (rc->rc_type == DLM_RCOM_LOCK)
 		rcom_lock_in((struct rcom_lock *) rc->rc_buf);
-- 
cgit v1.2.1


From 9e971b715dcc3cd5f4383f2815aaa7e5853d1f7b Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 13 Dec 2006 10:37:55 -0600
Subject: [DLM] add version check

Check if we receive a message from another lockspace member running a
version of the dlm with an incompatible inter-node message protocol.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/rcom.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c
index 521ad9bb47b7..54fba9bd6740 100644
--- a/fs/dlm/rcom.c
+++ b/fs/dlm/rcom.c
@@ -82,8 +82,17 @@ static void make_config(struct dlm_ls *ls, struct rcom_config *rf)
 	rf->rf_lsflags = ls->ls_exflags;
 }
 
-static int check_config(struct dlm_ls *ls, struct rcom_config *rf, int nodeid)
+static int check_config(struct dlm_ls *ls, struct dlm_rcom *rc, int nodeid)
 {
+	struct rcom_config *rf = (struct rcom_config *) rc->rc_buf;
+
+	if ((rc->rc_header.h_version & 0xFFFF0000) != DLM_HEADER_MAJOR) {
+		log_error(ls, "version mismatch: %x nodeid %d: %x",
+			  DLM_HEADER_MAJOR | DLM_HEADER_MINOR, nodeid,
+			  rc->rc_header.h_version);
+		return -EINVAL;
+	}
+
 	if (rf->rf_lvblen != ls->ls_lvblen ||
 	    rf->rf_lsflags != ls->ls_exflags) {
 		log_error(ls, "config mismatch: %d,%x nodeid %d: %d,%x",
@@ -145,8 +154,7 @@ int dlm_rcom_status(struct dlm_ls *ls, int nodeid)
 		log_debug(ls, "remote node %d not ready", nodeid);
 		rc->rc_result = 0;
 	} else
-		error = check_config(ls, (struct rcom_config *) rc->rc_buf,
-				     nodeid);
+		error = check_config(ls, rc, nodeid);
 	/* the caller looks at rc_result for the remote recovery status */
  out:
 	return error;
-- 
cgit v1.2.1


From da49f36f4f64feb281d7663be99e779b2aecc607 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 13 Dec 2006 10:38:45 -0600
Subject: [DLM] fix send_args() lvb copying

The send_args() function is used to copy parameters into a message for a
number different message types.  Only some of those types are set up
beforehand (in create_message) to include space for sending lvb data.
send_args was wrongly copying the lvb for all message types as long as the
lock had an lvb.  This means that the lvb data was being written past the
end of the message into unknown space.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lock.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 69ada5887078..cdf2cb9297fd 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -2144,12 +2144,24 @@ static void send_args(struct dlm_rsb *r, struct dlm_lkb *lkb,
 	if (lkb->lkb_astaddr)
 		ms->m_asts |= AST_COMP;
 
-	if (ms->m_type == DLM_MSG_REQUEST || ms->m_type == DLM_MSG_LOOKUP)
-		memcpy(ms->m_extra, r->res_name, r->res_length);
+	/* compare with switch in create_message; send_remove() doesn't
+	   use send_args() */
 
-	else if (lkb->lkb_lvbptr)
+	switch (ms->m_type) {
+	case DLM_MSG_REQUEST:
+	case DLM_MSG_LOOKUP:
+		memcpy(ms->m_extra, r->res_name, r->res_length);
+		break;
+	case DLM_MSG_CONVERT:
+	case DLM_MSG_UNLOCK:
+	case DLM_MSG_REQUEST_REPLY:
+	case DLM_MSG_CONVERT_REPLY:
+	case DLM_MSG_GRANT:
+		if (!lkb->lkb_lvbptr)
+			break;
 		memcpy(ms->m_extra, lkb->lkb_lvbptr, r->res_ls->ls_lvblen);
-
+		break;
+	}
 }
 
 static int send_common(struct dlm_rsb *r, struct dlm_lkb *lkb, int mstype)
-- 
cgit v1.2.1


From 8d07fd509e9c82a59e37b8b18a2fd0e8ef8fc837 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 13 Dec 2006 10:39:20 -0600
Subject: [DLM] fix receive_request() lvb copying

LVB's are not sent as part of new requests, but the code receiving the
request was copying data into the lvb anyway.  The space in the message
where it mistakenly thought the lvb lived actually contained the resource
name, so it wound up incorrectly copying this name data into the lvb.  Fix
is to just create the lvb, not copy junk into it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lock.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index cdf2cb9297fd..d8e919bad41a 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -2430,8 +2430,12 @@ static int receive_request_args(struct dlm_ls *ls, struct dlm_lkb *lkb,
 
 	DLM_ASSERT(is_master_copy(lkb), dlm_print_lkb(lkb););
 
-	if (receive_lvb(ls, lkb, ms))
-		return -ENOMEM;
+	if (lkb->lkb_exflags & DLM_LKF_VALBLK) {
+		/* lkb was just created so there won't be an lvb yet */
+		lkb->lkb_lvbptr = allocate_lvb(ls);
+		if (!lkb->lkb_lvbptr)
+			return -ENOMEM;
+	}
 
 	return 0;
 }
-- 
cgit v1.2.1


From 075529b5e1ffa8c9864d23930b71b5306a13d9f8 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 13 Dec 2006 10:40:26 -0600
Subject: [DLM] fix lost flags in stub replies

When the dlm fakes an unlock/cancel reply from a failed node using a stub
message struct, it wasn't setting the flags in the stub message.  So, in
the process of receiving the fake message the lkb flags would be updated
and cleared from the zero flags in the message.  The problem observed in
tests was the loss of the USER flag which caused the dlm to think a user
lock was a kernel lock and subsequently fail an assertion checking the
validity of the ast/callback field.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lock.c | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'fs')

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index d8e919bad41a..ed52485a86d0 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3148,6 +3148,7 @@ static void recover_convert_waiter(struct dlm_ls *ls, struct dlm_lkb *lkb)
 	if (middle_conversion(lkb)) {
 		hold_lkb(lkb);
 		ls->ls_stub_ms.m_result = -EINPROGRESS;
+		ls->ls_stub_ms.m_flags = lkb->lkb_flags;
 		_remove_from_waiters(lkb);
 		_receive_convert_reply(lkb, &ls->ls_stub_ms);
 
@@ -3221,6 +3222,7 @@ void dlm_recover_waiters_pre(struct dlm_ls *ls)
 		case DLM_MSG_UNLOCK:
 			hold_lkb(lkb);
 			ls->ls_stub_ms.m_result = -DLM_EUNLOCK;
+			ls->ls_stub_ms.m_flags = lkb->lkb_flags;
 			_remove_from_waiters(lkb);
 			_receive_unlock_reply(lkb, &ls->ls_stub_ms);
 			dlm_put_lkb(lkb);
@@ -3229,6 +3231,7 @@ void dlm_recover_waiters_pre(struct dlm_ls *ls)
 		case DLM_MSG_CANCEL:
 			hold_lkb(lkb);
 			ls->ls_stub_ms.m_result = -DLM_ECANCEL;
+			ls->ls_stub_ms.m_flags = lkb->lkb_flags;
 			_remove_from_waiters(lkb);
 			_receive_cancel_reply(lkb, &ls->ls_stub_ms);
 			dlm_put_lkb(lkb);
-- 
cgit v1.2.1


From 927255f0383342f5d49b82adb6689b9cba52a6f5 Mon Sep 17 00:00:00 2001
From: Adrian Bunk <bunk@stusta.de>
Date: Tue, 19 Dec 2006 13:04:03 -0800
Subject: [DLM] fs/dlm/lowcomms-tcp.c: remove 2 functions

Remove the following unused functions:

- lowcomms_send_message()
- lowcomms_max_buffer_size()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lowcomms-tcp.c | 24 ------------------------
 1 file changed, 24 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lowcomms-tcp.c b/fs/dlm/lowcomms-tcp.c
index 9be3a440c42a..3b224733f18b 100644
--- a/fs/dlm/lowcomms-tcp.c
+++ b/fs/dlm/lowcomms-tcp.c
@@ -880,22 +880,6 @@ out:
 	return -1;
 }
 
-/* API send message call, may queue the request */
-/* N.B. This is the old interface - use the new one for new calls */
-int lowcomms_send_message(int nodeid, char *buf, int len, gfp_t allocation)
-{
-	struct writequeue_entry *e;
-	char *b;
-
-	e = dlm_lowcomms_get_buffer(nodeid, len, allocation, &b);
-	if (e) {
-		memcpy(b, buf, len);
-		dlm_lowcomms_commit_buffer(e);
-		return 0;
-	}
-	return -ENOBUFS;
-}
-
 /* Look for activity on active sockets */
 static void process_sockets(void)
 {
@@ -1087,14 +1071,6 @@ static int daemons_start(void)
 	return 0;
 }
 
-/*
- * Return the largest buffer size we can cope with.
- */
-int lowcomms_max_buffer_size(void)
-{
-	return PAGE_CACHE_SIZE;
-}
-
 void dlm_lowcomms_stop(void)
 {
 	int i;
-- 
cgit v1.2.1


From c7b3383437ff41781964d1bf7f40ff8d7dd5bc47 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Thu, 14 Dec 2006 18:24:26 +0000
Subject: [GFS2] Fix DIO deadlock

This patch fixes Red Hat bugzilla #212627 in which a deadlock occurs
due to trying to take the i_mutex while holding a glock. The correct
locking order is defined as i_mutex -> glock in all cases.

I've left dealing with allocating writes. I know that we need to do
that, but for now this should do the trick. We don't need to take the
i_mutex on write, because the VFS has already taken it for us. On read
we don't need it since the glock is enough protection. The reason that
I've made some of the checks into a separate function is that we'll need
to do the checks again in the allocating write case eventually, so this
is partly in preparation for this. Likewise the return value test of !=
1 might look a bit odd and thats because we'll need a third return value
in case of requiring an allocation.

I've made the change to deferred mode on the glock to ensure flushing
read caches on other nodes. I notice that (using blktrace to look at
whats going on) we appear to do a better job of large I/Os than ext3
after this patch (in terms of not splitting up the I/Os).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Wendy Cheng <wcheng@redhat.com>
---
 fs/gfs2/ops_address.c | 74 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 29 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index d8d69a72a10d..0118aa439c1d 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -594,6 +594,36 @@ static void gfs2_invalidatepage(struct page *page, unsigned long offset)
 	return;
 }
 
+/**
+ * gfs2_ok_for_dio - check that dio is valid on this file
+ * @ip: The inode
+ * @rw: READ or WRITE
+ * @offset: The offset at which we are reading or writing
+ *
+ * Returns: 0 (to ignore the i/o request and thus fall back to buffered i/o)
+ *          1 (to accept the i/o request)
+ */
+static int gfs2_ok_for_dio(struct gfs2_inode *ip, int rw, loff_t offset)
+{
+	/*
+	 * Should we return an error here? I can't see that O_DIRECT for
+	 * a journaled file makes any sense. For now we'll silently fall
+	 * back to buffered I/O, likewise we do the same for stuffed
+	 * files since they are (a) small and (b) unaligned.
+	 */
+	if (gfs2_is_jdata(ip))
+		return 0;
+
+	if (gfs2_is_stuffed(ip))
+		return 0;
+
+	if (offset > i_size_read(&ip->i_inode))
+		return 0;
+	return 1;
+}
+
+
+
 static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb,
 			      const struct iovec *iov, loff_t offset,
 			      unsigned long nr_segs)
@@ -604,42 +634,28 @@ static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb,
 	struct gfs2_holder gh;
 	int rv;
 
-	if (rw == READ)
-		mutex_lock(&inode->i_mutex);
 	/*
-	 * Shared lock, even if its a write, since we do no allocation
-	 * on this path. All we need change is atime.
+	 * Deferred lock, even if its a write, since we do no allocation
+	 * on this path. All we need change is atime, and this lock mode
+	 * ensures that other nodes have flushed their buffered read caches
+	 * (i.e. their page cache entries for this inode). We do not,
+	 * unfortunately have the option of only flushing a range like
+	 * the VFS does.
 	 */
-	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME, &gh);
+	gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, GL_ATIME, &gh);
 	rv = gfs2_glock_nq_atime(&gh);
 	if (rv)
-		goto out;
-
-	if (offset > i_size_read(inode))
-		goto out;
-
-	/*
-	 * Should we return an error here? I can't see that O_DIRECT for
-	 * a journaled file makes any sense. For now we'll silently fall
-	 * back to buffered I/O, likewise we do the same for stuffed
-	 * files since they are (a) small and (b) unaligned.
-	 */
-	if (gfs2_is_jdata(ip))
-		goto out;
-
-	if (gfs2_is_stuffed(ip))
-		goto out;
-
-	rv = blockdev_direct_IO_own_locking(rw, iocb, inode,
-					    inode->i_sb->s_bdev,
-					    iov, offset, nr_segs,
-					    gfs2_get_block_direct, NULL);
+		return rv;
+	rv = gfs2_ok_for_dio(ip, rw, offset);
+	if (rv != 1)
+		goto out; /* dio not valid, fall back to buffered i/o */
+
+	rv = blockdev_direct_IO_no_locking(rw, iocb, inode, inode->i_sb->s_bdev,
+					   iov, offset, nr_segs,
+					   gfs2_get_block_direct, NULL);
 out:
 	gfs2_glock_dq_m(1, &gh);
 	gfs2_holder_uninit(&gh);
-	if (rw == READ)
-		mutex_unlock(&inode->i_mutex);
-
 	return rv;
 }
 
-- 
cgit v1.2.1


From e1d5b18ae92d0bbfe66dc2b4bab65006d32c5f7d Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Fri, 15 Dec 2006 16:49:51 -0500
Subject: [GFS2] Fail over to readpage for stuffed files

This is partially derrived from a patch written by Russell Cattelan.
It fixes a bug where there is a race between readpages and truncate
by ignoring readpages for stuffed files. This is ok because a stuffed
file will never be more than one block (minus sizeof(struct gfs2_dinode))
in size and block size is always less than page size, so we do not lose
anything efficiency-wise by not doing readahead for stuffed files. They
will have already been "read ahead" by the action of reading the inode
in, in the first place.

This is the remaining part of the fix for Red Hat bugzilla #218966
which had not yet made it upstream.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Russell Cattelan <cattelan@redhat.com>
---
 fs/gfs2/ops_address.c | 28 +++-------------------------
 1 file changed, 3 insertions(+), 25 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index 0118aa439c1d..37bfeb961eb3 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -256,7 +256,7 @@ out_unlock:
  *    the page lock and the glock) and return having done no I/O. Its
  *    obviously not something we'd want to do on too regular a basis.
  *    Any I/O we ignore at this time will be done via readpage later.
- * 2. We have to handle stuffed files here too.
+ * 2. We don't handle stuffed files here we let readpage do the honours.
  * 3. mpage_readpages() does most of the heavy lifting in the common case.
  * 4. gfs2_get_block() is relied upon to set BH_Boundary in the right places.
  * 5. We use LM_FLAG_TRY_1CB here, effectively we then have lock-ahead as
@@ -269,8 +269,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_holder gh;
-	unsigned page_idx;
-	int ret;
+	int ret = 0;
 	int do_unlock = 0;
 
 	if (likely(file != &gfs2_internal_file_sentinel)) {
@@ -289,29 +288,8 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
 			goto out_unlock;
 	}
 skip_lock:
-	if (gfs2_is_stuffed(ip)) {
-		struct pagevec lru_pvec;
-		pagevec_init(&lru_pvec, 0);
-		for (page_idx = 0; page_idx < nr_pages; page_idx++) {
-			struct page *page = list_entry(pages->prev, struct page, lru);
-			prefetchw(&page->flags);
-			list_del(&page->lru);
-			if (!add_to_page_cache(page, mapping,
-					       page->index, GFP_KERNEL)) {
-				ret = stuffed_readpage(ip, page);
-				unlock_page(page);
-				if (!pagevec_add(&lru_pvec, page))
-					 __pagevec_lru_add(&lru_pvec);
-			} else {
-				page_cache_release(page);
-			}
-		}
-		pagevec_lru_add(&lru_pvec);
-		ret = 0;
-	} else {
-		/* What we really want to do .... */
+	if (!gfs2_is_stuffed(ip))
 		ret = mpage_readpages(mapping, pages, nr_pages, gfs2_get_block);
-	}
 
 	if (do_unlock) {
 		gfs2_glock_dq_m(1, &gh);
-- 
cgit v1.2.1


From 5509826f1e548d14bb888c1cb6e3bbf23f855770 Mon Sep 17 00:00:00 2001
From: "S. Wendy Cheng" <wcheng@redhat.com>
Date: Thu, 18 Jan 2007 15:56:34 -0500
Subject: [GFS2] Fix change nlink deadlock

Bugzilla 215088

Fix deadlock in gfs2_change_nlink() while installing RHEL5 into GFS2
partition. The gfs2_rename() apparently needs block allocation for the
new name (into the directory) where it requires rg locks. At the same
time, while updating the nlink count for the replaced file,
gfs2_change_nlink() tries to return the inode meta-data back to resource
group where it needs rg locks too. Our logic doesn't allow process to
acquire these locks recursively by the same process  (RHEL installer)
that results a BUG call. This only happens within rename code path and
only if the destination file exists before the rename operation.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/inode.c     | 20 ++++++++++++++++----
 fs/gfs2/inode.h     |  1 +
 fs/gfs2/ops_inode.c | 25 ++++++++++++++++++++++---
 3 files changed, 39 insertions(+), 7 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index d122074c45e1..6bc443644c3c 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -281,16 +281,14 @@ out:
 }
 
 /**
- * gfs2_change_nlink - Change nlink count on inode
+ * gfs2_change_nlink_i - Change nlink count on inode
  * @ip: The GFS2 inode
  * @diff: The change in the nlink count required
  *
  * Returns: errno
  */
-
-int gfs2_change_nlink(struct gfs2_inode *ip, int diff)
+int gfs2_change_nlink_i(struct gfs2_inode *ip, int diff)
 {
-	struct gfs2_sbd *sdp = ip->i_inode.i_sb->s_fs_info;
 	struct buffer_head *dibh;
 	u32 nlink;
 	int error;
@@ -322,6 +320,20 @@ int gfs2_change_nlink(struct gfs2_inode *ip, int diff)
 	brelse(dibh);
 	mark_inode_dirty(&ip->i_inode);
 
+	return error;
+}
+
+int gfs2_change_nlink(struct gfs2_inode *ip, int diff)
+{
+	struct gfs2_sbd *sdp = ip->i_inode.i_sb->s_fs_info;
+	int error;
+
+	/* update the nlink */
+	error = gfs2_change_nlink_i(ip, diff);
+	if (error)
+		return error;
+
+	/* return meta data block back to rg */
 	if (ip->i_inode.i_nlink == 0) {
 		struct gfs2_rgrpd *rgd;
 		struct gfs2_holder ri_gh, rg_gh;
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index b57f448b15bc..85c67cb568d1 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -40,6 +40,7 @@ int gfs2_inode_refresh(struct gfs2_inode *ip);
 
 int gfs2_dinode_dealloc(struct gfs2_inode *inode);
 int gfs2_change_nlink(struct gfs2_inode *ip, int diff);
+int gfs2_change_nlink_i(struct gfs2_inode *ip, int diff);
 struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name,
 			   int is_root, struct nameidata *nd);
 struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name,
diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index 636dda4c7d38..919e8947e710 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -553,6 +553,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 	int alloc_required;
 	unsigned int x;
 	int error;
+	struct gfs2_rgrpd *rgd;
 
 	if (ndentry->d_inode) {
 		nip = GFS2_I(ndentry->d_inode);
@@ -684,12 +685,12 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 		error = gfs2_trans_begin(sdp, sdp->sd_max_dirres +
 					 al->al_rgd->rd_ri.ri_length +
 					 4 * RES_DINODE + 4 * RES_LEAF +
-					 RES_STATFS + RES_QUOTA, 0);
+					 RES_STATFS + RES_QUOTA + 1, 0);
 		if (error)
 			goto out_ipreserv;
 	} else {
 		error = gfs2_trans_begin(sdp, 4 * RES_DINODE +
-					 5 * RES_LEAF, 0);
+					 5 * RES_LEAF + 1, 0);
 		if (error)
 			goto out_gunlock;
 	}
@@ -703,7 +704,25 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 			error = gfs2_dir_del(ndip, &ndentry->d_name);
 			if (error)
 				goto out_end_trans;
-			error = gfs2_change_nlink(nip, -1);
+			error = gfs2_change_nlink_i(nip, -1);
+			if ((!error) && (nip->i_inode.i_nlink == 0)) {
+				error = -EIO;
+				rgd = gfs2_blk2rgrpd(sdp, nip->i_num.no_addr);
+				if (rgd) {
+					struct gfs2_holder nlink_rg_gh;
+					if (rgd != nip->i_alloc.al_rgd)
+						error = gfs2_glock_nq_init(
+						rgd->rd_gl, LM_ST_EXCLUSIVE,
+						0, &nlink_rg_gh);
+					else
+						error = 0;
+                			if (!error) {
+						gfs2_unlink_di(&nip->i_inode);
+						if (rgd != nip->i_alloc.al_rgd)
+							gfs2_glock_dq_uninit(&nlink_rg_gh);
+					}
+				}
+			}
 		}
 		if (error)
 			goto out_end_trans;
-- 
cgit v1.2.1


From 3fb4a251febe70e4c65ea8250545b391fd414d5a Mon Sep 17 00:00:00 2001
From: Patrick Caulfield <pcaulfie@redhat.com>
Date: Tue, 2 Jan 2007 17:01:05 +0000
Subject: [DLM] Fix schedule() calls

I was a little over-enthusiastic turning schedule() calls int cond_sched() when fixing the DLM for Andrew Morton.

These four should really be calls to schedule() or the dlm can busy-wait.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lowcomms-sctp.c | 4 ++--
 fs/dlm/lowcomms-tcp.c  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lowcomms-sctp.c b/fs/dlm/lowcomms-sctp.c
index fe158d7a9285..0940a805b3d7 100644
--- a/fs/dlm/lowcomms-sctp.c
+++ b/fs/dlm/lowcomms-sctp.c
@@ -1109,7 +1109,7 @@ static int dlm_recvd(void *data)
 		set_current_state(TASK_INTERRUPTIBLE);
 		add_wait_queue(&lowcomms_recv_wait, &wait);
 		if (!test_bit(CF_READ_PENDING, &sctp_con.flags))
-			cond_resched();
+			schedule();
 		remove_wait_queue(&lowcomms_recv_wait, &wait);
 		set_current_state(TASK_RUNNING);
 
@@ -1141,7 +1141,7 @@ static int dlm_sendd(void *data)
 	while (!kthread_should_stop()) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (write_list_empty())
-			cond_resched();
+			schedule();
 		set_current_state(TASK_RUNNING);
 
 		if (sctp_con.eagain_flag) {
diff --git a/fs/dlm/lowcomms-tcp.c b/fs/dlm/lowcomms-tcp.c
index 3b224733f18b..18b91c65bdb3 100644
--- a/fs/dlm/lowcomms-tcp.c
+++ b/fs/dlm/lowcomms-tcp.c
@@ -996,7 +996,7 @@ static int dlm_recvd(void *data)
 	while (!kthread_should_stop()) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (read_list_empty())
-			cond_resched();
+			schedule();
 		set_current_state(TASK_RUNNING);
 
 		process_sockets();
@@ -1030,7 +1030,7 @@ static int dlm_sendd(void *data)
 	while (!kthread_should_stop()) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (write_and_state_lists_empty())
-			cond_resched();
+			schedule();
 		set_current_state(TASK_RUNNING);
 
 		process_state_queue();
-- 
cgit v1.2.1


From 4edde74eedb8bc4c03adc3602b114b72a7ccd13f Mon Sep 17 00:00:00 2001
From: Patrick Caulfield <pcaulfie@redhat.com>
Date: Tue, 2 Jan 2007 17:08:54 +0000
Subject: [DLM] Fix spin lock already unlocked bug

I just noticed this message when testing some other changes I'd made to
lowcomms (to use workqueues) but the problem seems to be in the current
git trees too. I'm amazed no-one has seen it.

    BUG: spinlock already unlocked on CPU#1, dlm_recoverd/16868

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lowcomms-tcp.c | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'fs')

diff --git a/fs/dlm/lowcomms-tcp.c b/fs/dlm/lowcomms-tcp.c
index 18b91c65bdb3..ce5e7cdfb45d 100644
--- a/fs/dlm/lowcomms-tcp.c
+++ b/fs/dlm/lowcomms-tcp.c
@@ -709,6 +709,7 @@ void *dlm_lowcomms_get_buffer(int nodeid, int len,
 	if (!con)
 		return NULL;
 
+	spin_lock(&con->writequeue_lock);
 	e = list_entry(con->writequeue.prev, struct writequeue_entry, list);
 	if ((&e->list == &con->writequeue) ||
 	    (PAGE_CACHE_SIZE - e->end < len)) {
@@ -747,6 +748,7 @@ void dlm_lowcomms_commit_buffer(void *mh)
 	struct connection *con = e->con;
 	int users;
 
+	spin_lock(&con->writequeue_lock);
 	users = --e->users;
 	if (users)
 		goto out;
-- 
cgit v1.2.1


From 49686f71060e342bce6644a5c69fbc6ad0e75a13 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Mon, 8 Jan 2007 14:31:40 +0000
Subject: [GFS2] Fix ordering of page disposal vs. glock_dq

In case of unlinked files with dirty pages GFS2 wasn't clearing
the pages in quite the right order. This patch clears the pages
earlier (before the qlock_dq) to avoid the situation that the
release of the glock results in attempting to write back data that
has already been deallocated.

This fixes Red Hat bugzilla: #220117

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/ops_super.c | 6 ++++++
 1 file changed, 6 insertions(+)

(limited to 'fs')

diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
index b2837839eed9..c22738cdbf2d 100644
--- a/fs/gfs2/ops_super.c
+++ b/fs/gfs2/ops_super.c
@@ -429,6 +429,12 @@ static void gfs2_delete_inode(struct inode *inode)
 	}
 
 	error = gfs2_dinode_dealloc(ip);
+	/*
+	 * Must do this before unlock to avoid trying to write back
+	 * potentially dirty data now that inode no longer exists
+	 * on disk.
+	 */
+	truncate_inode_pages(&inode->i_data, 0);
 
 out_unlock:
 	gfs2_glock_dq(&ip->i_iopen_gh);
-- 
cgit v1.2.1


From 6c93fd1e578669364e026a0d44c669b871e2a8c4 Mon Sep 17 00:00:00 2001
From: Russell Cattelan <cattelan@redhat.com>
Date: Mon, 8 Jan 2007 17:47:51 -0600
Subject: [GFS2] BZ 217008 fsfuzzer fix.

Update the quilt header comments to match the
code changes.

Change gfs2_lookup_simple to return an error in the case
of a NULL inode.
The callers of gfs2_lookup_simple do not check for NULL
in the no entry case and such would end up dereferencing a NULL ptr.

This fixes:
http://projects.info-pull.com/mokb/MOKB-15-11-2006.html

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/inode.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 6bc443644c3c..bab338f6b610 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -361,8 +361,18 @@ out:
 struct inode *gfs2_lookup_simple(struct inode *dip, const char *name)
 {
 	struct qstr qstr;
+	struct inode *inode;
 	gfs2_str2qstr(&qstr, name);
-	return gfs2_lookupi(dip, &qstr, 1, NULL);
+	inode = gfs2_lookupi(dip, &qstr, 1, NULL);
+	/* gfs2_lookupi has inconsistent callers: vfs
+	 * related routines expect NULL for no entry found,
+	 * gfs2_lookup_simple callers expect ENOENT
+	 * and do not check for NULL.
+	 */
+	if (inode == NULL)
+		return ERR_PTR(-ENOENT);
+	else
+		return inode;
 }
 
 
-- 
cgit v1.2.1


From 87d21e07f3880b8d489f0b4a639deb1362101838 Mon Sep 17 00:00:00 2001
From: "S. Wendy Cheng" <wcheng@redhat.com>
Date: Thu, 18 Jan 2007 16:07:03 -0500
Subject: [GFS2] Fix gfs2_rename deadlock

Second round of gfs2_rename lock re-ordering to allow Anaconda adding
root partition on top of gfs2. Previous to this patch the recursive
lock detector in glock.c can be triggered due to attempting to lock
the rgrp twice. This fixes it by checking to see whether the rgrp
is already locked.

This fixes Red Hat bugzilla #221237

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/inode.c     | 50 +++++++++++++++++++++++++++++++-------------------
 fs/gfs2/inode.h     |  2 +-
 fs/gfs2/ops_inode.c | 25 +++----------------------
 3 files changed, 35 insertions(+), 42 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index bab338f6b610..58c2ce785fed 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -281,13 +281,13 @@ out:
 }
 
 /**
- * gfs2_change_nlink_i - Change nlink count on inode
+ * gfs2_change_nlink - Change nlink count on inode
  * @ip: The GFS2 inode
  * @diff: The change in the nlink count required
  *
  * Returns: errno
  */
-int gfs2_change_nlink_i(struct gfs2_inode *ip, int diff)
+int gfs2_change_nlink(struct gfs2_inode *ip, int diff)
 {
 	struct buffer_head *dibh;
 	u32 nlink;
@@ -320,40 +320,52 @@ int gfs2_change_nlink_i(struct gfs2_inode *ip, int diff)
 	brelse(dibh);
 	mark_inode_dirty(&ip->i_inode);
 
+	if (ip->i_inode.i_nlink == 0)
+		error = gfs2_change_nlink_i(ip);
+
 	return error;
 }
 
-int gfs2_change_nlink(struct gfs2_inode *ip, int diff)
+int gfs2_change_nlink_i(struct gfs2_inode *ip)
 {
 	struct gfs2_sbd *sdp = ip->i_inode.i_sb->s_fs_info;
-	int error;
-
-	/* update the nlink */
-	error = gfs2_change_nlink_i(ip, diff);
-	if (error)
-		return error;
-
-	/* return meta data block back to rg */
-	if (ip->i_inode.i_nlink == 0) {
-		struct gfs2_rgrpd *rgd;
-		struct gfs2_holder ri_gh, rg_gh;
+	struct gfs2_inode *rindex = GFS2_I(sdp->sd_rindex);
+	struct gfs2_glock *ri_gl = rindex->i_gl;
+	struct gfs2_rgrpd *rgd;
+	struct gfs2_holder ri_gh, rg_gh;
+	int existing, error;
 
+	/* if we come from rename path, we could have the lock already */
+	existing = gfs2_glock_is_locked_by_me(ri_gl);
+	if (!existing) {
 		error = gfs2_rindex_hold(sdp, &ri_gh);
 		if (error)
 			goto out;
-		error = -EIO;
-		rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr);
-		if (!rgd)
-			goto out_norgrp;
+	}
+
+	/* find the matching rgd */
+	error = -EIO;
+	rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr);
+	if (!rgd)
+		goto out_norgrp;
+
+	/*
+	 * Eventually we may want to move rgd(s) to a linked list
+	 * and piggyback the free logic into one of gfs2 daemons
+	 * to gain some performance.
+	 */
+	if (!rgd->rd_gl || !gfs2_glock_is_locked_by_me(rgd->rd_gl)) {
 		error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &rg_gh);
 		if (error)
 			goto out_norgrp;
 
 		gfs2_unlink_di(&ip->i_inode); /* mark inode unlinked */
 		gfs2_glock_dq_uninit(&rg_gh);
+	}
+
 out_norgrp:
+	if (!existing)
 		gfs2_glock_dq_uninit(&ri_gh);
-	}
 out:
 	return error;
 }
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index 85c67cb568d1..cee281b5326b 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -40,7 +40,7 @@ int gfs2_inode_refresh(struct gfs2_inode *ip);
 
 int gfs2_dinode_dealloc(struct gfs2_inode *inode);
 int gfs2_change_nlink(struct gfs2_inode *ip, int diff);
-int gfs2_change_nlink_i(struct gfs2_inode *ip, int diff);
+int gfs2_change_nlink_i(struct gfs2_inode *ip);
 struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name,
 			   int is_root, struct nameidata *nd);
 struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name,
diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index 919e8947e710..b2a12f44f59d 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -553,7 +553,6 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 	int alloc_required;
 	unsigned int x;
 	int error;
-	struct gfs2_rgrpd *rgd;
 
 	if (ndentry->d_inode) {
 		nip = GFS2_I(ndentry->d_inode);
@@ -685,12 +684,12 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 		error = gfs2_trans_begin(sdp, sdp->sd_max_dirres +
 					 al->al_rgd->rd_ri.ri_length +
 					 4 * RES_DINODE + 4 * RES_LEAF +
-					 RES_STATFS + RES_QUOTA + 1, 0);
+					 RES_STATFS + RES_QUOTA + 4, 0);
 		if (error)
 			goto out_ipreserv;
 	} else {
 		error = gfs2_trans_begin(sdp, 4 * RES_DINODE +
-					 5 * RES_LEAF + 1, 0);
+					 5 * RES_LEAF + 4, 0);
 		if (error)
 			goto out_gunlock;
 	}
@@ -704,25 +703,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 			error = gfs2_dir_del(ndip, &ndentry->d_name);
 			if (error)
 				goto out_end_trans;
-			error = gfs2_change_nlink_i(nip, -1);
-			if ((!error) && (nip->i_inode.i_nlink == 0)) {
-				error = -EIO;
-				rgd = gfs2_blk2rgrpd(sdp, nip->i_num.no_addr);
-				if (rgd) {
-					struct gfs2_holder nlink_rg_gh;
-					if (rgd != nip->i_alloc.al_rgd)
-						error = gfs2_glock_nq_init(
-						rgd->rd_gl, LM_ST_EXCLUSIVE,
-						0, &nlink_rg_gh);
-					else
-						error = 0;
-                			if (!error) {
-						gfs2_unlink_di(&nip->i_inode);
-						if (rgd != nip->i_alloc.al_rgd)
-							gfs2_glock_dq_uninit(&nlink_rg_gh);
-					}
-				}
-			}
+			error = gfs2_change_nlink(nip, -1);
 		}
 		if (error)
 			goto out_end_trans;
-- 
cgit v1.2.1


From 8ec6886748443bec53ce9b9bf50cec92bc417a1b Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Tue, 9 Jan 2007 09:38:39 -0600
Subject: [DLM] change some log_error to log_debug

Some common, non-error messages should use log_debug instead of log_error
so they can be turned off.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/rcom.c     |  4 ++--
 fs/dlm/recoverd.c | 22 +++++++++++-----------
 2 files changed, 13 insertions(+), 13 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c
index 54fba9bd6740..2e246af4a62c 100644
--- a/fs/dlm/rcom.c
+++ b/fs/dlm/rcom.c
@@ -435,7 +435,7 @@ static int is_old_reply(struct dlm_ls *ls, struct dlm_rcom *rc)
 		seq = ls->ls_recover_seq;
 		spin_unlock(&ls->ls_recover_lock);
 		if (rc->rc_seq_reply != seq) {
-			log_error(ls, "ignoring old reply %x from %d "
+			log_debug(ls, "ignoring old reply %x from %d "
 				      "seq_reply %llx expect %llx",
 				      rc->rc_type, rc->rc_header.h_nodeid,
 				      (unsigned long long)rc->rc_seq_reply,
@@ -469,7 +469,7 @@ void dlm_receive_rcom(struct dlm_header *hd, int nodeid)
 	}
 
 	if (dlm_recovery_stopped(ls) && (rc->rc_type != DLM_RCOM_STATUS)) {
-		log_error(ls, "ignoring recovery message %x from %d",
+		log_debug(ls, "ignoring recovery message %x from %d",
 			  rc->rc_type, nodeid);
 		goto out;
 	}
diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c
index 650536aa5139..3cb636d60249 100644
--- a/fs/dlm/recoverd.c
+++ b/fs/dlm/recoverd.c
@@ -77,7 +77,7 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
 
 	error = dlm_recover_members(ls, rv, &neg);
 	if (error) {
-		log_error(ls, "recover_members failed %d", error);
+		log_debug(ls, "recover_members failed %d", error);
 		goto fail;
 	}
 	start = jiffies;
@@ -89,7 +89,7 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
 
 	error = dlm_recover_directory(ls);
 	if (error) {
-		log_error(ls, "recover_directory failed %d", error);
+		log_debug(ls, "recover_directory failed %d", error);
 		goto fail;
 	}
 
@@ -99,7 +99,7 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
 
 	error = dlm_recover_directory_wait(ls);
 	if (error) {
-		log_error(ls, "recover_directory_wait failed %d", error);
+		log_debug(ls, "recover_directory_wait failed %d", error);
 		goto fail;
 	}
 
@@ -129,7 +129,7 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
 
 		error = dlm_recover_masters(ls);
 		if (error) {
-			log_error(ls, "recover_masters failed %d", error);
+			log_debug(ls, "recover_masters failed %d", error);
 			goto fail;
 		}
 
@@ -139,13 +139,13 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
 
 		error = dlm_recover_locks(ls);
 		if (error) {
-			log_error(ls, "recover_locks failed %d", error);
+			log_debug(ls, "recover_locks failed %d", error);
 			goto fail;
 		}
 
 		error = dlm_recover_locks_wait(ls);
 		if (error) {
-			log_error(ls, "recover_locks_wait failed %d", error);
+			log_debug(ls, "recover_locks_wait failed %d", error);
 			goto fail;
 		}
 
@@ -166,7 +166,7 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
 
 		error = dlm_recover_locks_wait(ls);
 		if (error) {
-			log_error(ls, "recover_locks_wait failed %d", error);
+			log_debug(ls, "recover_locks_wait failed %d", error);
 			goto fail;
 		}
 	}
@@ -184,7 +184,7 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
 	dlm_set_recover_status(ls, DLM_RS_DONE);
 	error = dlm_recover_done_wait(ls);
 	if (error) {
-		log_error(ls, "recover_done_wait failed %d", error);
+		log_debug(ls, "recover_done_wait failed %d", error);
 		goto fail;
 	}
 
@@ -192,19 +192,19 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
 
 	error = enable_locking(ls, rv->seq);
 	if (error) {
-		log_error(ls, "enable_locking failed %d", error);
+		log_debug(ls, "enable_locking failed %d", error);
 		goto fail;
 	}
 
 	error = dlm_process_requestqueue(ls);
 	if (error) {
-		log_error(ls, "process_requestqueue failed %d", error);
+		log_debug(ls, "process_requestqueue failed %d", error);
 		goto fail;
 	}
 
 	error = dlm_recover_waiters_post(ls);
 	if (error) {
-		log_error(ls, "recover_waiters_post failed %d", error);
+		log_debug(ls, "recover_waiters_post failed %d", error);
 		goto fail;
 	}
 
-- 
cgit v1.2.1


From 68c817a1c4e21b893672ac73d8a498e6647453aa Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Tue, 9 Jan 2007 09:41:48 -0600
Subject: [DLM] rename dlm_config_info fields

Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
can use macros to add configfs functions to access them (in a later
patch).  No functional changes in this patch, just naming changes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/config.c        | 16 ++++++++--------
 fs/dlm/config.h        | 16 ++++++++--------
 fs/dlm/lock.c          |  2 +-
 fs/dlm/lockspace.c     | 10 +++++-----
 fs/dlm/lowcomms-sctp.c |  6 +++---
 fs/dlm/lowcomms-tcp.c  |  9 +++++----
 fs/dlm/midcomms.c      |  4 ++--
 fs/dlm/rcom.c          |  8 ++++----
 fs/dlm/recover.c       |  4 ++--
 9 files changed, 38 insertions(+), 37 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index 88553054bbfa..958021f91486 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -777,13 +777,13 @@ int dlm_our_addr(struct sockaddr_storage *addr, int num)
 #define DEFAULT_SCAN_SECS          5
 
 struct dlm_config_info dlm_config = {
-	.tcp_port = DEFAULT_TCP_PORT,
-	.buffer_size = DEFAULT_BUFFER_SIZE,
-	.rsbtbl_size = DEFAULT_RSBTBL_SIZE,
-	.lkbtbl_size = DEFAULT_LKBTBL_SIZE,
-	.dirtbl_size = DEFAULT_DIRTBL_SIZE,
-	.recover_timer = DEFAULT_RECOVER_TIMER,
-	.toss_secs = DEFAULT_TOSS_SECS,
-	.scan_secs = DEFAULT_SCAN_SECS
+	.ci_tcp_port = DEFAULT_TCP_PORT,
+	.ci_buffer_size = DEFAULT_BUFFER_SIZE,
+	.ci_rsbtbl_size = DEFAULT_RSBTBL_SIZE,
+	.ci_lkbtbl_size = DEFAULT_LKBTBL_SIZE,
+	.ci_dirtbl_size = DEFAULT_DIRTBL_SIZE,
+	.ci_recover_timer = DEFAULT_RECOVER_TIMER,
+	.ci_toss_secs = DEFAULT_TOSS_SECS,
+	.ci_scan_secs = DEFAULT_SCAN_SECS
 };
 
diff --git a/fs/dlm/config.h b/fs/dlm/config.h
index 9da7839958a9..ce603e1c3bac 100644
--- a/fs/dlm/config.h
+++ b/fs/dlm/config.h
@@ -17,14 +17,14 @@
 #define DLM_MAX_ADDR_COUNT 3
 
 struct dlm_config_info {
-	int tcp_port;
-	int buffer_size;
-	int rsbtbl_size;
-	int lkbtbl_size;
-	int dirtbl_size;
-	int recover_timer;
-	int toss_secs;
-	int scan_secs;
+	int ci_tcp_port;
+	int ci_buffer_size;
+	int ci_rsbtbl_size;
+	int ci_lkbtbl_size;
+	int ci_dirtbl_size;
+	int ci_recover_timer;
+	int ci_toss_secs;
+	int ci_scan_secs;
 };
 
 extern struct dlm_config_info dlm_config;
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index ed52485a86d0..5bac9827ded3 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -810,7 +810,7 @@ static int shrink_bucket(struct dlm_ls *ls, int b)
 		list_for_each_entry_reverse(r, &ls->ls_rsbtbl[b].toss,
 					    res_hashchain) {
 			if (!time_after_eq(jiffies, r->res_toss_time +
-					   dlm_config.toss_secs * HZ))
+					   dlm_config.ci_toss_secs * HZ))
 				continue;
 			found = 1;
 			break;
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 59012b089e8d..f40817b53c6f 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -236,7 +236,7 @@ static int dlm_scand(void *data)
 	while (!kthread_should_stop()) {
 		list_for_each_entry(ls, &lslist, ls_list)
 			dlm_scan_rsbs(ls);
-		schedule_timeout_interruptible(dlm_config.scan_secs * HZ);
+		schedule_timeout_interruptible(dlm_config.ci_scan_secs * HZ);
 	}
 	return 0;
 }
@@ -422,7 +422,7 @@ static int new_lockspace(char *name, int namelen, void **lockspace,
 	ls->ls_count = 0;
 	ls->ls_flags = 0;
 
-	size = dlm_config.rsbtbl_size;
+	size = dlm_config.ci_rsbtbl_size;
 	ls->ls_rsbtbl_size = size;
 
 	ls->ls_rsbtbl = kmalloc(sizeof(struct dlm_rsbtable) * size, GFP_KERNEL);
@@ -434,7 +434,7 @@ static int new_lockspace(char *name, int namelen, void **lockspace,
 		rwlock_init(&ls->ls_rsbtbl[i].lock);
 	}
 
-	size = dlm_config.lkbtbl_size;
+	size = dlm_config.ci_lkbtbl_size;
 	ls->ls_lkbtbl_size = size;
 
 	ls->ls_lkbtbl = kmalloc(sizeof(struct dlm_lkbtable) * size, GFP_KERNEL);
@@ -446,7 +446,7 @@ static int new_lockspace(char *name, int namelen, void **lockspace,
 		ls->ls_lkbtbl[i].counter = 1;
 	}
 
-	size = dlm_config.dirtbl_size;
+	size = dlm_config.ci_dirtbl_size;
 	ls->ls_dirtbl_size = size;
 
 	ls->ls_dirtbl = kmalloc(sizeof(struct dlm_dirtable) * size, GFP_KERNEL);
@@ -489,7 +489,7 @@ static int new_lockspace(char *name, int namelen, void **lockspace,
 	mutex_init(&ls->ls_requestqueue_mutex);
 	mutex_init(&ls->ls_clear_proc_locks);
 
-	ls->ls_recover_buf = kmalloc(dlm_config.buffer_size, GFP_KERNEL);
+	ls->ls_recover_buf = kmalloc(dlm_config.ci_buffer_size, GFP_KERNEL);
 	if (!ls->ls_recover_buf)
 		goto out_dirfree;
 
diff --git a/fs/dlm/lowcomms-sctp.c b/fs/dlm/lowcomms-sctp.c
index 0940a805b3d7..5aeadadd8afd 100644
--- a/fs/dlm/lowcomms-sctp.c
+++ b/fs/dlm/lowcomms-sctp.c
@@ -635,7 +635,7 @@ static int add_bind_addr(struct sockaddr_storage *addr, int addr_len, int num)
 
 	if (result < 0)
 		log_print("Can't bind to port %d addr number %d",
-			  dlm_config.tcp_port, num);
+			  dlm_config.ci_tcp_port, num);
 
 	return result;
 }
@@ -711,7 +711,7 @@ static int init_sock(void)
 	/* Bind to all interfaces. */
 	for (i = 0; i < dlm_local_count; i++) {
 		memcpy(&localaddr, dlm_local_addr[i], sizeof(localaddr));
-		make_sockaddr(&localaddr, dlm_config.tcp_port, &addr_len);
+		make_sockaddr(&localaddr, dlm_config.ci_tcp_port, &addr_len);
 
 		result = add_bind_addr(&localaddr, addr_len, num);
 		if (result)
@@ -863,7 +863,7 @@ static void initiate_association(int nodeid)
 		return;
 	}
 
-	make_sockaddr(&rem_addr, dlm_config.tcp_port, &addrlen);
+	make_sockaddr(&rem_addr, dlm_config.ci_tcp_port, &addrlen);
 
 	outmessage.msg_name = &rem_addr;
 	outmessage.msg_namelen = addrlen;
diff --git a/fs/dlm/lowcomms-tcp.c b/fs/dlm/lowcomms-tcp.c
index ce5e7cdfb45d..b4fb5783ef8a 100644
--- a/fs/dlm/lowcomms-tcp.c
+++ b/fs/dlm/lowcomms-tcp.c
@@ -548,7 +548,7 @@ static void connect_to_sock(struct connection *con)
 	sock->sk->sk_user_data = con;
 	con->rx_action = receive_from_sock;
 
-	make_sockaddr(&saddr, dlm_config.tcp_port, &addr_len);
+	make_sockaddr(&saddr, dlm_config.ci_tcp_port, &addr_len);
 
 	add_sock(sock, con);
 
@@ -616,10 +616,10 @@ static struct socket *create_listen_sock(struct connection *con,
 	con->sock = sock;
 
 	/* Bind to our port */
-	make_sockaddr(saddr, dlm_config.tcp_port, &addr_len);
+	make_sockaddr(saddr, dlm_config.ci_tcp_port, &addr_len);
 	result = sock->ops->bind(sock, (struct sockaddr *) saddr, addr_len);
 	if (result < 0) {
-		printk("dlm: Can't bind to port %d\n", dlm_config.tcp_port);
+		printk("dlm: Can't bind to port %d\n", dlm_config.ci_tcp_port);
 		sock_release(sock);
 		sock = NULL;
 		con->sock = NULL;
@@ -638,7 +638,8 @@ static struct socket *create_listen_sock(struct connection *con,
 
 	result = sock->ops->listen(sock, 5);
 	if (result < 0) {
-		printk("dlm: Can't listen on port %d\n", dlm_config.tcp_port);
+		printk("dlm: Can't listen on port %d\n",
+		       dlm_config.ci_tcp_port);
 		sock_release(sock);
 		sock = NULL;
 		goto create_out;
diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index c9b1c3d535f4..a5126e0c68a6 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -82,7 +82,7 @@ int dlm_process_incoming_buffer(int nodeid, const void *base,
 		if (msglen < sizeof(struct dlm_header))
 			break;
 		err = -E2BIG;
-		if (msglen > dlm_config.buffer_size) {
+		if (msglen > dlm_config.ci_buffer_size) {
 			log_print("message size %d from %d too big, buf len %d",
 				  msglen, nodeid, len);
 			break;
@@ -103,7 +103,7 @@ int dlm_process_incoming_buffer(int nodeid, const void *base,
 
 		if (msglen > sizeof(__tmp) &&
 		    msg == (struct dlm_header *) __tmp) {
-			msg = kmalloc(dlm_config.buffer_size, GFP_KERNEL);
+			msg = kmalloc(dlm_config.ci_buffer_size, GFP_KERNEL);
 			if (msg == NULL)
 				return ret;
 		}
diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c
index 2e246af4a62c..6bfbd6153809 100644
--- a/fs/dlm/rcom.c
+++ b/fs/dlm/rcom.c
@@ -138,7 +138,7 @@ int dlm_rcom_status(struct dlm_ls *ls, int nodeid)
 		goto out;
 
 	allow_sync_reply(ls, &rc->rc_id);
-	memset(ls->ls_recover_buf, 0, dlm_config.buffer_size);
+	memset(ls->ls_recover_buf, 0, dlm_config.ci_buffer_size);
 
 	send_rcom(ls, mh, rc);
 
@@ -213,7 +213,7 @@ int dlm_rcom_names(struct dlm_ls *ls, int nodeid, char *last_name, int last_len)
 	if (nodeid == dlm_our_nodeid()) {
 		dlm_copy_master_names(ls, last_name, last_len,
 		                      ls->ls_recover_buf + len,
-		                      dlm_config.buffer_size - len, nodeid);
+		                      dlm_config.ci_buffer_size - len, nodeid);
 		goto out;
 	}
 
@@ -223,7 +223,7 @@ int dlm_rcom_names(struct dlm_ls *ls, int nodeid, char *last_name, int last_len)
 	memcpy(rc->rc_buf, last_name, last_len);
 
 	allow_sync_reply(ls, &rc->rc_id);
-	memset(ls->ls_recover_buf, 0, dlm_config.buffer_size);
+	memset(ls->ls_recover_buf, 0, dlm_config.ci_buffer_size);
 
 	send_rcom(ls, mh, rc);
 
@@ -241,7 +241,7 @@ static void receive_rcom_names(struct dlm_ls *ls, struct dlm_rcom *rc_in)
 
 	nodeid = rc_in->rc_header.h_nodeid;
 	inlen = rc_in->rc_header.h_length - sizeof(struct dlm_rcom);
-	outlen = dlm_config.buffer_size - sizeof(struct dlm_rcom);
+	outlen = dlm_config.ci_buffer_size - sizeof(struct dlm_rcom);
 
 	error = create_rcom(ls, nodeid, DLM_RCOM_NAMES_REPLY, outlen, &rc, &mh);
 	if (error)
diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index cf9f6831bab5..a7fa4cb6cd16 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -44,7 +44,7 @@
 static void dlm_wait_timer_fn(unsigned long data)
 {
 	struct dlm_ls *ls = (struct dlm_ls *) data;
-	mod_timer(&ls->ls_timer, jiffies + (dlm_config.recover_timer * HZ));
+	mod_timer(&ls->ls_timer, jiffies + (dlm_config.ci_recover_timer * HZ));
 	wake_up(&ls->ls_wait_general);
 }
 
@@ -55,7 +55,7 @@ int dlm_wait_function(struct dlm_ls *ls, int (*testfn) (struct dlm_ls *ls))
 	init_timer(&ls->ls_timer);
 	ls->ls_timer.function = dlm_wait_timer_fn;
 	ls->ls_timer.data = (long) ls;
-	ls->ls_timer.expires = jiffies + (dlm_config.recover_timer * HZ);
+	ls->ls_timer.expires = jiffies + (dlm_config.ci_recover_timer * HZ);
 	add_timer(&ls->ls_timer);
 
 	wait_event(ls->ls_wait_general, testfn(ls) || dlm_recovery_stopped(ls));
-- 
cgit v1.2.1


From 99fc64874aad1ee0aea5c4d8c07e3529f9d03497 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Tue, 9 Jan 2007 09:44:01 -0600
Subject: [DLM] add config entry to enable log_debug

Add a new dlm_config_info field to enable log_debug output and change
log_debug() to use it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/config.c       |  4 +++-
 fs/dlm/config.h       |  1 +
 fs/dlm/dlm_internal.h | 13 +++++++------
 3 files changed, 11 insertions(+), 7 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index 958021f91486..7cf20209b127 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -775,6 +775,7 @@ int dlm_our_addr(struct sockaddr_storage *addr, int num)
 #define DEFAULT_RECOVER_TIMER      5
 #define DEFAULT_TOSS_SECS         10
 #define DEFAULT_SCAN_SECS          5
+#define DEFAULT_LOG_DEBUG          0
 
 struct dlm_config_info dlm_config = {
 	.ci_tcp_port = DEFAULT_TCP_PORT,
@@ -784,6 +785,7 @@ struct dlm_config_info dlm_config = {
 	.ci_dirtbl_size = DEFAULT_DIRTBL_SIZE,
 	.ci_recover_timer = DEFAULT_RECOVER_TIMER,
 	.ci_toss_secs = DEFAULT_TOSS_SECS,
-	.ci_scan_secs = DEFAULT_SCAN_SECS
+	.ci_scan_secs = DEFAULT_SCAN_SECS,
+	.ci_log_debug = DEFAULT_LOG_DEBUG
 };
 
diff --git a/fs/dlm/config.h b/fs/dlm/config.h
index ce603e1c3bac..1e978611a96e 100644
--- a/fs/dlm/config.h
+++ b/fs/dlm/config.h
@@ -25,6 +25,7 @@ struct dlm_config_info {
 	int ci_recover_timer;
 	int ci_toss_secs;
 	int ci_scan_secs;
+	int ci_log_debug;
 };
 
 extern struct dlm_config_info dlm_config;
diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index 7185a132a8b5..ee993c5c2307 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -41,6 +41,7 @@
 #include <asm/uaccess.h>
 
 #include <linux/dlm.h>
+#include "config.h"
 
 #define DLM_LOCKSPACE_LEN	64
 
@@ -69,12 +70,12 @@ struct dlm_mhandle;
 #define log_error(ls, fmt, args...) \
 	printk(KERN_ERR "dlm: %s: " fmt "\n", (ls)->ls_name , ##args)
 
-#define DLM_LOG_DEBUG
-#ifdef DLM_LOG_DEBUG
-#define log_debug(ls, fmt, args...) log_error(ls, fmt, ##args)
-#else
-#define log_debug(ls, fmt, args...)
-#endif
+#define log_debug(ls, fmt, args...) \
+do { \
+	if (dlm_config.ci_log_debug) \
+		printk(KERN_DEBUG "dlm: %s: " fmt "\n", \
+		       (ls)->ls_name , ##args); \
+} while (0)
 
 #define DLM_ASSERT(x, do) \
 { \
-- 
cgit v1.2.1


From d200778e1257eeb92242355de6f191a0a5ad43c4 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Tue, 9 Jan 2007 09:46:02 -0600
Subject: [DLM] expose dlm_config_info fields in configfs

Make the dlm_config_info values readable and writeable via configfs
entries.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/config.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 132 insertions(+), 4 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index 7cf20209b127..8665c88e5af2 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -54,6 +54,11 @@ static struct config_item *make_node(struct config_group *, const char *);
 static void drop_node(struct config_group *, struct config_item *);
 static void release_node(struct config_item *);
 
+static ssize_t show_cluster(struct config_item *i, struct configfs_attribute *a,
+			    char *buf);
+static ssize_t store_cluster(struct config_item *i,
+			     struct configfs_attribute *a,
+			     const char *buf, size_t len);
 static ssize_t show_comm(struct config_item *i, struct configfs_attribute *a,
 			 char *buf);
 static ssize_t store_comm(struct config_item *i, struct configfs_attribute *a,
@@ -73,6 +78,101 @@ static ssize_t node_nodeid_write(struct node *nd, const char *buf, size_t len);
 static ssize_t node_weight_read(struct node *nd, char *buf);
 static ssize_t node_weight_write(struct node *nd, const char *buf, size_t len);
 
+struct cluster {
+	struct config_group group;
+	unsigned int cl_tcp_port;
+	unsigned int cl_buffer_size;
+	unsigned int cl_rsbtbl_size;
+	unsigned int cl_lkbtbl_size;
+	unsigned int cl_dirtbl_size;
+	unsigned int cl_recover_timer;
+	unsigned int cl_toss_secs;
+	unsigned int cl_scan_secs;
+	unsigned int cl_log_debug;
+};
+
+enum {
+	CLUSTER_ATTR_TCP_PORT = 0,
+	CLUSTER_ATTR_BUFFER_SIZE,
+	CLUSTER_ATTR_RSBTBL_SIZE,
+	CLUSTER_ATTR_LKBTBL_SIZE,
+	CLUSTER_ATTR_DIRTBL_SIZE,
+	CLUSTER_ATTR_RECOVER_TIMER,
+	CLUSTER_ATTR_TOSS_SECS,
+	CLUSTER_ATTR_SCAN_SECS,
+	CLUSTER_ATTR_LOG_DEBUG,
+};
+
+struct cluster_attribute {
+	struct configfs_attribute attr;
+	ssize_t (*show)(struct cluster *, char *);
+	ssize_t (*store)(struct cluster *, const char *, size_t);
+};
+
+static ssize_t cluster_set(struct cluster *cl, unsigned int *cl_field,
+			   unsigned int *info_field, int check_zero,
+			   const char *buf, size_t len)
+{
+	unsigned int x;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	x = simple_strtoul(buf, NULL, 0);
+
+	if (check_zero && !x)
+		return -EINVAL;
+
+	*cl_field = x;
+	*info_field = x;
+
+	return len;
+}
+
+#define __CONFIGFS_ATTR(_name,_mode,_read,_write) {                           \
+	.attr   = { .ca_name = __stringify(_name),                            \
+		    .ca_mode = _mode,                                         \
+		    .ca_owner = THIS_MODULE },                                \
+	.show   = _read,                                                      \
+	.store  = _write,                                                     \
+}
+
+#define CLUSTER_ATTR(name, check_zero)                                        \
+static ssize_t name##_write(struct cluster *cl, const char *buf, size_t len)  \
+{                                                                             \
+	return cluster_set(cl, &cl->cl_##name, &dlm_config.ci_##name,         \
+			   check_zero, buf, len);                             \
+}                                                                             \
+static ssize_t name##_read(struct cluster *cl, char *buf)                     \
+{                                                                             \
+	return snprintf(buf, PAGE_SIZE, "%u\n", cl->cl_##name);               \
+}                                                                             \
+static struct cluster_attribute cluster_attr_##name =                         \
+__CONFIGFS_ATTR(name, 0644, name##_read, name##_write)
+
+CLUSTER_ATTR(tcp_port, 1);
+CLUSTER_ATTR(buffer_size, 1);
+CLUSTER_ATTR(rsbtbl_size, 1);
+CLUSTER_ATTR(lkbtbl_size, 1);
+CLUSTER_ATTR(dirtbl_size, 1);
+CLUSTER_ATTR(recover_timer, 1);
+CLUSTER_ATTR(toss_secs, 1);
+CLUSTER_ATTR(scan_secs, 1);
+CLUSTER_ATTR(log_debug, 0);
+
+static struct configfs_attribute *cluster_attrs[] = {
+	[CLUSTER_ATTR_TCP_PORT] = &cluster_attr_tcp_port.attr,
+	[CLUSTER_ATTR_BUFFER_SIZE] = &cluster_attr_buffer_size.attr,
+	[CLUSTER_ATTR_RSBTBL_SIZE] = &cluster_attr_rsbtbl_size.attr,
+	[CLUSTER_ATTR_LKBTBL_SIZE] = &cluster_attr_lkbtbl_size.attr,
+	[CLUSTER_ATTR_DIRTBL_SIZE] = &cluster_attr_dirtbl_size.attr,
+	[CLUSTER_ATTR_RECOVER_TIMER] = &cluster_attr_recover_timer.attr,
+	[CLUSTER_ATTR_TOSS_SECS] = &cluster_attr_toss_secs.attr,
+	[CLUSTER_ATTR_SCAN_SECS] = &cluster_attr_scan_secs.attr,
+	[CLUSTER_ATTR_LOG_DEBUG] = &cluster_attr_log_debug.attr,
+	NULL,
+};
+
 enum {
 	COMM_ATTR_NODEID = 0,
 	COMM_ATTR_LOCAL,
@@ -152,10 +252,6 @@ struct clusters {
 	struct configfs_subsystem subsys;
 };
 
-struct cluster {
-	struct config_group group;
-};
-
 struct spaces {
 	struct config_group ss_group;
 };
@@ -197,6 +293,8 @@ static struct configfs_group_operations clusters_ops = {
 
 static struct configfs_item_operations cluster_ops = {
 	.release = release_cluster,
+	.show_attribute = show_cluster,
+	.store_attribute = store_cluster,
 };
 
 static struct configfs_group_operations spaces_ops = {
@@ -237,6 +335,7 @@ static struct config_item_type clusters_type = {
 
 static struct config_item_type cluster_type = {
 	.ct_item_ops = &cluster_ops,
+	.ct_attrs = cluster_attrs,
 	.ct_owner = THIS_MODULE,
 };
 
@@ -317,6 +416,16 @@ static struct config_group *make_cluster(struct config_group *g,
 	cl->group.default_groups[1] = &cms->cs_group;
 	cl->group.default_groups[2] = NULL;
 
+	cl->cl_tcp_port = dlm_config.ci_tcp_port;
+	cl->cl_buffer_size = dlm_config.ci_buffer_size;
+	cl->cl_rsbtbl_size = dlm_config.ci_rsbtbl_size;
+	cl->cl_lkbtbl_size = dlm_config.ci_lkbtbl_size;
+	cl->cl_dirtbl_size = dlm_config.ci_dirtbl_size;
+	cl->cl_recover_timer = dlm_config.ci_recover_timer;
+	cl->cl_toss_secs = dlm_config.ci_toss_secs;
+	cl->cl_scan_secs = dlm_config.ci_scan_secs;
+	cl->cl_log_debug = dlm_config.ci_log_debug;
+
 	space_list = &sps->ss_group;
 	comm_list = &cms->cs_group;
 	return &cl->group;
@@ -509,6 +618,25 @@ void dlm_config_exit(void)
  * Functions for user space to read/write attributes
  */
 
+static ssize_t show_cluster(struct config_item *i, struct configfs_attribute *a,
+			    char *buf)
+{
+	struct cluster *cl = to_cluster(i);
+	struct cluster_attribute *cla =
+			container_of(a, struct cluster_attribute, attr);
+	return cla->show ? cla->show(cl, buf) : 0;
+}
+
+static ssize_t store_cluster(struct config_item *i,
+			     struct configfs_attribute *a,
+			     const char *buf, size_t len)
+{
+	struct cluster *cl = to_cluster(i);
+	struct cluster_attribute *cla =
+		container_of(a, struct cluster_attribute, attr);
+	return cla->store ? cla->store(cl, buf, len) : -EINVAL;
+}
+
 static ssize_t show_comm(struct config_item *i, struct configfs_attribute *a,
 			 char *buf)
 {
-- 
cgit v1.2.1


From 70831465646b1fef9bf7b51b64409276411e9746 Mon Sep 17 00:00:00 2001
From: Robert Peterson <rpeterso@redhat.com>
Date: Thu, 11 Jan 2007 13:25:00 -0600
Subject: [GFS2] gfs2 knows of directories which it chooses not to display

This is for Red Hat bugzilla bug bz #222302:

Moving a virtual IP from node to node between two NFS-over-GFS2
servers was causing one of the GFS2 servers to become confused and
reference a deleted inode.  The problem was due to vfs dentries that did
not reference the gfs2_dops and therefore didn't call the gfs2 revalidate
code to revalidate a dentry after a directory had been deleted & recreated.
This patch is a crosswrite from a RHEL4 bug found in GFS1 as
bz #190756 and it is against the latest -nmw git tree.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/ops_export.c | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'fs')

diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c
index b4e7b8775315..6ea979c74b6b 100644
--- a/fs/gfs2/ops_export.c
+++ b/fs/gfs2/ops_export.c
@@ -22,6 +22,7 @@
 #include "glock.h"
 #include "glops.h"
 #include "inode.h"
+#include "ops_dentry.h"
 #include "ops_export.h"
 #include "rgrp.h"
 #include "util.h"
@@ -189,6 +190,7 @@ static struct dentry *gfs2_get_parent(struct dentry *child)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	dentry->d_op = &gfs2_dops;
 	return dentry;
 }
 
@@ -269,6 +271,7 @@ out_inode:
 		return ERR_PTR(-ENOMEM);
 	}
 
+	dentry->d_op = &gfs2_dops;
 	return dentry;
 
 fail_rgd:
-- 
cgit v1.2.1


From 03dc6a538e42bcc8d5dfabcee208b639db85a80c Mon Sep 17 00:00:00 2001
From: Adrian Bunk <bunk@stusta.de>
Date: Sat, 13 Jan 2007 10:56:41 +0100
Subject: [GFS2] make gfs2_change_nlink_i() static

On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc3-mm1:
>...
>  git-gfs2-nmw.patch
>...
>  git trees
>...

This patch makes the needlessly globlal gfs2_change_nlink_i() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/inode.c | 88 ++++++++++++++++++++++++++++-----------------------------
 fs/gfs2/inode.h |  1 -
 2 files changed, 44 insertions(+), 45 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 58c2ce785fed..260316954ad7 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -280,6 +280,50 @@ out:
 	return error;
 }
 
+static int gfs2_change_nlink_i(struct gfs2_inode *ip)
+{
+	struct gfs2_sbd *sdp = ip->i_inode.i_sb->s_fs_info;
+	struct gfs2_inode *rindex = GFS2_I(sdp->sd_rindex);
+	struct gfs2_glock *ri_gl = rindex->i_gl;
+	struct gfs2_rgrpd *rgd;
+	struct gfs2_holder ri_gh, rg_gh;
+	int existing, error;
+
+	/* if we come from rename path, we could have the lock already */
+	existing = gfs2_glock_is_locked_by_me(ri_gl);
+	if (!existing) {
+		error = gfs2_rindex_hold(sdp, &ri_gh);
+		if (error)
+			goto out;
+	}
+
+	/* find the matching rgd */
+	error = -EIO;
+	rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr);
+	if (!rgd)
+		goto out_norgrp;
+
+	/*
+	 * Eventually we may want to move rgd(s) to a linked list
+	 * and piggyback the free logic into one of gfs2 daemons
+	 * to gain some performance.
+	 */
+	if (!rgd->rd_gl || !gfs2_glock_is_locked_by_me(rgd->rd_gl)) {
+		error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &rg_gh);
+		if (error)
+			goto out_norgrp;
+
+		gfs2_unlink_di(&ip->i_inode); /* mark inode unlinked */
+		gfs2_glock_dq_uninit(&rg_gh);
+	}
+
+out_norgrp:
+	if (!existing)
+		gfs2_glock_dq_uninit(&ri_gh);
+out:
+	return error;
+}
+
 /**
  * gfs2_change_nlink - Change nlink count on inode
  * @ip: The GFS2 inode
@@ -326,50 +370,6 @@ int gfs2_change_nlink(struct gfs2_inode *ip, int diff)
 	return error;
 }
 
-int gfs2_change_nlink_i(struct gfs2_inode *ip)
-{
-	struct gfs2_sbd *sdp = ip->i_inode.i_sb->s_fs_info;
-	struct gfs2_inode *rindex = GFS2_I(sdp->sd_rindex);
-	struct gfs2_glock *ri_gl = rindex->i_gl;
-	struct gfs2_rgrpd *rgd;
-	struct gfs2_holder ri_gh, rg_gh;
-	int existing, error;
-
-	/* if we come from rename path, we could have the lock already */
-	existing = gfs2_glock_is_locked_by_me(ri_gl);
-	if (!existing) {
-		error = gfs2_rindex_hold(sdp, &ri_gh);
-		if (error)
-			goto out;
-	}
-
-	/* find the matching rgd */
-	error = -EIO;
-	rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr);
-	if (!rgd)
-		goto out_norgrp;
-
-	/*
-	 * Eventually we may want to move rgd(s) to a linked list
-	 * and piggyback the free logic into one of gfs2 daemons
-	 * to gain some performance.
-	 */
-	if (!rgd->rd_gl || !gfs2_glock_is_locked_by_me(rgd->rd_gl)) {
-		error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &rg_gh);
-		if (error)
-			goto out_norgrp;
-
-		gfs2_unlink_di(&ip->i_inode); /* mark inode unlinked */
-		gfs2_glock_dq_uninit(&rg_gh);
-	}
-
-out_norgrp:
-	if (!existing)
-		gfs2_glock_dq_uninit(&ri_gh);
-out:
-	return error;
-}
-
 struct inode *gfs2_lookup_simple(struct inode *dip, const char *name)
 {
 	struct qstr qstr;
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index cee281b5326b..b57f448b15bc 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -40,7 +40,6 @@ int gfs2_inode_refresh(struct gfs2_inode *ip);
 
 int gfs2_dinode_dealloc(struct gfs2_inode *inode);
 int gfs2_change_nlink(struct gfs2_inode *ip, int diff);
-int gfs2_change_nlink_i(struct gfs2_inode *ip);
 struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name,
 			   int is_root, struct nameidata *nd);
 struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name,
-- 
cgit v1.2.1


From 1d6e8131cf0064ef5ab5f3411a82b800afbfadee Mon Sep 17 00:00:00 2001
From: Patrick Caulfield <pcaulfie@redhat.com>
Date: Mon, 15 Jan 2007 14:33:34 +0000
Subject: [DLM] Use workqueues for dlm lowcomms

This patch converts the DLM TCP lowcomms to use workqueues rather than using its
own daemon functions. Simultaneously removing a lot of code and making it more
scalable on multi-processor machines.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lowcomms-sctp.c | 145 +++++++++++++---------------
 fs/dlm/lowcomms-tcp.c  | 254 ++++++++++---------------------------------------
 2 files changed, 115 insertions(+), 284 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lowcomms-sctp.c b/fs/dlm/lowcomms-sctp.c
index 5aeadadd8afd..dc83a9d979b5 100644
--- a/fs/dlm/lowcomms-sctp.c
+++ b/fs/dlm/lowcomms-sctp.c
@@ -72,6 +72,8 @@ struct nodeinfo {
 	struct list_head	writequeue; /* outgoing writequeue_entries */
 	spinlock_t		writequeue_lock;
 	int			nodeid;
+	struct work_struct      swork; /* Send workqueue */
+	struct work_struct      lwork; /* Locking workqueue */
 };
 
 static DEFINE_IDR(nodeinfo_idr);
@@ -96,6 +98,7 @@ struct connection {
 	atomic_t		waiting_requests;
 	struct cbuf		cb;
 	int                     eagain_flag;
+	struct work_struct      work; /* Send workqueue */
 };
 
 /* An entry waiting to be sent */
@@ -137,19 +140,23 @@ static void cbuf_eat(struct cbuf *cb, int n)
 static LIST_HEAD(write_nodes);
 static DEFINE_SPINLOCK(write_nodes_lock);
 
+
 /* Maximum number of incoming messages to process before
  * doing a schedule()
  */
 #define MAX_RX_MSG_COUNT 25
 
-/* Manage daemons */
-static struct task_struct *recv_task;
-static struct task_struct *send_task;
-static DECLARE_WAIT_QUEUE_HEAD(lowcomms_recv_wait);
+/* Work queues */
+static struct workqueue_struct *recv_workqueue;
+static struct workqueue_struct *send_workqueue;
+static struct workqueue_struct *lock_workqueue;
 
 /* The SCTP connection */
 static struct connection sctp_con;
 
+static void process_send_sockets(struct work_struct *work);
+static void process_recv_sockets(struct work_struct *work);
+static void process_lock_request(struct work_struct *work);
 
 static int nodeid_to_addr(int nodeid, struct sockaddr *retaddr)
 {
@@ -222,6 +229,8 @@ static struct nodeinfo *nodeid2nodeinfo(int nodeid, gfp_t alloc)
 	spin_lock_init(&ni->lock);
 	INIT_LIST_HEAD(&ni->writequeue);
 	spin_lock_init(&ni->writequeue_lock);
+	INIT_WORK(&ni->lwork, process_lock_request);
+	INIT_WORK(&ni->swork, process_send_sockets);
 	ni->nodeid = nodeid;
 
 	if (nodeid > max_nodeid)
@@ -249,11 +258,8 @@ static struct nodeinfo *assoc2nodeinfo(sctp_assoc_t assoc)
 /* Data or notification available on socket */
 static void lowcomms_data_ready(struct sock *sk, int count_unused)
 {
-	atomic_inc(&sctp_con.waiting_requests);
 	if (test_and_set_bit(CF_READ_PENDING, &sctp_con.flags))
-		return;
-
-	wake_up_interruptible(&lowcomms_recv_wait);
+		queue_work(recv_workqueue, &sctp_con.work);
 }
 
 
@@ -361,10 +367,10 @@ static void init_failed(void)
 				spin_lock_bh(&write_nodes_lock);
 				list_add_tail(&ni->write_list, &write_nodes);
 				spin_unlock_bh(&write_nodes_lock);
+				queue_work(send_workqueue, &ni->swork);
 			}
 		}
 	}
-	wake_up_process(send_task);
 }
 
 /* Something happened to an association */
@@ -446,8 +452,8 @@ static void process_sctp_notification(struct msghdr *msg, char *buf)
 				spin_lock_bh(&write_nodes_lock);
 				list_add_tail(&ni->write_list, &write_nodes);
 				spin_unlock_bh(&write_nodes_lock);
+				queue_work(send_workqueue, &ni->swork);
 			}
-			wake_up_process(send_task);
 		}
 		break;
 
@@ -580,8 +586,8 @@ static int receive_from_sock(void)
 				spin_lock_bh(&write_nodes_lock);
 				list_add_tail(&ni->write_list, &write_nodes);
 				spin_unlock_bh(&write_nodes_lock);
+				queue_work(send_workqueue, &ni->swork);
 			}
-			wake_up_process(send_task);
 		}
 	}
 
@@ -590,6 +596,7 @@ static int receive_from_sock(void)
 		return 0;
 
 	cbuf_add(&sctp_con.cb, ret);
+	// PJC: TODO: Add to node's workqueue....can we ??
 	ret = dlm_process_incoming_buffer(cpu_to_le32(sinfo->sinfo_ppid),
 					  page_address(sctp_con.rx_page),
 					  sctp_con.cb.base, sctp_con.cb.len,
@@ -820,7 +827,8 @@ void dlm_lowcomms_commit_buffer(void *arg)
 		spin_lock_bh(&write_nodes_lock);
 		list_add_tail(&ni->write_list, &write_nodes);
 		spin_unlock_bh(&write_nodes_lock);
-		wake_up_process(send_task);
+
+		queue_work(send_workqueue, &ni->swork);
 	}
 	return;
 
@@ -1088,101 +1096,75 @@ int dlm_lowcomms_close(int nodeid)
 	return 0;
 }
 
-static int write_list_empty(void)
+// PJC: The work queue function for receiving.
+static void process_recv_sockets(struct work_struct *work)
 {
-	int status;
+	if (test_and_clear_bit(CF_READ_PENDING, &sctp_con.flags)) {
+		int ret;
+		int count = 0;
 
-	spin_lock_bh(&write_nodes_lock);
-	status = list_empty(&write_nodes);
-	spin_unlock_bh(&write_nodes_lock);
+		do {
+			ret = receive_from_sock();
 
-	return status;
+			/* Don't starve out everyone else */
+			if (++count >= MAX_RX_MSG_COUNT) {
+				cond_resched();
+				count = 0;
+			}
+		} while (!kthread_should_stop() && ret >=0);
+	}
+	cond_resched();
 }
 
-static int dlm_recvd(void *data)
+// PJC: the work queue function for sending
+static void process_send_sockets(struct work_struct *work)
 {
-	DECLARE_WAITQUEUE(wait, current);
-
-	while (!kthread_should_stop()) {
-		int count = 0;
-
-		set_current_state(TASK_INTERRUPTIBLE);
-		add_wait_queue(&lowcomms_recv_wait, &wait);
-		if (!test_bit(CF_READ_PENDING, &sctp_con.flags))
-			schedule();
-		remove_wait_queue(&lowcomms_recv_wait, &wait);
-		set_current_state(TASK_RUNNING);
-
-		if (test_and_clear_bit(CF_READ_PENDING, &sctp_con.flags)) {
-			int ret;
-
-			do {
-				ret = receive_from_sock();
-
-				/* Don't starve out everyone else */
-				if (++count >= MAX_RX_MSG_COUNT) {
-					cond_resched();
-					count = 0;
-				}
-			} while (!kthread_should_stop() && ret >=0);
-		}
-		cond_resched();
+	if (sctp_con.eagain_flag) {
+		sctp_con.eagain_flag = 0;
+		refill_write_queue();
 	}
-
-	return 0;
+	process_output_queue();
 }
 
-static int dlm_sendd(void *data)
+// PJC: Process lock requests from a particular node.
+// TODO: can we optimise this out on UP ??
+static void process_lock_request(struct work_struct *work)
 {
-	DECLARE_WAITQUEUE(wait, current);
-
-	add_wait_queue(sctp_con.sock->sk->sk_sleep, &wait);
-
-	while (!kthread_should_stop()) {
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (write_list_empty())
-			schedule();
-		set_current_state(TASK_RUNNING);
-
-		if (sctp_con.eagain_flag) {
-			sctp_con.eagain_flag = 0;
-			refill_write_queue();
-		}
-		process_output_queue();
-	}
-
-	remove_wait_queue(sctp_con.sock->sk->sk_sleep, &wait);
-
-	return 0;
 }
 
 static void daemons_stop(void)
 {
-	kthread_stop(recv_task);
-	kthread_stop(send_task);
+	destroy_workqueue(recv_workqueue);
+	destroy_workqueue(send_workqueue);
+	destroy_workqueue(lock_workqueue);
 }
 
 static int daemons_start(void)
 {
-	struct task_struct *p;
 	int error;
+	recv_workqueue = create_workqueue("dlm_recv");
+	error = IS_ERR(recv_workqueue);
+	if (error) {
+		log_print("can't start dlm_recv %d", error);
+		return error;
+	}
 
-	p = kthread_run(dlm_recvd, NULL, "dlm_recvd");
-	error = IS_ERR(p);
+	send_workqueue = create_singlethread_workqueue("dlm_send");
+	error = IS_ERR(send_workqueue);
 	if (error) {
-		log_print("can't start dlm_recvd %d", error);
+		log_print("can't start dlm_send %d", error);
+		destroy_workqueue(recv_workqueue);
 		return error;
 	}
-	recv_task = p;
 
-	p = kthread_run(dlm_sendd, NULL, "dlm_sendd");
-	error = IS_ERR(p);
+	lock_workqueue = create_workqueue("dlm_rlock");
+	error = IS_ERR(lock_workqueue);
 	if (error) {
-		log_print("can't start dlm_sendd %d", error);
-		kthread_stop(recv_task);
+		log_print("can't start dlm_rlock %d", error);
+		destroy_workqueue(send_workqueue);
+		destroy_workqueue(recv_workqueue);
 		return error;
 	}
-	send_task = p;
 
 	return 0;
 }
@@ -1194,6 +1176,8 @@ int dlm_lowcomms_start(void)
 {
 	int error;
 
+	INIT_WORK(&sctp_con.work, process_recv_sockets);
+
 	error = init_sock();
 	if (error)
 		goto fail_sock;
@@ -1224,4 +1208,3 @@ void dlm_lowcomms_stop(void)
 	for (i = 0; i < dlm_local_count; i++)
 		kfree(dlm_local_addr[i]);
 }
-
diff --git a/fs/dlm/lowcomms-tcp.c b/fs/dlm/lowcomms-tcp.c
index b4fb5783ef8a..86e5f81da7cb 100644
--- a/fs/dlm/lowcomms-tcp.c
+++ b/fs/dlm/lowcomms-tcp.c
@@ -115,6 +115,8 @@ struct connection {
 	atomic_t waiting_requests;
 #define MAX_CONNECT_RETRIES 3
 	struct connection *othercon;
+	struct work_struct rwork; /* Receive workqueue */
+	struct work_struct swork; /* Send workqueue */
 };
 #define sock2con(x) ((struct connection *)(x)->sk_user_data)
 
@@ -131,14 +133,9 @@ struct writequeue_entry {
 
 static struct sockaddr_storage dlm_local_addr;
 
-/* Manage daemons */
-static struct task_struct *recv_task;
-static struct task_struct *send_task;
-
-static wait_queue_t lowcomms_send_waitq_head;
-static DECLARE_WAIT_QUEUE_HEAD(lowcomms_send_waitq);
-static wait_queue_t lowcomms_recv_waitq_head;
-static DECLARE_WAIT_QUEUE_HEAD(lowcomms_recv_waitq);
+/* Work queues */
+static struct workqueue_struct *recv_workqueue;
+static struct workqueue_struct *send_workqueue;
 
 /* An array of pointers to connections, indexed by NODEID */
 static struct connection **connections;
@@ -146,17 +143,8 @@ static DECLARE_MUTEX(connections_lock);
 static struct kmem_cache *con_cache;
 static int conn_array_size;
 
-/* List of sockets that have reads pending */
-static LIST_HEAD(read_sockets);
-static DEFINE_SPINLOCK(read_sockets_lock);
-
-/* List of sockets which have writes pending */
-static LIST_HEAD(write_sockets);
-static DEFINE_SPINLOCK(write_sockets_lock);
-
-/* List of sockets which have connects pending */
-static LIST_HEAD(state_sockets);
-static DEFINE_SPINLOCK(state_sockets_lock);
+static void process_recv_sockets(struct work_struct *work);
+static void process_send_sockets(struct work_struct *work);
 
 static struct connection *nodeid2con(int nodeid, gfp_t allocation)
 {
@@ -189,6 +177,8 @@ static struct connection *nodeid2con(int nodeid, gfp_t allocation)
 		init_rwsem(&con->sock_sem);
 		INIT_LIST_HEAD(&con->writequeue);
 		spin_lock_init(&con->writequeue_lock);
+		INIT_WORK(&con->swork, process_send_sockets);
+		INIT_WORK(&con->rwork, process_recv_sockets);
 
 		connections[nodeid] = con;
 	}
@@ -203,41 +193,22 @@ static void lowcomms_data_ready(struct sock *sk, int count_unused)
 {
 	struct connection *con = sock2con(sk);
 
-	atomic_inc(&con->waiting_requests);
-	if (test_and_set_bit(CF_READ_PENDING, &con->flags))
-		return;
-
-	spin_lock_bh(&read_sockets_lock);
-	list_add_tail(&con->read_list, &read_sockets);
-	spin_unlock_bh(&read_sockets_lock);
-
-	wake_up_interruptible(&lowcomms_recv_waitq);
+	if (!test_and_set_bit(CF_READ_PENDING, &con->flags))
+		queue_work(recv_workqueue, &con->rwork);
 }
 
 static void lowcomms_write_space(struct sock *sk)
 {
 	struct connection *con = sock2con(sk);
 
-	if (test_and_set_bit(CF_WRITE_PENDING, &con->flags))
-		return;
-
-	spin_lock_bh(&write_sockets_lock);
-	list_add_tail(&con->write_list, &write_sockets);
-	spin_unlock_bh(&write_sockets_lock);
-
-	wake_up_interruptible(&lowcomms_send_waitq);
+	if (!test_and_set_bit(CF_WRITE_PENDING, &con->flags))
+		queue_work(send_workqueue, &con->swork);
 }
 
 static inline void lowcomms_connect_sock(struct connection *con)
 {
-	if (test_and_set_bit(CF_CONNECT_PENDING, &con->flags))
-		return;
-
-	spin_lock_bh(&state_sockets_lock);
-	list_add_tail(&con->state_list, &state_sockets);
-	spin_unlock_bh(&state_sockets_lock);
-
-	wake_up_interruptible(&lowcomms_send_waitq);
+	if (!test_and_set_bit(CF_CONNECT_PENDING, &con->flags))
+		queue_work(send_workqueue, &con->swork);
 }
 
 static void lowcomms_state_change(struct sock *sk)
@@ -388,7 +359,8 @@ out:
 	return 0;
 
 out_resched:
-	lowcomms_data_ready(con->sock->sk, 0);
+	if (!test_and_set_bit(CF_READ_PENDING, &con->flags))
+		queue_work(recv_workqueue, &con->rwork);
 	up_read(&con->sock_sem);
 	cond_resched();
 	return 0;
@@ -477,6 +449,8 @@ static int accept_from_sock(struct connection *con)
 			othercon->nodeid = nodeid;
 			othercon->rx_action = receive_from_sock;
 			init_rwsem(&othercon->sock_sem);
+			INIT_WORK(&othercon->swork, process_send_sockets);
+			INIT_WORK(&othercon->rwork, process_recv_sockets);
 			set_bit(CF_IS_OTHERCON, &othercon->flags);
 			newcon->othercon = othercon;
 		}
@@ -498,7 +472,8 @@ static int accept_from_sock(struct connection *con)
 	 * beween processing the accept adding the socket
 	 * to the read_sockets list
 	 */
-	lowcomms_data_ready(newsock->sk, 0);
+	if (!test_and_set_bit(CF_READ_PENDING, &newcon->flags))
+		queue_work(recv_workqueue, &newcon->rwork);
 	up_read(&con->sock_sem);
 
 	return 0;
@@ -757,12 +732,8 @@ void dlm_lowcomms_commit_buffer(void *mh)
 	kunmap(e->page);
 	spin_unlock(&con->writequeue_lock);
 
-	if (test_and_set_bit(CF_WRITE_PENDING, &con->flags) == 0) {
-		spin_lock_bh(&write_sockets_lock);
-		list_add_tail(&con->write_list, &write_sockets);
-		spin_unlock_bh(&write_sockets_lock);
-
-		wake_up_interruptible(&lowcomms_send_waitq);
+	if (!test_and_set_bit(CF_WRITE_PENDING, &con->flags)) {
+		queue_work(send_workqueue, &con->swork);
 	}
 	return;
 
@@ -803,6 +774,7 @@ static void send_to_sock(struct connection *con)
 		offset = e->offset;
 		BUG_ON(len == 0 && e->users == 0);
 		spin_unlock(&con->writequeue_lock);
+		kmap(e->page);
 
 		ret = 0;
 		if (len) {
@@ -884,85 +856,29 @@ out:
 }
 
 /* Look for activity on active sockets */
-static void process_sockets(void)
+static void process_recv_sockets(struct work_struct *work)
 {
-	struct list_head *list;
-	struct list_head *temp;
-	int count = 0;
-
-	spin_lock_bh(&read_sockets_lock);
-	list_for_each_safe(list, temp, &read_sockets) {
-
-		struct connection *con =
-			list_entry(list, struct connection, read_list);
-		list_del(&con->read_list);
-		clear_bit(CF_READ_PENDING, &con->flags);
-
-		spin_unlock_bh(&read_sockets_lock);
+	struct connection *con = container_of(work, struct connection, rwork);
+	int err;
 
-		/* This can reach zero if we are processing requests
-		 * as they come in.
-		 */
-		if (atomic_read(&con->waiting_requests) == 0) {
-			spin_lock_bh(&read_sockets_lock);
-			continue;
-		}
-
-		do {
-			con->rx_action(con);
-
-			/* Don't starve out everyone else */
-			if (++count >= MAX_RX_MSG_COUNT) {
-				cond_resched();
-				count = 0;
-			}
-
-		} while (!atomic_dec_and_test(&con->waiting_requests) &&
-			 !kthread_should_stop());
-
-		spin_lock_bh(&read_sockets_lock);
-	}
-	spin_unlock_bh(&read_sockets_lock);
+	clear_bit(CF_READ_PENDING, &con->flags);
+	do {
+		err = con->rx_action(con);
+	} while (!err);
 }
 
-/* Try to send any messages that are pending
- */
-static void process_output_queue(void)
-{
-	struct list_head *list;
-	struct list_head *temp;
-
-	spin_lock_bh(&write_sockets_lock);
-	list_for_each_safe(list, temp, &write_sockets) {
-		struct connection *con =
-			list_entry(list, struct connection, write_list);
-		clear_bit(CF_WRITE_PENDING, &con->flags);
-		list_del(&con->write_list);
-
-		spin_unlock_bh(&write_sockets_lock);
-		send_to_sock(con);
-		spin_lock_bh(&write_sockets_lock);
-	}
-	spin_unlock_bh(&write_sockets_lock);
-}
 
-static void process_state_queue(void)
+static void process_send_sockets(struct work_struct *work)
 {
-	struct list_head *list;
-	struct list_head *temp;
-
-	spin_lock_bh(&state_sockets_lock);
-	list_for_each_safe(list, temp, &state_sockets) {
-		struct connection *con =
-			list_entry(list, struct connection, state_list);
-		list_del(&con->state_list);
-		clear_bit(CF_CONNECT_PENDING, &con->flags);
-		spin_unlock_bh(&state_sockets_lock);
+	struct connection *con = container_of(work, struct connection, swork);
 
+	if (test_and_clear_bit(CF_CONNECT_PENDING, &con->flags)) {
 		connect_to_sock(con);
-		spin_lock_bh(&state_sockets_lock);
 	}
-	spin_unlock_bh(&state_sockets_lock);
+
+	if (test_and_clear_bit(CF_WRITE_PENDING, &con->flags)) {
+		send_to_sock(con);
+	}
 }
 
 
@@ -979,97 +895,29 @@ static void clean_writequeues(void)
 	}
 }
 
-static int read_list_empty(void)
-{
-	int status;
-
-	spin_lock_bh(&read_sockets_lock);
-	status = list_empty(&read_sockets);
-	spin_unlock_bh(&read_sockets_lock);
-
-	return status;
-}
-
-/* DLM Transport comms receive daemon */
-static int dlm_recvd(void *data)
+static void work_stop(void)
 {
-	init_waitqueue_entry(&lowcomms_recv_waitq_head, current);
-	add_wait_queue(&lowcomms_recv_waitq, &lowcomms_recv_waitq_head);
-
-	while (!kthread_should_stop()) {
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (read_list_empty())
-			schedule();
-		set_current_state(TASK_RUNNING);
-
-		process_sockets();
-	}
-
-	return 0;
+	destroy_workqueue(recv_workqueue);
+	destroy_workqueue(send_workqueue);
 }
 
-static int write_and_state_lists_empty(void)
+static int work_start(void)
 {
-	int status;
-
-	spin_lock_bh(&write_sockets_lock);
-	status = list_empty(&write_sockets);
-	spin_unlock_bh(&write_sockets_lock);
-
-	spin_lock_bh(&state_sockets_lock);
-	if (list_empty(&state_sockets) == 0)
-		status = 0;
-	spin_unlock_bh(&state_sockets_lock);
-
-	return status;
-}
-
-/* DLM Transport send daemon */
-static int dlm_sendd(void *data)
-{
-	init_waitqueue_entry(&lowcomms_send_waitq_head, current);
-	add_wait_queue(&lowcomms_send_waitq, &lowcomms_send_waitq_head);
-
-	while (!kthread_should_stop()) {
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (write_and_state_lists_empty())
-			schedule();
-		set_current_state(TASK_RUNNING);
-
-		process_state_queue();
-		process_output_queue();
-	}
-
-	return 0;
-}
-
-static void daemons_stop(void)
-{
-	kthread_stop(recv_task);
-	kthread_stop(send_task);
-}
-
-static int daemons_start(void)
-{
-	struct task_struct *p;
 	int error;
-
-	p = kthread_run(dlm_recvd, NULL, "dlm_recvd");
-	error = IS_ERR(p);
+	recv_workqueue = create_workqueue("dlm_recv");
+	error = IS_ERR(recv_workqueue);
 	if (error) {
-		log_print("can't start dlm_recvd %d", error);
+		log_print("can't start dlm_recv %d", error);
 		return error;
 	}
-	recv_task = p;
 
-	p = kthread_run(dlm_sendd, NULL, "dlm_sendd");
-	error = IS_ERR(p);
+	send_workqueue = create_singlethread_workqueue("dlm_send");
+	error = IS_ERR(send_workqueue);
 	if (error) {
-		log_print("can't start dlm_sendd %d", error);
-		kthread_stop(recv_task);
+		log_print("can't start dlm_send %d", error);
+		destroy_workqueue(recv_workqueue);
 		return error;
 	}
-	send_task = p;
 
 	return 0;
 }
@@ -1086,7 +934,7 @@ void dlm_lowcomms_stop(void)
 			connections[i]->flags |= 0xFF;
 	}
 
-	daemons_stop();
+	work_stop();
 	clean_writequeues();
 
 	for (i = 0; i < conn_array_size; i++) {
@@ -1138,7 +986,7 @@ int dlm_lowcomms_start(void)
 	if (error)
 		goto fail_unlisten;
 
-	error = daemons_start();
+	error = work_start();
 	if (error)
 		goto fail_unlisten;
 
-- 
cgit v1.2.1


From a1bc86e6bddd34362ca08a3a4d898eb4b5c15215 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Mon, 15 Jan 2007 10:34:52 -0600
Subject: [DLM] fix user unlocking

When a user process exits, we clear all the locks it holds.  There is a
problem, though, with locks that the process had begun unlocking before it
exited.  We couldn't find the lkb's that were in the process of being
unlocked remotely, to flag that they are DEAD.  To solve this, we move
lkb's being unlocked onto a new list in the per-process structure that
tracks what locks the process is holding.  We can then go through this
list to flag the necessary lkb's when clearing locks for a process when it
exits.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/dlm_internal.h |  1 +
 fs/dlm/lock.c         | 30 ++++++++++++++++++------------
 fs/dlm/user.c         |  9 +++++++++
 3 files changed, 28 insertions(+), 12 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index ee993c5c2307..61d93201e1b2 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -526,6 +526,7 @@ struct dlm_user_proc {
 	spinlock_t		asts_spin;
 	struct list_head	locks;
 	spinlock_t		locks_spin;
+	struct list_head	unlocking;
 	wait_queue_head_t	wait;
 };
 
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 5bac9827ded3..6ad2b8eb96a5 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3772,12 +3772,10 @@ int dlm_user_unlock(struct dlm_ls *ls, struct dlm_user_args *ua_tmp,
 		goto out_put;
 
 	spin_lock(&ua->proc->locks_spin);
-	list_del_init(&lkb->lkb_ownqueue);
+	/* dlm_user_add_ast() may have already taken lkb off the proc list */
+	if (!list_empty(&lkb->lkb_ownqueue))
+		list_move(&lkb->lkb_ownqueue, &ua->proc->unlocking);
 	spin_unlock(&ua->proc->locks_spin);
-
-	/* this removes the reference for the proc->locks list added by
-	   dlm_user_request */
-	unhold_lkb(lkb);
  out_put:
 	dlm_put_lkb(lkb);
  out:
@@ -3817,9 +3815,8 @@ int dlm_user_cancel(struct dlm_ls *ls, struct dlm_user_args *ua_tmp,
 	/* this lkb was removed from the WAITING queue */
 	if (lkb->lkb_grmode == DLM_LOCK_IV) {
 		spin_lock(&ua->proc->locks_spin);
-		list_del_init(&lkb->lkb_ownqueue);
+		list_move(&lkb->lkb_ownqueue, &ua->proc->unlocking);
 		spin_unlock(&ua->proc->locks_spin);
-		unhold_lkb(lkb);
 	}
  out_put:
 	dlm_put_lkb(lkb);
@@ -3880,11 +3877,6 @@ void dlm_clear_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc)
 	mutex_lock(&ls->ls_clear_proc_locks);
 
 	list_for_each_entry_safe(lkb, safe, &proc->locks, lkb_ownqueue) {
-		if (lkb->lkb_ast_type) {
-			list_del(&lkb->lkb_astqueue);
-			unhold_lkb(lkb);
-		}
-
 		list_del_init(&lkb->lkb_ownqueue);
 
 		if (lkb->lkb_exflags & DLM_LKF_PERSISTENT) {
@@ -3901,6 +3893,20 @@ void dlm_clear_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc)
 
 		dlm_put_lkb(lkb);
 	}
+
+	/* in-progress unlocks */
+	list_for_each_entry_safe(lkb, safe, &proc->unlocking, lkb_ownqueue) {
+		list_del_init(&lkb->lkb_ownqueue);
+		lkb->lkb_flags |= DLM_IFL_DEAD;
+		dlm_put_lkb(lkb);
+	}
+
+	list_for_each_entry_safe(lkb, safe, &proc->asts, lkb_astqueue) {
+		list_del(&lkb->lkb_astqueue);
+		dlm_put_lkb(lkb);
+	}
+
 	mutex_unlock(&ls->ls_clear_proc_locks);
 	unlock_recovery(ls);
 }
+
diff --git a/fs/dlm/user.c b/fs/dlm/user.c
index c37e93e4f2df..d378b7fe2a1e 100644
--- a/fs/dlm/user.c
+++ b/fs/dlm/user.c
@@ -180,6 +180,14 @@ void dlm_user_add_ast(struct dlm_lkb *lkb, int type)
 	    ua->lksb.sb_status == -EAGAIN && !list_empty(&lkb->lkb_ownqueue))
 		remove_ownqueue = 1;
 
+	/* unlocks or cancels of waiting requests need to be removed from the
+	   proc's unlocking list, again there must be a better way...  */
+
+	if (ua->lksb.sb_status == -DLM_EUNLOCK ||
+	    (ua->lksb.sb_status == -DLM_ECANCEL &&
+	     lkb->lkb_grmode == DLM_LOCK_IV))
+		remove_ownqueue = 1;
+
 	/* We want to copy the lvb to userspace when the completion
 	   ast is read if the status is 0, the lock has an lvb and
 	   lvb_ops says we should.  We could probably have set_lvb_lock()
@@ -523,6 +531,7 @@ static int device_open(struct inode *inode, struct file *file)
 	proc->lockspace = ls->ls_local_handle;
 	INIT_LIST_HEAD(&proc->asts);
 	INIT_LIST_HEAD(&proc->locks);
+	INIT_LIST_HEAD(&proc->unlocking);
 	spin_lock_init(&proc->asts_spin);
 	spin_lock_init(&proc->locks_spin);
 	init_waitqueue_head(&proc->wait);
-- 
cgit v1.2.1


From 222d396092acc11b4af03bede309aa066945e920 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Mon, 15 Jan 2007 10:28:22 -0600
Subject: [DLM] fix master recovery

If master recovery happens on an rsb in one recovery sequence, then that
sequence is aborted before lock recovery happens, then in the next
sequence, we rely on the previous master recovery (which may now be
invalid due to another node ignoring a lookup result) and go on do to the
lock recovery where we get stuck due to an invalid master value.

 recovery cycle begins: master of rsb X has left
 nodes A and B send node C an rcom lookup for X to find the new master
 C gets lookup from B first, sets B as new master, and sends reply back to B
 C gets lookup from A next, and sends reply back to A saying B is master
 A gets lookup reply from C and sets B as the new master in the rsb
 recovery cycle on A, B and C is aborted to start a new recovery
 B gets lookup reply from C and ignores it since there's a new recovery
 recovery cycle begins: some other node has joined
 B doesn't think it's the master of X so it doesn't rebuild it in the directory
 C looks up the master of X, no one is master, so it becomes new master
 B looks up the master of X, finds it's C
 A believes that B is the master of X, so it sends its lock to B
 B sends an error back to A
 A resends
 this repeats forever, the incorrect master value on A is never corrected

The fix is to do master recovery on an rsb that still has the NEW_MASTER
flag set from an earlier recovery sequence, and therefore didn't complete
lock recovery.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/recover.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index a7fa4cb6cd16..c2cc7694cd16 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -397,7 +397,9 @@ int dlm_recover_masters(struct dlm_ls *ls)
 
 		if (dlm_no_directory(ls))
 			count += recover_master_static(r);
-		else if (!is_master(r) && dlm_is_removed(ls, r->res_nodeid)) {
+		else if (!is_master(r) &&
+			 (dlm_is_removed(ls, r->res_nodeid) ||
+			  rsb_flag(r, RSB_NEW_MASTER))) {
 			recover_master(r);
 			count++;
 		}
-- 
cgit v1.2.1


From a8d638e30e768adc6956541f79f7bf05139ba475 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Mon, 15 Jan 2007 13:52:17 +0000
Subject: [GFS2] Add writepages for "data=writeback" mounts

It occurred to me that although a gfs2 specific writepages for ordered
writes and journaled data would be tricky, by hooking writepages only
for "data=writeback" mounts we could take advantage of not needing
buffer heads (we don't use them on the read side, nor have we for some
time) and create much larger I/Os for the block layer.

Using blktrace both before and after, its possible to see that for large
I/Os, most of the requests generated through writepages are now 1024
sectors after this patch is applied as opposed to 8 sectors before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/ops_address.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

(limited to 'fs')

diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index 37bfeb961eb3..9ddf9753a552 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -16,6 +16,7 @@
 #include <linux/pagevec.h>
 #include <linux/mpage.h>
 #include <linux/fs.h>
+#include <linux/writeback.h>
 #include <linux/gfs2_ondisk.h>
 #include <linux/lm_interface.h>
 
@@ -156,6 +157,31 @@ out_ignore:
 	return 0;
 }
 
+/**
+ * gfs2_writepages - Write a bunch of dirty pages back to disk
+ * @mapping: The mapping to write
+ * @wbc: Write-back control
+ *
+ * For journaled files and/or ordered writes this just falls back to the
+ * kernel's default writepages path for now. We will probably want to change
+ * that eventually (i.e. when we look at allocate on flush).
+ *
+ * For the data=writeback case though we can already ignore buffer heads
+ * and write whole extents at once. This is a big reduction in the
+ * number of I/O requests we send and the bmap calls we make in this case.
+ */
+int gfs2_writepages(struct address_space *mapping, struct writeback_control *wbc)
+{
+	struct inode *inode = mapping->host;
+	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+	if (sdp->sd_args.ar_data == GFS2_DATA_WRITEBACK && !gfs2_is_jdata(ip))
+		return mpage_writepages(mapping, wbc, gfs2_get_block_noalloc);
+
+	return generic_writepages(mapping, wbc);
+}
+
 /**
  * stuffed_readpage - Fill in a Linux page with stuffed file data
  * @ip: the inode
@@ -757,6 +783,7 @@ out:
 
 const struct address_space_operations gfs2_file_aops = {
 	.writepage = gfs2_writepage,
+	.writepages = gfs2_writepages,
 	.readpage = gfs2_readpage,
 	.readpages = gfs2_readpages,
 	.sync_page = block_sync_page,
-- 
cgit v1.2.1


From 3699e3a44bf56e0cd58c97e8655f375ad9b65d9d Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Wed, 17 Jan 2007 15:09:20 +0000
Subject: [GFS2] Clean up/speed up readdir

This removes the extra filldir callback which gfs2 was using to
enclose an attempt at readahead for inodes during readdir. The
code was too complicated and also hurts performance badly in the
case that the getdents64/readdir call isn't being followed by
stat() and it wasn't even getting it right all the time when it
was.

As a result, on my test box an "ls" of a directory containing 250000
files fell from about 7mins (freshly mounted, so nothing cached) to
between about 15 to 25 seconds. When the directory content was cached,
the time taken fell from about 3mins to about 4 or 5 seconds.

Interestingly in the cached case, running "ls -l" once reduced the time
taken for subsequent runs of "ls" to about 6 secs even without this
patch. Now it turns out that there was a special case of glocks being
used for prefetching the metadata, but because of the timeouts for these
locks (set to 10 secs) the metadata was being timed out before it was
being used and this the prefetch code was constantly trying to prefetch
the same data over and over.

Calling "ls -l" meant that the inodes were brought into memory and once
the inodes are cached, the glocks are not disposed of until the inodes
are pushed out of the cache, thus extending the lifetime of the glocks,
and thus bringing down the time for subsequent runs of "ls"
considerably.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/dir.c        | 15 +++++-------
 fs/gfs2/dir.h        | 21 ++---------------
 fs/gfs2/glock.c      | 66 ----------------------------------------------------
 fs/gfs2/glock.h      |  4 ----
 fs/gfs2/incore.h     |  2 --
 fs/gfs2/ops_export.c |  9 ++++---
 fs/gfs2/ops_file.c   | 52 +----------------------------------------
 fs/gfs2/super.c      |  1 -
 fs/gfs2/sys.c        |  2 --
 9 files changed, 13 insertions(+), 159 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 0fdcb7713cd9..0eceb051084f 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -1198,12 +1198,11 @@ static int compare_dents(const void *a, const void *b)
  */
 
 static int do_filldir_main(struct gfs2_inode *dip, u64 *offset,
-			   void *opaque, gfs2_filldir_t filldir,
+			   void *opaque, filldir_t filldir,
 			   const struct gfs2_dirent **darr, u32 entries,
 			   int *copied)
 {
 	const struct gfs2_dirent *dent, *dent_next;
-	struct gfs2_inum_host inum;
 	u64 off, off_next;
 	unsigned int x, y;
 	int run = 0;
@@ -1240,11 +1239,9 @@ static int do_filldir_main(struct gfs2_inode *dip, u64 *offset,
 			*offset = off;
 		}
 
-		gfs2_inum_in(&inum, (char *)&dent->de_inum);
-
 		error = filldir(opaque, (const char *)(dent + 1),
 				be16_to_cpu(dent->de_name_len),
-				off, &inum,
+				off, be64_to_cpu(dent->de_inum.no_addr),
 				be16_to_cpu(dent->de_type));
 		if (error)
 			return 1;
@@ -1262,8 +1259,8 @@ static int do_filldir_main(struct gfs2_inode *dip, u64 *offset,
 }
 
 static int gfs2_dir_read_leaf(struct inode *inode, u64 *offset, void *opaque,
-			      gfs2_filldir_t filldir, int *copied,
-			      unsigned *depth, u64 leaf_no)
+			      filldir_t filldir, int *copied, unsigned *depth,
+			      u64 leaf_no)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct buffer_head *bh;
@@ -1343,7 +1340,7 @@ out:
  */
 
 static int dir_e_read(struct inode *inode, u64 *offset, void *opaque,
-		      gfs2_filldir_t filldir)
+		      filldir_t filldir)
 {
 	struct gfs2_inode *dip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
@@ -1402,7 +1399,7 @@ out:
 }
 
 int gfs2_dir_read(struct inode *inode, u64 *offset, void *opaque,
-		  gfs2_filldir_t filldir)
+		  filldir_t filldir)
 {
 	struct gfs2_inode *dip = GFS2_I(inode);
 	struct dirent_gather g;
diff --git a/fs/gfs2/dir.h b/fs/gfs2/dir.h
index b21b33668a5b..48fe89046bba 100644
--- a/fs/gfs2/dir.h
+++ b/fs/gfs2/dir.h
@@ -16,30 +16,13 @@ struct inode;
 struct gfs2_inode;
 struct gfs2_inum;
 
-/**
- * gfs2_filldir_t - Report a directory entry to the caller of gfs2_dir_read()
- * @opaque: opaque data used by the function
- * @name: the name of the directory entry
- * @length: the length of the name
- * @offset: the entry's offset in the directory
- * @inum: the inode number the entry points to
- * @type: the type of inode the entry points to
- *
- * Returns: 0 on success, 1 if buffer full
- */
-
-typedef int (*gfs2_filldir_t) (void *opaque,
-			      const char *name, unsigned int length,
-			      u64 offset,
-			      struct gfs2_inum_host *inum, unsigned int type);
-
 int gfs2_dir_search(struct inode *dir, const struct qstr *filename,
 		    struct gfs2_inum_host *inum, unsigned int *type);
 int gfs2_dir_add(struct inode *inode, const struct qstr *filename,
 		 const struct gfs2_inum_host *inum, unsigned int type);
 int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *filename);
-int gfs2_dir_read(struct inode *inode, u64 * offset, void *opaque,
-		  gfs2_filldir_t filldir);
+int gfs2_dir_read(struct inode *inode, u64 *offset, void *opaque,
+		  filldir_t filldir);
 int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename,
 		   struct gfs2_inum_host *new_inum, unsigned int new_type);
 
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 438146904b58..fb1960b7fdde 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -971,8 +971,6 @@ static void drop_bh(struct gfs2_glock *gl, unsigned int ret)
 	const struct gfs2_glock_operations *glops = gl->gl_ops;
 	struct gfs2_holder *gh = gl->gl_req_gh;
 
-	clear_bit(GLF_PREFETCH, &gl->gl_flags);
-
 	gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
 	gfs2_assert_warn(sdp, queue_empty(gl, &gl->gl_holders));
 	gfs2_assert_warn(sdp, !ret);
@@ -1227,8 +1225,6 @@ restart:
 		}
 	}
 
-	clear_bit(GLF_PREFETCH, &gl->gl_flags);
-
 	return error;
 }
 
@@ -1320,36 +1316,6 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
 	spin_unlock(&gl->gl_spin);
 }
 
-/**
- * gfs2_glock_prefetch - Try to prefetch a glock
- * @gl: the glock
- * @state: the state to prefetch in
- * @flags: flags passed to go_xmote_th()
- *
- */
-
-static void gfs2_glock_prefetch(struct gfs2_glock *gl, unsigned int state,
-				int flags)
-{
-	const struct gfs2_glock_operations *glops = gl->gl_ops;
-
-	spin_lock(&gl->gl_spin);
-
-	if (test_bit(GLF_LOCK, &gl->gl_flags) || !list_empty(&gl->gl_holders) ||
-	    !list_empty(&gl->gl_waiters1) || !list_empty(&gl->gl_waiters2) ||
-	    !list_empty(&gl->gl_waiters3) ||
-	    relaxed_state_ok(gl->gl_state, state, flags)) {
-		spin_unlock(&gl->gl_spin);
-		return;
-	}
-
-	set_bit(GLF_PREFETCH, &gl->gl_flags);
-	set_bit(GLF_LOCK, &gl->gl_flags);
-	spin_unlock(&gl->gl_spin);
-
-	glops->go_xmote_th(gl, state, flags);
-}
-
 static void greedy_work(struct work_struct *work)
 {
 	struct greedy *gr = container_of(work, struct greedy, gr_work.work);
@@ -1617,34 +1583,6 @@ void gfs2_glock_dq_uninit_m(unsigned int num_gh, struct gfs2_holder *ghs)
 		gfs2_glock_dq_uninit(&ghs[x]);
 }
 
-/**
- * gfs2_glock_prefetch_num - prefetch a glock based on lock number
- * @sdp: the filesystem
- * @number: the lock number
- * @glops: the glock operations for the type of glock
- * @state: the state to acquire the glock in
- * @flags: modifier flags for the aquisition
- *
- * Returns: errno
- */
-
-void gfs2_glock_prefetch_num(struct gfs2_sbd *sdp, u64 number,
-			     const struct gfs2_glock_operations *glops,
-			     unsigned int state, int flags)
-{
-	struct gfs2_glock *gl;
-	int error;
-
-	if (atomic_read(&sdp->sd_reclaim_count) <
-	    gfs2_tune_get(sdp, gt_reclaim_limit)) {
-		error = gfs2_glock_get(sdp, number, glops, CREATE, &gl);
-		if (!error) {
-			gfs2_glock_prefetch(gl, state, flags);
-			gfs2_glock_put(gl);
-		}
-	}
-}
-
 /**
  * gfs2_lvb_hold - attach a LVB from a glock
  * @gl: The glock in question
@@ -1781,15 +1719,11 @@ void gfs2_glock_cb(void *cb_data, unsigned int type, void *data)
 
 static int demote_ok(struct gfs2_glock *gl)
 {
-	struct gfs2_sbd *sdp = gl->gl_sbd;
 	const struct gfs2_glock_operations *glops = gl->gl_ops;
 	int demote = 1;
 
 	if (test_bit(GLF_STICKY, &gl->gl_flags))
 		demote = 0;
-	else if (test_bit(GLF_PREFETCH, &gl->gl_flags))
-		demote = time_after_eq(jiffies, gl->gl_stamp +
-				    gfs2_tune_get(sdp, gt_prefetch_secs) * HZ);
 	else if (glops->go_demote_ok)
 		demote = glops->go_demote_ok(gl);
 
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index fb39108fc05c..bde02a7061ec 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -103,10 +103,6 @@ int gfs2_glock_nq_m(unsigned int num_gh, struct gfs2_holder *ghs);
 void gfs2_glock_dq_m(unsigned int num_gh, struct gfs2_holder *ghs);
 void gfs2_glock_dq_uninit_m(unsigned int num_gh, struct gfs2_holder *ghs);
 
-void gfs2_glock_prefetch_num(struct gfs2_sbd *sdp, u64 number,
-			     const struct gfs2_glock_operations *glops,
-			     unsigned int state, int flags);
-
 /**
  * gfs2_glock_nq_init - intialize a holder and enqueue it on a glock
  * @gl: the glock
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 734421edae85..807587050384 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -147,7 +147,6 @@ struct gfs2_holder {
 enum {
 	GLF_LOCK		= 1,
 	GLF_STICKY		= 2,
-	GLF_PREFETCH		= 3,
 	GLF_DIRTY		= 5,
 	GLF_SKIP_WAITERS2	= 6,
 	GLF_GREEDY		= 7,
@@ -425,7 +424,6 @@ struct gfs2_tune {
 	unsigned int gt_complain_secs;
 	unsigned int gt_reclaim_limit; /* Max num of glocks in reclaim list */
 	unsigned int gt_entries_per_readdir;
-	unsigned int gt_prefetch_secs; /* Usage window for prefetched glocks */
 	unsigned int gt_greedy_default;
 	unsigned int gt_greedy_quantum;
 	unsigned int gt_greedy_max;
diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c
index 6ea979c74b6b..fbf55063928f 100644
--- a/fs/gfs2/ops_export.c
+++ b/fs/gfs2/ops_export.c
@@ -113,13 +113,12 @@ struct get_name_filldir {
 	char *name;
 };
 
-static int get_name_filldir(void *opaque, const char *name, unsigned int length,
-			    u64 offset, struct gfs2_inum_host *inum,
-			    unsigned int type)
+static int get_name_filldir(void *opaque, const char *name, int length,
+			    loff_t offset, u64 inum, unsigned int type)
 {
-	struct get_name_filldir *gnfd = (struct get_name_filldir *)opaque;
+	struct get_name_filldir *gnfd = opaque;
 
-	if (!gfs2_inum_equal(inum, &gnfd->inum))
+	if (inum != gnfd->inum.no_addr)
 		return 0;
 
 	memcpy(gnfd->name, name, length);
diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
index faa07e4b97d0..c996aa739a05 100644
--- a/fs/gfs2/ops_file.c
+++ b/fs/gfs2/ops_file.c
@@ -43,15 +43,6 @@
 #include "util.h"
 #include "eaops.h"
 
-/* For regular, non-NFS */
-struct filldir_reg {
-	struct gfs2_sbd *fdr_sbd;
-	int fdr_prefetch;
-
-	filldir_t fdr_filldir;
-	void *fdr_opaque;
-};
-
 /*
  * Most fields left uninitialised to catch anybody who tries to
  * use them. f_flags set to prevent file_accessed() from touching
@@ -127,41 +118,6 @@ static loff_t gfs2_llseek(struct file *file, loff_t offset, int origin)
 	return error;
 }
 
-/**
- * filldir_func - Report a directory entry to the caller of gfs2_dir_read()
- * @opaque: opaque data used by the function
- * @name: the name of the directory entry
- * @length: the length of the name
- * @offset: the entry's offset in the directory
- * @inum: the inode number the entry points to
- * @type: the type of inode the entry points to
- *
- * Returns: 0 on success, 1 if buffer full
- */
-
-static int filldir_func(void *opaque, const char *name, unsigned int length,
-			u64 offset, struct gfs2_inum_host *inum,
-			unsigned int type)
-{
-	struct filldir_reg *fdr = (struct filldir_reg *)opaque;
-	struct gfs2_sbd *sdp = fdr->fdr_sbd;
-	int error;
-
-	error = fdr->fdr_filldir(fdr->fdr_opaque, name, length, offset,
-				 inum->no_addr, type);
-	if (error)
-		return 1;
-
-	if (fdr->fdr_prefetch && !(length == 1 && *name == '.')) {
-		gfs2_glock_prefetch_num(sdp, inum->no_addr, &gfs2_inode_glops,
-				       LM_ST_SHARED, LM_FLAG_TRY | LM_FLAG_ANY);
-		gfs2_glock_prefetch_num(sdp, inum->no_addr, &gfs2_iopen_glops,
-				       LM_ST_SHARED, LM_FLAG_TRY);
-	}
-
-	return 0;
-}
-
 /**
  * gfs2_readdir - Read directory entries from a directory
  * @file: The directory to read from
@@ -175,16 +131,10 @@ static int gfs2_readdir(struct file *file, void *dirent, filldir_t filldir)
 {
 	struct inode *dir = file->f_mapping->host;
 	struct gfs2_inode *dip = GFS2_I(dir);
-	struct filldir_reg fdr;
 	struct gfs2_holder d_gh;
 	u64 offset = file->f_pos;
 	int error;
 
-	fdr.fdr_sbd = GFS2_SB(dir);
-	fdr.fdr_prefetch = 1;
-	fdr.fdr_filldir = filldir;
-	fdr.fdr_opaque = dirent;
-
 	gfs2_holder_init(dip->i_gl, LM_ST_SHARED, GL_ATIME, &d_gh);
 	error = gfs2_glock_nq_atime(&d_gh);
 	if (error) {
@@ -192,7 +142,7 @@ static int gfs2_readdir(struct file *file, void *dirent, filldir_t filldir)
 		return error;
 	}
 
-	error = gfs2_dir_read(dir, &offset, &fdr, filldir_func);
+	error = gfs2_dir_read(dir, &offset, dirent, filldir);
 
 	gfs2_glock_dq_uninit(&d_gh);
 
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 43a24f2e5905..100852a61b5d 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -78,7 +78,6 @@ void gfs2_tune_init(struct gfs2_tune *gt)
 	gt->gt_complain_secs = 10;
 	gt->gt_reclaim_limit = 5000;
 	gt->gt_entries_per_readdir = 32;
-	gt->gt_prefetch_secs = 10;
 	gt->gt_greedy_default = HZ / 10;
 	gt->gt_greedy_quantum = HZ / 40;
 	gt->gt_greedy_max = HZ / 4;
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 983eaf1e06be..cd28f08b244c 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -436,7 +436,6 @@ TUNE_ATTR(atime_quantum, 0);
 TUNE_ATTR(max_readahead, 0);
 TUNE_ATTR(complain_secs, 0);
 TUNE_ATTR(reclaim_limit, 0);
-TUNE_ATTR(prefetch_secs, 0);
 TUNE_ATTR(statfs_slow, 0);
 TUNE_ATTR(new_files_jdata, 0);
 TUNE_ATTR(new_files_directio, 0);
@@ -465,7 +464,6 @@ static struct attribute *tune_attrs[] = {
 	&tune_attr_max_readahead.attr,
 	&tune_attr_complain_secs.attr,
 	&tune_attr_reclaim_limit.attr,
-	&tune_attr_prefetch_secs.attr,
 	&tune_attr_statfs_slow.attr,
 	&tune_attr_quota_simul_sync.attr,
 	&tune_attr_quota_cache_secs.attr,
-- 
cgit v1.2.1


From 330005c2b23e71e54931913e9b63d1712a19e444 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Mon, 15 Jan 2007 16:36:26 -0500
Subject: [GFS2] Remove max_atomic_write tunable

This removes an unused sysfs tunable parameter.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/incore.h | 1 -
 fs/gfs2/super.c  | 1 -
 fs/gfs2/sys.c    | 2 --
 3 files changed, 4 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 807587050384..9114851ac53e 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -417,7 +417,6 @@ struct gfs2_tune {
 	unsigned int gt_atime_quantum; /* Min secs between atime updates */
 	unsigned int gt_new_files_jdata;
 	unsigned int gt_new_files_directio;
-	unsigned int gt_max_atomic_write; /* Split big writes into this size */
 	unsigned int gt_max_readahead; /* Max bytes to read-ahead from disk */
 	unsigned int gt_lockdump_size;
 	unsigned int gt_stall_secs; /* Detects trouble! */
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 100852a61b5d..3e17dcf35a30 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -71,7 +71,6 @@ void gfs2_tune_init(struct gfs2_tune *gt)
 	gt->gt_atime_quantum = 3600;
 	gt->gt_new_files_jdata = 0;
 	gt->gt_new_files_directio = 0;
-	gt->gt_max_atomic_write = 4 << 20;
 	gt->gt_max_readahead = 1 << 18;
 	gt->gt_lockdump_size = 131072;
 	gt->gt_stall_secs = 600;
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index cd28f08b244c..11206118d983 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -441,7 +441,6 @@ TUNE_ATTR(new_files_jdata, 0);
 TUNE_ATTR(new_files_directio, 0);
 TUNE_ATTR(quota_simul_sync, 1);
 TUNE_ATTR(quota_cache_secs, 1);
-TUNE_ATTR(max_atomic_write, 1);
 TUNE_ATTR(stall_secs, 1);
 TUNE_ATTR(greedy_default, 1);
 TUNE_ATTR(greedy_quantum, 1);
@@ -467,7 +466,6 @@ static struct attribute *tune_attrs[] = {
 	&tune_attr_statfs_slow.attr,
 	&tune_attr_quota_simul_sync.attr,
 	&tune_attr_quota_cache_secs.attr,
-	&tune_attr_max_atomic_write.attr,
 	&tune_attr_stall_secs.attr,
 	&tune_attr_greedy_default.attr,
 	&tune_attr_greedy_quantum.attr,
-- 
cgit v1.2.1


From fee852e374fb367c5436b1226eb93b35f8355ed9 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Wed, 17 Jan 2007 15:33:23 +0000
Subject: [GFS2] Shrink gfs2_inode memory by half

Here is something I spotted (while looking for something entirely
different) the other day.

Rather than using a completion in each and every struct gfs2_holder,
this removes it in favour of hashed wait queues, thus saving a
considerable amount of memory both on the stack (where a number of
gfs2_holder structures are allocated) and in particular in the
gfs2_inode which has 8 gfs2_holder structures embedded within it.

As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
In actual practice we get a much better result than that since
now that a gfs2_inode is under the 2048 byte barrier, we get two
per 4k slab page effectively halving the amount of memory required
to store gfs2_inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/glock.c  | 66 ++++++++++++++++++++++++++++++++++++--------------------
 fs/gfs2/incore.h |  2 +-
 2 files changed, 44 insertions(+), 24 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index fb1960b7fdde..5341e03b873f 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -19,6 +19,7 @@
 #include <linux/gfs2_ondisk.h>
 #include <linux/list.h>
 #include <linux/lm_interface.h>
+#include <linux/wait.h>
 #include <asm/uaccess.h>
 
 #include "gfs2.h"
@@ -395,7 +396,6 @@ void gfs2_holder_init(struct gfs2_glock *gl, unsigned int state, unsigned flags,
 	gh->gh_flags = flags;
 	gh->gh_error = 0;
 	gh->gh_iflags = 0;
-	init_completion(&gh->gh_wait);
 
 	if (gh->gh_state == LM_ST_EXCLUSIVE)
 		gh->gh_flags |= GL_LOCAL_EXCL;
@@ -479,6 +479,29 @@ static void gfs2_holder_put(struct gfs2_holder *gh)
 	kfree(gh);
 }
 
+static void gfs2_holder_dispose_or_wake(struct gfs2_holder *gh)
+{
+	if (test_bit(HIF_DEALLOC, &gh->gh_iflags)) {
+		gfs2_holder_put(gh);
+		return;
+	}
+	clear_bit(HIF_WAIT, &gh->gh_iflags);
+	smp_mb();
+	wake_up_bit(&gh->gh_iflags, HIF_WAIT);
+}
+
+static int holder_wait(void *word)
+{
+        schedule();
+        return 0;
+}
+
+static void wait_on_holder(struct gfs2_holder *gh)
+{
+	might_sleep();
+	wait_on_bit(&gh->gh_iflags, HIF_WAIT, holder_wait, TASK_UNINTERRUPTIBLE);
+}
+
 /**
  * rq_mutex - process a mutex request in the queue
  * @gh: the glock holder
@@ -493,7 +516,9 @@ static int rq_mutex(struct gfs2_holder *gh)
 	list_del_init(&gh->gh_list);
 	/*  gh->gh_error never examined.  */
 	set_bit(GLF_LOCK, &gl->gl_flags);
-	complete(&gh->gh_wait);
+	clear_bit(HIF_WAIT, &gh->gh_flags);
+	smp_mb();
+	wake_up_bit(&gh->gh_iflags, HIF_WAIT);
 
 	return 1;
 }
@@ -549,7 +574,7 @@ static int rq_promote(struct gfs2_holder *gh)
 	gh->gh_error = 0;
 	set_bit(HIF_HOLDER, &gh->gh_iflags);
 
-	complete(&gh->gh_wait);
+	gfs2_holder_dispose_or_wake(gh);
 
 	return 0;
 }
@@ -573,10 +598,7 @@ static int rq_demote(struct gfs2_holder *gh)
 		list_del_init(&gh->gh_list);
 		gh->gh_error = 0;
 		spin_unlock(&gl->gl_spin);
-		if (test_bit(HIF_DEALLOC, &gh->gh_iflags))
-			gfs2_holder_put(gh);
-		else
-			complete(&gh->gh_wait);
+		gfs2_holder_dispose_or_wake(gh);
 		spin_lock(&gl->gl_spin);
 	} else {
 		gl->gl_req_gh = gh;
@@ -684,6 +706,8 @@ static void gfs2_glmutex_lock(struct gfs2_glock *gl)
 
 	gfs2_holder_init(gl, 0, 0, &gh);
 	set_bit(HIF_MUTEX, &gh.gh_iflags);
+	if (test_and_set_bit(HIF_WAIT, &gh.gh_iflags))
+		BUG();
 
 	spin_lock(&gl->gl_spin);
 	if (test_and_set_bit(GLF_LOCK, &gl->gl_flags)) {
@@ -691,11 +715,13 @@ static void gfs2_glmutex_lock(struct gfs2_glock *gl)
 	} else {
 		gl->gl_owner = current;
 		gl->gl_ip = (unsigned long)__builtin_return_address(0);
-		complete(&gh.gh_wait);
+		clear_bit(HIF_WAIT, &gh.gh_iflags);
+		smp_mb();
+		wake_up_bit(&gh.gh_iflags, HIF_WAIT);
 	}
 	spin_unlock(&gl->gl_spin);
 
-	wait_for_completion(&gh.gh_wait);
+	wait_on_holder(&gh);
 	gfs2_holder_uninit(&gh);
 }
 
@@ -774,6 +800,7 @@ restart:
 			return;
 		set_bit(HIF_DEMOTE, &new_gh->gh_iflags);
 		set_bit(HIF_DEALLOC, &new_gh->gh_iflags);
+		set_bit(HIF_WAIT, &new_gh->gh_iflags);
 
 		goto restart;
 	}
@@ -908,12 +935,8 @@ static void xmote_bh(struct gfs2_glock *gl, unsigned int ret)
 
 	gfs2_glock_put(gl);
 
-	if (gh) {
-		if (test_bit(HIF_DEALLOC, &gh->gh_iflags))
-			gfs2_holder_put(gh);
-		else
-			complete(&gh->gh_wait);
-	}
+	if (gh)
+		gfs2_holder_dispose_or_wake(gh);
 }
 
 /**
@@ -999,12 +1022,8 @@ static void drop_bh(struct gfs2_glock *gl, unsigned int ret)
 
 	gfs2_glock_put(gl);
 
-	if (gh) {
-		if (test_bit(HIF_DEALLOC, &gh->gh_iflags))
-			gfs2_holder_put(gh);
-		else
-			complete(&gh->gh_wait);
-	}
+	if (gh)
+		gfs2_holder_dispose_or_wake(gh);
 }
 
 /**
@@ -1105,8 +1124,7 @@ static int glock_wait_internal(struct gfs2_holder *gh)
 	if (gh->gh_flags & LM_FLAG_PRIORITY)
 		do_cancels(gh);
 
-	wait_for_completion(&gh->gh_wait);
-
+	wait_on_holder(gh);
 	if (gh->gh_error)
 		return gh->gh_error;
 
@@ -1162,6 +1180,8 @@ static void add_to_queue(struct gfs2_holder *gh)
 	struct gfs2_holder *existing;
 
 	BUG_ON(!gh->gh_owner);
+	if (test_and_set_bit(HIF_WAIT, &gh->gh_iflags))
+		BUG();
 
 	existing = find_holder_by_owner(&gl->gl_holders, gh->gh_owner);
 	if (existing) {
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 9114851ac53e..a24c4af09ce0 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -128,6 +128,7 @@ enum {
 	HIF_HOLDER		= 6,
 	HIF_FIRST		= 7,
 	HIF_ABORTED		= 9,
+	HIF_WAIT		= 10,
 };
 
 struct gfs2_holder {
@@ -140,7 +141,6 @@ struct gfs2_holder {
 
 	int gh_error;
 	unsigned long gh_iflags;
-	struct completion gh_wait;
 	unsigned long gh_ip;
 };
 
-- 
cgit v1.2.1


From e5dab552c82ce416d7be867b1e5a0fa585dcf590 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Thu, 18 Jan 2007 17:44:20 +0000
Subject: [GFS2] Remove the "greedy" function from glock.[ch]

The "greedy" code was an attempt to retain glocks for a minimum length
of time when they relate to mmap()ed files. The current implementation
of this feature is not, however, ideal in that it required allocating
memory in order to do this and its overly complicated.

It also misses the mark by ignoring the other I/O operations which are
just as likely to suffer from the same problem. So the plan is to remove
this now and then add the functionality back as part of the glock state
machine at a later date (and thus take into account all the possible
users of this feature)

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/glock.c     | 93 -----------------------------------------------------
 fs/gfs2/glock.h     |  2 --
 fs/gfs2/glops.c     | 34 --------------------
 fs/gfs2/incore.h    |  9 +-----
 fs/gfs2/ops_super.c |  2 --
 fs/gfs2/ops_vm.c    | 24 +-------------
 fs/gfs2/super.c     |  3 --
 fs/gfs2/sys.c       |  6 ----
 8 files changed, 2 insertions(+), 171 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 5341e03b873f..90847e0957bb 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -34,11 +34,6 @@
 #include "super.h"
 #include "util.h"
 
-struct greedy {
-	struct gfs2_holder gr_gh;
-	struct delayed_work gr_work;
-};
-
 struct gfs2_gl_hash_bucket {
         struct hlist_head hb_list;
 };
@@ -617,30 +612,6 @@ static int rq_demote(struct gfs2_holder *gh)
 	return 0;
 }
 
-/**
- * rq_greedy - process a queued request to drop greedy status
- * @gh: the glock holder
- *
- * Returns: 1 if the queue is blocked
- */
-
-static int rq_greedy(struct gfs2_holder *gh)
-{
-	struct gfs2_glock *gl = gh->gh_gl;
-
-	list_del_init(&gh->gh_list);
-	/*  gh->gh_error never examined.  */
-	clear_bit(GLF_GREEDY, &gl->gl_flags);
-	spin_unlock(&gl->gl_spin);
-
-	gfs2_holder_uninit(gh);
-	kfree(container_of(gh, struct greedy, gr_gh));
-
-	spin_lock(&gl->gl_spin);
-
-	return 0;
-}
-
 /**
  * run_queue - process holder structures on a glock
  * @gl: the glock
@@ -671,8 +642,6 @@ static void run_queue(struct gfs2_glock *gl)
 
 			if (test_bit(HIF_DEMOTE, &gh->gh_iflags))
 				blocked = rq_demote(gh);
-			else if (test_bit(HIF_GREEDY, &gh->gh_iflags))
-				blocked = rq_greedy(gh);
 			else
 				gfs2_assert_warn(gl->gl_sbd, 0);
 
@@ -1336,68 +1305,6 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
 	spin_unlock(&gl->gl_spin);
 }
 
-static void greedy_work(struct work_struct *work)
-{
-	struct greedy *gr = container_of(work, struct greedy, gr_work.work);
-	struct gfs2_holder *gh = &gr->gr_gh;
-	struct gfs2_glock *gl = gh->gh_gl;
-	const struct gfs2_glock_operations *glops = gl->gl_ops;
-
-	clear_bit(GLF_SKIP_WAITERS2, &gl->gl_flags);
-
-	if (glops->go_greedy)
-		glops->go_greedy(gl);
-
-	spin_lock(&gl->gl_spin);
-
-	if (list_empty(&gl->gl_waiters2)) {
-		clear_bit(GLF_GREEDY, &gl->gl_flags);
-		spin_unlock(&gl->gl_spin);
-		gfs2_holder_uninit(gh);
-		kfree(gr);
-	} else {
-		gfs2_glock_hold(gl);
-		list_add_tail(&gh->gh_list, &gl->gl_waiters2);
-		run_queue(gl);
-		spin_unlock(&gl->gl_spin);
-		gfs2_glock_put(gl);
-	}
-}
-
-/**
- * gfs2_glock_be_greedy -
- * @gl:
- * @time:
- *
- * Returns: 0 if go_greedy will be called, 1 otherwise
- */
-
-int gfs2_glock_be_greedy(struct gfs2_glock *gl, unsigned int time)
-{
-	struct greedy *gr;
-	struct gfs2_holder *gh;
-
-	if (!time || gl->gl_sbd->sd_args.ar_localcaching ||
-	    test_and_set_bit(GLF_GREEDY, &gl->gl_flags))
-		return 1;
-
-	gr = kmalloc(sizeof(struct greedy), GFP_KERNEL);
-	if (!gr) {
-		clear_bit(GLF_GREEDY, &gl->gl_flags);
-		return 1;
-	}
-	gh = &gr->gr_gh;
-
-	gfs2_holder_init(gl, 0, 0, gh);
-	set_bit(HIF_GREEDY, &gh->gh_iflags);
-	INIT_DELAYED_WORK(&gr->gr_work, greedy_work);
-
-	set_bit(GLF_SKIP_WAITERS2, &gl->gl_flags);
-	schedule_delayed_work(&gr->gr_work, time);
-
-	return 0;
-}
-
 /**
  * gfs2_glock_dq_uninit - dequeue a holder from a glock and initialize it
  * @gh: the holder structure
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index bde02a7061ec..ddc56dc4ec9f 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -92,8 +92,6 @@ int gfs2_glock_poll(struct gfs2_holder *gh);
 int gfs2_glock_wait(struct gfs2_holder *gh);
 void gfs2_glock_dq(struct gfs2_holder *gh);
 
-int gfs2_glock_be_greedy(struct gfs2_glock *gl, unsigned int time);
-
 void gfs2_glock_dq_uninit(struct gfs2_holder *gh);
 int gfs2_glock_nq_num(struct gfs2_sbd *sdp,
 		      u64 number, const struct gfs2_glock_operations *glops,
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index b068d10bcb6e..e4da26fe406f 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -318,39 +318,6 @@ static void inode_go_unlock(struct gfs2_holder *gh)
 		gfs2_meta_cache_flush(ip);
 }
 
-/**
- * inode_greedy -
- * @gl: the glock
- *
- */
-
-static void inode_greedy(struct gfs2_glock *gl)
-{
-	struct gfs2_sbd *sdp = gl->gl_sbd;
-	struct gfs2_inode *ip = gl->gl_object;
-	unsigned int quantum = gfs2_tune_get(sdp, gt_greedy_quantum);
-	unsigned int max = gfs2_tune_get(sdp, gt_greedy_max);
-	unsigned int new_time;
-
-	spin_lock(&ip->i_spin);
-
-	if (time_after(ip->i_last_pfault + quantum, jiffies)) {
-		new_time = ip->i_greedy + quantum;
-		if (new_time > max)
-			new_time = max;
-	} else {
-		new_time = ip->i_greedy - quantum;
-		if (!new_time || new_time > max)
-			new_time = 1;
-	}
-
-	ip->i_greedy = new_time;
-
-	spin_unlock(&ip->i_spin);
-
-	iput(&ip->i_inode);
-}
-
 /**
  * rgrp_go_demote_ok - Check to see if it's ok to unlock a RG's glock
  * @gl: the glock
@@ -492,7 +459,6 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
 	.go_demote_ok = inode_go_demote_ok,
 	.go_lock = inode_go_lock,
 	.go_unlock = inode_go_unlock,
-	.go_greedy = inode_greedy,
 	.go_type = LM_TYPE_INODE,
 };
 
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index a24c4af09ce0..dc024b18ea96 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -111,7 +111,6 @@ struct gfs2_glock_operations {
 	int (*go_lock) (struct gfs2_holder *gh);
 	void (*go_unlock) (struct gfs2_holder *gh);
 	void (*go_callback) (struct gfs2_glock *gl, unsigned int state);
-	void (*go_greedy) (struct gfs2_glock *gl);
 	const int go_type;
 };
 
@@ -120,7 +119,6 @@ enum {
 	HIF_MUTEX		= 0,
 	HIF_PROMOTE		= 1,
 	HIF_DEMOTE		= 2,
-	HIF_GREEDY		= 3,
 
 	/* States */
 	HIF_ALLOCED		= 4,
@@ -149,7 +147,6 @@ enum {
 	GLF_STICKY		= 2,
 	GLF_DIRTY		= 5,
 	GLF_SKIP_WAITERS2	= 6,
-	GLF_GREEDY		= 7,
 };
 
 struct gfs2_glock {
@@ -166,7 +163,7 @@ struct gfs2_glock {
 	unsigned long gl_ip;
 	struct list_head gl_holders;
 	struct list_head gl_waiters1;	/* HIF_MUTEX */
-	struct list_head gl_waiters2;	/* HIF_DEMOTE, HIF_GREEDY */
+	struct list_head gl_waiters2;	/* HIF_DEMOTE */
 	struct list_head gl_waiters3;	/* HIF_PROMOTE */
 
 	const struct gfs2_glock_operations *gl_ops;
@@ -235,7 +232,6 @@ struct gfs2_inode {
 
 	spinlock_t i_spin;
 	struct rw_semaphore i_rw_mutex;
-	unsigned int i_greedy;
 	unsigned long i_last_pfault;
 
 	struct buffer_head *i_cache[GFS2_MAX_META_HEIGHT];
@@ -423,9 +419,6 @@ struct gfs2_tune {
 	unsigned int gt_complain_secs;
 	unsigned int gt_reclaim_limit; /* Max num of glocks in reclaim list */
 	unsigned int gt_entries_per_readdir;
-	unsigned int gt_greedy_default;
-	unsigned int gt_greedy_quantum;
-	unsigned int gt_greedy_max;
 	unsigned int gt_statfs_quantum;
 	unsigned int gt_statfs_slow;
 };
diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
index c22738cdbf2d..47369d011214 100644
--- a/fs/gfs2/ops_super.c
+++ b/fs/gfs2/ops_super.c
@@ -452,14 +452,12 @@ out:
 
 static struct inode *gfs2_alloc_inode(struct super_block *sb)
 {
-	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_inode *ip;
 
 	ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL);
 	if (ip) {
 		ip->i_flags = 0;
 		ip->i_gl = NULL;
-		ip->i_greedy = gfs2_tune_get(sdp, gt_greedy_default);
 		ip->i_last_pfault = jiffies;
 	}
 	return &ip->i_inode;
diff --git a/fs/gfs2/ops_vm.c b/fs/gfs2/ops_vm.c
index 45a5f11fc39a..14b380fb0602 100644
--- a/fs/gfs2/ops_vm.c
+++ b/fs/gfs2/ops_vm.c
@@ -28,34 +28,13 @@
 #include "trans.h"
 #include "util.h"
 
-static void pfault_be_greedy(struct gfs2_inode *ip)
-{
-	unsigned int time;
-
-	spin_lock(&ip->i_spin);
-	time = ip->i_greedy;
-	ip->i_last_pfault = jiffies;
-	spin_unlock(&ip->i_spin);
-
-	igrab(&ip->i_inode);
-	if (gfs2_glock_be_greedy(ip->i_gl, time))
-		iput(&ip->i_inode);
-}
-
 static struct page *gfs2_private_nopage(struct vm_area_struct *area,
 					unsigned long address, int *type)
 {
 	struct gfs2_inode *ip = GFS2_I(area->vm_file->f_mapping->host);
-	struct page *result;
 
 	set_bit(GIF_PAGED, &ip->i_flags);
-
-	result = filemap_nopage(area, address, type);
-
-	if (result && result != NOPAGE_OOM)
-		pfault_be_greedy(ip);
-
-	return result;
+	return filemap_nopage(area, address, type);
 }
 
 static int alloc_page_backing(struct gfs2_inode *ip, struct page *page)
@@ -167,7 +146,6 @@ static struct page *gfs2_sharewrite_nopage(struct vm_area_struct *area,
 		set_page_dirty(result);
 	}
 
-	pfault_be_greedy(ip);
 out:
 	gfs2_glock_dq_uninit(&i_gh);
 
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 3e17dcf35a30..ce5353ac8f3d 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -77,9 +77,6 @@ void gfs2_tune_init(struct gfs2_tune *gt)
 	gt->gt_complain_secs = 10;
 	gt->gt_reclaim_limit = 5000;
 	gt->gt_entries_per_readdir = 32;
-	gt->gt_greedy_default = HZ / 10;
-	gt->gt_greedy_quantum = HZ / 40;
-	gt->gt_greedy_max = HZ / 4;
 	gt->gt_statfs_quantum = 30;
 	gt->gt_statfs_slow = 0;
 }
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 11206118d983..d01f9f0fda26 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -442,9 +442,6 @@ TUNE_ATTR(new_files_directio, 0);
 TUNE_ATTR(quota_simul_sync, 1);
 TUNE_ATTR(quota_cache_secs, 1);
 TUNE_ATTR(stall_secs, 1);
-TUNE_ATTR(greedy_default, 1);
-TUNE_ATTR(greedy_quantum, 1);
-TUNE_ATTR(greedy_max, 1);
 TUNE_ATTR(statfs_quantum, 1);
 TUNE_ATTR_DAEMON(scand_secs, scand_process);
 TUNE_ATTR_DAEMON(recoverd_secs, recoverd_process);
@@ -467,9 +464,6 @@ static struct attribute *tune_attrs[] = {
 	&tune_attr_quota_simul_sync.attr,
 	&tune_attr_quota_cache_secs.attr,
 	&tune_attr_stall_secs.attr,
-	&tune_attr_greedy_default.attr,
-	&tune_attr_greedy_quantum.attr,
-	&tune_attr_greedy_max.attr,
 	&tune_attr_statfs_quantum.attr,
 	&tune_attr_scand_secs.attr,
 	&tune_attr_recoverd_secs.attr,
-- 
cgit v1.2.1


From 6bd9c8c2fb99d1f5af6201db2f063c1d754c230a Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Fri, 19 Jan 2007 13:57:36 -0500
Subject: [GFS2] Remove unused go_callback operation

This is never used, so we might as well remove it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/glock.c  | 2 --
 fs/gfs2/incore.h | 1 -
 2 files changed, 3 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 90847e0957bb..8e4b55a2bd77 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1568,8 +1568,6 @@ static void blocking_cb(struct gfs2_sbd *sdp, struct lm_lockname *name,
 	if (!gl)
 		return;
 
-	if (gl->gl_ops->go_callback)
-		gl->gl_ops->go_callback(gl, state);
 	handle_callback(gl, state);
 
 	spin_lock(&gl->gl_spin);
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index dc024b18ea96..1acbcc2415ee 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -110,7 +110,6 @@ struct gfs2_glock_operations {
 	int (*go_demote_ok) (struct gfs2_glock *gl);
 	int (*go_lock) (struct gfs2_holder *gh);
 	void (*go_unlock) (struct gfs2_holder *gh);
-	void (*go_callback) (struct gfs2_glock *gl, unsigned int state);
 	const int go_type;
 };
 
-- 
cgit v1.2.1


From 1c0f4872dc4bbeb2223a300517099786211fce83 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Mon, 22 Jan 2007 12:10:39 -0500
Subject: [GFS2] Remove local exclusive glock mode

Here is a patch for GFS2 to remove the local exclusive flag. In
the places it was used, mutex's are always held earlier in the
call path, so it appears redundant in the LM_ST_SHARED case.

Also, the GFS2 holders were setting local exclusive in any case where
the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
code where the flag was tested have been replaced with tests for the
lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
as globally exclusive).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/glock.c      | 16 +++-------------
 fs/gfs2/glock.h      |  1 -
 fs/gfs2/glops.c      |  2 +-
 fs/gfs2/ops_export.c |  3 +--
 fs/gfs2/super.c      | 11 ++++-------
 5 files changed, 9 insertions(+), 24 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 8e4b55a2bd77..1345c3d44ede 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -391,10 +391,6 @@ void gfs2_holder_init(struct gfs2_glock *gl, unsigned int state, unsigned flags,
 	gh->gh_flags = flags;
 	gh->gh_error = 0;
 	gh->gh_iflags = 0;
-
-	if (gh->gh_state == LM_ST_EXCLUSIVE)
-		gh->gh_flags |= GL_LOCAL_EXCL;
-
 	gfs2_glock_hold(gl);
 }
 
@@ -412,9 +408,6 @@ void gfs2_holder_reinit(unsigned int state, unsigned flags, struct gfs2_holder *
 {
 	gh->gh_state = state;
 	gh->gh_flags = flags;
-	if (gh->gh_state == LM_ST_EXCLUSIVE)
-		gh->gh_flags |= GL_LOCAL_EXCL;
-
 	gh->gh_iflags &= 1 << HIF_ALLOCED;
 	gh->gh_ip = (unsigned long)__builtin_return_address(0);
 }
@@ -557,11 +550,11 @@ static int rq_promote(struct gfs2_holder *gh)
 		set_bit(GLF_LOCK, &gl->gl_flags);
 	} else {
 		struct gfs2_holder *next_gh;
-		if (gh->gh_flags & GL_LOCAL_EXCL)
+		if (gh->gh_state == LM_ST_EXCLUSIVE)
 			return 1;
 		next_gh = list_entry(gl->gl_holders.next, struct gfs2_holder,
 				     gh_list);
-		if (next_gh->gh_flags & GL_LOCAL_EXCL)
+		if (next_gh->gh_state == LM_ST_EXCLUSIVE)
 			 return 1;
 	}
 
@@ -1363,10 +1356,7 @@ static int glock_compare(const void *arg_a, const void *arg_b)
 		return 1;
 	if (a->ln_number < b->ln_number)
 		return -1;
-	if (gh_a->gh_state == LM_ST_SHARED && gh_b->gh_state == LM_ST_EXCLUSIVE)
-		return 1;
-	if (!(gh_a->gh_flags & GL_LOCAL_EXCL) && (gh_b->gh_flags & GL_LOCAL_EXCL))
-		return 1;
+	BUG_ON(gh_a->gh_gl->gl_ops->go_type == gh_b->gh_gl->gl_ops->go_type);
 	return 0;
 }
 
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index ddc56dc4ec9f..1eaeacdd14a4 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -20,7 +20,6 @@
 #define LM_FLAG_ANY		0x00000008
 #define LM_FLAG_PRIORITY	0x00000010 */
 
-#define GL_LOCAL_EXCL		0x00000020
 #define GL_ASYNC		0x00000040
 #define GL_EXACT		0x00000080
 #define GL_SKIP			0x00000100
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index e4da26fe406f..dda68586f757 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -295,7 +295,7 @@ static int inode_go_lock(struct gfs2_holder *gh)
 
 	if ((ip->i_di.di_flags & GFS2_DIF_TRUNC_IN_PROG) &&
 	    (gl->gl_state == LM_ST_EXCLUSIVE) &&
-	    (gh->gh_flags & GL_LOCAL_EXCL))
+	    (gh->gh_state == LM_ST_EXCLUSIVE))
 		error = gfs2_truncatei_resume(ip);
 
 	return error;
diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c
index fbf55063928f..4855e8cca622 100644
--- a/fs/gfs2/ops_export.c
+++ b/fs/gfs2/ops_export.c
@@ -216,8 +216,7 @@ static struct dentry *gfs2_get_dentry(struct super_block *sb, void *inum_obj)
 	}
 
 	error = gfs2_glock_nq_num(sdp, inum->no_addr, &gfs2_inode_glops,
-				  LM_ST_SHARED, LM_FLAG_ANY | GL_LOCAL_EXCL,
-				  &i_gh);
+				  LM_ST_SHARED, LM_FLAG_ANY, &i_gh);
 	if (error)
 		return ERR_PTR(error);
 
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ce5353ac8f3d..70f424fcf1cd 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -354,8 +354,7 @@ int gfs2_jindex_hold(struct gfs2_sbd *sdp, struct gfs2_holder *ji_gh)
 	mutex_lock(&sdp->sd_jindex_mutex);
 
 	for (;;) {
-		error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED,
-					   GL_LOCAL_EXCL, ji_gh);
+		error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, ji_gh);
 		if (error)
 			break;
 
@@ -524,8 +523,7 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
 	struct gfs2_log_header_host head;
 	int error;
 
-	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED,
-				   GL_LOCAL_EXCL, &t_gh);
+	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED, 0, &t_gh);
 	if (error)
 		return error;
 
@@ -578,9 +576,8 @@ int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 	gfs2_quota_sync(sdp);
 	gfs2_statfs_sync(sdp);
 
-	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED,
-				GL_LOCAL_EXCL | GL_NOCACHE,
-				&t_gh);
+	error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED, GL_NOCACHE,
+				   &t_gh);
 	if (error && !test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
 		return error;
 
-- 
cgit v1.2.1


From f2f5095f9e63db57faa7cb082e958910ecdd7ad4 Mon Sep 17 00:00:00 2001
From: Patrick Caulfield <pcaulfie@redhat.com>
Date: Mon, 22 Jan 2007 14:50:10 +0000
Subject: [DLM] lowcomms tidy

This patch removes some redundant fields from the connection structure and adds
some lockdep annotation to remove spurious warnings.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lowcomms-tcp.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lowcomms-tcp.c b/fs/dlm/lowcomms-tcp.c
index 86e5f81da7cb..6e27201f2f95 100644
--- a/fs/dlm/lowcomms-tcp.c
+++ b/fs/dlm/lowcomms-tcp.c
@@ -97,9 +97,6 @@ struct connection {
 	struct socket *sock;	/* NULL if not connected */
 	uint32_t nodeid;	/* So we know who we are in the list */
 	struct rw_semaphore sock_sem; /* Stop connect races */
-	struct list_head read_list;   /* On this list when ready for reading */
-	struct list_head write_list;  /* On this list when ready for writing */
-	struct list_head state_list;  /* On this list when ready to connect */
 	unsigned long flags;	/* bit 1,2 = We are on the read/write lists */
 #define CF_READ_PENDING 1
 #define CF_WRITE_PENDING 2
@@ -391,7 +388,7 @@ static int accept_from_sock(struct connection *con)
 	if (result < 0)
 		return -ENOMEM;
 
-	down_read(&con->sock_sem);
+	down_read_nested(&con->sock_sem, 0);
 
 	result = -ENOTCONN;
 	if (con->sock == NULL)
@@ -434,7 +431,7 @@ static int accept_from_sock(struct connection *con)
 		result = -ENOMEM;
 		goto accept_err;
 	}
-	down_write(&newcon->sock_sem);
+	down_write_nested(&newcon->sock_sem, 1);
 	if (newcon->sock) {
 		struct connection *othercon = newcon->othercon;
 
-- 
cgit v1.2.1


From b5d32bead1578afc5ca817d40c320764d50a8600 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Mon, 22 Jan 2007 12:15:34 -0500
Subject: [GFS2] Tidy up glops calls

This patch doesn't make any changes to the ordering of the various
operations related to glocking, but it does tidy up the calls to the
glops.c functions to make the structure more obvious.

The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
made static within glock.c since they are called by every set of glock
operations. The xmote_th and drop_th glock operations are then made
conditional upon those two routines existing and called from the
previously mentioned functions in glock.c respectively.

Also it can be seen that the go_sync operation isn't needed since it can
easily be replaced by calls to xmote_bh and drop_bh respectively. This
results in no longer (confusingly) calling back into routines in glock.c
from glops.c and also reducing the glock operations by one member.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/glock.c  |  29 ++++++++--------
 fs/gfs2/glock.h  |   4 ---
 fs/gfs2/glops.c  | 100 +++++++++++++++++++++++--------------------------------
 fs/gfs2/incore.h |   3 +-
 4 files changed, 59 insertions(+), 77 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 1345c3d44ede..5b772bb0210f 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -43,6 +43,8 @@ typedef void (*glock_examiner) (struct gfs2_glock * gl);
 static int gfs2_dump_lockstate(struct gfs2_sbd *sdp);
 static int dump_glock(struct gfs2_glock *gl);
 static int dump_inode(struct gfs2_inode *ip);
+static void gfs2_glock_xmote_th(struct gfs2_holder *gh);
+static void gfs2_glock_drop_th(struct gfs2_glock *gl);
 
 #define GFS2_GL_HASH_SHIFT      15
 #define GFS2_GL_HASH_SIZE       (1 << GFS2_GL_HASH_SHIFT)
@@ -524,7 +526,6 @@ static int rq_promote(struct gfs2_holder *gh)
 {
 	struct gfs2_glock *gl = gh->gh_gl;
 	struct gfs2_sbd *sdp = gl->gl_sbd;
-	const struct gfs2_glock_operations *glops = gl->gl_ops;
 
 	if (!relaxed_state_ok(gl->gl_state, gh->gh_state, gh->gh_flags)) {
 		if (list_empty(&gl->gl_holders)) {
@@ -539,7 +540,7 @@ static int rq_promote(struct gfs2_holder *gh)
 				gfs2_reclaim_glock(sdp);
 			}
 
-			glops->go_xmote_th(gl, gh->gh_state, gh->gh_flags);
+			gfs2_glock_xmote_th(gh);
 			spin_lock(&gl->gl_spin);
 		}
 		return 1;
@@ -577,7 +578,6 @@ static int rq_promote(struct gfs2_holder *gh)
 static int rq_demote(struct gfs2_holder *gh)
 {
 	struct gfs2_glock *gl = gh->gh_gl;
-	const struct gfs2_glock_operations *glops = gl->gl_ops;
 
 	if (!list_empty(&gl->gl_holders))
 		return 1;
@@ -595,9 +595,9 @@ static int rq_demote(struct gfs2_holder *gh)
 
 		if (gh->gh_state == LM_ST_UNLOCKED ||
 		    gl->gl_state != LM_ST_EXCLUSIVE)
-			glops->go_drop_th(gl);
+			gfs2_glock_drop_th(gl);
 		else
-			glops->go_xmote_th(gl, gh->gh_state, gh->gh_flags);
+			gfs2_glock_xmote_th(gh);
 
 		spin_lock(&gl->gl_spin);
 	}
@@ -909,23 +909,26 @@ static void xmote_bh(struct gfs2_glock *gl, unsigned int ret)
  *
  */
 
-void gfs2_glock_xmote_th(struct gfs2_glock *gl, unsigned int state, int flags)
+void gfs2_glock_xmote_th(struct gfs2_holder *gh)
 {
+	struct gfs2_glock *gl = gh->gh_gl;
 	struct gfs2_sbd *sdp = gl->gl_sbd;
+	int flags = gh->gh_flags;
+	unsigned state = gh->gh_state;
 	const struct gfs2_glock_operations *glops = gl->gl_ops;
 	int lck_flags = flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB |
 				 LM_FLAG_NOEXP | LM_FLAG_ANY |
 				 LM_FLAG_PRIORITY);
 	unsigned int lck_ret;
 
+	if (glops->go_xmote_th)
+		glops->go_xmote_th(gl);
+
 	gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
 	gfs2_assert_warn(sdp, queue_empty(gl, &gl->gl_holders));
 	gfs2_assert_warn(sdp, state != LM_ST_UNLOCKED);
 	gfs2_assert_warn(sdp, state != gl->gl_state);
 
-	if (gl->gl_state == LM_ST_EXCLUSIVE && glops->go_sync)
-		glops->go_sync(gl);
-
 	gfs2_glock_hold(gl);
 	gl->gl_req_bh = xmote_bh;
 
@@ -994,19 +997,19 @@ static void drop_bh(struct gfs2_glock *gl, unsigned int ret)
  *
  */
 
-void gfs2_glock_drop_th(struct gfs2_glock *gl)
+static void gfs2_glock_drop_th(struct gfs2_glock *gl)
 {
 	struct gfs2_sbd *sdp = gl->gl_sbd;
 	const struct gfs2_glock_operations *glops = gl->gl_ops;
 	unsigned int ret;
 
+	if (glops->go_drop_th)
+		glops->go_drop_th(gl);
+
 	gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
 	gfs2_assert_warn(sdp, queue_empty(gl, &gl->gl_holders));
 	gfs2_assert_warn(sdp, gl->gl_state != LM_ST_UNLOCKED);
 
-	if (gl->gl_state == LM_ST_EXCLUSIVE && glops->go_sync)
-		glops->go_sync(gl);
-
 	gfs2_glock_hold(gl);
 	gl->gl_req_bh = drop_bh;
 
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 1eaeacdd14a4..f50e40ceca43 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -82,10 +82,6 @@ void gfs2_holder_init(struct gfs2_glock *gl, unsigned int state, unsigned flags,
 void gfs2_holder_reinit(unsigned int state, unsigned flags,
 			struct gfs2_holder *gh);
 void gfs2_holder_uninit(struct gfs2_holder *gh);
-
-void gfs2_glock_xmote_th(struct gfs2_glock *gl, unsigned int state, int flags);
-void gfs2_glock_drop_th(struct gfs2_glock *gl);
-
 int gfs2_glock_nq(struct gfs2_holder *gh);
 int gfs2_glock_poll(struct gfs2_holder *gh);
 int gfs2_glock_wait(struct gfs2_holder *gh);
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index dda68586f757..c4b0391b7aa2 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -117,12 +117,14 @@ static void gfs2_pte_inval(struct gfs2_glock *gl)
 
 static void meta_go_sync(struct gfs2_glock *gl)
 {
+	if (gl->gl_state != LM_ST_EXCLUSIVE)
+		return;
+
 	if (test_and_clear_bit(GLF_DIRTY, &gl->gl_flags)) {
 		gfs2_log_flush(gl->gl_sbd, gl);
 		gfs2_meta_sync(gl);
 		gfs2_ail_empty_gl(gl);
 	}
-
 }
 
 /**
@@ -141,6 +143,37 @@ static void meta_go_inval(struct gfs2_glock *gl, int flags)
 	gl->gl_vn++;
 }
 
+/**
+ * inode_go_sync - Sync the dirty data and/or metadata for an inode glock
+ * @gl: the glock protecting the inode
+ *
+ */
+
+static void inode_go_sync(struct gfs2_glock *gl)
+{
+	struct gfs2_inode *ip = gl->gl_object;
+
+	if (ip && !S_ISREG(ip->i_inode.i_mode))
+		ip = NULL;
+
+	if (test_bit(GLF_DIRTY, &gl->gl_flags)) {
+		gfs2_log_flush(gl->gl_sbd, gl);
+		if (ip)
+			filemap_fdatawrite(ip->i_inode.i_mapping);
+		gfs2_meta_sync(gl);
+		if (ip) {
+			struct address_space *mapping = ip->i_inode.i_mapping;
+			int error = filemap_fdatawait(mapping);
+			if (error == -ENOSPC)
+				set_bit(AS_ENOSPC, &mapping->flags);
+			else if (error)
+				set_bit(AS_EIO, &mapping->flags);
+		}
+		clear_bit(GLF_DIRTY, &gl->gl_flags);
+		gfs2_ail_empty_gl(gl);
+	}
+}
+
 /**
  * inode_go_xmote_th - promote/demote a glock
  * @gl: the glock
@@ -149,12 +182,12 @@ static void meta_go_inval(struct gfs2_glock *gl, int flags)
  *
  */
 
-static void inode_go_xmote_th(struct gfs2_glock *gl, unsigned int state,
-			      int flags)
+static void inode_go_xmote_th(struct gfs2_glock *gl)
 {
 	if (gl->gl_state != LM_ST_UNLOCKED)
 		gfs2_pte_inval(gl);
-	gfs2_glock_xmote_th(gl, state, flags);
+	if (gl->gl_state == LM_ST_EXCLUSIVE)
+		inode_go_sync(gl);
 }
 
 /**
@@ -189,38 +222,8 @@ static void inode_go_xmote_bh(struct gfs2_glock *gl)
 static void inode_go_drop_th(struct gfs2_glock *gl)
 {
 	gfs2_pte_inval(gl);
-	gfs2_glock_drop_th(gl);
-}
-
-/**
- * inode_go_sync - Sync the dirty data and/or metadata for an inode glock
- * @gl: the glock protecting the inode
- *
- */
-
-static void inode_go_sync(struct gfs2_glock *gl)
-{
-	struct gfs2_inode *ip = gl->gl_object;
-
-	if (ip && !S_ISREG(ip->i_inode.i_mode))
-		ip = NULL;
-
-	if (test_bit(GLF_DIRTY, &gl->gl_flags)) {
-		gfs2_log_flush(gl->gl_sbd, gl);
-		if (ip)
-			filemap_fdatawrite(ip->i_inode.i_mapping);
-		gfs2_meta_sync(gl);
-		if (ip) {
-			struct address_space *mapping = ip->i_inode.i_mapping;
-			int error = filemap_fdatawait(mapping);
-			if (error == -ENOSPC)
-				set_bit(AS_ENOSPC, &mapping->flags);
-			else if (error)
-				set_bit(AS_EIO, &mapping->flags);
-		}
-		clear_bit(GLF_DIRTY, &gl->gl_flags);
-		gfs2_ail_empty_gl(gl);
-	}
+	if (gl->gl_state == LM_ST_EXCLUSIVE)
+		inode_go_sync(gl);
 }
 
 /**
@@ -365,8 +368,7 @@ static void rgrp_go_unlock(struct gfs2_holder *gh)
  *
  */
 
-static void trans_go_xmote_th(struct gfs2_glock *gl, unsigned int state,
-			      int flags)
+static void trans_go_xmote_th(struct gfs2_glock *gl)
 {
 	struct gfs2_sbd *sdp = gl->gl_sbd;
 
@@ -375,8 +377,6 @@ static void trans_go_xmote_th(struct gfs2_glock *gl, unsigned int state,
 		gfs2_meta_syncfs(sdp);
 		gfs2_log_shutdown(sdp);
 	}
-
-	gfs2_glock_xmote_th(gl, state, flags);
 }
 
 /**
@@ -428,8 +428,6 @@ static void trans_go_drop_th(struct gfs2_glock *gl)
 		gfs2_meta_syncfs(sdp);
 		gfs2_log_shutdown(sdp);
 	}
-
-	gfs2_glock_drop_th(gl);
 }
 
 /**
@@ -445,8 +443,8 @@ static int quota_go_demote_ok(struct gfs2_glock *gl)
 }
 
 const struct gfs2_glock_operations gfs2_meta_glops = {
-	.go_xmote_th = gfs2_glock_xmote_th,
-	.go_drop_th = gfs2_glock_drop_th,
+	.go_xmote_th = meta_go_sync,
+	.go_drop_th = meta_go_sync,
 	.go_type = LM_TYPE_META,
 };
 
@@ -454,7 +452,6 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
 	.go_xmote_th = inode_go_xmote_th,
 	.go_xmote_bh = inode_go_xmote_bh,
 	.go_drop_th = inode_go_drop_th,
-	.go_sync = inode_go_sync,
 	.go_inval = inode_go_inval,
 	.go_demote_ok = inode_go_demote_ok,
 	.go_lock = inode_go_lock,
@@ -463,9 +460,6 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
 };
 
 const struct gfs2_glock_operations gfs2_rgrp_glops = {
-	.go_xmote_th = gfs2_glock_xmote_th,
-	.go_drop_th = gfs2_glock_drop_th,
-	.go_sync = meta_go_sync,
 	.go_inval = meta_go_inval,
 	.go_demote_ok = rgrp_go_demote_ok,
 	.go_lock = rgrp_go_lock,
@@ -481,33 +475,23 @@ const struct gfs2_glock_operations gfs2_trans_glops = {
 };
 
 const struct gfs2_glock_operations gfs2_iopen_glops = {
-	.go_xmote_th = gfs2_glock_xmote_th,
-	.go_drop_th = gfs2_glock_drop_th,
 	.go_type = LM_TYPE_IOPEN,
 };
 
 const struct gfs2_glock_operations gfs2_flock_glops = {
-	.go_xmote_th = gfs2_glock_xmote_th,
-	.go_drop_th = gfs2_glock_drop_th,
 	.go_type = LM_TYPE_FLOCK,
 };
 
 const struct gfs2_glock_operations gfs2_nondisk_glops = {
-	.go_xmote_th = gfs2_glock_xmote_th,
-	.go_drop_th = gfs2_glock_drop_th,
 	.go_type = LM_TYPE_NONDISK,
 };
 
 const struct gfs2_glock_operations gfs2_quota_glops = {
-	.go_xmote_th = gfs2_glock_xmote_th,
-	.go_drop_th = gfs2_glock_drop_th,
 	.go_demote_ok = quota_go_demote_ok,
 	.go_type = LM_TYPE_QUOTA,
 };
 
 const struct gfs2_glock_operations gfs2_journal_glops = {
-	.go_xmote_th = gfs2_glock_xmote_th,
-	.go_drop_th = gfs2_glock_drop_th,
 	.go_type = LM_TYPE_JOURNAL,
 };
 
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 1acbcc2415ee..12c80fd28db5 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -101,11 +101,10 @@ struct gfs2_bufdata {
 };
 
 struct gfs2_glock_operations {
-	void (*go_xmote_th) (struct gfs2_glock *gl, unsigned int state, int flags);
+	void (*go_xmote_th) (struct gfs2_glock *gl);
 	void (*go_xmote_bh) (struct gfs2_glock *gl);
 	void (*go_drop_th) (struct gfs2_glock *gl);
 	void (*go_drop_bh) (struct gfs2_glock *gl);
-	void (*go_sync) (struct gfs2_glock *gl);
 	void (*go_inval) (struct gfs2_glock *gl, int flags);
 	int (*go_demote_ok) (struct gfs2_glock *gl);
 	int (*go_lock) (struct gfs2_holder *gh);
-- 
cgit v1.2.1


From bd44e2b007bc9024bce3357c185b38c73f87c3dd Mon Sep 17 00:00:00 2001
From: Patrick Caulfield <pcaulfie@redhat.com>
Date: Mon, 22 Jan 2007 14:51:33 +0000
Subject: [DLM] fix lowcomms receiving

This patch fixes a bug whereby data on a newly accepted connection would be
ignored if it arrived soon after the accept.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lowcomms-tcp.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lowcomms-tcp.c b/fs/dlm/lowcomms-tcp.c
index 6e27201f2f95..8e6a76cf1805 100644
--- a/fs/dlm/lowcomms-tcp.c
+++ b/fs/dlm/lowcomms-tcp.c
@@ -327,6 +327,9 @@ static int receive_from_sock(struct connection *con)
 
 	if (ret <= 0)
 		goto out_close;
+	if (ret == -EAGAIN)
+		goto out_resched;
+
 	if (ret == len)
 		call_again_soon = 1;
 	cbuf_add(&con->cb, ret);
@@ -359,8 +362,7 @@ out_resched:
 	if (!test_and_set_bit(CF_READ_PENDING, &con->flags))
 		queue_work(recv_workqueue, &con->rwork);
 	up_read(&con->sock_sem);
-	cond_resched();
-	return 0;
+	return -EAGAIN;
 
 out_close:
 	up_read(&con->sock_sem);
@@ -381,6 +383,7 @@ static int accept_from_sock(struct connection *con)
 	int len;
 	int nodeid;
 	struct connection *newcon;
+	struct connection *addcon;
 
 	memset(&peeraddr, 0, sizeof(peeraddr));
 	result = sock_create_kern(dlm_local_addr.ss_family, SOCK_STREAM,
@@ -454,12 +457,13 @@ static int accept_from_sock(struct connection *con)
 		othercon->sock = newsock;
 		newsock->sk->sk_user_data = othercon;
 		add_sock(newsock, othercon);
+		addcon = othercon;
 	}
 	else {
 		newsock->sk->sk_user_data = newcon;
 		newcon->rx_action = receive_from_sock;
 		add_sock(newsock, newcon);
-
+		addcon = newcon;
 	}
 
 	up_write(&newcon->sock_sem);
@@ -469,8 +473,8 @@ static int accept_from_sock(struct connection *con)
 	 * beween processing the accept adding the socket
 	 * to the read_sockets list
 	 */
-	if (!test_and_set_bit(CF_READ_PENDING, &newcon->flags))
-		queue_work(recv_workqueue, &newcon->rwork);
+	if (!test_and_set_bit(CF_READ_PENDING, &addcon->flags))
+		queue_work(recv_workqueue, &addcon->rwork);
 	up_read(&con->sock_sem);
 
 	return 0;
@@ -610,8 +614,7 @@ static struct socket *create_listen_sock(struct connection *con,
 
 	result = sock->ops->listen(sock, 5);
 	if (result < 0) {
-		printk("dlm: Can't listen on port %d\n",
-		       dlm_config.ci_tcp_port);
+		printk("dlm: Can't listen on port %d\n", dlm_config.ci_tcp_port);
 		sock_release(sock);
 		sock = NULL;
 		goto create_out;
@@ -811,7 +814,7 @@ send_error:
 
 out_connect:
 	up_read(&con->sock_sem);
-	lowcomms_connect_sock(con);
+	connect_to_sock(con);
 	return;
 }
 
@@ -873,9 +876,8 @@ static void process_send_sockets(struct work_struct *work)
 		connect_to_sock(con);
 	}
 
-	if (test_and_clear_bit(CF_WRITE_PENDING, &con->flags)) {
-		send_to_sock(con);
-	}
+	clear_bit(CF_WRITE_PENDING, &con->flags);
+	send_to_sock(con);
 }
 
 
-- 
cgit v1.2.1


From 12132933c4fdeb458195a9388287d550c8476edf Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Mon, 22 Jan 2007 13:09:04 -0500
Subject: [GFS2] Remove queue_empty() function

This function is not longer required since we do not do recursive
locking in the glock layer. As a result all its callers can be
replaceed with list_empty() calls.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/glock.c | 38 +++++++-------------------------------
 1 file changed, 7 insertions(+), 31 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 5b772bb0210f..1509481b8ca6 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -210,30 +210,6 @@ out:
 	return rv;
 }
 
-/**
- * queue_empty - check to see if a glock's queue is empty
- * @gl: the glock
- * @head: the head of the queue to check
- *
- * This function protects the list in the event that a process already
- * has a holder on the list and is adding a second holder for itself.
- * The glmutex lock is what generally prevents processes from working
- * on the same glock at once, but the special case of adding a second
- * holder for yourself ("recursive" locking) doesn't involve locking
- * glmutex, making the spin lock necessary.
- *
- * Returns: 1 if the queue is empty
- */
-
-static inline int queue_empty(struct gfs2_glock *gl, struct list_head *head)
-{
-	int empty;
-	spin_lock(&gl->gl_spin);
-	empty = list_empty(head);
-	spin_unlock(&gl->gl_spin);
-	return empty;
-}
-
 /**
  * search_bucket() - Find struct gfs2_glock by lock number
  * @bucket: the bucket to search
@@ -814,7 +790,7 @@ static void xmote_bh(struct gfs2_glock *gl, unsigned int ret)
 	int op_done = 1;
 
 	gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
-	gfs2_assert_warn(sdp, queue_empty(gl, &gl->gl_holders));
+	gfs2_assert_warn(sdp, list_empty(&gl->gl_holders));
 	gfs2_assert_warn(sdp, !(ret & LM_OUT_ASYNC));
 
 	state_change(gl, ret & LM_OUT_ST_MASK);
@@ -925,7 +901,7 @@ void gfs2_glock_xmote_th(struct gfs2_holder *gh)
 		glops->go_xmote_th(gl);
 
 	gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
-	gfs2_assert_warn(sdp, queue_empty(gl, &gl->gl_holders));
+	gfs2_assert_warn(sdp, list_empty(&gl->gl_holders));
 	gfs2_assert_warn(sdp, state != LM_ST_UNLOCKED);
 	gfs2_assert_warn(sdp, state != gl->gl_state);
 
@@ -960,7 +936,7 @@ static void drop_bh(struct gfs2_glock *gl, unsigned int ret)
 	struct gfs2_holder *gh = gl->gl_req_gh;
 
 	gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
-	gfs2_assert_warn(sdp, queue_empty(gl, &gl->gl_holders));
+	gfs2_assert_warn(sdp, list_empty(&gl->gl_holders));
 	gfs2_assert_warn(sdp, !ret);
 
 	state_change(gl, LM_ST_UNLOCKED);
@@ -1007,7 +983,7 @@ static void gfs2_glock_drop_th(struct gfs2_glock *gl)
 		glops->go_drop_th(gl);
 
 	gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
-	gfs2_assert_warn(sdp, queue_empty(gl, &gl->gl_holders));
+	gfs2_assert_warn(sdp, list_empty(&gl->gl_holders));
 	gfs2_assert_warn(sdp, gl->gl_state != LM_ST_UNLOCKED);
 
 	gfs2_glock_hold(gl);
@@ -1697,7 +1673,7 @@ void gfs2_reclaim_glock(struct gfs2_sbd *sdp)
 	atomic_inc(&sdp->sd_reclaimed);
 
 	if (gfs2_glmutex_trylock(gl)) {
-		if (queue_empty(gl, &gl->gl_holders) &&
+		if (list_empty(&gl->gl_holders) &&
 		    gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl))
 			handle_callback(gl, LM_ST_UNLOCKED);
 		gfs2_glmutex_unlock(gl);
@@ -1761,7 +1737,7 @@ static void scan_glock(struct gfs2_glock *gl)
 		return;
 
 	if (gfs2_glmutex_trylock(gl)) {
-		if (queue_empty(gl, &gl->gl_holders) &&
+		if (list_empty(&gl->gl_holders) &&
 		    gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl))
 			goto out_schedule;
 		gfs2_glmutex_unlock(gl);
@@ -1810,7 +1786,7 @@ static void clear_glock(struct gfs2_glock *gl)
 	}
 
 	if (gfs2_glmutex_trylock(gl)) {
-		if (queue_empty(gl, &gl->gl_holders) &&
+		if (list_empty(gl, &gl->gl_holders) &&
 		    gl->gl_state != LM_ST_UNLOCKED)
 			handle_callback(gl, LM_ST_UNLOCKED);
 		gfs2_glmutex_unlock(gl);
-- 
cgit v1.2.1


From 90101c31867b7acc44286b425d50e1042aa55b8d Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Tue, 23 Jan 2007 13:20:41 -0500
Subject: [GFS2] Compile fix for glock.c

This one liner got missed from the previous patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/glock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 1509481b8ca6..f68582ddaa5c 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1786,7 +1786,7 @@ static void clear_glock(struct gfs2_glock *gl)
 	}
 
 	if (gfs2_glmutex_trylock(gl)) {
-		if (list_empty(gl, &gl->gl_holders) &&
+		if (list_empty(&gl->gl_holders) &&
 		    gl->gl_state != LM_ST_UNLOCKED)
 			handle_callback(gl, LM_ST_UNLOCKED);
 		gfs2_glmutex_unlock(gl);
-- 
cgit v1.2.1


From ddfe0627838ca0c0e8babb0dd2bd7f4b35e25bff Mon Sep 17 00:00:00 2001
From: Eric Sandeen <sandeen@redhat.com>
Date: Thu, 18 Jan 2007 16:41:23 -0600
Subject: [GFS2] use CURRENT_TIME_SEC instead of get_seconds in gfs2

I was looking something else up and came across this...

I don't honestly have a good reason to change it other than to make it
like every other Linux filesystem in this regard.  ;-)  It doesn't
functionally change anything, but makes some lines shorter. :)

I'm also curious; why does gfs2 have 64-bits of on-disk timestamps, but
not in timespec_t format, and only stores second resolutions?  Seems like
you're halfway to sub-second resolutions already.

I suppose if that gets implemented then all of the below should
instead be CURRENT_TIME not CURRENT_TIME_SEC.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/bmap.c      | 10 +++++-----
 fs/gfs2/dir.c       | 10 +++++-----
 fs/gfs2/inode.c     |  2 +-
 fs/gfs2/ops_inode.c |  2 +-
 4 files changed, 12 insertions(+), 12 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 8240c1ff94f4..113f6c9110c7 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -773,7 +773,7 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh,
 			gfs2_free_data(ip, bstart, blen);
 	}
 
-	ip->i_inode.i_mtime.tv_sec = ip->i_inode.i_ctime.tv_sec = get_seconds();
+	ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 
 	gfs2_dinode_out(ip, dibh->b_data);
 
@@ -848,7 +848,7 @@ static int do_grow(struct gfs2_inode *ip, u64 size)
 	}
 
 	ip->i_di.di_size = size;
-	ip->i_inode.i_mtime.tv_sec = ip->i_inode.i_ctime.tv_sec = get_seconds();
+	ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 
 	error = gfs2_meta_inode_buffer(ip, &dibh);
 	if (error)
@@ -963,7 +963,7 @@ static int trunc_start(struct gfs2_inode *ip, u64 size)
 
 	if (gfs2_is_stuffed(ip)) {
 		ip->i_di.di_size = size;
-		ip->i_inode.i_mtime.tv_sec = ip->i_inode.i_ctime.tv_sec = get_seconds();
+		ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 		gfs2_trans_add_bh(ip->i_gl, dibh, 1);
 		gfs2_dinode_out(ip, dibh->b_data);
 		gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode) + size);
@@ -975,7 +975,7 @@ static int trunc_start(struct gfs2_inode *ip, u64 size)
 
 		if (!error) {
 			ip->i_di.di_size = size;
-			ip->i_inode.i_mtime.tv_sec = ip->i_inode.i_ctime.tv_sec = get_seconds();
+			ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 			ip->i_di.di_flags |= GFS2_DIF_TRUNC_IN_PROG;
 			gfs2_trans_add_bh(ip->i_gl, dibh, 1);
 			gfs2_dinode_out(ip, dibh->b_data);
@@ -1048,7 +1048,7 @@ static int trunc_end(struct gfs2_inode *ip)
 			ip->i_num.no_addr;
 		gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode));
 	}
-	ip->i_inode.i_mtime.tv_sec = ip->i_inode.i_ctime.tv_sec = get_seconds();
+	ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 	ip->i_di.di_flags &= ~GFS2_DIF_TRUNC_IN_PROG;
 
 	gfs2_trans_add_bh(ip->i_gl, dibh, 1);
diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c
index 0eceb051084f..c93ca8f361b5 100644
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -131,7 +131,7 @@ static int gfs2_dir_write_stuffed(struct gfs2_inode *ip, const char *buf,
 	memcpy(dibh->b_data + offset + sizeof(struct gfs2_dinode), buf, size);
 	if (ip->i_di.di_size < offset + size)
 		ip->i_di.di_size = offset + size;
-	ip->i_inode.i_mtime.tv_sec = ip->i_inode.i_ctime.tv_sec = get_seconds();
+	ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 	gfs2_dinode_out(ip, dibh->b_data);
 
 	brelse(dibh);
@@ -229,7 +229,7 @@ out:
 
 	if (ip->i_di.di_size < offset + copied)
 		ip->i_di.di_size = offset + copied;
-	ip->i_inode.i_mtime.tv_sec = ip->i_inode.i_ctime.tv_sec = get_seconds();
+	ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 
 	gfs2_trans_add_bh(ip->i_gl, dibh, 1);
 	gfs2_dinode_out(ip, dibh->b_data);
@@ -1565,7 +1565,7 @@ int gfs2_dir_add(struct inode *inode, const struct qstr *name,
 				break;
 			gfs2_trans_add_bh(ip->i_gl, bh, 1);
 			ip->i_di.di_entries++;
-			ip->i_inode.i_mtime.tv_sec = ip->i_inode.i_ctime.tv_sec = get_seconds();
+			ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 			gfs2_dinode_out(ip, bh->b_data);
 			brelse(bh);
 			error = 0;
@@ -1651,7 +1651,7 @@ int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *name)
 		gfs2_consist_inode(dip);
 	gfs2_trans_add_bh(dip->i_gl, bh, 1);
 	dip->i_di.di_entries--;
-	dip->i_inode.i_mtime.tv_sec = dip->i_inode.i_ctime.tv_sec = get_seconds();
+	dip->i_inode.i_mtime = dip->i_inode.i_ctime = CURRENT_TIME_SEC;
 	gfs2_dinode_out(dip, bh->b_data);
 	brelse(bh);
 	mark_inode_dirty(&dip->i_inode);
@@ -1699,7 +1699,7 @@ int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename,
 		gfs2_trans_add_bh(dip->i_gl, bh, 1);
 	}
 
-	dip->i_inode.i_mtime.tv_sec = dip->i_inode.i_ctime.tv_sec = get_seconds();
+	dip->i_inode.i_mtime = dip->i_inode.i_ctime = CURRENT_TIME_SEC;
 	gfs2_dinode_out(dip, bh->b_data);
 	brelse(bh);
 	return 0;
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 260316954ad7..f7c8d31ce41a 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -357,7 +357,7 @@ int gfs2_change_nlink(struct gfs2_inode *ip, int diff)
 	else
 		drop_nlink(&ip->i_inode);
 
-	ip->i_inode.i_ctime.tv_sec = get_seconds();
+	ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 
 	gfs2_trans_add_bh(ip->i_gl, dibh, 1);
 	gfs2_dinode_out(ip, dibh->b_data);
diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index b2a12f44f59d..747c7316f5cc 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -728,7 +728,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 		error = gfs2_meta_inode_buffer(ip, &dibh);
 		if (error)
 			goto out_end_trans;
-		ip->i_inode.i_ctime.tv_sec = get_seconds();
+		ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 		gfs2_trans_add_bh(ip->i_gl, dibh, 1);
 		gfs2_dinode_out(ip, dibh->b_data);
 		brelse(dibh);
-- 
cgit v1.2.1


From d043e1900c97f7282b71844c8530279913b6ec5a Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Tue, 23 Jan 2007 16:56:36 -0500
Subject: [GFS2] Fix typo in glock.c

This is a one letter typo fix in glock.c, spotted by Rob Kenna.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/glock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index f68582ddaa5c..c070ede531c5 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -482,7 +482,7 @@ static int rq_mutex(struct gfs2_holder *gh)
 	list_del_init(&gh->gh_list);
 	/*  gh->gh_error never examined.  */
 	set_bit(GLF_LOCK, &gl->gl_flags);
-	clear_bit(HIF_WAIT, &gh->gh_flags);
+	clear_bit(HIF_WAIT, &gh->gh_iflags);
 	smp_mb();
 	wake_up_bit(&gh->gh_iflags, HIF_WAIT);
 
-- 
cgit v1.2.1


From f1f1c1ccf7848a6e25db30ee9216e1a1e7eb6bef Mon Sep 17 00:00:00 2001
From: Patrick Caulfield <pcaulfie@redhat.com>
Date: Wed, 24 Jan 2007 11:17:59 +0000
Subject: [DLM] Make sock_sem into a mutex

Now that there can be multiple dlm_recv threads running we need to prevent two
recvs running for the same connection - it's unlikely but it can happen and it
causes message corruption.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lowcomms-tcp.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lowcomms-tcp.c b/fs/dlm/lowcomms-tcp.c
index 8e6a76cf1805..18ade44287e7 100644
--- a/fs/dlm/lowcomms-tcp.c
+++ b/fs/dlm/lowcomms-tcp.c
@@ -96,7 +96,7 @@ static bool cbuf_empty(struct cbuf *cb)
 struct connection {
 	struct socket *sock;	/* NULL if not connected */
 	uint32_t nodeid;	/* So we know who we are in the list */
-	struct rw_semaphore sock_sem; /* Stop connect races */
+	struct mutex sock_mutex;
 	unsigned long flags;	/* bit 1,2 = We are on the read/write lists */
 #define CF_READ_PENDING 1
 #define CF_WRITE_PENDING 2
@@ -171,7 +171,7 @@ static struct connection *nodeid2con(int nodeid, gfp_t allocation)
 			goto finish;
 
 		con->nodeid = nodeid;
-		init_rwsem(&con->sock_sem);
+		mutex_init(&con->sock_mutex);
 		INIT_LIST_HEAD(&con->writequeue);
 		spin_lock_init(&con->writequeue_lock);
 		INIT_WORK(&con->swork, process_send_sockets);
@@ -247,7 +247,7 @@ static void make_sockaddr(struct sockaddr_storage *saddr, uint16_t port,
 /* Close a remote connection and tidy up */
 static void close_connection(struct connection *con, bool and_other)
 {
-	down_write(&con->sock_sem);
+	mutex_lock(&con->sock_mutex);
 
 	if (con->sock) {
 		sock_release(con->sock);
@@ -262,7 +262,7 @@ static void close_connection(struct connection *con, bool and_other)
 		con->rx_page = NULL;
 	}
 	con->retries = 0;
-	up_write(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 }
 
 /* Data received from remote end */
@@ -276,7 +276,7 @@ static int receive_from_sock(struct connection *con)
 	int r;
 	int call_again_soon = 0;
 
-	down_read(&con->sock_sem);
+	mutex_lock(&con->sock_mutex);
 
 	if (con->sock == NULL)
 		goto out;
@@ -355,17 +355,17 @@ static int receive_from_sock(struct connection *con)
 out:
 	if (call_again_soon)
 		goto out_resched;
-	up_read(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 	return 0;
 
 out_resched:
 	if (!test_and_set_bit(CF_READ_PENDING, &con->flags))
 		queue_work(recv_workqueue, &con->rwork);
-	up_read(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 	return -EAGAIN;
 
 out_close:
-	up_read(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 	if (ret != -EAGAIN && !test_bit(CF_IS_OTHERCON, &con->flags)) {
 		close_connection(con, false);
 		/* Reconnect when there is something to send */
@@ -391,7 +391,7 @@ static int accept_from_sock(struct connection *con)
 	if (result < 0)
 		return -ENOMEM;
 
-	down_read_nested(&con->sock_sem, 0);
+	mutex_lock_nested(&con->sock_mutex, 0);
 
 	result = -ENOTCONN;
 	if (con->sock == NULL)
@@ -417,7 +417,7 @@ static int accept_from_sock(struct connection *con)
 	if (dlm_addr_to_nodeid(&peeraddr, &nodeid)) {
 		printk("dlm: connect from non cluster node\n");
 		sock_release(newsock);
-		up_read(&con->sock_sem);
+		mutex_unlock(&con->sock_mutex);
 		return -1;
 	}
 
@@ -434,7 +434,7 @@ static int accept_from_sock(struct connection *con)
 		result = -ENOMEM;
 		goto accept_err;
 	}
-	down_write_nested(&newcon->sock_sem, 1);
+	mutex_lock_nested(&newcon->sock_mutex, 1);
 	if (newcon->sock) {
 		struct connection *othercon = newcon->othercon;
 
@@ -442,13 +442,13 @@ static int accept_from_sock(struct connection *con)
 			othercon = kmem_cache_zalloc(con_cache, GFP_KERNEL);
 			if (!othercon) {
 				printk("dlm: failed to allocate incoming socket\n");
-				up_write(&newcon->sock_sem);
+				mutex_unlock(&newcon->sock_mutex);
 				result = -ENOMEM;
 				goto accept_err;
 			}
 			othercon->nodeid = nodeid;
 			othercon->rx_action = receive_from_sock;
-			init_rwsem(&othercon->sock_sem);
+			mutex_init(&othercon->sock_mutex);
 			INIT_WORK(&othercon->swork, process_send_sockets);
 			INIT_WORK(&othercon->rwork, process_recv_sockets);
 			set_bit(CF_IS_OTHERCON, &othercon->flags);
@@ -466,7 +466,7 @@ static int accept_from_sock(struct connection *con)
 		addcon = newcon;
 	}
 
-	up_write(&newcon->sock_sem);
+	mutex_unlock(&newcon->sock_mutex);
 
 	/*
 	 * Add it to the active queue in case we got data
@@ -475,12 +475,12 @@ static int accept_from_sock(struct connection *con)
 	 */
 	if (!test_and_set_bit(CF_READ_PENDING, &addcon->flags))
 		queue_work(recv_workqueue, &addcon->rwork);
-	up_read(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 
 	return 0;
 
 accept_err:
-	up_read(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 	sock_release(newsock);
 
 	if (result != -EAGAIN)
@@ -501,7 +501,7 @@ static void connect_to_sock(struct connection *con)
 		return;
 	}
 
-	down_write(&con->sock_sem);
+	mutex_lock(&con->sock_mutex);
 	if (con->retries++ > MAX_CONNECT_RETRIES)
 		goto out;
 
@@ -553,7 +553,7 @@ out_err:
 		result = 0;
 	}
 out:
-	up_write(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 	return;
 }
 
@@ -757,7 +757,7 @@ static void send_to_sock(struct connection *con)
 	struct writequeue_entry *e;
 	int len, offset;
 
-	down_read(&con->sock_sem);
+	mutex_lock(&con->sock_mutex);
 	if (con->sock == NULL)
 		goto out_connect;
 
@@ -803,17 +803,17 @@ static void send_to_sock(struct connection *con)
 	}
 	spin_unlock(&con->writequeue_lock);
 out:
-	up_read(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 	return;
 
 send_error:
-	up_read(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 	close_connection(con, false);
 	lowcomms_connect_sock(con);
 	return;
 
 out_connect:
-	up_read(&con->sock_sem);
+	mutex_unlock(&con->sock_mutex);
 	connect_to_sock(con);
 	return;
 }
-- 
cgit v1.2.1


From 8fd3a98f2c22982aff4d29e4ee72959d3032c123 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 24 Jan 2007 10:11:45 -0600
Subject: [DLM] saved dlm message can be dropped

dlm_receive_message() returns 0 instead of returning 'error'.  What would
happen is that process_requestqueue would take a saved message off the
requestqueue and call receive_message on it.  receive_message would then
see that recovery had been aborted, set error to EINTR, and 'goto out',
expecting that the error would be returned.  Instead, 0 was always
returned, so process_requestqueue would think that the message had been
processed and delete it instead of saving it to process next time.  This
means the message (usually an unlock in my tests) would be lost.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 6ad2b8eb96a5..7c7ac2aaa8b1 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3018,7 +3018,7 @@ int dlm_receive_message(struct dlm_header *hd, int nodeid, int recovery)
 {
 	struct dlm_message *ms = (struct dlm_message *) hd;
 	struct dlm_ls *ls;
-	int error;
+	int error = 0;
 
 	if (!recovery)
 		dlm_message_in(ms);
@@ -3135,7 +3135,7 @@ int dlm_receive_message(struct dlm_header *hd, int nodeid, int recovery)
  out:
 	dlm_put_lockspace(ls);
 	dlm_astd_wake();
-	return 0;
+	return error;
 }
 
 
-- 
cgit v1.2.1


From b790c3b7c38aae28c497bb363a6fe72f7c96568f Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 24 Jan 2007 10:21:33 -0600
Subject: [DLM] can miss clearing resend flag

A long, complicated sequence of events, beginning with the RESEND flag not
being cleared on an lkb, can result in an unlock never completing.

- lkb on waiters list for remote lookup
- the remote node is both the dir node and the master node, so
  it optimizes the lookup into a request and sends a request
  reply back
- the request reply is saved on the requestqueue to be processed
  after recovery
- recovery runs dlm_recover_waiters_pre() which sets RESEND flag
  so the lookup will be resent after recovery
- end of recovery: process_requestqueue takes saved request reply
  which removes the lkb off the waitesr list, _without_ clearing
  the RESEND flag
- end of recovery: dlm_recover_waiters_post() doesn't do anything
  with the now completed lookup lkb (would usually clear RESEND)
- later, the node unmounts, unlocks this lkb that still has RESEND
  flag set
- the lkb is on the waiters list again, now for unlock, when recovery
  occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
  set, doesn't do anything since the master still exists
- end of recovery: dlm_recover_waiters_post() takes this lkb off
  the waiters list because it has the RESEND flag set, then reports
  an error because unlocks are never supposed to be handled in
  recover_waiters_post().
- later, the unlock reply is received, doesn't find the lkb on
  the waiters list because recover_waiters_post() has wrongly
  removed it.
- the unlock operation has been lost, and we're left with a
  stray granted lock
- unmount spins waiting for the unlock to complete

The visible evidence of this problem will be a node where gfs umount is
spinning, the dlm waiters list will be empty, and the dlm locks list will
show a granted lock.

The fix is simply to clear the RESEND flag when taking an lkb off the
waiters list.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lock.c | 6 ++++++
 1 file changed, 6 insertions(+)

(limited to 'fs')

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 7c7ac2aaa8b1..c10257f10b90 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -754,6 +754,11 @@ static void add_to_waiters(struct dlm_lkb *lkb, int mstype)
 	mutex_unlock(&ls->ls_waiters_mutex);
 }
 
+/* We clear the RESEND flag because we might be taking an lkb off the waiters
+   list as part of process_requestqueue (e.g. a lookup that has an optimized
+   request reply on the requestqueue) between dlm_recover_waiters_pre() which
+   set RESEND and dlm_recover_waiters_post() */
+
 static int _remove_from_waiters(struct dlm_lkb *lkb)
 {
 	int error = 0;
@@ -764,6 +769,7 @@ static int _remove_from_waiters(struct dlm_lkb *lkb)
 		goto out;
 	}
 	lkb->lkb_wait_type = 0;
+	lkb->lkb_flags &= ~DLM_IFL_RESEND;
 	list_del(&lkb->lkb_wait_reply);
 	unhold_lkb(lkb);
  out:
-- 
cgit v1.2.1


From d7c103d0bd29c94f78155a4538faf314e49d9713 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Thu, 25 Jan 2007 17:14:59 +0000
Subject: [GFS2] Fix recursive locking attempt with NFS

In certain cases, its possible for NFS to call the lookup code while
holding the glock (when doing a readdirplus operation) so we need to
check for that and not try and lock the glock twice. This also fixes a
typo in a previous NFS related GFS2 patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/inode.c     | 23 +++++++++++++++--------
 fs/gfs2/ops_inode.c |  2 +-
 2 files changed, 16 insertions(+), 9 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index f7c8d31ce41a..88fcfb4f5c4d 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -395,8 +395,10 @@ struct inode *gfs2_lookup_simple(struct inode *dip, const char *name)
  * @is_root: If 1, ignore the caller's permissions
  * @i_gh: An uninitialized holder for the new inode glock
  *
- * There will always be a vnode (Linux VFS inode) for the d_gh inode unless
- * @is_root is true.
+ * This can be called via the VFS filldir function when NFS is doing
+ * a readdirplus and the inode which its intending to stat isn't
+ * already in cache. In this case we must not take the directory glock
+ * again, since the readdir call will have already taken that lock.
  *
  * Returns: errno
  */
@@ -409,8 +411,9 @@ struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name,
 	struct gfs2_holder d_gh;
 	struct gfs2_inum_host inum;
 	unsigned int type;
-	int error = 0;
+	int error;
 	struct inode *inode = NULL;
+	int unlock = 0;
 
 	if (!name->len || name->len > GFS2_FNAMESIZE)
 		return ERR_PTR(-ENAMETOOLONG);
@@ -422,9 +425,12 @@ struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name,
 		return dir;
 	}
 
-	error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh);
-	if (error)
-		return ERR_PTR(error);
+	if (gfs2_glock_is_locked_by_me(dip->i_gl) == 0) {
+		error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh);
+		if (error)
+			return ERR_PTR(error);
+		unlock = 1;
+	}
 
 	if (!is_root) {
 		error = permission(dir, MAY_EXEC, NULL);
@@ -439,10 +445,11 @@ struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name,
 	inode = gfs2_inode_lookup(sb, &inum, type);
 
 out:
-	gfs2_glock_dq_uninit(&d_gh);
+	if (unlock)
+		gfs2_glock_dq_uninit(&d_gh);
 	if (error == -ENOENT)
 		return NULL;
-	return inode;
+	return inode ? inode : ERR_PTR(error);
 }
 
 static int pick_formal_ino_1(struct gfs2_sbd *sdp, u64 *formal_ino)
diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index 747c7316f5cc..5591f8905cf7 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -1018,7 +1018,7 @@ static int gfs2_getattr(struct vfsmount *mnt, struct dentry *dentry,
 	}
 
 	generic_fillattr(inode, stat);
-	if (unlock);
+	if (unlock)
 		gfs2_glock_dq_uninit(&gh);
 
 	return 0;
-- 
cgit v1.2.1


From 8bd9572769767c6fd164cff4e1202df12cb34b4a Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Thu, 25 Jan 2007 10:04:20 +0000
Subject: [GFS2] Fix list corruption in lops.c

The patch below appears to fix the list corruption that we are seeing on
occasion. Although the transaction structure is private to a single
thread, when the queued structures are dismantled during an in-core
commit, its possible for a different thread to be trying to add the same
structure to another, new, transaction at the same time.

To avoid this, this patch takes the log spinlock during this operation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/lops.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 4d7f94d8c7bd..16bb4b4561ae 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -69,13 +69,16 @@ static void buf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le)
 	struct gfs2_bufdata *bd = container_of(le, struct gfs2_bufdata, bd_le);
 	struct gfs2_trans *tr;
 
-	if (!list_empty(&bd->bd_list_tr))
+	gfs2_log_lock(sdp);
+	if (!list_empty(&bd->bd_list_tr)) {
+		gfs2_log_unlock(sdp);
 		return;
-
+	}
 	tr = current->journal_info;
 	tr->tr_touched = 1;
 	tr->tr_num_buf++;
 	list_add(&bd->bd_list_tr, &tr->tr_list_buf);
+	gfs2_log_unlock(sdp);
 
 	if (!list_empty(&le->le_list))
 		return;
@@ -84,7 +87,6 @@ static void buf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le)
 
 	gfs2_meta_check(sdp, bd->bd_bh);
 	gfs2_pin(sdp, bd->bd_bh);
-
 	gfs2_log_lock(sdp);
 	sdp->sd_log_num_buf++;
 	list_add(&le->le_list, &sdp->sd_log_le_buf);
@@ -98,11 +100,13 @@ static void buf_lo_incore_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 	struct list_head *head = &tr->tr_list_buf;
 	struct gfs2_bufdata *bd;
 
+	gfs2_log_lock(sdp);
 	while (!list_empty(head)) {
 		bd = list_entry(head->next, struct gfs2_bufdata, bd_list_tr);
 		list_del_init(&bd->bd_list_tr);
 		tr->tr_num_buf--;
 	}
+	gfs2_log_unlock(sdp);
 	gfs2_assert_warn(sdp, !tr->tr_num_buf);
 }
 
@@ -462,13 +466,17 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le)
 	struct address_space *mapping = bd->bd_bh->b_page->mapping;
 	struct gfs2_inode *ip = GFS2_I(mapping->host);
 
+	gfs2_log_lock(sdp);
 	tr->tr_touched = 1;
 	if (list_empty(&bd->bd_list_tr) &&
 	    (ip->i_di.di_flags & GFS2_DIF_JDATA)) {
 		tr->tr_num_buf++;
 		list_add(&bd->bd_list_tr, &tr->tr_list_buf);
+		gfs2_log_unlock(sdp);
 		gfs2_pin(sdp, bd->bd_bh);
 		tr->tr_num_buf_new++;
+	} else {
+		gfs2_log_unlock(sdp);
 	}
 	gfs2_trans_add_gl(bd->bd_gl);
 	gfs2_log_lock(sdp);
-- 
cgit v1.2.1


From 2f708649baad6350f506e7f5ca6649e32a8e4b49 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Thu, 25 Jan 2007 13:50:52 -0600
Subject: [GFS2] increase default lock limit

Increase the number of locks at which point the dlm begins asking gfs to
reduce its lock usage.  The default value is largely arbitrary, but the
current value of 50,000 ends up limiting performance unnecessarily for too
many users.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/locking/dlm/lock_dlm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/gfs2/locking/dlm/lock_dlm.h b/fs/gfs2/locking/dlm/lock_dlm.h
index 33af707a4d3f..a87c7bf3c568 100644
--- a/fs/gfs2/locking/dlm/lock_dlm.h
+++ b/fs/gfs2/locking/dlm/lock_dlm.h
@@ -36,7 +36,7 @@
 
 #define GDLM_STRNAME_BYTES	24
 #define GDLM_LVB_SIZE		32
-#define GDLM_DROP_COUNT		50000
+#define GDLM_DROP_COUNT		200000
 #define GDLM_DROP_PERIOD	60
 #define GDLM_NAME_LEN		128
 
-- 
cgit v1.2.1


From ee32e4f3d347e4b562de0bd70be99e622d7d1a9f Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Thu, 25 Jan 2007 14:24:04 -0600
Subject: [GFS2] make lock_dlm drop_count tunable in sysfs

We want to be able to change or disable the default drop_count (number at
which the dlm asks gfs to limit the the number of locks it's holding).
Add it to the collection of sysfs tunables for an fs.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/locking/dlm/main.c  |  6 ------
 fs/gfs2/locking/dlm/mount.c |  6 ++----
 fs/gfs2/locking/dlm/sysfs.c | 13 +++++++++++++
 3 files changed, 15 insertions(+), 10 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/locking/dlm/main.c b/fs/gfs2/locking/dlm/main.c
index 2194b1d5b5ec..a0e7eda643ed 100644
--- a/fs/gfs2/locking/dlm/main.c
+++ b/fs/gfs2/locking/dlm/main.c
@@ -11,9 +11,6 @@
 
 #include "lock_dlm.h"
 
-extern int gdlm_drop_count;
-extern int gdlm_drop_period;
-
 extern struct lm_lockops gdlm_ops;
 
 static int __init init_lock_dlm(void)
@@ -40,9 +37,6 @@ static int __init init_lock_dlm(void)
 		return error;
 	}
 
-	gdlm_drop_count = GDLM_DROP_COUNT;
-	gdlm_drop_period = GDLM_DROP_PERIOD;
-
 	printk(KERN_INFO
 	       "Lock_DLM (built %s %s) installed\n", __DATE__, __TIME__);
 	return 0;
diff --git a/fs/gfs2/locking/dlm/mount.c b/fs/gfs2/locking/dlm/mount.c
index cdd1694e889b..1d8faa3da8af 100644
--- a/fs/gfs2/locking/dlm/mount.c
+++ b/fs/gfs2/locking/dlm/mount.c
@@ -9,8 +9,6 @@
 
 #include "lock_dlm.h"
 
-int gdlm_drop_count;
-int gdlm_drop_period;
 const struct lm_lockops gdlm_ops;
 
 
@@ -24,8 +22,8 @@ static struct gdlm_ls *init_gdlm(lm_callback_t cb, struct gfs2_sbd *sdp,
 	if (!ls)
 		return NULL;
 
-	ls->drop_locks_count = gdlm_drop_count;
-	ls->drop_locks_period = gdlm_drop_period;
+	ls->drop_locks_count = GDLM_DROP_COUNT;
+	ls->drop_locks_period = GDLM_DROP_PERIOD;
 	ls->fscb = cb;
 	ls->sdp = sdp;
 	ls->fsflags = flags;
diff --git a/fs/gfs2/locking/dlm/sysfs.c b/fs/gfs2/locking/dlm/sysfs.c
index 29ae06f94944..4746b884662d 100644
--- a/fs/gfs2/locking/dlm/sysfs.c
+++ b/fs/gfs2/locking/dlm/sysfs.c
@@ -116,6 +116,17 @@ static ssize_t recover_status_show(struct gdlm_ls *ls, char *buf)
 	return sprintf(buf, "%d\n", ls->recover_jid_status);
 }
 
+static ssize_t drop_count_show(struct gdlm_ls *ls, char *buf)
+{
+	return sprintf(buf, "%d\n", ls->drop_locks_count);
+}
+
+static ssize_t drop_count_store(struct gdlm_ls *ls, const char *buf, size_t len)
+{
+	ls->drop_locks_count = simple_strtol(buf, NULL, 0);
+	return len;
+}
+
 struct gdlm_attr {
 	struct attribute attr;
 	ssize_t (*show)(struct gdlm_ls *, char *);
@@ -135,6 +146,7 @@ GDLM_ATTR(first_done,     0444, first_done_show,     NULL);
 GDLM_ATTR(recover,        0644, recover_show,        recover_store);
 GDLM_ATTR(recover_done,   0444, recover_done_show,   NULL);
 GDLM_ATTR(recover_status, 0444, recover_status_show, NULL);
+GDLM_ATTR(drop_count,     0644, drop_count_show,     drop_count_store);
 
 static struct attribute *gdlm_attrs[] = {
 	&gdlm_attr_proto_name.attr,
@@ -147,6 +159,7 @@ static struct attribute *gdlm_attrs[] = {
 	&gdlm_attr_recover.attr,
 	&gdlm_attr_recover_done.attr,
 	&gdlm_attr_recover_status.attr,
+	&gdlm_attr_drop_count.attr,
 	NULL,
 };
 
-- 
cgit v1.2.1


From 67f55897ee5ffa16ca00ed39d176dc52b5066679 Mon Sep 17 00:00:00 2001
From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Thu, 25 Jan 2007 18:42:39 -0800
Subject: [GFS2/DLM] use sysfs

With CONFIG_DLM=m, CONFIG_PROC_FS=n, and CONFIG_SYSFS=n, kernel build
fails with:

WARNING: "kernel_subsys" [fs/gfs2/locking/dlm/lock_dlm.ko] undefined!
WARNING: "kernel_subsys" [fs/dlm/dlm.ko] undefined!
WARNING: "kernel_subsys" [fs/configfs/configfs.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

Since fs/dlm/lockspace.c and fs/gfs2/locking/dlm/sysfs.c use
kernel_subsys, they should either DEPEND on it or SELECT it.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/Kconfig  | 1 +
 fs/gfs2/Kconfig | 1 +
 2 files changed, 2 insertions(+)

(limited to 'fs')

diff --git a/fs/dlm/Kconfig b/fs/dlm/Kconfig
index b5654a284fef..d1359b91410a 100644
--- a/fs/dlm/Kconfig
+++ b/fs/dlm/Kconfig
@@ -5,6 +5,7 @@ config DLM
 	tristate "Distributed Lock Manager (DLM)"
 	depends on IPV6 || IPV6=n
 	select CONFIGFS_FS
+	select SYSFS
 	select IP_SCTP if DLM_SCTP
 	help
 	A general purpose distributed lock manager for kernel or userspace
diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index 6a2ffa2db14f..2c184a97cb12 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -38,6 +38,7 @@ config GFS2_FS_LOCKING_DLM
 	select IP_SCTP if DLM_SCTP
 	select CONFIGFS_FS
 	select DLM
+	select SYSFS
 	help
 	Multiple node locking module for GFS2
 
-- 
cgit v1.2.1


From 001172778543c6997d3339f43085e43460e5883a Mon Sep 17 00:00:00 2001
From: Adrian Bunk <bunk@stusta.de>
Date: Sun, 28 Jan 2007 17:19:50 +0100
Subject: [GFS2/DLM] fix GFS2 circular dependency

On Sun, Jan 28, 2007 at 11:08:18AM +0100, Jiri Slaby wrote:
> Andrew Morton napsal(a):
> >Temporarily at
> >
> >	http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/
>
> Unable to select IPV6. Menuconfig doesn't offer it when INET is selected.
> When it's not it appears in the menu, but after state change it gets away.
> The same behaviour in xconfig, gconfig.
>
> $ mkdir ../a/tst
> $ make O=../a/tst menuconfig
>   HOSTCC  scripts/basic/fixdep
> [...]
>   HOSTLD  scripts/kconfig/mconf
> scripts/kconfig/mconf arch/i386/Kconfig
> Warning! Found recursive dependency: INET GFS2_FS_LOCKING_DLM SYSFS
> OCFS2_FS INET
>
> Maybe this is the problem?

Yes, patch below.

> regards,

cu
Adrian

<--  snip  -->

This patch fixes a circular dependency by letting GFS2_FS_LOCKING_DLM
and DLM depend on instead of select SYSFS.

Since SYSFS depends on EMBEDDED this change shouldn't cause any problems
for users.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/Kconfig  | 3 +--
 fs/gfs2/Kconfig | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/Kconfig b/fs/dlm/Kconfig
index d1359b91410a..7e264b72e0c5 100644
--- a/fs/dlm/Kconfig
+++ b/fs/dlm/Kconfig
@@ -3,9 +3,8 @@ menu "Distributed Lock Manager"
 
 config DLM
 	tristate "Distributed Lock Manager (DLM)"
-	depends on IPV6 || IPV6=n
+	depends on SYSFS && (IPV6 || IPV6=n)
 	select CONFIGFS_FS
-	select SYSFS
 	select IP_SCTP if DLM_SCTP
 	help
 	A general purpose distributed lock manager for kernel or userspace
diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index 2c184a97cb12..cbd5f3319321 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -34,11 +34,10 @@ config GFS2_FS_LOCKING_NOLOCK
 
 config GFS2_FS_LOCKING_DLM
 	tristate "GFS2 DLM locking module"
-	depends on GFS2_FS && NET && INET && (IPV6 || IPV6=n)
+	depends on GFS2_FS && SYSFS && NET && INET && (IPV6 || IPV6=n)
 	select IP_SCTP if DLM_SCTP
 	select CONFIGFS_FS
 	select DLM
-	select SYSFS
 	help
 	Multiple node locking module for GFS2
 
-- 
cgit v1.2.1


From bbb28ab7599789740b2233a0805d22aefb97f533 Mon Sep 17 00:00:00 2001
From: Eric Sandeen <sandeen@redhat.com>
Date: Mon, 29 Jan 2007 11:11:51 -0600
Subject: [GFS2] more CURRENT_TIME_SEC

Whoops, quilt user error, missed this one in the previous patch.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/eattr.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/eattr.c b/fs/gfs2/eattr.c
index ebebbdcd7057..0c83c7f4dda8 100644
--- a/fs/gfs2/eattr.c
+++ b/fs/gfs2/eattr.c
@@ -301,7 +301,7 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, struct buffer_head *bh,
 
 	error = gfs2_meta_inode_buffer(ip, &dibh);
 	if (!error) {
-		ip->i_inode.i_ctime.tv_sec = get_seconds();
+		ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 		gfs2_trans_add_bh(ip->i_gl, dibh, 1);
 		gfs2_dinode_out(ip, dibh->b_data);
 		brelse(dibh);
@@ -718,7 +718,7 @@ static int ea_alloc_skeleton(struct gfs2_inode *ip, struct gfs2_ea_request *er,
 					    (er->er_mode & S_IFMT));
 			ip->i_inode.i_mode = er->er_mode;
 		}
-		ip->i_inode.i_ctime.tv_sec = get_seconds();
+		ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 		gfs2_trans_add_bh(ip->i_gl, dibh, 1);
 		gfs2_dinode_out(ip, dibh->b_data);
 		brelse(dibh);
@@ -853,7 +853,7 @@ static int ea_set_simple_noalloc(struct gfs2_inode *ip, struct buffer_head *bh,
 			(ip->i_inode.i_mode & S_IFMT) == (er->er_mode & S_IFMT));
 		ip->i_inode.i_mode = er->er_mode;
 	}
-	ip->i_inode.i_ctime.tv_sec = get_seconds();
+	ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 	gfs2_trans_add_bh(ip->i_gl, dibh, 1);
 	gfs2_dinode_out(ip, dibh->b_data);
 	brelse(dibh);
@@ -1134,7 +1134,7 @@ static int ea_remove_stuffed(struct gfs2_inode *ip, struct gfs2_ea_location *el)
 
 	error = gfs2_meta_inode_buffer(ip, &dibh);
 	if (!error) {
-		ip->i_inode.i_ctime.tv_sec = get_seconds();
+		ip->i_inode.i_ctime = CURRENT_TIME_SEC;
 		gfs2_trans_add_bh(ip->i_gl, dibh, 1);
 		gfs2_dinode_out(ip, dibh->b_data);
 		brelse(dibh);
-- 
cgit v1.2.1


From 61be084efcc4451934257350281962595418a33c Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Mon, 29 Jan 2007 11:51:45 +0000
Subject: [GFS2] Put back semaphore to avoid umount problem

Dave Teigland fixed this bug a while back, but I managed to mistakenly
remove the semaphore during later development. It is required to avoid
the list of inodes changing during an invalidate_inodes call. I have
made it an rwsem since the read side will be taken frequently during
normal filesystem operation. The write site will only happen during
umount of the file system.

Also the bug only triggers when using the DLM lock manager and only then
under certain conditions as its timing related.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Teigland <teigland@redhat.com>
---
 fs/gfs2/glock.c | 6 ++++++
 1 file changed, 6 insertions(+)

(limited to 'fs')

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index c070ede531c5..6618c1190252 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -20,6 +20,7 @@
 #include <linux/list.h>
 #include <linux/lm_interface.h>
 #include <linux/wait.h>
+#include <linux/rwsem.h>
 #include <asm/uaccess.h>
 
 #include "gfs2.h"
@@ -45,6 +46,7 @@ static int dump_glock(struct gfs2_glock *gl);
 static int dump_inode(struct gfs2_inode *ip);
 static void gfs2_glock_xmote_th(struct gfs2_holder *gh);
 static void gfs2_glock_drop_th(struct gfs2_glock *gl);
+static DECLARE_RWSEM(gfs2_umount_flush_sem);
 
 #define GFS2_GL_HASH_SHIFT      15
 #define GFS2_GL_HASH_SIZE       (1 << GFS2_GL_HASH_SHIFT)
@@ -1578,12 +1580,14 @@ void gfs2_glock_cb(void *cb_data, unsigned int type, void *data)
 		struct lm_async_cb *async = data;
 		struct gfs2_glock *gl;
 
+		down_read(&gfs2_umount_flush_sem);
 		gl = gfs2_glock_find(sdp, &async->lc_name);
 		if (gfs2_assert_warn(sdp, gl))
 			return;
 		if (!gfs2_assert_warn(sdp, gl->gl_req_bh))
 			gl->gl_req_bh(gl, async->lc_ret);
 		gfs2_glock_put(gl);
+		up_read(&gfs2_umount_flush_sem);
 		return;
 	}
 
@@ -1828,7 +1832,9 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp, int wait)
 			t = jiffies;
 		}
 
+		down_write(&gfs2_umount_flush_sem);
 		invalidate_inodes(sdp->sd_vfs);
+		up_write(&gfs2_umount_flush_sem);
 		msleep(10);
 	}
 }
-- 
cgit v1.2.1


From ddee76089cc9bcbd8ae9ec6c26e726a8ab2fe675 Mon Sep 17 00:00:00 2001
From: Russell Cattelan <cattelan@redhat.com>
Date: Mon, 29 Jan 2007 17:13:44 -0600
Subject: [GFS2] Fix unlink deadlocks

Move the glock acquisition to outside of the transactions.

Lock odering must be preserved in order to prevent ABBA
deadlocks. The current gfs2_change_nlink code would tries
to grab the glock after having started a transaction and thus is holding
the log lock. This is inconsistent with other code paths in
gfs that grab the resource group glock prior to staring
a tranactions.

One problem with this fix is that the resource group
lock is always grabbed now even if the inode still has
ref count and can not be marked for unlink.

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/inode.c     | 46 +---------------------------------------------
 fs/gfs2/ops_inode.c | 47 +++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 40 insertions(+), 53 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 88fcfb4f5c4d..0d6831a40565 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -280,50 +280,6 @@ out:
 	return error;
 }
 
-static int gfs2_change_nlink_i(struct gfs2_inode *ip)
-{
-	struct gfs2_sbd *sdp = ip->i_inode.i_sb->s_fs_info;
-	struct gfs2_inode *rindex = GFS2_I(sdp->sd_rindex);
-	struct gfs2_glock *ri_gl = rindex->i_gl;
-	struct gfs2_rgrpd *rgd;
-	struct gfs2_holder ri_gh, rg_gh;
-	int existing, error;
-
-	/* if we come from rename path, we could have the lock already */
-	existing = gfs2_glock_is_locked_by_me(ri_gl);
-	if (!existing) {
-		error = gfs2_rindex_hold(sdp, &ri_gh);
-		if (error)
-			goto out;
-	}
-
-	/* find the matching rgd */
-	error = -EIO;
-	rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr);
-	if (!rgd)
-		goto out_norgrp;
-
-	/*
-	 * Eventually we may want to move rgd(s) to a linked list
-	 * and piggyback the free logic into one of gfs2 daemons
-	 * to gain some performance.
-	 */
-	if (!rgd->rd_gl || !gfs2_glock_is_locked_by_me(rgd->rd_gl)) {
-		error = gfs2_glock_nq_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, &rg_gh);
-		if (error)
-			goto out_norgrp;
-
-		gfs2_unlink_di(&ip->i_inode); /* mark inode unlinked */
-		gfs2_glock_dq_uninit(&rg_gh);
-	}
-
-out_norgrp:
-	if (!existing)
-		gfs2_glock_dq_uninit(&ri_gh);
-out:
-	return error;
-}
-
 /**
  * gfs2_change_nlink - Change nlink count on inode
  * @ip: The GFS2 inode
@@ -365,7 +321,7 @@ int gfs2_change_nlink(struct gfs2_inode *ip, int diff)
 	mark_inode_dirty(&ip->i_inode);
 
 	if (ip->i_inode.i_nlink == 0)
-		error = gfs2_change_nlink_i(ip);
+		gfs2_unlink_di(&ip->i_inode); /* mark inode unlinked */
 
 	return error;
 }
diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index 5591f8905cf7..f40a84807d75 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -264,13 +264,23 @@ static int gfs2_unlink(struct inode *dir, struct dentry *dentry)
 	struct gfs2_inode *dip = GFS2_I(dir);
 	struct gfs2_sbd *sdp = GFS2_SB(dir);
 	struct gfs2_inode *ip = GFS2_I(dentry->d_inode);
-	struct gfs2_holder ghs[2];
+	struct gfs2_holder ghs[3];
+	struct gfs2_rgrpd *rgd;
+	struct gfs2_holder ri_gh;
 	int error;
 
+	error = gfs2_rindex_hold(sdp, &ri_gh);
+	if (error)
+		return error;
+
 	gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, ghs);
-	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, ghs + 1);
+	gfs2_holder_init(ip->i_gl,  LM_ST_EXCLUSIVE, 0, ghs + 1);
 
-	error = gfs2_glock_nq_m(2, ghs);
+	rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr);
+	gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2);
+
+
+	error = gfs2_glock_nq_m(3, ghs);
 	if (error)
 		goto out;
 
@@ -291,10 +301,12 @@ static int gfs2_unlink(struct inode *dir, struct dentry *dentry)
 out_end_trans:
 	gfs2_trans_end(sdp);
 out_gunlock:
-	gfs2_glock_dq_m(2, ghs);
+	gfs2_glock_dq_m(3, ghs);
 out:
 	gfs2_holder_uninit(ghs);
 	gfs2_holder_uninit(ghs + 1);
+	gfs2_holder_uninit(ghs + 2);
+	gfs2_glock_dq_uninit(&ri_gh);
 	return error;
 }
 
@@ -449,13 +461,22 @@ static int gfs2_rmdir(struct inode *dir, struct dentry *dentry)
 	struct gfs2_inode *dip = GFS2_I(dir);
 	struct gfs2_sbd *sdp = GFS2_SB(dir);
 	struct gfs2_inode *ip = GFS2_I(dentry->d_inode);
-	struct gfs2_holder ghs[2];
+	struct gfs2_holder ghs[3];
+	struct gfs2_rgrpd *rgd;
+	struct gfs2_holder ri_gh;
 	int error;
 
+
+	error = gfs2_rindex_hold(sdp, &ri_gh);
+	if (error)
+		return error;
 	gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, ghs);
 	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, ghs + 1);
 
-	error = gfs2_glock_nq_m(2, ghs);
+	rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr);
+	gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2);
+
+	error = gfs2_glock_nq_m(3, ghs);
 	if (error)
 		goto out;
 
@@ -483,10 +504,12 @@ static int gfs2_rmdir(struct inode *dir, struct dentry *dentry)
 	gfs2_trans_end(sdp);
 
 out_gunlock:
-	gfs2_glock_dq_m(2, ghs);
+	gfs2_glock_dq_m(3, ghs);
 out:
 	gfs2_holder_uninit(ghs);
 	gfs2_holder_uninit(ghs + 1);
+	gfs2_holder_uninit(ghs + 2);
+	gfs2_glock_dq_uninit(&ri_gh);
 	return error;
 }
 
@@ -547,7 +570,8 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 	struct gfs2_inode *ip = GFS2_I(odentry->d_inode);
 	struct gfs2_inode *nip = NULL;
 	struct gfs2_sbd *sdp = GFS2_SB(odir);
-	struct gfs2_holder ghs[4], r_gh;
+	struct gfs2_holder ghs[5], r_gh;
+	struct gfs2_rgrpd *nrgd;
 	unsigned int num_gh;
 	int dir_rename = 0;
 	int alloc_required;
@@ -587,6 +611,13 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 	if (nip) {
 		gfs2_holder_init(nip->i_gl, LM_ST_EXCLUSIVE, 0, ghs + num_gh);
 		num_gh++;
+		/* grab the resource lock for unlink flag twiddling 
+		 * this is the case of the target file already existing
+		 * so we unlink before doing the rename
+		 */
+		nrgd = gfs2_blk2rgrpd(sdp, nip->i_num.no_addr);
+		if (nrgd)
+			gfs2_holder_init(nrgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + num_gh++);
 	}
 
 	error = gfs2_glock_nq_m(num_gh, ghs);
-- 
cgit v1.2.1


From 9beeb9f3c5b0401491f59b71521ab2678f584b09 Mon Sep 17 00:00:00 2001
From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Tue, 30 Jan 2007 14:30:08 -0800
Subject: [DLM/GFS2] indent help text

Indent help text as expected.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/Kconfig  | 16 ++++++++--------
 fs/gfs2/Kconfig | 45 ++++++++++++++++++++++-----------------------
 2 files changed, 30 insertions(+), 31 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/Kconfig b/fs/dlm/Kconfig
index 7e264b72e0c5..6fa7b0d5c043 100644
--- a/fs/dlm/Kconfig
+++ b/fs/dlm/Kconfig
@@ -7,17 +7,17 @@ config DLM
 	select CONFIGFS_FS
 	select IP_SCTP if DLM_SCTP
 	help
-	A general purpose distributed lock manager for kernel or userspace
-	applications.
+	  A general purpose distributed lock manager for kernel or userspace
+	  applications.
 
 choice
 	prompt "Select DLM communications protocol"
 	depends on DLM
 	default DLM_TCP
 	help
-	The DLM Can use TCP or SCTP for it's network communications.
-	SCTP supports multi-homed operations whereas TCP doesn't.
-	However, SCTP seems to have stability problems at the moment.
+	  The DLM Can use TCP or SCTP for it's network communications.
+	  SCTP supports multi-homed operations whereas TCP doesn't.
+	  However, SCTP seems to have stability problems at the moment.
 
 config DLM_TCP
 	bool "TCP/IP"
@@ -31,8 +31,8 @@ config DLM_DEBUG
 	bool "DLM debugging"
 	depends on DLM
 	help
-	Under the debugfs mount point, the name of each lockspace will
-	appear as a file in the "dlm" directory.  The output is the
-	list of resource and locks the local node knows about.
+	  Under the debugfs mount point, the name of each lockspace will
+	  appear as a file in the "dlm" directory.  The output is the
+	  list of resource and locks the local node knows about.
 
 endmenu
diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index cbd5f3319321..de8e64c03f73 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -4,33 +4,33 @@ config GFS2_FS
 	select FS_POSIX_ACL
 	select CRC32
 	help
-	A cluster filesystem.
+	  A cluster filesystem.
 
-	Allows a cluster of computers to simultaneously use a block device
-	that is shared between them (with FC, iSCSI, NBD, etc...).  GFS reads
-	and writes to the block device like a local filesystem, but also uses
-	a lock module to allow the computers coordinate their I/O so
-	filesystem consistency is maintained.  One of the nifty features of
-	GFS is perfect consistency -- changes made to the filesystem on one
-	machine show up immediately on all other machines in the cluster.
+	  Allows a cluster of computers to simultaneously use a block device
+	  that is shared between them (with FC, iSCSI, NBD, etc...).  GFS reads
+	  and writes to the block device like a local filesystem, but also uses
+	  a lock module to allow the computers coordinate their I/O so
+	  filesystem consistency is maintained.  One of the nifty features of
+	  GFS is perfect consistency -- changes made to the filesystem on one
+	  machine show up immediately on all other machines in the cluster.
 
-	To use the GFS2 filesystem, you will need to enable one or more of
-	the below locking modules. Documentation and utilities for GFS2 can
-	be found here: http://sources.redhat.com/cluster
+	  To use the GFS2 filesystem, you will need to enable one or more of
+	  the below locking modules. Documentation and utilities for GFS2 can
+	  be found here: http://sources.redhat.com/cluster
 
 config GFS2_FS_LOCKING_NOLOCK
 	tristate "GFS2 \"nolock\" locking module"
 	depends on GFS2_FS
 	help
-	Single node locking module for GFS2.
+	  Single node locking module for GFS2.
 
-	Use this module if you want to use GFS2 on a single node without
-	its clustering features. You can still take advantage of the
-	large file support, and upgrade to running a full cluster later on
-	if required.
+	  Use this module if you want to use GFS2 on a single node without
+	  its clustering features. You can still take advantage of the
+	  large file support, and upgrade to running a full cluster later on
+	  if required.
 
-	If you will only be using GFS2 in cluster mode, you do not need this
-	module.
+	  If you will only be using GFS2 in cluster mode, you do not need this
+	  module.
 
 config GFS2_FS_LOCKING_DLM
 	tristate "GFS2 DLM locking module"
@@ -39,9 +39,8 @@ config GFS2_FS_LOCKING_DLM
 	select CONFIGFS_FS
 	select DLM
 	help
-	Multiple node locking module for GFS2
-
-	Most users of GFS2 will require this module. It provides the locking
-	interface between GFS2 and the DLM, which is required to use GFS2
-	in a cluster environment.
+	  Multiple node locking module for GFS2
 
+	  Most users of GFS2 will require this module. It provides the locking
+	  interface between GFS2 and the DLM, which is required to use GFS2
+	  in a cluster environment.
-- 
cgit v1.2.1


From 62a0f62369b0fece37f6652d69b918c89d53c3b3 Mon Sep 17 00:00:00 2001
From: David Teigland <teigland@redhat.com>
Date: Wed, 31 Jan 2007 13:25:00 -0600
Subject: [DLM] zero new user lvbs

A new lvb for a userland lock wasn't being initialized to zero.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index c10257f10b90..e725005fafd0 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3643,7 +3643,7 @@ int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua,
 	}
 
 	if (flags & DLM_LKF_VALBLK) {
-		ua->lksb.sb_lvbptr = kmalloc(DLM_USER_LVB_LEN, GFP_KERNEL);
+		ua->lksb.sb_lvbptr = kzalloc(DLM_USER_LVB_LEN, GFP_KERNEL);
 		if (!ua->lksb.sb_lvbptr) {
 			kfree(ua);
 			__put_lkb(ls, lkb);
@@ -3712,7 +3712,7 @@ int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp,
 	ua = (struct dlm_user_args *)lkb->lkb_astparam;
 
 	if (flags & DLM_LKF_VALBLK && !ua->lksb.sb_lvbptr) {
-		ua->lksb.sb_lvbptr = kmalloc(DLM_USER_LVB_LEN, GFP_KERNEL);
+		ua->lksb.sb_lvbptr = kzalloc(DLM_USER_LVB_LEN, GFP_KERNEL);
 		if (!ua->lksb.sb_lvbptr) {
 			error = -ENOMEM;
 			goto out_put;
-- 
cgit v1.2.1


From a34fbc6363256387372331000462691bc4b3f5a9 Mon Sep 17 00:00:00 2001
From: Patrick Caulfield <pcaulfie@redhat.com>
Date: Thu, 1 Feb 2007 16:46:33 +0000
Subject: [DLM] fix softlockup in dlm_recv

This patch stops the dlm_recv workqueue from busy-waiting when a node
disconnects. This can cause soft lockup errors on debug systems and bad
performance generally.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/dlm/lowcomms-tcp.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

(limited to 'fs')

diff --git a/fs/dlm/lowcomms-tcp.c b/fs/dlm/lowcomms-tcp.c
index 18ade44287e7..f1efd17b2614 100644
--- a/fs/dlm/lowcomms-tcp.c
+++ b/fs/dlm/lowcomms-tcp.c
@@ -2,7 +2,7 @@
 *******************************************************************************
 **
 **  Copyright (C) Sistina Software, Inc.  1997-2003  All rights reserved.
-**  Copyright (C) 2004-2006 Red Hat, Inc.  All rights reserved.
+**  Copyright (C) 2004-2007 Red Hat, Inc.  All rights reserved.
 **
 **  This copyrighted material is made available to anyone wishing to use,
 **  modify, copy, or redistribute it subject to the terms and conditions
@@ -109,7 +109,6 @@ struct connection {
 	struct page *rx_page;
 	struct cbuf cb;
 	int retries;
-	atomic_t waiting_requests;
 #define MAX_CONNECT_RETRIES 3
 	struct connection *othercon;
 	struct work_struct rwork; /* Receive workqueue */
@@ -278,8 +277,11 @@ static int receive_from_sock(struct connection *con)
 
 	mutex_lock(&con->sock_mutex);
 
-	if (con->sock == NULL)
-		goto out;
+	if (con->sock == NULL) {
+		ret = -EAGAIN;
+		goto out_close;
+	}
+
 	if (con->rx_page == NULL) {
 		/*
 		 * This doesn't need to be atomic, but I think it should
@@ -352,7 +354,6 @@ static int receive_from_sock(struct connection *con)
 		con->rx_page = NULL;
 	}
 
-out:
 	if (call_again_soon)
 		goto out_resched;
 	mutex_unlock(&con->sock_mutex);
@@ -370,6 +371,9 @@ out_close:
 		close_connection(con, false);
 		/* Reconnect when there is something to send */
 	}
+	/* Don't return success if we really got EOF */
+	if (ret == 0)
+		ret = -EAGAIN;
 
 	return ret;
 }
@@ -847,7 +851,6 @@ int dlm_lowcomms_close(int nodeid)
 	if (con) {
 		clean_one_writequeue(con);
 		close_connection(con, true);
-		atomic_set(&con->waiting_requests, 0);
 	}
 	return 0;
 
-- 
cgit v1.2.1


From 549ae0ac3d574a682e82b02e79259a65445a675b Mon Sep 17 00:00:00 2001
From: Wendy Cheng <wcheng@redhat.com>
Date: Tue, 6 Feb 2007 03:52:16 -0500
Subject: [GFS2] nfsd readdirplus assertion failure

Glock assertion failure found in '07 NFS connectathon. One of the NFSDs
is doing a "readdirplus" procedure call. It passes the logic into
gfs2_readdir() where it obtains its directory inode glock. This is then
followed by filehandle construction that invokes lookup code. It hits
the assertion failure while trying to obtain the inode glock again
inside gfs2_drevalidate().

This patch bypasses the recursive glock call if caller already holds the
lock.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/ops_dentry.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

(limited to 'fs')

diff --git a/fs/gfs2/ops_dentry.c b/fs/gfs2/ops_dentry.c
index d355899585d8..9187eb174b43 100644
--- a/fs/gfs2/ops_dentry.c
+++ b/fs/gfs2/ops_dentry.c
@@ -46,6 +46,7 @@ static int gfs2_drevalidate(struct dentry *dentry, struct nameidata *nd)
 	struct gfs2_inum_host inum;
 	unsigned int type;
 	int error;
+	int had_lock=0;
 
 	if (inode && is_bad_inode(inode))
 		goto invalid;
@@ -53,9 +54,12 @@ static int gfs2_drevalidate(struct dentry *dentry, struct nameidata *nd)
 	if (sdp->sd_args.ar_localcaching)
 		goto valid;
 
-	error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh);
-	if (error)
-		goto fail;
+	had_lock = gfs2_glock_is_locked_by_me(dip->i_gl);
+	if (!had_lock) {
+		error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh);
+		if (error)
+			goto fail;
+	} 
 
 	error = gfs2_dir_search(parent->d_inode, &dentry->d_name, &inum, &type);
 	switch (error) {
@@ -82,13 +86,15 @@ static int gfs2_drevalidate(struct dentry *dentry, struct nameidata *nd)
 	}
 
 valid_gunlock:
-	gfs2_glock_dq_uninit(&d_gh);
+	if (!had_lock)
+		gfs2_glock_dq_uninit(&d_gh);
 valid:
 	dput(parent);
 	return 1;
 
 invalid_gunlock:
-	gfs2_glock_dq_uninit(&d_gh);
+	if (!had_lock)
+		gfs2_glock_dq_uninit(&d_gh);
 invalid:
 	if (inode && S_ISDIR(inode->i_mode)) {
 		if (have_submounts(dentry))
-- 
cgit v1.2.1


From 2d72e7101cc7fff5c1eb21bfcbba51c8002418d2 Mon Sep 17 00:00:00 2001
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Wed, 7 Feb 2007 10:25:59 -0500
Subject: [GFS2] Unlock page on prepare_write try lock failure

When the try lock of the glock failed in prepare_write we were
incorrectly exiting this function with the page still locked.
This was resulting in further I/O to this page hanging.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/ops_address.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index 9ddf9753a552..5e9653c4d886 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -360,8 +360,10 @@ static int gfs2_prepare_write(struct file *file, struct page *page,
 	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, &ip->i_gh);
 	error = gfs2_glock_nq_atime(&ip->i_gh);
 	if (unlikely(error)) {
-		if (error == GLR_TRYFAILED)
+		if (error == GLR_TRYFAILED) {
+			unlock_page(page);
 			error = AOP_TRUNCATED_PAGE;
+		}
 		goto out_uninit;
 	}
 
-- 
cgit v1.2.1


From a2cf822274b3d58a16a65c8338e299e18b3dc3a4 Mon Sep 17 00:00:00 2001
From: Adrian Bunk <bunk@stusta.de>
Date: Tue, 6 Feb 2007 23:12:49 +0100
Subject: [GFS2] make gfs2_writepages() static

On Mon, Jan 29, 2007 at 08:45:28PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc6-mm2:
>...
>  git-gfs2-nmw.patch
>...
>  git trees
>...

This patch makes the needlessly global gfs2_writepages() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
---
 fs/gfs2/ops_address.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index 5e9653c4d886..56e33590b656 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -170,7 +170,8 @@ out_ignore:
  * and write whole extents at once. This is a big reduction in the
  * number of I/O requests we send and the bmap calls we make in this case.
  */
-int gfs2_writepages(struct address_space *mapping, struct writeback_control *wbc)
+static int gfs2_writepages(struct address_space *mapping,
+			   struct writeback_control *wbc)
 {
 	struct inode *inode = mapping->host;
 	struct gfs2_inode *ip = GFS2_I(inode);
-- 
cgit v1.2.1