gfs2: change gfs2 readdir cookie

gfs2 currently returns 31 bits of filename hash as a cookie that readdir uses for an offset into the directory. When there are a large number of directory entries, the likelihood of a collision goes up way too quickly. GFS2 will now return cookies that are guaranteed unique for a while, and then fail back to using 30 bits of filename hash. Specifically, the directory leaf blocks are divided up into chunks based on the minimum size of a gfs2 directory entry (48 bytes). Each entry's cookie is based off the chunk where it starts, in the linked list of leaf blocks that it hashes to (there are 131072 hash buckets). Directory entries will have unique names until they take reach chunk 8192. Assuming the largest filenames possible, and the least efficient spacing possible, this new method will still be able to return unique names when the previous method has statistically more than a 99% chance of a collision. The non-unique names it fails back to are guaranteed to not collide with the unique names. unique cookies will be in this format: - 1 bit "0" to make sure the the returned cookie is positive - 17 bits for the hash table index - 1 bit for the mode "0" - 13 bits for the offset non-unique cookies will be in this format: - 1 bit "0" to make sure the the returned cookie is positive - 17 bits for the hash table index - 1 bit for the mode "1" - 13 more bits of the name hash Another benefit of location based cookies, is that once a directory's exhash table is fully extended (so that multiple hash table indexs do not use the same leaf blocks), gfs2 can skip sorting the directory entries until it reaches the non-unique ones, and then it only needs to sort these. This provides a significant speed up for directory reads of very large directories. The only issue is that for these cookies to continue to point to the correct entry as files are added and removed from the directory, gfs2 must keep the entries at the same offset in the leaf block when they are split (see my previous patch). This means that until all the nodes in a cluster are running with code that will split the directory leaf blocks this way, none of the nodes can use the new cookie code. To deal with this, gfs2 now has the mount option loccookie, which, if set, will make it return these new location based cookies. This option must not be set until all nodes in the cluster are at least running this version of the kernel code, and you have guaranteed that there are no outstanding cookies required by other software, such as NFS. gfs2 uses some of the extra space at the end of the gfs2_dirent structure to store the calculated readdir cookies. This keeps us from needing to allocate a seperate array to hold these values. gfs2 recomputes the cookie stored in de_cookie for every readdir call. The time it takes to do so is small, and if gfs2 expected this value to be saved on disk, the new code wouldn't work correctly on filesystems created with an earlier version of gfs2. One issue with adding de_cookie to the union in the gfs2_dirent structure is that it caused the union to align itself to a 4 byte boundary, instead of its previous 2 byte boundary. This changed the offset of de_rahead. To solve that, I pulled de_rahead out of the union, since it does not need to be there. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
author: Benjamin Marzinski <bmarzins@redhat.com> 2015-12-01 08:46:55 -0600
committer: Bob Peterson <rpeterso@redhat.com> 2015-12-14 12:19:37 -0600
commit: 471f3db2786bc32011d6693413eb93b0c3da2579 (patch)
tree: a3af645fd5dbb014bffa50a715422b954e83564f /fs/gfs2/incore.h
parent: 340174722929d80a107120400bab527cfc7e47f1 (diff)
download: talos-obmc-linux-471f3db2786bc32011d6693413eb93b0c3da2579.tar.gz
talos-obmc-linux-471f3db2786bc32011d6693413eb93b0c3da2579.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 921304e1d785..845fb09cc606 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -562,6 +562,8 @@ struct gfs2_args {
 	unsigned int ar_errors:2;               /* errors=withdraw | panic */
 	unsigned int ar_nobarrier:1;            /* do not send barriers */
 	unsigned int ar_rgrplvb:1;		/* use lvbs for rgrp info */
+	unsigned int ar_loccookie:1;		/* use location based readdir
+						   cookies */
 	int ar_commit;				/* Commit interval */
 	int ar_statfs_quantum;			/* The fast statfs interval */
 	int ar_quota_quantum;			/* The quota interval */
@@ -689,6 +691,7 @@ struct gfs2_sbd {
 	u64 sd_heightsize[GFS2_MAX_META_HEIGHT + 1];
 	u32 sd_max_jheight; /* Max height of journaled file's meta tree */
 	u64 sd_jheightsize[GFS2_MAX_META_HEIGHT + 1];
+	u32 sd_max_dents_per_leaf; /* Max number of dirents in a leaf block */
 
 	struct gfs2_args sd_args;	/* Mount arguments */
 	struct gfs2_tune sd_tune;	/* Filesystem tuning structure */
author	Benjamin Marzinski <bmarzins@redhat.com>	2015-12-01 08:46:55 -0600
committer	Bob Peterson <rpeterso@redhat.com>	2015-12-14 12:19:37 -0600
commit	471f3db2786bc32011d6693413eb93b0c3da2579 (patch)
tree	a3af645fd5dbb014bffa50a715422b954e83564f /fs/gfs2/incore.h
parent	340174722929d80a107120400bab527cfc7e47f1 (diff)
download	talos-obmc-linux-471f3db2786bc32011d6693413eb93b0c3da2579.tar.gz talos-obmc-linux-471f3db2786bc32011d6693413eb93b0c3da2579.zip