8 files changed, 41 insertions, 261 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index f66e748fc5e4..b7bd6c9009cc 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -87,8 +87,6 @@ jfs.txt
 	- info and mount options for the JFS filesystem.
 locks.txt
 	- info on file locking implementations, flock() vs. fcntl(), etc.
-logfs.txt
-	- info on the LogFS flash filesystem.
 mandatory-locking.txt
 	- info on the Linux implementation of Sys V mandatory file locking.
 ncpfs.txt
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 1b5f15653b1b..ace63cd7af8c 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -20,7 +20,7 @@ prototypes:
 	void (*d_iput)(struct dentry *, struct inode *);
 	char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
 	struct vfsmount *(*d_automount)(struct path *path);
-	int (*d_manage)(struct dentry *, bool);
+	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, const struct inode *,
 				 unsigned int);
 
@@ -556,7 +556,7 @@ till "end_pgoff". ->map_pages() is called with page table locked and must
 not block.  If it's not possible to reach a page without blocking,
 filesystem should skip it. Filesystem should use do_set_pte() to setup
 page table entry. Pointer to entry associated with the page is passed in
-"pte" field in fault_env structure. Pointers to entries for other offsets
+"pte" field in vm_fault structure. Pointers to entries for other offsets
 should be calculated relative to "pte".
 
 	->page_mkwrite() is called when a previously read-only pte is
diff --git a/Documentation/filesystems/logfs.txt b/Documentation/filesystems/logfs.txt
deleted file mode 100644
index bca42c22a143..000000000000
--- a/Documentation/filesystems/logfs.txt
+++ /dev/null
@@ -1,241 +0,0 @@
-
-The LogFS Flash Filesystem
-==========================
-
-Specification
-=============
-
-Superblocks
------------
-
-Two superblocks exist at the beginning and end of the filesystem.
-Each superblock is 256 Bytes large, with another 3840 Bytes reserved
-for future purposes, making a total of 4096 Bytes.
-
-Superblock locations may differ for MTD and block devices.  On MTD the
-first non-bad block contains a superblock in the first 4096 Bytes and
-the last non-bad block contains a superblock in the last 4096 Bytes.
-On block devices, the first 4096 Bytes of the device contain the first
-superblock and the last aligned 4096 Byte-block contains the second
-superblock.
-
-For the most part, the superblocks can be considered read-only.  They
-are written only to correct errors detected within the superblocks,
-move the journal and change the filesystem parameters through tunefs.
-As a result, the superblock does not contain any fields that require
-constant updates, like the amount of free space, etc.
-
-Segments
---------
-
-The space in the device is split up into equal-sized segments.
-Segments are the primary write unit of LogFS.  Within each segments,
-writes happen from front (low addresses) to back (high addresses.  If
-only a partial segment has been written, the segment number, the
-current position within and optionally a write buffer are stored in
-the journal.
-
-Segments are erased as a whole.  Therefore Garbage Collection may be
-required to completely free a segment before doing so.
-
-Journal
---------
-
-The journal contains all global information about the filesystem that
-is subject to frequent change.  At mount time, it has to be scanned
-for the most recent commit entry, which contains a list of pointers to
-all currently valid entries.
-
-Object Store
-------------
-
-All space except for the superblocks and journal is part of the object
-store.  Each segment contains a segment header and a number of
-objects, each consisting of the object header and the payload.
-Objects are either inodes, directory entries (dentries), file data
-blocks or indirect blocks.
-
-Levels
-------
-
-Garbage collection (GC) may fail if all data is written
-indiscriminately.  One requirement of GC is that data is separated
-roughly according to the distance between the tree root and the data.
-Effectively that means all file data is on level 0, indirect blocks
-are on levels 1, 2, 3 4 or 5 for 1x, 2x, 3x, 4x or 5x indirect blocks,
-respectively.  Inode file data is on level 6 for the inodes and 7-11
-for indirect blocks.
-
-Each segment contains objects of a single level only.  As a result,
-each level requires its own separate segment to be open for writing.
-
-Inode File
-----------
-
-All inodes are stored in a special file, the inode file.  Single
-exception is the inode file's inode (master inode) which for obvious
-reasons is stored in the journal instead.  Instead of data blocks, the
-leaf nodes of the inode files are inodes.
-
-Aliases
--------
-
-Writes in LogFS are done by means of a wandering tree.  A naïve
-implementation would require that for each write or a block, all
-parent blocks are written as well, since the block pointers have
-changed.  Such an implementation would not be very efficient.
-
-In LogFS, the block pointer changes are cached in the journal by means
-of alias entries.  Each alias consists of its logical address - inode
-number, block index, level and child number (index into block) - and
-the changed data.  Any 8-byte word can be changes in this manner.
-
-Currently aliases are used for block pointers, file size, file used
-bytes and the height of an inodes indirect tree.
-
-Segment Aliases
----------------
-
-Related to regular aliases, these are used to handle bad blocks.
-Initially, bad blocks are handled by moving the affected segment
-content to a spare segment and noting this move in the journal with a
-segment alias, a simple (to, from) tupel.  GC will later empty this
-segment and the alias can be removed again.  This is used on MTD only.
-
-Vim
----
-
-By cleverly predicting the life time of data, it is possible to
-separate long-living data from short-living data and thereby reduce
-the GC overhead later.  Each type of distinc life expectency (vim) can
-have a separate segment open for writing.  Each (level, vim) tupel can
-be open just once.  If an open segment with unknown vim is encountered
-at mount time, it is closed and ignored henceforth.
-
-Indirect Tree
--------------
-
-Inodes in LogFS are similar to FFS-style filesystems with direct and
-indirect block pointers.  One difference is that LogFS uses a single
-indirect pointer that can be either a 1x, 2x, etc. indirect pointer.
-A height field in the inode defines the height of the indirect tree
-and thereby the indirection of the pointer.
-
-Another difference is the addressing of indirect blocks.  In LogFS,
-the first 16 pointers in the first indirect block are left empty,
-corresponding to the 16 direct pointers in the inode.  In ext2 (maybe
-others as well) the first pointer in the first indirect block
-corresponds to logical block 12, skipping the 12 direct pointers.
-So where ext2 is using arithmetic to better utilize space, LogFS keeps
-arithmetic simple and uses compression to save space.
-
-Compression
------------
-
-Both file data and metadata can be compressed.  Compression for file
-data can be enabled with chattr +c and disabled with chattr -c.  Doing
-so has no effect on existing data, but new data will be stored
-accordingly.  New inodes will inherit the compression flag of the
-parent directory.
-
-Metadata is always compressed.  However, the space accounting ignores
-this and charges for the uncompressed size.  Failing to do so could
-result in GC failures when, after moving some data, indirect blocks
-compress worse than previously.  Even on a 100% full medium, GC may
-not consume any extra space, so the compression gains are lost space
-to the user.
-
-However, they are not lost space to the filesystem internals.  By
-cheating the user for those bytes, the filesystem gained some slack
-space and GC will run less often and faster.
-
-Garbage Collection and Wear Leveling
-------------------------------------
-
-Garbage collection is invoked whenever the number of free segments
-falls below a threshold.  The best (known) candidate is picked based
-on the least amount of valid data contained in the segment.  All
-remaining valid data is copied elsewhere, thereby invalidating it.
-
-The GC code also checks for aliases and writes then back if their
-number gets too large.
-
-Wear leveling is done by occasionally picking a suboptimal segment for
-garbage collection.  If a stale segments erase count is significantly
-lower than the active segments' erase counts, it will be picked.  Wear
-leveling is rate limited, so it will never monopolize the device for
-more than one segment worth at a time.
-
-Values for "occasionally", "significantly lower" are compile time
-constants.
-
-Hashed directories
-------------------
-
-To satisfy efficient lookup(), directory entries are hashed and
-located based on the hash.  In order to both support large directories
-and not be overly inefficient for small directories, several hash
-tables of increasing size are used.  For each table, the hash value
-modulo the table size gives the table index.
-
-Tables sizes are chosen to limit the number of indirect blocks with a
-fully populated table to 0, 1, 2 or 3 respectively.  So the first
-table contains 16 entries, the second 512-16, etc.
-
-The last table is special in several ways.  First its size depends on
-the effective 32bit limit on telldir/seekdir cookies.  Since logfs
-uses the upper half of the address space for indirect blocks, the size
-is limited to 2^31.  Secondly the table contains hash buckets with 16
-entries each.
-
-Using single-entry buckets would result in birthday "attacks".  At
-just 2^16 used entries, hash collisions would be likely (P >= 0.5).
-My math skills are insufficient to do the combinatorics for the 17x
-collisions necessary to overflow a bucket, but testing showed that in
-10,000 runs the lowest directory fill before a bucket overflow was
-188,057,130 entries with an average of 315,149,915 entries.  So for
-directory sizes of up to a million, bucket overflows should be
-virtually impossible under normal circumstances.
-
-With carefully chosen filenames, it is obviously possible to cause an
-overflow with just 21 entries (4 higher tables + 16 entries + 1).  So
-there may be a security concern if a malicious user has write access
-to a directory.
-
-Open For Discussion
-===================
-
-Device Address Space
---------------------
-
-A device address space is used for caching.  Both block devices and
-MTD provide functions to either read a single page or write a segment.
-Partial segments may be written for data integrity, but where possible
-complete segments are written for performance on simple block device
-flash media.
-
-Meta Inodes
------------
-
-Inodes are stored in the inode file, which is just a regular file for
-most purposes.  At umount time, however, the inode file needs to
-remain open until all dirty inodes are written.  So
-generic_shutdown_super() may not close this inode, but shouldn't
-complain about remaining inodes due to the inode file either.  Same
-goes for mapping inode of the device address space.
-
-Currently logfs uses a hack that essentially copies part of fs/inode.c
-code over.  A general solution would be preferred.
-
-Indirect block mapping
-----------------------
-
-With compression, the block device (or mapping inode) cannot be used
-to cache indirect blocks.  Some other place is required.  Currently
-logfs uses the top half of each inode's address space.  The low 8TB
-(on 32bit) are filled with file data, the high 8TB are used for
-indirect blocks.
-
-One problem is that 16TB files created on 64bit systems actually have
-data in the top 8TB.  But files >16TB would cause problems anyway, so
-only the limit has changed.
diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
index bcbf9710e4af..634d03e20c2d 100644
--- a/Documentation/filesystems/overlayfs.txt
+++ b/Documentation/filesystems/overlayfs.txt
@@ -66,7 +66,7 @@ At mount time, the two directories given as mount options "lowerdir" and
 "upperdir" are combined into a merged directory:
 
   mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,\
-workdir=/work /merged
+  workdir=/work /merged
 
 The "workdir" needs to be an empty directory on the same filesystem
 as upperdir.
@@ -118,6 +118,7 @@ programs.
 
 seek offsets are assigned sequentially when the directories are read.
 Thus if
+
   - read part of a directory
   - remember an offset, and close the directory
   - re-open the directory some time later
@@ -130,6 +131,23 @@ directory.
 Readdir on directories that are not merged is simply handled by the
 underlying directory (upper or lower).
 
+renaming directories
+--------------------
+
+When renaming a directory that is on the lower layer or merged (i.e. the
+directory was not created on the upper layer to start with) overlayfs can
+handle it in two different ways:
+
+1. return EXDEV error: this error is returned by rename(2) when trying to
+   move a file or directory across filesystem boundaries.  Hence
+   applications are usually prepared to hande this error (mv(1) for example
+   recursively copies the directory tree).  This is the default behavior.
+
+2. If the "redirect_dir" feature is enabled, then the directory will be
+   copied up (but not the contents).  Then the "trusted.overlay.redirect"
+   extended attribute is set to the path of the original location from the
+   root of the overlay.  Finally the directory is moved to the new
+   location.
 
 Non-directories
 ---------------
@@ -185,13 +203,13 @@ filesystem, so both st_dev and st_ino of the file may change.
 
 Any open files referring to this inode will access the old data.
 
-Any file locks (and leases) obtained before copy_up will not apply
-to the copied up file.
-
 If a file with multiple hard links is copied up, then this will
 "break" the link.  Changes will not be propagated to other names
 referring to the same inode.
 
+Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged
+directory will fail with EXDEV.
+
 Changes to underlying filesystems
 ---------------------------------
 
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index bdd025ceb763..95280079c0b3 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -596,3 +596,7 @@ in your dentry operations instead.
 [mandatory]
 	->rename() has an added flags argument.  Any flags not handled by the
         filesystem should result in EINVAL being returned.
+--
+[recommended]
+	->readlink is optional for symlinks.  Don't set, unless filesystem needs
+	to fake something for readlink(2).
diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.txt
index 74eaac26f8b8..6ea1ceda6f52 100644
--- a/Documentation/filesystems/sysfs-pci.txt
+++ b/Documentation/filesystems/sysfs-pci.txt
@@ -17,6 +17,7 @@ that support it.  For example, a given bus might look like this:
      |   |-- resource0
      |   |-- resource1
      |   |-- resource2
+     |   |-- revision
      |   |-- rom
      |   |-- subsystem_device
      |   |-- subsystem_vendor
@@ -41,6 +42,7 @@ files, each with their own function.
        resource		   PCI resource host addresses (ascii, ro)
        resource0..N	   PCI resource N, if present (binary, mmap, rw[1])
        resource0_wc..N_wc  PCI WC map resource N, if prefetchable (binary, mmap)
+       revision		   PCI revision (ascii, ro)
        rom		   PCI ROM resource, if present (binary, ro)
        subsystem_device	   PCI subsystem device (ascii, ro)
        subsystem_vendor	   PCI subsystem vendor (ascii, ro)
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index b5039a00caaf..b968084eeac1 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -451,9 +451,6 @@ otherwise noted.
 	exist; this is checked by the VFS.  Unlike plain rename,
 	source and target may be of different type.
 
-  readlink: called by the readlink(2) system call. Only required if
-	you want to support reading symbolic links
-
   get_link: called by the VFS to follow a symbolic link to the
 	inode it points to.  Only required if you want to support
 	symbolic links.  This method returns the symlink body
@@ -468,6 +465,12 @@ otherwise noted.
 	argument.  If request can't be handled without leaving RCU mode,
 	have it return ERR_PTR(-ECHILD).
 
+  readlink: this is now just an override for use by readlink(2) for the
+	cases when ->get_link uses nd_jump_link() or object is not in
+	fact a symlink.  Normally filesystems should only implement
+	->get_link for symlinks and readlink(2) will automatically use
+	that.
+
   permission: called by the VFS to check for access rights on a POSIX-like
   	filesystem.
 
@@ -948,7 +951,7 @@ struct dentry_operations {
 	void (*d_iput)(struct dentry *, struct inode *);
 	char *(*d_dname)(struct dentry *, char *, int);
 	struct vfsmount *(*d_automount)(struct path *);
-	int (*d_manage)(struct dentry *, bool);
+	int (*d_manage)(const struct path *, bool);
 	struct dentry *(*d_real)(struct dentry *, const struct inode *,
 				 unsigned int);
 };
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index c2d44e6e117b..3b9b5c149f32 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -51,13 +51,6 @@ default behaviour.
 	CRC enabled filesystems always use the attr2 format, and so
 	will reject the noattr2 mount option if it is set.
 
-  barrier (*)
-  nobarrier
-	Enables/disables the use of block layer write barriers for
-	writes into the journal and for data integrity operations.
-	This allows for drive level write caching to be enabled, for
-	devices that support write barriers.
-
   discard
   nodiscard (*)
 	Enable/disable the issuing of commands to let the block
@@ -228,7 +221,10 @@ default behaviour.
 Deprecated Mount Options
 ========================
 
-None at present.
+  Name				Removal Schedule
+  ----				----------------
+  barrier			no earlier than v4.15
+  nobarrier			no earlier than v4.15
 
 
 Removed Mount Options