summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* fsnotify: put inode specific fields in an fsnotify_mark in a unionEric Paris2010-07-284-22/+22
| | | | | | | | | The addition of marks on vfs mounts will be simplified if the inode specific parts of a mark and the vfsmnt specific parts of a mark are actually in a union so naming can be easy. This patch just implements the inode struct and the union. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: include vfsmount in should_send_event when appropriateEric Paris2010-07-283-23/+23
| | | | | | | | | To ensure that a group will not duplicate events when it receives it based on the vfsmount and the inode should_send_event test we should distinguish those two cases. We pass a vfsmount to this function so groups can make their own determinations. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: mount point listeners list and global maskEric Paris2010-07-284-26/+112
| | | | | | | | | | | | | | | | | | | | | | | currently all of the notification systems implemented select which inodes they care about and receive messages only about those inodes (or the children of those inodes.) This patch begins to flesh out fsnotify support for the concept of listeners that want to hear notification for an inode accessed below a given monut point. This patch implements a second list of fsnotify groups to hold these types of groups and a second global mask to hold the events of interest for this type of group. The reason we want a second group list and mask is because the inode based notification should_send_event support which makes each group look for a mark on the given inode. With one nfsmount listener that means that every group would have to take the inode->i_lock, look for their mark, not find one, and return for every operation. By seperating vfsmount from inode listeners only when there is a inode listener will the inode groups have to look for their mark and take the inode lock. vfsmount listeners will have to grab the lock and look for a mark but there should be fewer of them, and one vfsmount listener won't cause the i_lock to be grabbed and released for every fsnotify group on every io operation. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: add groups to fsnotify_inode_groups when registering inode watchEric Paris2010-07-283-10/+17
| | | | | | | | | | Currently all fsnotify groups are added immediately to the fsnotify_inode_groups list upon creation. This means, even groups with no watches (common for audit) will be on the global tracking list and will get checked for every event. This patch adds groups to the global list on when the first inode mark is added to the group. Signed-of-by: Eric Paris <eparis@redhat.com>
* fsnotify: initialize the group->num_marks in a better placeEric Paris2010-07-281-3/+7
| | | | | | | | | | Currently the comments say that group->num_marks is held because the group is on the fsnotify_group list. This isn't strictly the case, we really just hold the num_marks for the life of the group (any time group->refcnt is != 0) This patch moves the initialization stuff and makes it clear when it is really being held. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: rename fsnotify_groups to fsnotify_inode_groupsEric Paris2010-07-283-18/+26
| | | | | | | | Simple renaming patch. fsnotify is about to support mount point listeners so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists used only for groups which have watches on inodes. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: drop mask argument from fsnotify_alloc_groupEric Paris2010-07-283-9/+3
| | | | | | | Nothing uses the mask argument to fsnotify_alloc_group. This patch drops that argument. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: fsnotify_obtain_group should be fsnotify_alloc_groupEric Paris2010-07-283-9/+5
| | | | | | | | fsnotify_obtain_group was intended to be able to find an already existing group. Nothing uses that functionality. This just renames it to fsnotify_alloc_group so it is clear what it is doing. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: fsnotify_obtain_group kzalloc cleanupEric Paris2010-07-281-3/+0
| | | | | | | fsnotify_obtain_group uses kzalloc but then proceedes to set things to 0. This patch just deletes those useless lines. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: remove group_num altogetherEric Paris2010-07-283-58/+4
| | | | | | | | The original fsnotify interface has a group-num which was intended to be able to find a group after it was added. I no longer think this is a necessary thing to do and so we remove the group_num. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: lock annotation for event replacementEric Paris2010-07-281-28/+13
| | | | | | | | | fsnotify_replace_event need to lock both the old and the new event. This causes lockdep to get all pissed off since it dosn't know this is safe. It's safe in this case since the new event is impossible to be reached from other places in the kernel. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: replace an event on a listEric Paris2010-07-281-0/+56
| | | | | | | | | fanotify would like to clone events already on its notification list, make changes to the new event, and then replace the old event on the list with the new event. This patch implements the replace functionality of that process. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: clone existing eventsEric Paris2010-07-281-4/+25
| | | | | | | | | | | fsnotify_clone_event will take an event, clone it, and return the cloned event to the caller. Since events may be in use by multiple fsnotify groups simultaneously certain event entries (such as the mask) cannot be changed after the event was created. Since fanotify would like to merge events happening on the same file it needs a new clean event to work with so it can change any fields it wishes. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: per group notification queue merge typesEric Paris2010-07-283-56/+75
| | | | | | | | | | | inotify only wishes to merge a new event with the last event on the notification fifo. fanotify is willing to merge any events including by means of bitwise OR masks of multiple events together. This patch moves the inotify event merging logic out of the generic fsnotify notification.c and into the inotify code. This allows each use of fsnotify to provide their own merge functionality. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: send struct file when sending events to parents when possibleEric Paris2010-07-281-3/+10
| | | | | | | | | fanotify needs a path in order to open an fd to the object which changed. Currently notifications to inode's parents are done using only the inode. For some parental notification we have the entire file, send that so fanotify can use it. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: pass a file instead of an inode to open, read, and writeEric Paris2010-07-285-12/+11
| | | | | | | | | | fanotify, the upcoming notification system actually needs a struct path so it can do opens in the context of listeners, and it needs a file so it can get f_flags from the original process. Close was the only operation that already was passing a struct file to the notification hook. This patch passes a file for access, modify, and open as well as they are easily available to these hooks. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: include data in should_send callsEric Paris2010-07-283-3/+4
| | | | | | | | | fanotify is going to need to look at file->private_data to know if an event should be sent or not. This passes the data (which might be a file, dentry, inode, or none) to the should_send function calls so fanotify can get that information when available Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: provide the data type to should_send_eventEric Paris2010-07-283-3/+5
| | | | | | | | | | fanotify is only interested in event types which contain enough information to open the original file in the context of the fanotify listener. Since fanotify may not want to send events if that data isn't present we pass the data type to the should_send_event function call so fanotify can express its lack of interest. Signed-off-by: Eric Paris <eparis@redhat.com>
* inotify: do not spam console without limitEric Paris2010-07-281-7/+4
| | | | | | | | | inotify was supposed to have a dmesg printk ratelimitor which would cause inotify to only emit one message per boot. The static bool was never set so it kept firing messages. This patch correctly limits warnings in multiple places. Signed-off-by: Eric Paris <eparis@redhat.com>
* inotify: remove inotify in kernel interfaceEric Paris2010-07-285-895/+1
| | | | | | nothing uses inotify in the kernel, drop it! Signed-off-by: Eric Paris <eparis@redhat.com>
* inotify: do not reuse watch descriptorsEric Paris2010-07-281-7/+6
| | | | | | | | | | Prior to 2.6.31 inotify would not reuse watch descriptors until all of them had been used at least once. After the rewrite inotify would reuse watch descriptors. The selinux utility 'restorecond' was found to have problems when watch descriptors were reused. This patch reverts to the pre inotify rewrite behavior to not reuse watch descriptors. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: use kmem_cache_zalloc to simplify event initializationEric Paris2010-07-281-12/+1
| | | | | | | | | fsnotify event initialization is done entry by entry with almost everything set to either 0 or NULL. Use kmem_cache_zalloc and only initialize things that need non-zero initialization. Also means we don't have to change initialization entries based on the config options. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: kzalloc fsnotify groupsEric Paris2010-07-281-1/+1
| | | | | | | Use kzalloc for fsnotify_groups so that none of the fields can leak any information accidentally. Signed-off-by: Eric Paris <eparis@redhat.com>
* inotify: use container_of instead of castingEric Paris2010-07-281-1/+3
| | | | | | | | inotify_free_mark casts directly from an fsnotify_mark_entry to an inotify_inode_mark_entry. This works, but should use container_of instead for future proofing. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: use fsnotify_create_event to allocate the q_overflow eventEric Paris2010-07-281-4/+7
| | | | | | | | | | Currently fsnotify defines a static fsnotify event which is sent when a group overflows its allotted queue length. This patch just allocates that event from the event cache rather than defining it statically. There is no known reason that the current implementation is wrong, but this makes sure the event is initialized and created like any other. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: allow addition of duplicate fsnotify marksEric Paris2010-07-283-5/+7
| | | | | | | | | This patch allows a task to add a second fsnotify mark to an inode for the same group. This mark will be added to the end of the inode's list and this will never be found by the stand fsnotify_find_mark() function. This is useful if a user wants to add a new mark before removing the old one. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: duplicate fsnotify_mark_entry data between 2 marksEric Paris2010-07-281-1/+9
| | | | | | | Simple copy fsnotify information from one mark to another in preparation for the second mark to replace the first. Signed-off-by: Eric Paris <eparis@redhat.com>
* inotify: simplify the inotify idr handlingEric Paris2010-07-281-55/+139
| | | | | | | | This patch moves all of the idr editing operations into their own idr functions. It makes it easier to prove locking correctness and to to understand the code flow. Signed-off-by: Eric Paris <eparis@redhat.com>
* 9p: Pass the correct end of buffer to p9stat_readLatchesar Ionkov2010-07-271-1/+1
| | | | | | | Pass the correct end of the buffer to p9stat_read. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* sysfs: allow creating symlinks from untagged to tagged directoriesEric W. Biederman2010-07-261-1/+2
| | | | | | | | | | Supporting symlinks from untagged to tagged directories is reasonable, and needed to support CONFIG_SYSFS_DEPRECATED. So don't fail a prior allowing that case to work. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* sysfs: sysfs_delete_link handle symlinks from untagged to tagged directories.Eric W. Biederman2010-07-261-1/+1
| | | | | | | | This happens for network devices when SYSFS_DEPRECATED is enabled. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* sysfs: Don't allow the creation of symlinks we can't removeEric W. Biederman2010-07-261-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Recently my tagged sysfs support revealed a flaw in the device core that a few rare drivers are running into such that we don't always put network devices in a class subdirectory named net/. Since we are not creating the class directory the network devices wind up in a non-tagged directory, but the symlinks to the network devices from /sys/class/net are in a tagged directory. All of which works until we go to remove or rename the symlink. When we remove or rename a symlink we look in the namespace of the target of the symlink. Since the target of the symlink is in a non-tagged sysfs directory we don't have a namespace to look in, and we fail to remove the symlink. Detect this problem up front and simply don't create symlinks we won't be able to remove later. This prevents symlink leakage and fails in a much clearer and more understandable way. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* CIFS: Fix a malicious redirect problem in the DNS lookup codeDavid Howells2010-07-223-5/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the security problem in the CIFS filesystem DNS lookup code in which a malicious redirect could be installed by a random user by simply adding a result record into one of their keyrings with add_key() and then invoking a CIFS CFS lookup [CVE-2010-2524]. This is done by creating an internal keyring specifically for the caching of DNS lookups. To enforce the use of this keyring, the module init routine creates a set of override credentials with the keyring installed as the thread keyring and instructs request_key() to only install lookup result keys in that keyring. The override is then applied around the call to request_key(). This has some additional benefits when a kernel service uses this module to request a key: (1) The result keys are owned by root, not the user that caused the lookup. (2) The result keys don't pop up in the user's keyrings. (3) The result keys don't come out of the quota of the user that caused the lookup. The keyring can be viewed as root by doing cat /proc/keys: 2a0ca6c3 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: 1/4 It can then be listed with 'keyctl list' by root. # keyctl list 0x2a0ca6c3 1 key in keyring: 726766307: --alswrv 0 0 dns_resolver: foo.bar.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve French <smfrench@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix up trivial spelling errors ('taht' -> 'that')Linus Torvalds2010-07-211-1/+1
| | | | | | | | | Pointed out by Lucas who found the new one in a comment in setup_percpu.c. And then I fixed the others that I grepped for. Reported-by: Lucas <canolucas@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2010-07-205-39/+72
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: do not include cap/dentry releases in replayed messages ceph: reuse request message when replaying against recovering mds ceph: fix creation of ipv6 sockets ceph: fix parsing of ipv6 addresses ceph: fix printing of ipv6 addrs ceph: add kfree() to error path ceph: fix leak of mon authorizer ceph: fix message revocation
| * ceph: do not include cap/dentry releases in replayed messagesSage Weil2010-07-162-0/+9
| | | | | | | | | | | | | | | | | | Strip the cap and dentry releases from replayed messages. They can cause the shared state to get out of sync because they were generated (with the request message) earlier, and no longer reflect the current client state. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: reuse request message when replaying against recovering mdsSage Weil2010-07-161-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replayed rename operations (after an mds failure/recovery) were broken because the request paths were regenerated from the dentry names, which get mangled when d_move() is called. Instead, resend the previous request message when replaying completed operations. Just make sure the REPLAY flag is set and the target ino is filled in. This fixes problems with workloads doing renames when the MDS restarts, where the rename operation appears to succeed, but on mds restart then fails (leading to client confusion, app breakage, etc.). Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix creation of ipv6 socketsSage Weil2010-07-091-3/+5
| | | | | | | | | | | | Use the address family from the peer address instead of assuming IPv4. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix parsing of ipv6 addressesSage Weil2010-07-091-6/+19
| | | | | | | | | | | | | | Check for brackets around the ipv6 address to avoid ambiguity with the port number. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix printing of ipv6 addrsSage Weil2010-07-081-18/+6
| | | | | | | | | | | | | | | | The buffer was too small. Make it bigger, use snprintf(), put brackets around the ipv6 address to avoid mixing it up with the :port, and use the ever-so-handy %pI[46] formats. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: add kfree() to error pathDan Carpenter2010-07-081-0/+1
| | | | | | | | | | | | | | We leak a "pi" on this error path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix leak of mon authorizerSage Weil2010-07-051-0/+3
| | | | | | | | | | | | Fix leak of a struct ceph_buffer on umount. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: fix message revocationSage Weil2010-07-051-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | A message can be on a queue (pending or sent), or out_msg (sending), or both. We were assuming that if it's not on a queue it couldn't be out_msg, but that was false in the case of lossy connections like the OSD. Fix ceph_con_revoke() to treat these cases independently. Also, fix the out_kvec_is_message check to only trigger if we are currently sending _this_ message. This fixes a GPF in tcp_sendpage, triggered by OSD restarts. Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'shrinker' of ↵Linus Torvalds2010-07-1918-74/+103
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev * 'shrinker' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev: xfs: track AGs with reclaimable inodes in per-ag radix tree xfs: convert inode shrinker to per-filesystem contexts mm: add context argument to shrinker callback
| * | xfs: track AGs with reclaimable inodes in per-ag radix treeDave Chinner2010-07-202-7/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://bugzilla.kernel.org/show_bug.cgi?id=16348 When the filesystem grows to a large number of allocation groups, the summing of recalimable inodes gets expensive. In many cases, most AGs won't have any reclaimable inodes and so we are wasting CPU time aggregating over these AGs. This is particularly important for the inode shrinker that gets called frequently under memory pressure. To avoid the overhead, track AGs with reclaimable inodes in the per-ag radix tree so that we can find all the AGs with reclaimable inodes via a simple gang tag lookup. This involves setting the tag when the first reclaimable inode is tracked in the AG, and removing the tag when the last reclaimable inode is removed from the tree. Then the summation process becomes a loop walking the radix tree summing AGs with the reclaim tag set. This significantly reduces the overhead of scanning - a 6400 AG filesystea now only uses about 25% of a cpu in kswapd while slab reclaim progresses instead of being permanently stuck at 100% CPU and making little progress. Clean filesystems filesystems will see no overhead and the overhead only increases linearly with the number of dirty AGs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * | xfs: convert inode shrinker to per-filesystem contextsDave Chinner2010-07-204-53/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now the shrinker passes us a context, wire up a shrinker context per filesystem. This allows us to remove the global mount list and the locking problems that introduced. It also means that a shrinker call does not need to traverse clean filesystems before finding a filesystem with reclaimable inodes. This significantly reduces scanning overhead when lots of filesystems are present. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * | mm: add context argument to shrinker callbackDave Chinner2010-07-1914-16/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2010-07-192-23/+126
|\ \ \ | |/ / |/| | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix checks in BTRFS_IOC_CLONE_RANGE Btrfs: fix CLONE ioctl destination file size expansion to block boundary Btrfs: fix split_leaf double split corner case
| * | Btrfs: fix checks in BTRFS_IOC_CLONE_RANGEDan Rosenberg2010-07-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check whether the donor file is append-only before writing to it. 2. The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer overflow that allows a user to specify an out-of-bounds range to copy from the source file (if off + len wraps around). I haven't been able to successfully exploit this, but I'd imagine that a clever attacker could use this to read things he shouldn't. Even if it's not exploitable, it couldn't hurt to be safe. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> cc: stable@kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | Btrfs: fix CLONE ioctl destination file size expansion to block boundarySage Weil2010-07-191-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CLONE and CLONE_RANGE ioctls round up the range of extents being cloned to the block size when the range to clone extends to the end of file (this is always the case with CLONE). It was then using that offset when extending the destination file's i_size. Fix this by not setting i_size beyond the originally requested ending offset. This bug was introduced by a22285a6 (2.6.35-rc1). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
OpenPOWER on IntegriCloud