summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2013-09-2221-175/+364
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are mostly bug fixes and a two small performance fixes. The most important of the bunch are Josef's fix for a snapshotting regression and Mark's update to fix compile problems on arm" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits) Btrfs: create the uuid tree on remount rw btrfs: change extent-same to copy entire argument struct Btrfs: dir_inode_operations should use btrfs_update_time also btrfs: Add btrfs: prefix to kernel log output btrfs: refuse to remount read-write after abort Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0 Btrfs: don't leak transaction in btrfs_sync_file() Btrfs: add the missing mutex unlock in write_all_supers() Btrfs: iput inode on allocation failure Btrfs: remove space_info->reservation_progress Btrfs: kill delay_iput arg to the wait_ordered functions Btrfs: fix worst case calculator for space usage Revert "Btrfs: rework the overcommit logic to be based on the total size" Btrfs: improve replacing nocow extents Btrfs: drop dir i_size when adding new names on replay Btrfs: replay dir_index items before other items Btrfs: check roots last log commit when checking if an inode has been logged Btrfs: actually log directory we are fsync()'ing Btrfs: actually limit the size of delalloc range Btrfs: allocate the free space by the existed max extent size when ENOSPC ...
| * Btrfs: create the uuid tree on remount rwJosef Bacik2013-09-211-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | Users have been complaining of the uuid tree stuff warning that there is no uuid root when trying to do snapshot operations. This is because if you mount -o ro we will not create the uuid tree. But then if you mount -o rw,remount we will still not create it and then any subsequent snapshot/subvol operations you try to do will fail gloriously. Fix this by creating the uuid_root on remount rw if it was not already there. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * btrfs: change extent-same to copy entire argument structMark Fasheh2013-09-211-31/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_ioctl_file_extent_same() uses __put_user_unaligned() to copy some data back to it's argument struct. Unfortunately, not all architectures provide __put_user_unaligned(), so compiles break on them if btrfs is selected. Instead, just copy the whole struct in / out at the start and end of operations, respectively. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: dir_inode_operations should use btrfs_update_time alsoGuangyu Sun2013-09-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2bc5565286121d2a77ccd728eb3484dff2035b58 (Btrfs: don't update atime on RO subvolumes) ensures that the access time of an inode is not updated when the inode lives in a read-only subvolume. However, if a directory on a read-only subvolume is accessed, the atime is updated. This results in a write operation to a read-only subvolume. I believe that access times should never be updated on read-only subvolumes. To reproduce: # mkfs.btrfs -f /dev/dm-3 (...) # mount /dev/dm-3 /mnt # btrfs subvol create /mnt/sub Create subvolume '/mnt/sub' # mkdir /mnt/sub/dir # echo "abc" > /mnt/sub/dir/file # btrfs subvol snapshot -r /mnt/sub /mnt/rosnap Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap' # stat /mnt/rosnap/dir File: `/mnt/rosnap/dir' Size: 8 Blocks: 0 IO Block: 4096 directory Device: 16h/22d Inode: 257 Links: 1 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-09-11 07:21:49.389157126 -0400 Modify: 2013-09-11 07:22:02.330156079 -0400 Change: 2013-09-11 07:22:02.330156079 -0400 # ls /mnt/rosnap/dir file # stat /mnt/rosnap/dir File: `/mnt/rosnap/dir' Size: 8 Blocks: 0 IO Block: 4096 directory Device: 16h/22d Inode: 257 Links: 1 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-09-11 07:22:56.797151670 -0400 Modify: 2013-09-11 07:22:02.330156079 -0400 Change: 2013-09-11 07:22:02.330156079 -0400 Reported-by: Koen De Wit <koen.de.wit@oracle.com> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * btrfs: Add btrfs: prefix to kernel log outputFrank Holton2013-09-211-2/+2
| | | | | | | | | | | | | | | | | | | | The kernel log entries for device label %s and device fsid %pU are missing the btrfs: prefix. Add those here. Signed-off-by: Frank Holton <fholton@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * btrfs: refuse to remount read-write after abortDavid Sterba2013-09-211-0/+6
| | | | | | | | | | | | | | | | | | | | | | It's still possible to flip the filesystem into RW mode after it's remounted RO due to an abort. There are lots of places that check for the superblock error bit and will not write data, but we should not let the filesystem appear read-write. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when ↵chandan2013-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | arg is 0 This patch makes it possible to set BTRFS_FS_TREE_OBJECTID as the default subvolume by passing a subvolume id of 0. Signed-off-by: chandan <chandan@linux.vnet.ibm.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: don't leak transaction in btrfs_sync_file()Filipe David Borba Manana2013-09-211-2/+2
| | | | | | | | | | | | | | | | | | In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would return without ending/freeing the transaction. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: add the missing mutex unlock in write_all_supers()Stefan Behrens2013-09-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The BUG() was replaced by btrfs_error() and return -EIO with the patch "get rid of one BUG() in write_all_supers()", but the missing mutex_unlock() was overlooked. The 0-DAY kernel build service from Intel reported the missing unlock which was found by the coccinelle tool: fs/btrfs/disk-io.c:3422:2-8: preceding lock on line 3374 Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: iput inode on allocation failureJosef Bacik2013-09-211-0/+4
| | | | | | | | | | | | | | | | We don't do the iput when we fail to allocate our delayed delalloc work in __start_delalloc_inodes, fix this. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: remove space_info->reservation_progressJosef Bacik2013-09-212-12/+0
| | | | | | | | | | | | | | This isn't used for anything anymore, just remove it. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: kill delay_iput arg to the wait_ordered functionsJosef Bacik2013-09-218-33/+14
| | | | | | | | | | | | | | | | | | | | | | This is a left over of how we used to wait for ordered extents, which was to grab the inode and then run filemap flush on it. However if we have an ordered extent then we already are holding a ref on the inode, and we just use btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on the inode to start work on the ordered extent. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: fix worst case calculator for space usageJosef Bacik2013-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | Forever ago I made the worst case calculator say that we could potentially split into 3 blocks for every level on the way down, which isn't right. If we split we're only going to get two new blocks, the one we originally cow'ed and the new one we're going to split. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Revert "Btrfs: rework the overcommit logic to be based on the total size"Josef Bacik2013-09-211-12/+3
| | | | | | | | | | | | | | | | | | | | This reverts commit 70afa3998c9baed4186df38988246de1abdab56d. It is causing performance issues and wasn't actually correct. There were problems with the way we flushed delalloc and that was the real cause of the early enospc. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: improve replacing nocow extentsJosef Bacik2013-09-211-14/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various people have hit a deadlock when running btrfs/011. This is because when replacing nocow extents we will take the i_mutex to make sure nobody messes with the file while we are replacing the extent. The problem is we are already holding a transaction open, which is a locking inversion, so instead we need to save these inodes we find and then process them outside of the transaction. Further we can't just lock the inode and assume we are good to go. We need to lock the extent range and then read back the extent cache for the inode to make sure the extent really still points at the physical block we want. If it doesn't we don't have to copy it. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: drop dir i_size when adding new names on replayJosef Bacik2013-09-211-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | So if we have dir_index items in the log that means we also have the inode item as well, which means that the inode's i_size is correct. However when we process dir_index'es we call btrfs_add_link() which will increase the directory's i_size for the new entry. To fix this we need to just set the dir items i_size to 0, and then as we find dir_index items we adjust the i_size. btrfs_add_link() will do it for new entries, and if the entry already exists we can just add the name_len to the i_size ourselves. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: replay dir_index items before other itemsJosef Bacik2013-09-211-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported a bug where his log would not replay because he was getting -EEXIST back. This was because he had a file moved into a directory that was logged. What happens is the file had a lower inode number, and so it is processed first when replaying the log, and so we add the inode ref in for the directory it was moved to. But then we process the directories DIR_INDEX item and try to add the inode ref for that inode and it fails because we already added it when we replayed the inode. To solve this problem we need to just process any DIR_INDEX items we have in the log first so this all is taken care of, and then we can replay the rest of the items. With this patch my reproducer can remount the file system properly instead of erroring out. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: check roots last log commit when checking if an inode has been loggedJosef Bacik2013-09-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Liu introduced a local copy of the last log commit for an inode to make sure we actually log an inode even if a log commit has already taken place. In order to make sure we didn't relog the same inode multiple times he set this local copy to the current trans when we log the inode, because usually we log the inode and then sync the log. The exception to this is during rename, we will relog an inode if the name changed and it is already in the log. The problem with this is then we go to sync the inode, and our check to see if the inode has already been logged is tripped and we don't sync the log. To fix this we need to _also_ check against the roots last log commit, because it could be less than what is in our local copy of the log commit. This fixes a bug where we rename a file into a directory and then fsync the directory and then on remount the directory is no longer there. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: actually log directory we are fsync()'ingJosef Bacik2013-09-211-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | If you just create a directory and then fsync that directory and then pull the power plug you will come back up and the directory will not be there. That is because we won't actually create directories if we've logged files inside of them since they will be created on replay, but in this check we will set our logged_trans of our current directory if it happens to be a directory, making us think it doesn't need to be logged. Fix the logic to only do this to parent directories. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: actually limit the size of delalloc rangeJosef Bacik2013-09-211-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So forever we have had this thing to limit the amount of delalloc pages we'll setup to be written out to 128mb. This is because we have to lock all the pages in this range, so anything above this gets a bit unweildly, and also without a limit we'll happily allocate gigantic chunks of disk space. Turns out our check for this wasn't quite right, we wouldn't actually limit the chunk we wanted to write out, we'd just stop looking for more space after we went over the limit. So if you do a giant 20gb dd on my box with lots of ram I could get 2gig extents. This is fine normally, except when you go to relocate these extents and we can't find enough space to relocate these moster extents, since we have to be able to allocate exactly the same sized extent to move it around. So fix this by actually enforcing the limit. With this patch I'm no longer seeing giant 1.5gb extents. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: allocate the free space by the existed max extent size when ENOSPCMiao Xie2013-09-213-31/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By the current code, if the requested size is very large, and all the extents in the free space cache are small, we will waste lots of the cpu time to cut the requested size in half and search the cache again and again until it gets down to the size the allocator can return. In fact, we can know the max extent size in the cache after the first search, so we needn't cut the size in half repeatedly, and just use the max extent size directly. This way can save lots of cpu time and make the performance grow up when there are only fragments in the free space cache. According to my test, if there are only 4KB free space extents in the fs, and the total size of those extents are 256MB, we can reduce the execute time of the following test from 5.4s to 1.4s. dd if=/dev/zero of=<testfile> bs=1MB count=1 oflag=sync Changelog v2 -> v3: - fix the problem that we skip the block group with the space which is less than we need. Changelog v1 -> v2: - address the problem that we return a wrong start position when searching the free space in a bitmap. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * btrfs: add lockdep and tracing annotations for uuid treeDavid Sterba2013-09-212-0/+2
| | | | | | | | | | | | Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * btrfs: show compiled-in config features at module load timeStefan Behrens2013-09-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | We want to know if there are debugging features compiled in, this may affect performance. The message is printed before the sanity checks. (This commit message is a copy of David Sterba's commit message when he introduced btrfs_print_info()). Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: more efficient inode tree replace operationFilipe David Borba Manana2013-09-211-5/+5
| | | | | | | | | | | | | | | | | | | | | | Instead of removing the current inode from the red black tree and then add the new one, just use the red black tree replace operation, which is more efficient. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: do not add replace target to the alloc_listIlya Dryomov2013-09-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If replace was suspended by the umount, replace target device is added to the fs_devices->alloc_list during a later mount. This is obviously wrong. ->is_tgtdev_for_dev_replace is supposed to guard against that, but ->is_tgtdev_for_dev_replace is (and can only ever be) initialized *after* everything is opened and fs_devices lists are populated. Fix this by checking the devid instead: for replace targets it's always equal to BTRFS_DEV_REPLACE_DEVID. Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: fixup error handling in btrfs_reloc_cowJosef Bacik2013-09-213-22/+32
| | | | | | | | | | | | | | | | | | | | | | If we failed to actually allocate the correct size of the extent to relocate we will end up in an infinite loop because we won't return an error, we'll just move on to the next extent. So fix this up by returning an error, and then fix all the callers to return an error up the stack rather than BUG_ON()'ing. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Merge tag 'v3.11' into for-linusChris Mason2013-09-2199-410/+956
| |\ | | | | | | | | | Linux 3.11
* | \ Merge tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2013-09-211-0/+11
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfix from Trond Myklebust: "Fix a regression due to incorrect sharing of gss auth caches" * tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: RPCSEC_GSS: fix crash on destroying gss auth
| * | | RPCSEC_GSS: fix crash on destroying gss authJ. Bruce Fields2013-09-181-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression since eb6dc19d8e72ce3a957af5511d20c0db0a8bd007 "RPCSEC_GSS: Share all credential caches on a per-transport basis" which could cause an occasional oops in the nfsd code (see below). The problem was that an auth was left referencing a client that had been freed. To avoid this we need to ensure that auths are shared only between descendants of a common client; the fact that a clone of an rpc_client takes a reference on its parent then ensures that the parent client will last as long as the auth. Also add a comment explaining what I think was the intention of this code. general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc CPU: 3 PID: 4071 Comm: kworker/u8:2 Not tainted 3.11.0-rc2-00182-g025145f #1665 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: nfsd4_callbacks nfsd4_do_callback_rpc [nfsd] task: ffff88003e206080 ti: ffff88003c384000 task.ti: ffff88003c384000 RIP: 0010:[<ffffffffa00001f3>] [<ffffffffa00001f3>] rpc_net_ns+0x53/0x70 [sunrpc] RSP: 0000:ffff88003c385ab8 EFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: ffff88003af9a800 RCX: 0000000000000002 RDX: ffffffffa00001a5 RSI: 0000000000000001 RDI: ffffffff81e284e0 RBP: ffff88003c385ad8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000015 R12: ffff88003c990840 R13: ffff88003c990878 R14: ffff88003c385ba8 R15: ffff88003e206080 FS: 0000000000000000(0000) GS:ffff88003fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fcdf737e000 CR3: 000000003ad2b000 CR4: 00000000000006e0 Stack: ffffffffa00001a5 0000000000000006 0000000000000006 ffff88003af9a800 ffff88003c385b08 ffffffffa00d52a4 ffff88003c385ba8 ffff88003c751bd8 ffff88003c751bc0 ffff88003e113600 ffff88003c385b18 ffffffffa00d530c Call Trace: [<ffffffffa00001a5>] ? rpc_net_ns+0x5/0x70 [sunrpc] [<ffffffffa00d52a4>] __gss_pipe_release+0x54/0x90 [auth_rpcgss] [<ffffffffa00d530c>] gss_pipe_free+0x2c/0x30 [auth_rpcgss] [<ffffffffa00d678b>] gss_destroy+0x9b/0xf0 [auth_rpcgss] [<ffffffffa000de63>] rpcauth_release+0x23/0x30 [sunrpc] [<ffffffffa0001e81>] rpc_release_client+0x51/0xb0 [sunrpc] [<ffffffffa00020d5>] rpc_shutdown_client+0xe5/0x170 [sunrpc] [<ffffffff81098a14>] ? cpuacct_charge+0xa4/0xb0 [<ffffffff81098975>] ? cpuacct_charge+0x5/0xb0 [<ffffffffa019556f>] nfsd4_process_cb_update.isra.17+0x2f/0x210 [nfsd] [<ffffffff819a4ac0>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff819a4acb>] ? _raw_spin_unlock_irq+0x3b/0x60 [<ffffffff810703ab>] ? process_one_work+0x15b/0x510 [<ffffffffa01957dd>] nfsd4_do_callback_rpc+0x8d/0xa0 [nfsd] [<ffffffff8107041e>] process_one_work+0x1ce/0x510 [<ffffffff810703ab>] ? process_one_work+0x15b/0x510 [<ffffffff810712ab>] worker_thread+0x11b/0x370 [<ffffffff81071190>] ? manage_workers.isra.24+0x2b0/0x2b0 [<ffffffff8107854b>] kthread+0xdb/0xe0 [<ffffffff819a4ac0>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff81078470>] ? __init_kthread_worker+0x70/0x70 [<ffffffff819ac7dc>] ret_from_fork+0x7c/0xb0 [<ffffffff81078470>] ? __init_kthread_worker+0x70/0x70 Code: a5 01 00 a0 31 d2 31 f6 48 c7 c7 e0 84 e2 81 e8 f4 91 0a e1 48 8b 43 60 48 c7 c2 a5 01 00 a0 be 01 00 00 00 48 c7 c7 e0 84 e2 81 <48> 8b 98 10 07 00 00 e8 91 8f 0a e1 e8 +3c 4e 07 e1 48 83 c4 18 RIP [<ffffffffa00001f3>] rpc_net_ns+0x53/0x70 [sunrpc] RSP <ffff88003c385ab8> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | Merge tag 'pm+acpi-3.12-rc2' of ↵Linus Torvalds2013-09-209-22/+46
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: 1) Four fixes for cpufreq regressions introduced by the changes that removed Device Tree parsing for CPU device nodes from cpufreq drivers from Sudeep KarkadaNagesha. 2) Two fixes for recent cpufreq regressions introduced by changes related to the preservation of sysfs attributes over system suspend/resume cycles from Viresh Kumar. 3) Fix for ACPI-based wakeup signaling in the PCI subsystem that fails to stop PME polling for devices put into the D3cold power state from Rafael J Wysocki. 4) Fix for bad interactions between cpufreq and udev on systems supporting intel_pstate where acpi-cpufreq is available as well from Yinghai Lu. * tag 'pm+acpi-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: return EEXIST instead of EBUSY for second registering PCI / ACPI / PM: Clear pme_poll for devices in D3cold on wakeup ARM: shmobile: change dev_id to cpu0 while registering cpu clock ARM: i.MX: change dev_id to cpu0 while registering cpu clock cpufreq: imx6q-cpufreq: assign cpu_dev correctly to cpu0 device cpufreq: cpufreq-cpu0: assign cpu_dev correctly to cpu0 device cpufreq: unlock correct rwsem while updating policy->cpu cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish()
| * \ \ \ Merge branch 'pm-cpufreq'Rafael J. Wysocki2013-09-208-19/+43
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-cpufreq: cpufreq: return EEXIST instead of EBUSY for second registering ARM: shmobile: change dev_id to cpu0 while registering cpu clock ARM: i.MX: change dev_id to cpu0 while registering cpu clock cpufreq: imx6q-cpufreq: assign cpu_dev correctly to cpu0 device cpufreq: cpufreq-cpu0: assign cpu_dev correctly to cpu0 device cpufreq: unlock correct rwsem while updating policy->cpu cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish()
| | * | | | cpufreq: return EEXIST instead of EBUSY for second registeringYinghai Lu2013-09-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On systems that support intel_pstate, acpi_cpufreq fails to load, and udev keeps trying until trace gets filled up and kernel crashes. The root cause is driver return ret from cpufreq_register_driver(), because when some other driver takes over before, it will return EBUSY and then udev will keep trying ... cpufreq_register_driver() should return EEXIST instead so that the system can boot without appending intel_pstate=disable and still use intel_pstate. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | ARM: shmobile: change dev_id to cpu0 while registering cpu clockSudeep KarkadaNagesha2013-09-192-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently all clkdev registration use "cpufreq-cpu0.0" as dev_id for cpu clock which refers to virtual platform device. It needs to be "cpu0" instead which is actual cpu0 device id. This patch changes the dev_id from "cpufreq-cpu0.0" to "cpu0". Reported-and-tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Magnus Damm <damm@opensource.se> Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | ARM: i.MX: change dev_id to cpu0 while registering cpu clockSudeep KarkadaNagesha2013-09-192-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently all clkdev registration use "cpufreq-cpu0.0" as dev_id for cpu clock which refers to virtual platform device. It needs to be "cpu0" instead which is actual cpu0 device id. This patch changes the dev_id from "cpufreq-cpu0.0" to "cpu0". Reported-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Tested-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | cpufreq: imx6q-cpufreq: assign cpu_dev correctly to cpu0 deviceSudeep KarkadaNagesha2013-09-192-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit cdc58d602d2e657602a90c190cbf745886c95977 "cpufreq: imx6q-cpufreq: remove device tree parsing for cpu nodes" assumed the pdev->dev is set to cpu0 device in the platform code. But it actually points to the virtual cpufreq-cpu0 platform device which is not present in the device tree. Most of the information needed by cpufreq is stored in cpu0 DT node. So cpu_dev must point to cpu0 device. This patch fixes the wrong assignment to cpu_dev. Reported-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Tested-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | cpufreq: cpufreq-cpu0: assign cpu_dev correctly to cpu0 deviceSudeep KarkadaNagesha2013-09-191-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f837a9b5ab05c52a07108c6f09ca66f2e0aee757 "cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes" assumed the pdev->dev is set to cpu0 device in the platform code. But it actually points to the virtual cpufreq-cpu0 platform device which is not present in the device tree. Most of the information needed by cpufreq is stored in cpu0 DT node. So cpu_dev must point to cpu0 device. This patch fixes the wrong assignment to cpu_dev. Reported-and-tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | cpufreq: unlock correct rwsem while updating policy->cpuViresh Kumar2013-09-181-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code looks like this: WARN_ON(lock_policy_rwsem_write(cpu)); update_policy_cpu(policy, new_cpu); unlock_policy_rwsem_write(cpu); {lock|unlock}_policy_rwsem_write(cpu) takes/releases policy->cpu's rwsem. Because cpu is changing with the call to update_policy_cpu(), the unlock_policy_rwsem_write() will release the incorrect lock. The right solution would be to release the same lock as was taken earlier. Also update_policy_cpu() was also called from cpufreq_add_dev() without any locks and so its better if we move this locking to inside update_policy_cpu(). This patch fixes a regression introduced in 3.12 by commit f9ba680d23 (cpufreq: Extract the handover of policy cpu to a helper function). Reported-and-tested-by: Jon Medhurst<tixy@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish()Viresh Kumar2013-09-181-8/+8
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke after a recent change "cedb70a cpufreq: Split __cpufreq_remove_dev() into two parts" from Srivatsa. Consider a scenario where we have two CPUs in a policy (0 & 1) and we are removing CPU 1. On the call to __cpufreq_remove_dev_prepare() we have cleared 1 from policy->cpus and now on a call to __cpufreq_remove_dev_finish() we read cpumask_weight of policy->cpus, which will come as 1 and this code will behave as if we are removing the last CPU from policy :) Fix it by clearing the CPU mask in __cpufreq_remove_dev_finish() instead of __cpufreq_remove_dev_prepare(). Tested-by: Stephen Warren <swarren@wwwdotorg.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | Merge branch 'acpi-pci'Rafael J. Wysocki2013-09-201-3/+3
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | * acpi-pci: PCI / ACPI / PM: Clear pme_poll for devices in D3cold on wakeup
| | * | | PCI / ACPI / PM: Clear pme_poll for devices in D3cold on wakeupRafael J. Wysocki2013-09-201-3/+3
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 448bd85 (PCI/PM: add PCIe runtime D3cold support) added a piece of code to pci_acpi_wake_dev() causing that function to behave in a special way for devices in D3cold (so that their configuration registers are not accessed before those devices are resumed). However, it didn't take the clearing of the pme_poll flag into account. That has to be done for all devices, even if they are in D3cold, or pci_pme_list_scan() will not know that wakeup has been signaled for the device and will poll its PME Status bit unnecessarily. Fix the problem by moving the clearing of the pme_poll flag in pci_acpi_wake_dev() before the code introduced by commit 448bd85. Reported-and-tested-by: David E. Box <david.e.box@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: 3.6+ <stable@vger.kernel.org> # 3.6+
* | | | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2013-09-202-16/+31
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull vhost updates from Michael Tsirkin: "vhost: minor changes on top of 3.12-rc1 This fixes module loading for vhost-scsi, and tweaks locking in vhost core a bit. Both of these are not exactly release blockers but it's early in the cycle so I think it's a good idea to apply them now" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-scsi: whitespace tweak vhost/scsi: use vmalloc for order-10 allocation vhost: wake up worker outside spin_lock
| * | | | vhost-scsi: whitespace tweakMichael S. Tsirkin2013-09-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove space at start of line that sneaked in. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | | | vhost/scsi: use vmalloc for order-10 allocationMichael S. Tsirkin2013-09-171-14/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As vhost scsi device struct is large, if the device is created on a busy system, kzalloc() might fail, so this patch does a fallback to vzalloc(). As vmalloc() adds overhead on data-path, add __GFP_REPEAT to kzalloc() flags to do this fallback only when really needed. Reviewed-by: Asias He <asias@redhat.com> Reported-by: Dan Aloni <alonid@stratoscale.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | | | vhost: wake up worker outside spin_lockQin Chuanyu2013-09-171-1/+3
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the wake_up_process func is included by spin_lock/unlock in vhost_work_queue, but it could be done outside the spin_lock. I have test it with kernel 3.0.27 and guest suse11-sp2 using iperf, the num as below. original modified thread_num tp(Gbps) vhost(%) | tp(Gbps) vhost(%) 1 9.59 28.82 | 9.59 27.49 8 9.61 32.92 | 9.62 26.77 64 9.58 46.48 | 9.55 38.99 256 9.6 63.7 | 9.6 52.59 Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | | | CacheFiles: Don't try to dump the index key if the cookie has been clearedDavid Howells2013-09-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't try to dump the index key that distinguishes an object if netfs data in the cookie the object refers to has been cleared (ie. the cookie has passed most of the way through __fscache_relinquish_cookie()). Since the netfs holds the index key, we can't get at it once the ->def and ->netfs_data pointers have been cleared - and a NULL pointer exception will ensue, usually just after a: CacheFiles: Error: Unexpected object collision error is reported. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | CacheFiles: Fix memory leak in cachefiles_check_auxdata error pathsJosh Boyer2013-09-201-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cachefiles_check_auxdata(), we allocate auxbuf but fail to free it if we determine there's an error or that the data is stale. Further, assigning the output of vfs_getxattr() to auxbuf->len gives problems with checking for errors as auxbuf->len is a u16. We don't actually need to set auxbuf->len, so keep the length in a variable for now. We shouldn't need to check the upper limit of the buffer as an overflow there should be indicated by -ERANGE. While we're at it, fscache_check_aux() returns an enum value, not an int, so assign it to an appropriately typed variable rather than to ret. Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: Hongyi Jia <jiayisuse@gmail.com> cc: Milosz Tanski <milosz@adfin.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | lockref: use cmpxchg64 explicitly for lockless updatesWill Deacon2013-09-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cmpxchg() function tends not to support 64-bit arguments on 32-bit architectures. This could be either due to use of unsigned long arguments (like on ARM) or lack of instruction support (cmpxchgq on x86). However, these architectures may implement a specific cmpxchg64() function to provide 64-bit cmpxchg support instead. Since the lockref code requires a 64-bit cmpxchg and relies on the architecture selecting ARCH_USE_CMPXCHG_LOCKREF, move to using cmpxchg64 instead of cmpxchg and allow 32-bit architectures to make use of the lockless lockref implementation. Cc: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'arm64-stable' of ↵Linus Torvalds2013-09-205-15/+26
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 Pull ARM64 fixes from Catalin Marinas: - Compat register fault reporting fix - Documentation clarification on tagged pointers - hwcap widened to 64-bit (user space already reading it as 64-bit) * tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: arm64: Widen hwcap to be 64 bit arm64: Correctly report LR and SP for compat tasks arm64: documentation: tighten up tagged pointer documentation arm64: Make do_bad_area() function static
| * | | | arm64: Widen hwcap to be 64 bitSteve Capper2013-09-202-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under arm64 elf_hwcap is a 32 bit quantity, but it is stored in a 64 bit auxiliary ELF field and glibc reads hwcap as 64 bit. This patch widens elf_hwcap to be 64 bit. Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | | arm64: Correctly report LR and SP for compat tasksCatalin Marinas2013-09-201-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a task crashes and we print debugging information, ensure that compat tasks show the actual AArch32 LR and SP registers rather than the AArch64 ones. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
OpenPOWER on IntegriCloud