summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | | | | | [CIFS] Remove obsolete headerSteve French2010-09-291-37/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We decided not to use connector to do the upcalls so cn_cifs.h is obsolete - remove it. Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: allow matching of tcp sessions in CifsNew stateJeff Layton2010-09-291-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With commit 7332f2a6217ee6925f83ef0e725013067ed316ba, cifsd will no longer exit when the socket abends and the tcpStatus is CifsNew. With that change, there's no reason to avoid matching an existing session in this state. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: add tcon field to cifsFileInfo structJeff Layton2010-09-292-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eventually, we'll have more than one tcon per superblock. At that point, we'll need to know which one is associated with a particular fid. For now, this is just set from the cifs_sb->tcon pointer, but eventually the caller of cifs_new_fileinfo will pass a tcon pointer in. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: add "mfsymlinks" mount optionStefan Metzmacher2010-09-293-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the start for an implementation of "Minshall+French Symlinks" (see http://wiki.samba.org/index.php/UNIX_Extensions#Minshall.2BFrench_symlinks). Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: use Minshall+French symlink functionsStefan Metzmacher2010-09-294-4/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If configured, Minshall+French Symlinks are used against all servers. If the server supports UNIX Extensions, we still create Minshall+French Symlinks on write, but on read we fallback to UNIX Extension symlinks. Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: implement CIFSCreateMFSymLink()Stefan Metzmacher2010-09-291-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: implement CIFSFormatMFSymlink()Stefan Metzmacher2010-09-291-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: implement CIFSQueryMFSymLink()Stefan Metzmacher2010-09-291-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: implement CIFSCouldBeMFSymlink() and CIFSCheckMFSymlink()Stefan Metzmacher2010-09-292-0/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: implement CIFSParseMFSymlink()Stefan Metzmacher2010-09-291-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: Allow binding to local IP address.Ben Greear2010-09-293-2/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using multi-homed machines, it's nice to be able to specify the local IP to use for outbound connections. This patch gives cifs the ability to bind to a particular IP address. Usage: mount -t cifs -o srcaddr=192.168.1.50,user=foo, ... Usage: mount -t cifs -o srcaddr=2002::100:1,user=foo, ... Acked-by: Jeff Layton <jlayton@redhat.com> Acked-by: Dr. David Holder <david.holder@erion.co.uk> Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs NTLMv2/NTLMSSP ntlmv2 within ntlmssp autentication codeShirish Pargaonkar2010-09-298-53/+225
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attribue Value (AV) pairs or Target Info (TI) pairs are part of ntlmv2 authentication. Structure ntlmv2_resp had only definition for two av pairs. So removed it, and now allocation of av pairs is dynamic. For servers like Windows 7/2008, av pairs sent by server in challege packet (type 2 in the ntlmssp exchange/negotiation) can vary. Server sends them during ntlmssp negotiation. So when ntlmssp is used as an authentication mechanism, type 2 challenge packet from server has this information. Pluck it and use the entire blob for authenticaiton purpose. If user has not specified, extract (netbios) domain name from the av pairs which is used to calculate ntlmv2 hash. Servers like Windows 7 are particular about the AV pair blob. Servers like Windows 2003, are not very strict about the contents of av pair blob used during ntlmv2 authentication. So when security mechanism such as ntlmv2 is used (not ntlmv2 in ntlmssp), there is no negotiation and so genereate a minimal blob that gets used in ntlmv2 authentication as well as gets sent. Fields tilen and tilbob are session specific. AV pair values are defined. To calculate ntlmv2 response we need ti/av pair blob. For sec mech like ntlmssp, the blob is plucked from type 2 response from the server. From this blob, netbios name of the domain is retrieved, if user has not already provided, to be included in the Target String as part of ntlmv2 hash calculations. For sec mech like ntlmv2, create a minimal, two av pair blob. The allocated blob is freed in case of error. In case there is no error, this blob is used in calculating ntlmv2 response (in CalcNTLMv2_response) and is also copied on the response to the server, and then freed. The type 3 ntlmssp response is prepared on a buffer, 5 * sizeof of struct _AUTHENTICATE_MESSAGE, an empirical value large enough to hold _AUTHENTICATE_MESSAGE plus a blob with max possible 10 values as part of ntlmv2 response and lmv2 keys and domain, user, workstation names etc. Also, kerberos gets selected as a default mechanism if server supports it, over the other security mechanisms. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs NTLMv2/NTLMSSP Change variable name mac_key to session key to reflect ↵Shirish Pargaonkar2010-09-295-23/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the key it holds Change name of variable mac_key to session key. The reason mac_key was changed to session key is, this structure does not hold message authentication code, it holds the session key (for ntlmv2, ntlmv1 etc.). mac is generated as a signature in cifs_calc* functions. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: fix broken oplock handlingSuresh Jayaraman2010-09-293-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cifs_new_fileinfo() does not use the 'oplock' value from the callers. Instead, it sets it to REQ_OPLOCK which seems wrong. We should be using the oplock value obtained from the Server to set the inode's clientCanCacheAll or clientCanCacheRead flags. Fix this by passing oplock from the callers to cifs_new_fileinfo(). This change dates back to commit a6ce4932 (2.6.30-rc3). So, all the affected versions will need this fix. Please Cc stable once reviewed and accepted. Cc: Stable <stable@kernel.org> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
| | * | | | | | cifs: use type __u32 instead of int for the oplock parameterSuresh Jayaraman2010-09-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and avoid implicit casting from a signed type. Also, pass oplock by value instead by reference as we don't intend to change the value in cifs_open_inode_helper(). Thanks to Jeff Layton for spotting this. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-10-221-0/+3
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: Fix dlm lock status block comment in dlm.h dlm: Don't send callback to node making lock request when "try 1cb" fails
| * | | | | | | | dlm: Don't send callback to node making lock request when "try 1cb" failsSteven Whitehouse2010-09-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When converting a lock, an lkb is in the granted state and also being used to request a new state. In the case that the conversion was a "try 1cb" type which has failed, and if the new state was incompatible with the old state, a callback was being generated to the requesting node. This is incorrect as callbacks should only be sent to all the other nodes holding blocking locks. The requesting node should receive the normal (failed) response to its "try 1cb" conversion request only. This was discovered while debugging a performance problem on GFS2, however this fix also speeds up GFS as well. In the GFS2 case the performance gain is over 10x for cases of write activity to an inode whose glock is cached on another, idle (wrt that glock) node. (comment added, dct) Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
* | | | | | | | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2010-10-2260-1375/+1185
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: (36 commits) xfs: semaphore cleanup xfs: Extend project quotas to support 32bit project ids xfs: remove xfs_buf wrappers xfs: remove xfs_cred.h xfs: remove xfs_globals.h xfs: remove xfs_version.h xfs: remove xfs_refcache.h xfs: fix the xfs_trans_committed xfs: remove unused t_callback field in struct xfs_trans xfs: fix bogus m_maxagi check in xfs_iget xfs: do not use xfs_mod_incore_sb_batch for per-cpu counters xfs: do not use xfs_mod_incore_sb for per-cpu counters xfs: remove XFS_MOUNT_NO_PERCPU_SB xfs: pack xfs_buf structure more tightly xfs: convert buffer cache hash to rbtree xfs: serialise inode reclaim within an AG xfs: batch inode reclaim lookup xfs: implement batched inode lookups for AG walking xfs: split out inode walk inode grabbing xfs: split inode AG walking into separate code for reclaim ...
| * \ \ \ \ \ \ \ \ Merge branch 'v2.6.36'Alex Elder2010-10-211-0/+2
| |\ \ \ \ \ \ \ \ \ | | | |_|_|/ / / / / | | |/| | | | | | |
| * | | | | | | | | xfs: semaphore cleanupThomas Gleixner2010-10-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead. (Ported to current XFS code by <aelder@sgi.com>.) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: Extend project quotas to support 32bit project idsArkadiusz Mi?kiewicz2010-10-1816-44/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for 32bit project quota identifiers. On disk format is backward compatible with 16bit projid numbers. projid on disk is now kept in two 16bit values - di_projid_lo (which holds the same position as old 16bit projid value) and new di_projid_hi (takes existing padding) and converts from/to 32bit value on the fly. xfs_admin (for existing fs), mkfs.xfs (for new fs) needs to be used to enable PROJID32BIT support. Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: remove xfs_buf wrappersChristoph Hellwig2010-10-1816-48/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop having two different names for many buffer functions and use the more descriptive xfs_buf_* names directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: remove xfs_cred.hChristoph Hellwig2010-10-1812-58/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're not actually passing around credentials inside XFS for a while now, so remove all xfs_cred.h with it's cred_t typedef and all instances of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: remove xfs_globals.hChristoph Hellwig2010-10-182-24/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This header only provides one extern that isn't actually declared anywhere, and shadowed by a macro. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: remove xfs_version.hChristoph Hellwig2010-10-183-30/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It used to have a place when it contained an automatically generated CVS version, but these days it's entirely superflous. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: remove xfs_refcache.hChristoph Hellwig2010-10-181-52/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This header has been completely unused for a couple of years. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: fix the xfs_trans_committedChristoph Hellwig2010-10-181-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the correct prototype for xfs_trans_committed instead of casting it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: remove unused t_callback field in struct xfs_transChristoph Hellwig2010-10-182-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: fix bogus m_maxagi check in xfs_igetChristoph Hellwig2010-10-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These days inode64 should only control which AGs we allocate new inodes from, while we still try to support reading all existing inodes. To make this actually work the check ontop of xfs_iget needs to be relaxed to allow inodes in all allocation groups instead of just those that we allow allocating inodes from. Note that we can't simply remove the check - it prevents us from accessing invalid data when fed invalid inode numbers from NFS or bulkstat. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: do not use xfs_mod_incore_sb_batch for per-cpu countersChristoph Hellwig2010-10-182-107/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the per-cpu counters manually in xfs_trans_unreserve_and_mod_sb and remove support for per-cpu counters from xfs_mod_incore_sb_batch to simplify it. And added benefit is that we don't have to take m_sb_lock for transactions that only modify per-cpu counters. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: do not use xfs_mod_incore_sb for per-cpu countersChristoph Hellwig2010-10-185-40/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Export xfs_icsb_modify_counters and always use it for modifying the per-cpu counters. Remove support for per-cpu counters from xfs_mod_incore_sb to simplify it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: remove XFS_MOUNT_NO_PERCPU_SBChristoph Hellwig2010-10-183-29/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fail the mount if we can't allocate memory for the per-CPU counters. This is consistent with how we handle everything else in the mount path and makes the superblock counter modification a lot simpler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: pack xfs_buf structure more tightlyDave Chinner2010-10-181-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pahole reports the struct xfs_buf has quite a few holes in it, so packing the structure better will reduce the size of it by 16 bytes. Also, move all the fields used in cache lookups into the first cacheline. Before on x86_64: /* size: 320, cachelines: 5 */ /* sum members: 298, holes: 6, sum holes: 22 */ After on x86_64: /* size: 304, cachelines: 5 */ /* padding: 6 */ /* last cacheline: 48 bytes */ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: convert buffer cache hash to rbtreeDave Chinner2010-10-184-76/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The buffer cache hash is showing typical hash scalability problems. In large scale testing the number of cached items growing far larger than the hash can efficiently handle. Hence we need to move to a self-scaling cache indexing mechanism. I have selected rbtrees for indexing becuse they can have O(log n) search scalability, and insert and remove cost is not excessive, even on large trees. Hence we should be able to cache large numbers of buffers without incurring the excessive cache miss search penalties that the hash is imposing on us. To ensure we still have parallel access to the cache, we need multiple trees. Rather than hashing the buffers by disk address to select a tree, it seems more sensible to separate trees by typical access patterns. Most operations use buffers from within a single AG at a time, so rather than searching lots of different lists, separate the buffer indexes out into per-AG rbtrees. This means that searches during metadata operation have a much higher chance of hitting cache resident nodes, and that updates of the tree are less likely to disturb trees being accessed on other CPUs doing independent operations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: serialise inode reclaim within an AGDave Chinner2010-10-183-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory reclaim via shrinkers has a terrible habit of having N+M concurrent shrinker executions (N = num CPUs, M = num kswapds) all trying to shrink the same cache. When the cache they are all working on is protected by a single spinlock, massive contention an slowdowns occur. Wrap the per-ag inode caches with a reclaim mutex to serialise reclaim access to the AG. This will block concurrent reclaim in each AG but still allow reclaim to scan multiple AGs concurrently. Allow shrinkers to move on to the next AG if it can't get the lock, and if we can't get any AG, then start blocking on locks. To prevent reclaimers from continually scanning the same inodes in each AG, add a cursor that tracks where the last reclaim got up to and start from that point on the next reclaim. This should avoid only ever scanning a small number of inodes at the satart of each AG and not making progress. If we have a non-shrinker based reclaim pass, ignore the cursor and reset it to zero once we are done. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: batch inode reclaim lookupDave Chinner2010-10-181-33/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Batch and optimise the per-ag inode lookup for reclaim to minimise scanning overhead. This involves gang lookups on the radix trees to get multiple inodes during each tree walk, and tighter validation of what inodes can be reclaimed without blocking befor we take any locks. This is based on ideas suggested in a proof-of-concept patch posted by Nick Piggin. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: implement batched inode lookups for AG walkingDave Chinner2010-10-182-23/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the reclaim code separated from the generic walking code, it is simple to implement batched lookups for the generic walk code. Separate out the inode validation from the execute operations and modify the tree lookups to get a batch of inodes at a time. Reclaim operations will be optimised separately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: split out inode walk inode grabbingDave Chinner2010-10-182-54/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing read side inode cache walks, the code to validate and grab an inode is common to all callers. Split it out of the execute callbacks in preparation for batching lookups. Similarly, split out the inode reference dropping from the execute callbacks into the main lookup look to be symmetric with the grab. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: split inode AG walking into separate code for reclaimDave Chinner2010-10-186-115/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reclaim walk requires different locking and has a slightly different walk algorithm, so separate it out so that it can be optimised separately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: remove buftarg hash for external devicesDave Chinner2010-10-181-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For RT and external log devices, we never use hashed buffers on them now. Remove the buftarg hash tables that are set up for them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: use unhashed buffers for size checksDave Chinner2010-10-183-45/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we are checking we can access the last block of each device, we do not need to use cached buffers as they will be tossed away immediately. Use uncached buffers for size checks so that all IO prior to full in-memory structure initialisation does not use the buffer cache. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: kill XBF_FS_MANAGED buffersDave Chinner2010-10-183-59/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Filesystem level managed buffers are buffers that have their lifecycle controlled by the filesystem layer, not the buffer cache. We currently cache these buffers, which makes cleanup and cache walking somewhat troublesome. Convert the fs managed buffers to uncached buffers obtained by via xfs_buf_get_uncached(), and remove the XBF_FS_MANAGED special cases from the buffer cache. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: store xfs_mount in the buftarg instead of in the xfs_bufDave Chinner2010-10-185-21/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each buffer contains both a buftarg pointer and a mount pointer. If we add a mount pointer into the buftarg, we can avoid needing the b_mount field in every buffer and grab it from the buftarg when needed instead. This shrinks the xfs_buf by 8 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: introduced uncached buffer read primitveDave Chinner2010-10-182-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid the need to use cached buffers for single-shot or buffers cached at the filesystem level, introduce a new buffer read primitive that bypasses the cache an reads directly from disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: rename xfs_buf_get_nodaddr to be more appropriateDave Chinner2010-10-186-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_buf_get_nodaddr() is really used to allocate a buffer that is uncached. While it is not directly assigned a disk address, the fact that they are not cached is a more important distinction. With the upcoming uncached buffer read primitive, we should be consistent with this disctinction. While there, make page allocation in xfs_buf_get_nodaddr() safe against memory reclaim re-entrancy into the filesystem by allowing a flags parameter to be passed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: don't use vfs writeback for pure metadata modificationsDave Chinner2010-10-1812-86/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under heavy multi-way parallel create workloads, the VFS struggles to write back all the inodes that have been changed in age order. The bdi flusher thread becomes CPU bound, spending 85% of it's time in the VFS code, mostly traversing the superblock dirty inode list to separate dirty inodes old enough to flush. We already keep an index of all metadata changes in age order - in the AIL - and continued log pressure will do age ordered writeback without any extra overhead at all. If there is no pressure on the log, the xfssyncd will periodically write back metadata in ascending disk address offset order so will be very efficient. Hence we can stop marking VFS inodes dirty during transaction commit or when changing timestamps during transactions. This will keep the inodes in the superblock dirty list to those containing data or unlogged metadata changes. However, the timstamp changes are slightly more complex than this - there are a couple of places that do unlogged updates of the timestamps, and the VFS need to be informed of these. Hence add a new function xfs_trans_ichgtime() for transactional changes, and leave xfs_ichgtime() for the non-transactional changes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * | | | | | | | | xfs: lockless per-ag lookupsDave Chinner2010-10-183-11/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we start taking a reference to the per-ag for every cached buffer in the system, kernel lockstat profiling on an 8-way create workload shows the mp->m_perag_lock has higher acquisition rates than the inode lock and has significantly more contention. That is, it becomes the highest contended lock in the system. The perag lookup is trivial to convert to lock-less RCU lookups because perag structures never go away. Hence the only thing we need to protect against is tree structure changes during a grow. This can be done simply by replacing the locking in xfs_perag_get() with RCU read locking. This removes the mp->m_perag_lock completely from this path. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: remove debug assert for per-ag reference countingDave Chinner2010-10-181-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we start taking references per cached buffer to the the perag it is cached on, it will blow the current debug maximum reference count assert out of the water. The assert has never caught a bug, and we have tracing to track changes if there ever is a problem, so just remove it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: reduce the number of CIL lock round trips during commitDave Chinner2010-10-181-105/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When commiting a transaction, we do a lock CIL state lock round trip on every single log vector we insert into the CIL. This is resulting in the lock being as hot as the inode and dcache locks on 8-way create workloads. Rework the insertion loops to bring the number of lock round trips to one per transaction for log vectors, and one more do the busy extents. Also change the allocation of the log vector buffer not to zero it as we copy over the entire allocated buffer anyway. This patch also includes a structural cleanup to the CIL item insertion provided by Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
| * | | | | | | | | xfs: eliminate some newly-reported gcc warningsPoyo VL2010-10-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ionut Gabriel Popescu <poyo_vl@yahoo.com> submitted a simple change to eliminate some "may be used uninitialized" warnings when building XFS. The reported condition seems to be something that GCC did not used to recognize or report. The warnings were produced by: gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) Signed-off-by: Ionut Gabriel Popescu <poyo_vl@yahoo.com> Signed-off-by: Alex Elder <aelder@sgi.com>
OpenPOWER on IntegriCloud