summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | | | | | NFSv4.1: Fix a potential layoutget/layoutrecall deadlockTrond Myklebust2018-07-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the client is sending a layoutget, but the server issues a callback to recall what it thinks may be an outstanding layout, then we may find an uninitialised layout attached to the inode due to the layoutget. In that case, it is appropriate to return NFS4ERR_NOMATCHING_LAYOUT rather than NFS4ERR_DELAY, as the latter can end up deadlocking. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | | | | | | | | | | pNFS: Parse the results of layoutget on open even if permissions checks failTrond Myklebust2018-07-263-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even if the results of the permissions checks failed, we should parse the results of the layout on open call so that we can return the layout if required. Note that we also want to ignore the sequence counter for whether or not a layout recall occurred. If the recall pertained to our OPEN, then the callback will know, and will attempt to wait for us to finih processing anyway. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | | | | | | | | | | NFS: Allow optimisation of lseek(fd, SEEK_CUR, 0) on directoriesTrond Myklebust2018-07-261-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There should be no need to grab the inode lock if we're only reading the file offset. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | | | | | | | | | | pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout()Trond Myklebust2018-07-261-5/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the old layout was recalled, and we returned NFS4ERR_NOMATCHINGLAYOUT then we need to wait for all outstanding layoutget calls to complete before we can send a new one. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | | | | | | | | | | pNFS/flexfiles: Ensure we always return a layout if it has layoutstatsTrond Myklebust2018-07-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a layout segment is carrying layoutstats or layout error information, then we always want to return it rather than using a forgetful model. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | | | | | | | | | | pNFS: Ignore non-recalled layouts in pnfs_layout_need_return()Trond Myklebust2018-07-261-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a layout has been recalled, then we should fire off a layoutreturn as soon as all the layout segments that match the recall have been retired. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | | | | | | | | | | pNFS: Don't update the stateid when replying NFS4ERR_DELAY to a layout recallTrond Myklebust2018-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC5661 doesn't state directly that the client should update the layout stateid if it returns NFS4ERR_NOMATCHING_LAYOUT in response to a recall, however it does state that this error will "cleanly indicate completion" on par with returning the layout. For this reason, we assume that the client should update the layout stateid. The Linux pNFS server definitely does expect this behaviour. However, if the client replies NFS4ERR_DELAY, then it is stating that the recall was not processed, so it would be very wrong to update the layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | | | | | | | | | | pNFS: Don't discard layout segments that are marked for returnTrond Myklebust2018-07-262-16/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are layout segments that are marked for return, then we need to ensure that pnfs_mark_matching_lsegs_return() does not just silently discard them, but it should tell the caller that there is a layoutreturn scheduled. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | | | | | | | | | | | | Merge tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2018-08-2316-103/+143
|\ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Chuck Lever fixed a problem with NFSv4.0 callbacks over GSS from multi-homed servers. The only new feature is a minor bit of protocol (change_attr_type) which the client doesn't even use yet. Other than that, various bugfixes and cleanup" * tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux: (27 commits) sunrpc: Add comment defining gssd upcall API keywords nfsd: Remove callback_cred nfsd: Use correct credential for NFSv4.0 callback with GSS sunrpc: Extract target name into svc_cred sunrpc: Enable the kernel to specify the hostname part of service principals sunrpc: Don't use stack buffer with scatterlist rpc: remove unneeded variable 'ret' in rdma_listen_handler nfsd: use true and false for boolean values nfsd: constify write_op[] fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id NFSD: Handle full-length symlinks NFSD: Refactor the generic write vector fill helper svcrdma: Clean up Read chunk path svcrdma: Avoid releasing a page in svc_xprt_release() nfsd: Mark expected switch fall-through sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data' nfsd: fix leaked file lock with nfs exported overlayfs nfsd: don't advertise a SCSI layout for an unsupported request_queue nfsd: fix corrupted reply to badly ordered compound nfsd: clarify check_op_ordering ...
| * | | | | | | | | | | | | nfsd: Remove callback_credChuck Lever2018-08-223-30/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: The global callback_cred is no longer used, so it can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: Use correct credential for NFSv4.0 callback with GSSChuck Lever2018-08-221-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've had trouble when operating a multi-homed Linux NFS server with Kerberos using NFSv4.0. Lately, I've seen my clients reporting this (and then hanging): May 9 11:43:26 manet kernel: NFS: NFSv4 callback contains invalid cred The client-side commit f11b2a1cfbf5 ("nfs4: copy acceptor name from context to nfs_client") appears to be related, but I suspect this problem has been going on for some time before that. RFC 7530 Section 3.3.3 says: > For Kerberos V5, nfs/hostname would be a server principal in the > Kerberos Key Distribution Center database. This is the same > principal the client acquired a GSS-API context for when it issued > the SETCLIENTID operation ... In other words, an NFSv4.0 client expects that the server will use the same GSS principal for callback that the client used to establish its lease. For example, if the client used the service principal "nfs@server.domain" to establish its lease, the server is required to use "nfs@server.domain" when performing NFSv4.0 callback operations. The Linux NFS server currently does not. It uses a common service principal for all callback connections. Sometimes this works as expected, and other times -- for example, when the server is accessible via multiple hostnames -- it won't work at all. This patch scrapes the target name from the client credential, and uses that for the NFSv4.0 callback credential. That should be correct much more often. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | sunrpc: Extract target name into svc_credChuck Lever2018-08-221-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSv4.0 callback needs to know the GSS target name the client used when it established its lease. That information is available from the GSS context created by gssproxy. Make it available in each svc_cred. Note this will also give us access to the real target service principal name (which is typically "nfs", but spec does not require that). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: use true and false for boolean valuesGustavo A. R. Silva2018-08-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return statements in functions returning bool should use true or false instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: constify write_op[]Eric Biggers2018-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | write_op[] is never modified, so make it 'const'. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_idnixiaoming2018-08-091-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | READ_BUF(8); dummy = be32_to_cpup(p++); dummy = be32_to_cpup(p++); ... READ_BUF(4); dummy = be32_to_cpup(p++); Assigning value to "dummy" here, but that stored value is overwritten before it can be used. At the same time READ_BUF() will re-update the pointer p. delete invalid assignment statements Signed-off-by: nixiaoming <nixiaoming@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | NFSD: Handle full-length symlinksChuck Lever2018-08-092-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've given up on the idea of zero-copy handling of SYMLINK on the server side. This is because the Linux VFS symlink API requires the symlink pathname to be in a NUL-terminated kmalloc'd buffer. The NUL-termination is going to be problematic (watching out for landing on a page boundary and dealing with a 4096-byte pathname). I don't believe that SYMLINK creation is on a performance path or is requested frequently enough that it will cause noticeable CPU cache pollution due to data copies. There will be two places where a transport callout will be necessary to fill in the rqstp: one will be in the svc_fill_symlink_pathname() helper that is used by NFSv2 and NFSv3, and the other will be in nfsd4_decode_create(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | NFSD: Refactor the generic write vector fill helperChuck Lever2018-08-093-21/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fill_in_write_vector() is nearly the same logic as svc_fill_write_vector(), but there are a few differences so that the former can handle multiple WRITE payloads in a single COMPOUND. svc_fill_write_vector() can be adjusted so that it can be used in the NFSv4 WRITE code path too. Instead of assuming the pages are coming from rq_args.pages, have the caller pass in the page list. The immediate benefit is a reduction of code duplication. It also prevents the NFSv4 WRITE decoder from passing an empty vector element when the transport has provided the payload in the xdr_buf's page array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: Mark expected switch fall-throughGustavo A. R. Silva2018-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Warning level 2 was used: -Wimplicit-fallthrough=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: fix leaked file lock with nfs exported overlayfsAmir Goldstein2018-08-095-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfsd and lockd call vfs_lock_file() to lock/unlock the inode returned by locks_inode(file). Many places in nfsd/lockd code use the inode returned by file_inode(file) for lock manipulation. With Overlayfs, file_inode() (the underlying inode) is not the same object as locks_inode() (the overlay inode). This can result in "Leaked POSIX lock" messages and eventually to a kernel crash as reported by Eddie Horng: https://marc.info/?l=linux-unionfs&m=153086643202072&w=2 Fix all the call sites in nfsd/lockd that should use locks_inode(). This is a correctness bug that manifested when overlayfs gained NFS export support in v4.16. Reported-by: Eddie Horng <eddiehorng.tw@gmail.com> Tested-by: Eddie Horng <eddiehorng.tw@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Fixes: 8383f1748829 ("ovl: wire up NFS export operations") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: don't advertise a SCSI layout for an unsupported request_queueBenjamin Coddington2018-06-191-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 30181faae37f ("nfsd: Check queue type before submitting a SCSI request") did the work of ensuring that we don't send SCSI requests to a request queue that won't support them, but that check is in the GETDEVICEINFO path. Let's not set the SCSI layout in fs_layout_type in the first place, and then we'll have less clients sending GETDEVICEINFO for non-SCSI request queues and less unnecessary WARN_ONs. While we're in here, remove some outdated comments that refer to "overwriting" layout seletion because commit 8a4c3926889e ("nfsd: allow nfsd to advertise multiple layout types") changed things to no longer overwrite the layout type. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: fix corrupted reply to badly ordered compoundJ. Bruce Fields2018-06-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're encoding a single op in the reply but leaving the number of ops zero, so the reply makes no sense. Somewhat academic as this isn't a case any real client will hit, though in theory perhaps that could change in a future protocol extension. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: clarify check_op_orderingJ. Bruce Fields2018-06-171-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document a couple things that confused me on a recent reading. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: update obselete comment referencing the BKLJ. Bruce Fields2018-06-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's inode->i_lock that's now taken in setlease and break_lease, instead of the big kernel lock. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd4: cleanup sessionid in nfsd4_destroy_sessionJ. Bruce Fields2018-06-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The name of this variable doesn't fit the type. And we only ever use one field of it. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd4: less confusing nfsd4_compound_in_sessionJ. Bruce Fields2018-06-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the function prototype match the name a little better. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd4: support change_attr_type attributeJ. Bruce Fields2018-06-172-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The change attribute is what is used by clients to revalidate their caches. Our server may use i_version or ctime for that purpose. Those choices behave slightly differently, and it may be useful to the client to know which we're using. This attribute tells the client that. The Linux client doesn't yet use this attribute yet, though. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd: fix NFSv4 time_delta attributeJ. Bruce Fields2018-06-171-3/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we return the worst-case value of 1 second in the time delta attribute. That's not terribly useful. Instead, return a value calculated from the time granularity supported by the filesystem and the system clock. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd4: return default lease periodJ. Bruce Fields2018-06-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I don't have a good rationale for the lease period, but 90 seconds seems long, and as long as we're allowing the server to extend the grace period up to double the lease period, let's half the default to 45. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | | | | | | nfsd4: extend reclaim period for reclaiming clientsJ. Bruce Fields2018-06-174-1/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the client is only renewing state a little sooner than once a lease period, then it might not discover the server has restarted till close to the end of the grace period, and might run out of time to do the actual reclaim. Extend the grace period by a second each time we notice there are clients still trying to reclaim, up to a limit of another whole lease period. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | | | | | | | | | | Merge tag 'upstream-4.19-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds2018-08-2334-558/+724
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ | |_|_|/ / / / / / / / / / / |/| | | | / / / / / / / / / | | |_|_|/ / / / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull UBI/UBIFS updates from Richard Weinberger: - Year 2038 preparations - New UBI feature to skip CRC checks of static volumes - A new Kconfig option to disable xattrs in UBIFS - Lots of fixes in UBIFS, found by our new test framework * tag 'upstream-4.19-rc1' of git://git.infradead.org/linux-ubifs: (21 commits) ubifs: Set default assert action to read-only ubifs: Allow setting assert action as mount parameter ubifs: Rework ubifs_assert() ubifs: Pass struct ubifs_info to ubifs_assert() ubifs: Turn two ubifs_assert() into a WARN_ON() ubi: expose the volume CRC check skip flag ubi: provide a way to skip CRC checks ubifs: Use kmalloc_array() ubifs: Check data node size before truncate Revert "UBIFS: Fix potential integer overflow in allocation" ubifs: Add comment on c->commit_sem ubifs: introduce Kconfig symbol for xattr support ubifs: use swap macro in swap_dirty_idx ubifs: tnc: use monotonic znode timestamp ubifs: use timespec64 for inode timestamps ubifs: xattr: Don't operate on deleted inodes ubifs: gc: Fix typo ubifs: Fix memory leak in lprobs self-check ubi: Initialize Fastmap checkmapping correctly ubifs: Fix synced_i_size calculation for xattr inodes ...
| * | | | | | | | | | | | ubifs: Set default assert action to read-onlyRichard Weinberger2018-08-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally UBIFS just reported a failed assertion and moved on. The drawback is that users will notice UBIFS bugs when it is too late, most of the time when it is no longer about to mount. This makes bug hunting problematic since valuable information from failing asserts is long gone when UBIFS is dead. The other extreme, panic'ing on a failing assert is also not worthwhile, we want users and developers give a chance to collect as much debugging information as possible if UBIFS hits an assert. Therefore go for the third option, switch to read-only mode when an assert fails. That way UBIFS will not write possible bad data to the MTD and gives users the chance to collect debugging information. Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Allow setting assert action as mount parameterRichard Weinberger2018-08-153-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expose our three options to userspace. Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Rework ubifs_assert()Richard Weinberger2018-08-153-4/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With having access to struct ubifs_info in ubifs_assert() we can give more information when an assert is failing. By using ubifs_err() we can tell which UBIFS instance failed. Also multiple actions can be taken now. We support: - report: This is what UBIFS did so far, just report the failure and go on. - read-only: Switch to read-only mode. - panic: shoot the kernel in the head. Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Pass struct ubifs_info to ubifs_assert()Richard Weinberger2018-08-1531-521/+550
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows us to have more context in ubifs_assert() and take different actions depending on the configuration. Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Turn two ubifs_assert() into a WARN_ON()Richard Weinberger2018-08-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are going to pass struct ubifs_info to ubifs_assert() but while unloading the UBIFS module we don't have the info struct anymore. Therefore replace the asserts by a regular WARN_ON(). Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Use kmalloc_array()Richard Weinberger2018-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 6da2ec56059c ("treewide: kmalloc() -> kmalloc_array()") we use kmalloc_array() for kmalloc() that computes the length with a multiplication. Cc: Kees Cook <keescook@chromium.org> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Check data node size before truncateRichard Weinberger2018-08-151-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check whether the size is within bounds before using it. If the size is not correct, abort and dump the bad data node. Cc: Kees Cook <keescook@chromium.org> Cc: Silvio Cesare <silvio.cesare@gmail.com> Cc: stable@vger.kernel.org Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Reported-by: Silvio Cesare <silvio.cesare@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | Revert "UBIFS: Fix potential integer overflow in allocation"Richard Weinberger2018-08-151-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 353748a359f1821ee934afc579cf04572406b420. It bypassed the linux-mtd review process and fixes the issue not as it should. Cc: Kees Cook <keescook@chromium.org> Cc: Silvio Cesare <silvio.cesare@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Add comment on c->commit_semRichard Weinberger2018-08-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every single time I come across that code, I get confused because it looks like a possible dead lock. Help myself by adding a comment. Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: introduce Kconfig symbol for xattr supportStefan Agner2018-08-156-3/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow to disable extended attribute support. This aids in reliability testing, especially since some xattr related bugs have surfaced. Also an embedded system might not need it, so this allows for a slightly smaller kernel (about 4KiB). Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: use swap macro in swap_dirty_idxGustavo A. R. Silva2018-08-151-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of the swap macro and remove unnecessary variable *t*. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: tnc: use monotonic znode timestampArnd Bergmann2018-08-154-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tnc uses get_seconds() based timestamps to check the age of a znode, which has two problems: on 32-bit architectures this may overflow in 2038 or 2106, and it gives incorrect information when the system time is updated using settimeofday(). Using montonic timestamps with ktime_get_seconds() solves both thes problems. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: use timespec64 for inode timestampsArnd Bergmann2018-08-152-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both vfs and the on-disk inode structures can deal with fine-grained timestamps now, so this is the last missing piece to make ubifs y2038-safe on 32-bit architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: xattr: Don't operate on deleted inodesRichard Weinberger2018-08-151-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xattr operations can race with unlink and the following assert triggers: UBIFS assert failed in ubifs_jnl_change_xattr at 1606 (pid 6256) Fix this by checking i_nlink before working on the host inode. Cc: <stable@vger.kernel.org> Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: gc: Fix typoRichard Weinberger2018-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBIFS operates on LEBs, not PEBs. Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Fix memory leak in lprobs self-checkRichard Weinberger2018-08-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocate the buffer after we return early. Otherwise memory is being leaked. Cc: <stable@vger.kernel.org> Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Fix synced_i_size calculation for xattr inodesRichard Weinberger2018-08-151-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ubifs_jnl_update() we sync parent and child inodes to the flash, in case of xattrs, the parent inode (AKA host inode) has a non-zero data_len. Therefore we need to adjust synced_i_size too. This issue was reported by ubifs self tests unter a xattr related work load. UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: ui_size is 4, synced_i_size is 0, but inode is clean UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: i_ino 65, i_mode 0x81a4, i_size 4 Cc: <stable@vger.kernel.org> Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | | | | | | | ubifs: Fix directory size calculation for symlinksRichard Weinberger2018-08-151-2/+3
| | |_|_|_|_|_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have to account the name of the symlink and not the target length. Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks") Cc: <stable@vger.kernel.org> Signed-off-by: Richard Weinberger <richard@nod.at>
* | | | | | | | | | | | Merge tag 'f2fs-for-4.19' of ↵Linus Torvalds2018-08-2218-527/+1645
|\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've tuned f2fs to improve general performance by serializing block allocation and enhancing discard flows like fstrim which avoids user IO contention. And we've added fsync_mode=nobarrier which gives an option to user where it skips issuing cache_flush commands to underlying flash storage. And there are many bug fixes related to fuzzed images, revoked atomic writes, quota ops, and minor direct IO. Enhancements: - add fsync_mode=nobarrier which bypasses cache_flush command - enhance the discarding flow which avoids user IOs and issues in LBA order - readahead some encrypted blocks during GC - enable in-memory inode checksum to verify the blocks if F2FS_CHECK_FS is set - enhance nat_bits behavior - set -o discard by default - set REQ_RAHEAD to bio in ->readpages Bug fixes: - fix a corner case to corrupt atomic_writes revoking flow - revisit i_gc_rwsem to fix race conditions - fix some dio behaviors captured by xfstests - correct handling errors given by quota-related failures - add many sanity check flows to avoid fuzz test failures - add more error number propagation to their callers - fix several corner cases to continue fault injection w/ shutdown loop" * tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (89 commits) f2fs: readahead encrypted block during GC f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc f2fs: fix performance issue observed with multi-thread sequential read f2fs: fix to skip verifying block address for non-regular inode f2fs: rework fault injection handling to avoid a warning f2fs: support fault_type mount option f2fs: fix to return success when trimming meta area f2fs: fix use-after-free of dicard command entry f2fs: support discard submission error injection f2fs: split discard command in prior to block layer f2fs: wake up gc thread immediately when gc_urgent is set f2fs: fix incorrect range->len in f2fs_trim_fs() f2fs: refresh recent accessed nat entry in lru list f2fs: fix avoid race between truncate and background GC f2fs: avoid race between zero_range and background GC f2fs: fix to do sanity check with block address in main area v2 f2fs: fix to do sanity check with inline flags f2fs: fix to reset i_gc_failures correctly f2fs: fix invalid memory access f2fs: fix to avoid broken of dnode block list ...
| * | | | | | | | | | | | f2fs: readahead encrypted block during GCChao Yu2018-08-203-22/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During GC, for each encrypted block, we will read block synchronously into meta page, and then submit it into current cold data log area. So this block read model with 4k granularity can make poor performance, like migrating non-encrypted block, let's readahead encrypted block as well to improve migration performance. To implement this, we choose meta page that its index is old block address of the encrypted block, and readahead ciphertext into this page, later, if readaheaded page is still updated, we will load its data into target meta page, and submit the write IO. Note that for OPU, truncation, deletion, we need to invalid meta page after we invalid old block address, to make sure we won't load invalid data from target meta page during encrypted block migration. for ((i = 0; i < 1000; i++)) do { xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync"; } done for ((i = 0; i < 1000; i+=2)) do { rm /mnt/f2fs/dir/$i; } done ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0); Before: gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc] gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1 gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc] gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc] gc-6549 [001] .... 214682.213902: block_plug: [gc] gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc] gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1 gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc] gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc] gc-6549 [001] .... 214682.226414: block_plug: [gc] gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc] gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1 gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc] gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc] gc-6549 [001] .... 214682.226911: block_plug: [gc] gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc] gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1 After: gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc] gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc] gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc] gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc] gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc] gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc] gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc] gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc] gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc] gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc] gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc] gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc] gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc] gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc] gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc] gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc] gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1 gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] .... 214327.026046: block_plug: [gc] Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
OpenPOWER on IntegriCloud