summaryrefslogtreecommitdiffstats
path: root/include/linux/sunrpc/svcsock.h
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2014-04-121-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull yet more networking updates from David Miller: 1) Various fixes to the new Redpine Signals wireless driver, from Fariya Fatima. 2) L2TP PPP connect code takes PMTU from the wrong socket, fix from Dmitry Petukhov. 3) UFO and TSO packets differ in whether they include the protocol header in gso_size, account for that in skb_gso_transport_seglen(). From Florian Westphal. 4) If VLAN untagging fails, we double free the SKB in the bridging output path. From Toshiaki Makita. 5) Several call sites of sk->sk_data_ready() were referencing an SKB just added to the socket receive queue in order to calculate the second argument via skb->len. This is dangerous because the moment the skb is added to the receive queue it can be consumed in another context and freed up. It turns out also that none of the sk->sk_data_ready() implementations even care about this second argument. So just kill it off and thus fix all these use-after-free bugs as a side effect. 6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti. 7) pktgen needs to do locking properly for LLTX devices, from Daniel Borkmann. 8) xen-netfront driver initializes TX array entries in RX loop :-) From Vincenzo Maffione. 9) After refactoring, some tunnel drivers allow a tunnel to be configured on top itself. Fix from Nicolas Dichtel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) vti: don't allow to add the same tunnel twice gre: don't allow to add the same tunnel twice drivers: net: xen-netfront: fix array initialization bug pktgen: be friendly to LLTX devices r8152: check RTL8152_UNPLUG net: sun4i-emac: add promiscuous support net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO net: ipv6: Fix oif in TCP SYN+ACK route lookup. drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts drivers: net: cpsw: discard all packets received when interface is down net: Fix use after free by removing length arg from sk_data_ready callbacks. Drivers: net: hyperv: Address UDP checksum issues Drivers: net: hyperv: Negotiate suitable ndis version for offload support Drivers: net: hyperv: Allocate memory for all possible per-pecket information bridge: Fix double free and memory leak around br_allowed_ingress bonding: Remove debug_fs files when module init fails i40evf: program RSS LUT correctly i40evf: remove open-coded skb_cow_head ixgb: remove open-coded skb_cow_head igbvf: remove open-coded skb_cow_head ...
| * net: Fix use after free by removing length arg from sk_data_ready callbacks.David S. Miller2014-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
* | nfsd: check passed socket's net matches NFSd superblock's oneStanislav Kinsbursky2014-03-311-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | There could be a case, when NFSd file system is mounted in network, different to socket's one, like below: "ip netns exec" creates new network and mount namespace, which duplicates NFSd mount point, created in init_net context. And thus NFS server stop in nested network context leads to RPCBIND client destruction in init_net. Then, on NFSd start in nested network context, rpc.nfsd process creates socket in nested net and passes it into "write_ports", which leads to RPCBIND sockets creation in init_net context because of the same reason (NFSd monut point was created in init_net context). An attempt to register passed socket in nested net leads to panic, because no RPCBIND client present in nexted network namespace. This patch add check that passed socket's net matches NFSd superblock's one. And returns -EINVAL error to user psace otherwise. v2: Put socket on exit. Reported-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* svcrpc: track rpc data length separately from sk_tcplenJ. Bruce Fields2012-12-041-2/+9
| | | | | | | | | | | Keep a separate field, sk_datalen, that tracks only the data contained in a fragment, not including the fragment header. For now, this is always just max(0, sk_tcplen - 4), but after we allow multiple fragments sk_datalen will accumulate the total rpc data size while sk_tcplen only tracks progress receiving the current fragment. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* svcrpc: don't byte-swap sk_reclen in placeJ. Bruce Fields2012-12-041-1/+11
| | | | | | | | | Byte-swapping in place is always a little dubious. Let's instead define this field to always be big-endian, and do the swapping on demand where we need it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: remove unused listener-removal interfacesJ. Bruce Fields2012-09-101-3/+0
| | | | | | | | | | | | | | | | | | | You can use nfsd/portlist to give nfsd additional sockets to listen on. In theory you can also remove listening sockets this way. But nobody's ever done that as far as I can tell. Also this was partially broken in 2.6.25, by a217813f9067b785241cb7f31956e51d2071703a "knfsd: Support adding transports by writing portlist file". (Note that we decide whether to take the "delfd" case by checking for a digit--but what's actually expected in that case is something made by svc_one_sock_name(), which won't begin with a digit.) So, let's just rip out this stuff. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* SUNRPC: service destruction in network namespace contextStanislav Kinsbursky2012-02-151-1/+1
| | | | | | | | | | | | | | | v2: Added comment to BUG_ON's in svc_destroy() to make code looks clearer. This patch introduces network namespace filter for service destruction function. Nothing special here - just do exactly the same operations, but only for tranports in passed networks namespace context. BTW, BUG_ON() checks for empty service transports lists were returned into svc_destroy() function. This is because of swithing generic svc_close_all() to networks namespace dependable svc_close_net(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* svcrpc: destroy server sockets all at onceJ. Bruce Fields2011-12-061-1/+1
| | | | | | | | There's no reason I can see that we need to call sv_shutdown between closing the two lists of sockets. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* SUNRPC: Don't wait for full record to receive tcp dataJ. Bruce Fields2011-04-071-0/+1
| | | | | | | | | | | Ensure that we immediately read and buffer data from the incoming TCP stream so that we grow the receive window quickly, and don't deadlock on large READ or WRITE requests. Also do some minor exit cleanup. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* rpc: move sk_bc_xprt to svc_xprtJ. Bruce Fields2011-01-111-1/+0
| | | | | | | This seems obviously transport-level information even if it's currently used only by the server socket code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd41: sunrpc: Added rpc server-side backchannel handlingRahul Iyer2009-09-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the call direction is a reply, copy the xid and call direction into the req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns rpc_garbage. Signed-off-by: Rahul Iyer <iyer@netapp.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [get rid of CONFIG_NFSD_V4_1] [sunrpc: refactoring of svc_tcp_recvfrom] [nfsd41: sunrpc: create common send routine for the fore and the back channels] [nfsd41: sunrpc: Use free_page() to free server backchannel pages] [nfsd41: sunrpc: Document server backchannel locking] [nfsd41: sunrpc: remove bc_connect_worker()] [nfsd41: sunrpc: Define xprt_server_backchannel()[ [nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions] [nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()] [nfsd41: sunrpc: Don't auto close the server backchannel connection] [nfsd41: sunrpc: Remove unused functions] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: change bc_sock to bc_xprt] [nfsd41: sunrpc: move struct rpc_buffer def into a common header file] [nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex] [removed cosmetic changes] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: add new xprt class for nfsv4.1 backchannel] [sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> [reverted more cosmetic leftovers] [got rid of xprt_server_backchannel] [separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Cc: Trond Myklebust <trond.myklebust@netapp.com> [sunrpc: change idle timeout value for the backchannel] Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Acked-by: Trond Myklebust <trond.myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsdLinus Torvalds2009-06-221-2/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits) SUNRPC: Fix the TCP server's send buffer accounting nfsd41: Backchannel: minorversion support for the back channel nfsd41: Backchannel: cleanup nfs4.0 callback encode routines nfsd41: Remove ip address collision detection case nfsd: optimise the starting of zero threads when none are running. nfsd: don't take nfsd_mutex twice when setting number of threads. nfsd41: sanity check client drc maxreqs nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct NFS: kill off complicated macro 'PROC' sunrpc: potential memory leak in function rdma_read_xdr nfsd: minor nfsd_vfs_write cleanup nfsd: Pull write-gathering code out of nfsd_vfs_write nfsd: track last inode only in use_wgather case sunrpc: align cache_clean work's timer nfsd: Use write gathering only with NFSv2 NFSv4: kill off complicated macro 'PROC' NFSv4: do exact check about attribute specified knfsd: remove unreported filehandle stats counters knfsd: fix reply cache memory corruption knfsd: reply cache cleanups ...
| * SUNRPC: pass buffer size to svc_sock_names()Chuck Lever2009-04-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Adjust the synopsis of svc_sock_names() to pass in the size of the output buffer. Add a documenting comment. This is a cosmetic change for now. A subsequent patch will make sure the buffer length is passed to one_sock_name(), where the length will actually be useful. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * SUNRPC: pass buffer size to svc_addsock()Chuck Lever2009-04-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Adjust the synopsis of svc_addsock() to pass in the size of the output buffer. Add a documenting comment. This is a cosmetic change for now. A subsequent patch will make sure the buffer length is passed to one_sock_name(), where the length will actually be useful. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* | nfs41: sunrpc: provide functions to create and destroy a svc_xprt for ↵Benny Halevy2009-06-171-0/+2
|/ | | | | | | | | | | | | | backchannel use For nfs41 callbacks we need an svc_xprt to process requests coming up the backchannel socket as rpc_rqst's that are transformed into svc_rqst's that need a rq_xprt to be processed. The svc_{udp,tcp}_create methods are too heavy for this job as svc_create_socket creates an actual socket to listen on while for nfs41 we're "reusing" the fore channel's socket. Signed-off-by: Benny Halevy <bhalevy@panasas.com>
* NLM: Remove unused argument from svc_addsock() functionChuck Lever2008-10-041-4/+1
| | | | | | | | | Clean up: The svc_addsock() function no longer uses its "proto" argument, so remove it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* SUNRPC: Update RPC server's TCP record marker decoderChuck Lever2008-04-231-2/+2
| | | | | | | | Clean up: Update the RPC server's TCP record marker decoder to match the constructs used by the RPC client's TCP socket transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Move the sockaddr information to svc_xprtTom Tucker2008-02-011-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the transport sockaddr to the svc_xprt structure. Convenience functions are added to set and get the local and remote addresses of a transport from the transport provider as well as determine the length of a sockaddr. A transport is responsible for setting the xpt_local and xpt_remote addresses in the svc_xprt structure as part of transport creation and xpo_accept processing. This cannot be done in a generic way and in fact varies between TCP, UDP and RDMA. A set of xpo_ functions (e.g. getlocalname, getremotename) could have been added but this would have resulted in additional caching and copying of the addresses around. Note that the xpt_local address should also be set on listening endpoints; for TCP/RDMA this is done as part of endpoint creation. For connected transports like TCP and RDMA, the addresses never change and can be set once and copied into the rqstp structure for each request. For UDP, however, the local and remote addresses may change for each request. In this case, the address information is obtained from the UDP recvmsg info and copied into the rqstp structure from there. A svc_xprt_local_port function was also added that returns the local port given a transport. This is used by svc_create_xprt when returning the port associated with a newly created transport, and later when creating a generic find transport service to check if a service is already listening on a given port. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Make deferral processing xprt independentTom Tucker2008-02-011-3/+0
| | | | | | | | | | | | | This patch moves the transport independent sk_deferred list to the svc_xprt structure and updates the svc_deferred_req structure to keep pointers to svc_xprt's directly. The deferral processing code is also moved out of the transport dependent recvfrom functions and into the generic svc_recv path. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Move the authinfo cache to svc_xprt.Tom Tucker2008-02-011-5/+0
| | | | | | | | | | | | | | | | Move the authinfo cache to svc_xprt. This allows both the TCP and RDMA transports to share this logic. A flag bit is used to determine if auth information is to be cached or not. Previously, this code looked at the transport protocol. I've also changed the spin_lock/unlock logic so that a lock is not taken for transports that are not caching auth info. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Remove sk_lastrecvTom Tucker2008-02-011-1/+0
| | | | | | | | | | | With the implementation of the new mark and sweep algorithm for shutting down old connections, the sk_lastrecv field is no longer needed. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Make svc_send transport neutralTom Tucker2008-02-011-1/+0
| | | | | | | | | | | | | Move the sk_mutex field to the transport independent svc_xprt structure. Now all the fields that svc_send touches are transport neutral. Change the svc_send function to use the transport independent svc_xprt directly instead of the transport dependent svc_sock structure. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Move sk_reserved to svc_xprtTom Tucker2008-02-011-2/+0
| | | | | | | | | | | This functionally trivial patch moves the sk_reserved field to the transport independent svc_xprt structure. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Make close transport independentTom Tucker2008-02-011-3/+1
| | | | | | | | | | | | | | | | | | | | Move sk_list and sk_ready to svc_xprt. This involves close because these lists are walked by svcs when closing all their transports. So I combined the moving of these lists to svc_xprt with making close transport independent. The svc_force_sock_close has been changed to svc_close_all and takes a list as an argument. This removes some svc internals knowledge from the svcs. This code races with module removal and transport addition. Thanks to Simon Holm Thøgersen for a compile fix. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
* svc: Move sk_server and sk_pool to svc_xprtTom Tucker2008-02-011-2/+0
| | | | | | | | | | | | This is another incremental change that moves transport independent fields from svc_sock to the svc_xprt structure. The changes should be functionally null. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Move sk_flags to the svc_xprt structureTom Tucker2008-02-011-13/+0
| | | | | | | | | | | This functionally trivial change moves the transport independent sk_flags field to the transport independent svc_xprt structure. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Change sk_inuse to a krefTom Tucker2008-02-011-1/+0
| | | | | | | | | | | | Change the atomic_t reference count to a kref and move it to the transport indepenent svc_xprt structure. Change the reference count wrapper names to be generic. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Change services to use new svc_create_xprt serviceTom Tucker2008-02-011-1/+0
| | | | | | | | | | Modify the various kernel RPC svcs to use the svc_create_xprt service. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Add xpo_accept transport functionTom Tucker2008-02-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the accept logic looked into the socket state to determine whether to call accept or recv when data-ready was indicated on an endpoint. Since some transports don't use sockets, this logic now uses a flag bit (SK_LISTENER) to identify listening endpoints. A transport function (xpo_accept) allows each transport to define its own accept processing. A transport's initialization logic is reponsible for setting the SK_LISTENER bit. I didn't see any way to do this in transport independent logic since the passive side of a UDP connection doesn't listen and always recv's. In the svc_recv function, if the SK_LISTENER bit is set, the transport xpo_accept function is called to handle accept processing. Note that all functions are defined even if they don't make sense for a given transport. For example, accept doesn't mean anything for UDP. The function is defined anyway and bug checks if called. The UDP transport should never set the SK_LISTENER bit. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Move sk_sendto and sk_recvfrom to svc_xprt_classTom Tucker2008-02-011-3/+0
| | | | | | | | | | | | The sk_sendto and sk_recvfrom are function pointers that allow svc_sock to be used for both UDP and TCP. Move these function pointers to the svc_xprt_ops structure. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* svc: Make svc_sock the tcp/udp transportTom Tucker2008-02-011-0/+4
| | | | | | | | | | | | | | Make TCP and UDP svc_sock transports, and register them with the svc transport core. A transport type (svc_sock) has an svc_xprt as its first member, and calls svc_xprt_init to initialize this field. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* SUNRPC server: record the destination address of a requestFrank van Maarseveen2007-07-101-0/+1
| | | | | | | | Save the destination address of an incoming request over TCP like is done already for UDP. It is necessary later for callbacks by the server. Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* knfsd: rename sk_defer_lock to sk_lockNeilBrown2007-05-091-1/+2
| | | | | | | | | | | | Now that sk_defer_lock protects two different things, make the name more generic. Also don't bother with disabling _bh as the lock is only ever taken from process context. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] knfsd: fix recently introduced problem with shutting down a busy NFS ↵NeilBrown2007-03-061-1/+1
| | | | | | | | | | | | | | | | | | | | | server When the last thread of nfsd exits, it shuts down all related sockets. It currently uses svc_close_socket to do this, but that only is immediately effective if the socket is not SK_BUSY. If the socket is busy - i.e. if a request has arrived that has not yet been processes - svc_close_socket is not effective and the shutdown process spins. So create a new svc_force_close_socket which removes the SK_BUSY flag is set and then calls svc_close_socket. Also change some open-codes loops in svc_destroy to use list_for_each_entry_safe. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] knfsd: SUNRPC: Cache remote peer's address in svc_sockChuck Lever2007-02-121-0/+3
| | | | | | | | | | | The remote peer's address won't change after the socket has been accepted. We don't need to call ->getname on every incoming request. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] knfsd: SUNRPC: allow creating an RPC service without registering ↵Chuck Lever2007-02-121-1/+1
| | | | | | | | | | | | | | | | | with portmapper Sometimes we need to create an RPC service but not register it with the local portmapper. NFSv4 delegation callback, for example. Change the svc_makesock() API to allow optionally creating temporary or permanent sockets, optionally registering with the local portmapper, and make it return the ephemeral port of the new socket. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] knfsd: SUNRPC: update internal API: separate pmap register and temp ↵Chuck Lever2007-02-121-0/+7
| | | | | | | | | | | | | | | | sockets Currently in the RPC server, registering with the local portmapper and creating "permanent" sockets are tied together. Expand the internal APIs to allow these two socket characteristics to be separately specified. This will be externalized in the next patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] knfsd: fix a race in closing NFSd connectionsNeilBrown2007-02-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you lose this race, it can iput a socket inode twice and you get a BUG in fs/inode.c When I added the option for user-space to close a socket, I added some cruft to svc_delete_socket so that I could call that function when closing a socket per user-space request. This was the wrong thing to do. I should have just set SK_CLOSE and let normal mechanisms do the work. Not only wrong, but buggy. The locking is all wrong and it openned up a race where-by a socket could be closed twice. So this patch: Introduces svc_close_socket which sets SK_CLOSE then either leave the close up to a thread, or calls svc_delete_socket if it can get SK_BUSY. Adds a bias to sk_busy which is removed when SK_DEAD is set, This avoid races around shutting down the socket. Changes several 'spin_lock' to 'spin_lock_bh' where the _bh was missing. Bugzilla-url: http://bugzilla.kernel.org/show_bug.cgi?id=7916 Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] knfsd: knfsd: cache ipmap per TCP socketGreg Banks2006-10-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Speed up high call-rate workloads by caching the struct ip_map for the peer on the connected struct svc_sock instead of looking it up in the ip_map cache hashtable on every call. This helps workloads using AUTH_SYS authentication over TCP. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. Profiling showed strcmp(), called from ip_map_match(), was taking 4.8% of each CPU, and ip_map_lookup() was taking 2.9%. This patch drops both contribution into the profile noise. Note that the above result overstates this value of this patch for most workloads. The synthetic clients are all using separate IP addresses, so there are 64 entries in the ip_map cache hash. Because the kernel measured contained the bug fixed in commit commit 1f1e030bf75774b6a283518e1534d598e14147d4 and was running on 64bit little-endian machine, probably all of those 64 entries were on a single chain, thus increasing the cost of ip_map_lookup(). With a modern kernel you would need more clients to see the same amount of performance improvement. This patch has helped to scale knfsd to handle a deployment with 2000 NFS clients. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] knfsd: split svc_serv into poolsGreg Banks2006-10-021-0/+1
| | | | | | | | | | | | | | | | | Split out the list of idle threads and pending sockets from svc_serv into a new svc_pool structure, and allocate a fixed number (in this patch, 1) of pools per svc_serv. The new structure contains a lock which takes over several of the duties of svc_serv->sv_lock, which is now relegated to protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv. The point is to move the hottest fields out of svc_serv and into svc_pool, allowing a following patch to arrange for a svc_pool per NUMA node or per CPU. This is a major step towards making the NFS server NUMA-friendly. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] knfsd: convert sk_reserved to atomic_tGreg Banks2006-10-021-1/+1
| | | | | | | | | | | Convert the svc_sock->sk_reserved variable from an int protected by svc_serv->sv_lock, to an atomic. This reduces (by 1) the number of places we need to take the (effectively global) svc_serv->sv_lock. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] knfsd: use new lock for svc_sock deferred listGreg Banks2006-10-021-0/+1
| | | | | | | | | | | Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock instead of svc_serv->sv_lock. Using the more fine-grained lock reduces the number of places we need to take the svc_serv lock. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] knfsd: convert sk_inuse to atomic_tGreg Banks2006-10-021-1/+1
| | | | | | | | | | | Convert the svc_sock->sk_inuse counter from an int protected by svc_serv->sv_lock, to an atomic. This reduces the number of places we need to take the (effectively global) svc_serv->sv_lock. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] knfsd: move tempsock aging to a timerGreg Banks2006-10-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following are 11 patches from Greg Banks which combine to make knfsd more Numa-aware. They reduce hitting on 'global' data structures, and create some data-structures that can be node-local. knfsd threads are bound to a particular node, and the thread to handle a new request is chosen from the threads that are attach to the node that received the interrupt. The distribution of threads across nodes can be controlled by a new file in the 'nfsd' filesystem, though the default approach of an even spread is probably fine for most sites. Some (old) numbers that show the efficacy of these patches: N == number of NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2 N Throughput, MiB/s CPU usage, % (max=N*100) Before After Before After --- ------ ---- ----- ----- 4 312 435 350 228 6 500 656 501 418 8 562 804 690 589 This patch: Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces the amount of work that needs to be done in the main RPC loop and the length of time we need to hold the (effectively global) svc_serv->sv_lock. [akpm@osdl.org: cleanup] Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] knfsd: Drop 'serv' option to svc_recv and svc_processNeilBrown2006-10-021-1/+1
| | | | | | | | | | | It isn't needed as it is available in rqstp->rq_server, and dropping it allows some local vars to be dropped. [akpm@osdl.org: build fix] Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist'NeilBrown2006-10-021-1/+5
| | | | | | | | | | | | Userspace should create and bind a socket (but not connectted) and write the 'fd' to portlist. This will cause the nfs server to listen on that socket. To close a socket, the name of the socket - as read from 'portlist' can be written to 'portlist' with a preceding '-'. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] knfsd: define new nfsdfs file: portlist - contains list of portsNeilBrown2006-10-021-0/+1
| | | | | | | | | | | | | | | | | This file will list all ports that nfsd has open. Default when TCP enabled will be ipv4 udp 0.0.0.0 2049 ipv4 tcp 0.0.0.0 2049 Later, the list of ports will be settable. 'portlist' chosen rather than 'ports', to avoid unnecessary confusion with non-mainline patches which created 'ports' with different semantics. [akpm@osdl.org: cleanups, build fix] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [NET]: sem2mutex part 2Ingo Molnar2006-03-201-1/+1
| | | | | | | | | | | Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds2005-04-161-0/+65
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
OpenPOWER on IntegriCloud