summaryrefslogtreecommitdiffstats
path: root/include/net/sock.h
Commit message (Collapse)AuthorAgeFilesLines
* [INET]: speedup inet (tcp/dccp) lookupsEric Dumazet2005-10-031-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! */ } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto *skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Do not protect sysctl_optmem_max with CONFIG_SYSCTLDavid S. Miller2005-09-051-1/+2
| | | | | | | The ipv4 and ipv6 protocols need to access it unconditionally. SYSCTL=n build failure reported by Russell King. Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix sk_forward_alloc underflow in tcp_sendmsgHerbert Xu2005-09-011-3/+2
| | | | | | | | | | | | | | | | I've finally found a potential cause of the sk_forward_alloc underflows that people have been reporting sporadically. When tcp_sendmsg tacks on extra bits to an existing TCP_PAGE we don't check sk_forward_alloc even though a large amount of time may have elapsed since we allocated the page. In the mean time someone could've come along and liberated packets and reclaimed sk_forward_alloc memory. This patch makes tcp_sendmsg check sk_forward_alloc every time as we do in do_tcp_sendpages. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Add sk_stream_wmem_scheduleHerbert Xu2005-09-011-4/+8
| | | | | | | | This patch introduces sk_stream_wmem_schedule as a short-hand for the sk_forward_alloc checking on egress. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: fix PROC_FS=n compileAdrian Bunk2005-08-291-2/+0
| | | | | Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Implement SKB fast cloning.David S. Miller2005-08-291-1/+1
| | | | | | | | | | | | | | | | | Protocols that make extensive use of SKB cloning, for example TCP, eat at least 2 allocations per packet sent as a result. To cut the kmalloc() count in half, we implement a pre-allocation scheme wherein we allocate 2 sk_buff objects in advance, then use a simple reference count to free up the memory at the correct time. Based upon an initial patch by Thomas Graf and suggestions from Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix sparse warningsArnaldo Carvalho de Melo2005-08-291-0/+12
| | | | | | | | | | | Of this type, mostly: CHECK net/ipv6/netfilter.c net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static? net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static? Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Store skb->timestamp as offset to a base timestampPatrick McHardy2005-08-291-5/+8
| | | | | | | Reduces skb size by 8 bytes on 64-bit. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make NETDEBUG pure printk wrappersPatrick McHardy2005-08-291-4/+4
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICSK]: Generalise tcp_listen_{start,stop}Arnaldo Carvalho de Melo2005-08-291-0/+1
| | | | | | | | | This also moved inet_iif from tcp to inet_hashtables.h, as it is needed by the inet_lookup callers, perhaps this needs a bit of polishing, but for now seems fine. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Introduce inet_connection_sockArnaldo Carvalho de Melo2005-08-291-3/+0
| | | | | | | | | | | | | | This creates struct inet_connection_sock, moving members out of struct tcp_sock that are shareable with other INET connection oriented protocols, such as DCCP, that in my private tree already uses most of these members. The functions that operate on these members were renamed, using a inet_csk_ prefix while not being moved yet to a new file, so as to ease the review of these changes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SOCK]: Introduce sk_cloneArnaldo Carvalho de Melo2005-08-291-0/+2
| | | | | | | | Out of tcp_create_openreq_child, will be used in dccp_create_openreq_child, and is a nice sock function anyway. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Generalise the TCP sock ID lookup routinesArnaldo Carvalho de Melo2005-08-291-6/+6
| | | | | | | | | | | | | | | And also some TIME_WAIT functions. [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size /tmp/before.size: 282955 13122 9312 305389 4a8ed net/ipv4/built-in.o /tmp/after.size: 281566 13122 9312 304000 4a380 net/ipv4/built-in.o [acme@toy net-2.6.14]$ I kept them still inlined, will uninline at some point to see what would be the performance difference. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Generalise tcp_tw_bucket, aka TIME_WAIT socketsArnaldo Carvalho de Melo2005-08-291-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This paves the way to generalise the rest of the sock ID lookup routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro kernels (where IPv6 is always built as a module): [root@qemu ~]# grep tw_sock /proc/slabinfo tw_sock_TCPv6 0 0 128 31 1 tw_sock_TCP 0 0 96 41 1 [root@qemu ~]# Now if a protocol wants to use the TIME_WAIT generic infrastructure it only has to set the sk_prot->twsk_obj_size field with the size of its inet_timewait_sock derived sock and proto_register will create sk_prot->twsk_slab, for now its only for INET sockets, but we can introduce timewait_sock later if some non INET transport protocolo wants to use this stuff. Next changesets will take advantage of this new infrastructure to generalise even more TCP code. [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size /tmp/before.size: 188646 11764 5068 205478 322a6 net/ipv4/built-in.o /tmp/after.size: 188144 11764 5068 204976 320b0 net/ipv4/built-in.o [acme@toy net-2.6.14]$ Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1 (qemu host)). Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Move tcp_port_rover to inet_hashinfoArnaldo Carvalho de Melo2005-08-291-1/+1
| | | | | | | | | | Also expose all of the tcp_hashinfo members, i.e. killing those tcp_ehash, etc macros, this will more clearly expose already generic functions and some that need just a bit of work to become generic, as we'll see in the upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SOCK]: Introduce sk_setup_capsArnaldo Carvalho de Melo2005-08-291-0/+10
| | | | | | | | | From tcp_v4_setup_caps, that always is preceded by a call to __sk_dst_set, so coalesce this sequence into sk_setup_caps, removing one call to a TCP function in the IP layer. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SOCK]: Rename __tcp_v4_rehash to __sk_prot_rehashArnaldo Carvalho de Melo2005-08-291-0/+9
| | | | | | | This operation was already generic and DCCP will use it. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Cleanup INET_REFCNT_DEBUG codeArnaldo Carvalho de Melo2005-08-291-1/+31
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix socket bitop damageRalf Baechle2005-08-231-0/+5
| | | | | | | | | | | | | | | | | | | The socket flag cleanups that went into 2.6.12-rc1 are basically oring the flags of an old socket into the socket just being created. Unfortunately that one was just initialized by sock_init_data(), so already has SOCK_ZAPPED set. As the result zapped sockets are created and all incoming connection will fail due to this bug which again was carefully replicated to at least AX.25, NET/ROM or ROSE. In order to keep the abstraction alive I've introduced sock_copy_flags() to copy the socket flags from one sockets to another and used that instead of the bitwise copy thing. Anyway, the idea here has probably been to copy all flags, so sock_copy_flags() should be the right thing. With this the ham radio protocols are usable again, so I hope this will make it into 2.6.13. Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix sparse warningsVictor Fusco2005-07-081-7/+11
| | | | | | | | | | From: Victor Fusco <victor@cetuc.puc-rio.br> Fix the sparse warning "implicit cast to nocast type" Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Simplify SKB data portion allocation with NETIF_F_SG.David S. Miller2005-07-051-2/+5
| | | | | | | | | | | | | | | | | | | | The ideal and most optimal layout for an SKB when doing scatter-gather is to put all the headers at skb->data, and all the user data in the page array. This makes SKB splitting and combining extremely simple, especially before a packet goes onto the wire the first time. So, when sk_stream_alloc_pskb() is given a zero size, make sure there is no skb_tailroom(). This is achieved by applying SKB_DATA_ALIGN() to the header length used here. Next, make select_size() in TCP output segmentation use a length of zero when NETIF_F_SG is true on the outgoing interface. Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] Rename open_request to request_sockArnaldo Carvalho de Melo2005-06-181-2/+2
| | | | | | | | | | | | | | | | Ok, this one just renames some stuff to have a better namespace and to dissassociate it from TCP: struct open_request -> struct request_sock tcp_openreq_alloc -> reqsk_alloc tcp_openreq_free -> reqsk_free tcp_openreq_fastfree -> __reqsk_free With this most of the infrastructure closely resembles a struct sock methods subset. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] Generalise TCP's struct open_request minisock infrastructureArnaldo Carvalho de Melo2005-06-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kept this first changeset minimal, without changing existing names to ease peer review. Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn has two new members: ->slab, that replaces tcp_openreq_cachep ->obj_size, to inform the size of the openreq descendant for a specific protocol The protocol specific fields in struct open_request were moved to a class hierarchy, with the things that are common to all connection oriented PF_INET protocols in struct inet_request_sock, the TCP ones in tcp_request_sock, that is an inet_request_sock, that is an open_request. I.e. this uses the same approach used for the struct sock class hierarchy, with sk_prot indicating if the protocol wants to use the open_request infrastructure by filling in sk_prot->rsk_prot with an or_calltable. Results? Performance is improved and TCP v4 now uses only 64 bytes per open request minisock, down from 96 without this patch :-) Next changeset will rename some of the structs, fields and functions mentioned above, struct or_calltable is way unclear, better name it struct request_sock_ops, s/struct open_request/struct request_sock/g, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] update Ross Biro bouncing email addressJesper Juhl2005-05-051-1/+1
| | | | | | | | | Ross moved. Remove the bad email address so people will find the correct one in ./CREDITS. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [IPV6]: Fix OOPS when using IPV6_ADDRFORMArnaldo Carvalho de Melo2005-05-051-0/+2
| | | | | | | | | | This causes sk->sk_prot to change, which makes the socket release free the sock into the wrong SLAB cache. Fix this by introducing sk_prot_creator so that we always remember where the sock came from. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] DocBook: fix some descriptionsMartin Waitz2005-05-011-0/+1
| | | | | | | | | Some KernelDoc descriptions are updated to match the current code. No code changes. Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] DocBook: changes and extensions to the kernel documentationPavel Pisa2005-05-011-67/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have recompiled Linux kernel 2.6.11.5 documentation for me and our university students again. The documentation could be extended for more sources which are equipped by structured comments for recent 2.6 kernels. I have tried to proceed with that task. I have done that more times from 2.6.0 time and it gets boring to do same changes again and again. Linux kernel compiles after changes for i386 and ARM targets. I have added references to some more files into kernel-api book, I have added some section names as well. So please, check that changes do not break something and that categories are not too much skewed. I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved by kernel convention. Most of the other changes are modifications in the comments to make kernel-doc happy, accept some parameters description and do not bail out on errors. Changed <pid> to @pid in the description, moved some #ifdef before comments to correct function to comments bindings, etc. You can see result of the modified documentation build at http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz Some more sources are ready to be included into kernel-doc generated documentation. Sources has been added into kernel-api for now. Some more section names added and probably some more chaos introduced as result of quick cleanup work. Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds2005-04-161-0/+1297
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
OpenPOWER on IntegriCloud