summaryrefslogtreecommitdiffstats
path: root/include/net/udp.h
Commit message (Collapse)AuthorAgeFilesLines
* udp: Generalize skb_udp_segmentTom Herbert2014-10-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_udp_segment is the function called from udp4_ufo_fragment to segment a UDP tunnel packet. This function currently assumes segmentation is transparent Ethernet bridging (i.e. VXLAN encapsulation). This patch generalizes the function to operate on either Ethertype or IP protocol. The inner_protocol field must be set to the protocol of the inner header. This can now be either an Ethertype or an IP protocol (in a union). A new flag in the skbuff indicates which type is effective. skb_set_inner_protocol and skb_set_inner_ipproto helper functions were added to set the inner_protocol. These functions are called from the point where the tunnel encapsulation is occuring. When skb_udp_tunnel_segment is called, the function to segment the inner packet is selected based on the inner IP or Ethertype. In the case of an IP protocol encapsulation, the function is derived from inet[6]_offloads. In the case of Ethertype, skb->protocol is set to the inner_protocol and skb_mac_gso_segment is called. (GRE currently does this, but it might be possible to lookup the protocol in offload_base and call the appropriate segmenation function directly). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: additional GRO supportTom Herbert2014-08-241-0/+18
| | | | | | | | Implement GRO for UDPv6. Add UDP checksum verification in gro_receive for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: Add function to make source port for UDP tunnelsTom Herbert2014-07-071-0/+29
| | | | | | | | | | | | This patch adds udp_flow_src_port function which is intended to be a common function that UDP tunnel implementations call to set the source port. The source port is chosen so that a hash over the outer headers (IP addresses and UDP ports) acts as suitable hash for the flow of the encapsulated packet. In this manner, UDP encapsulation works with RSS and ECMP based wrt the inner flow. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: call __skb_checksum_complete when doing full checksumTom Herbert2014-06-151-1/+3
| | | | | | | | | | In __udp_lib_checksum_complete check if checksum is being done over all the data (len is equal to skb->len) and if it is call __skb_checksum_complete instead of __skb_checksum_complete_head. This allows checksum to be saved in checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: Generic functions to set checksumTom Herbert2014-06-041-0/+9
| | | | | | | | | Added udp_set_csum and udp6_set_csum functions to set UDP checksums in packets. These are for simple UDP packets such as those that might be created in UDP tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Split sk_no_check into sk_no_check_{rx,tx}Tom Herbert2014-05-231-9/+0
| | | | | | | | | | Define separate fields in the sock structure for configuring disabling checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx. The SO_NO_CHECK socket option only affects sk_no_check_tx. Also, removed UDP_CSUM_* defines since they are no longer necessary. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: Remove unnecessary semicolon from do{}while (0) macroJoe Perches2013-11-071-7/+7
| | | | | | | | | Just an unnecessary semicolon that should be removed... Whitespace neatening of macro too. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: ipv4: Add udp early demuxShawn Bohrer2013-10-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The removal of the routing cache introduced a performance regression for some UDP workloads since a dst lookup must be done for each packet. This change caches the dst per socket in a similar manner to what we do for TCP by implementing early_demux. For UDP multicast we can only cache the dst if there is only one receiving socket on the host. Since caching only works when there is one receiving socket we do the multicast socket lookup using RCU. For UDP unicast we only demux sockets with an exact match in order to not break forwarding setups. Additionally since the hash chains may be long we only check the first socket to see if it is a match and not waste extra time searching the whole chain when we might not find an exact match. Benchmark results from a netperf UDP_RR test: Before 87961.22 transactions/s After 89789.68 transactions/s Benchmark results from a fio 1 byte UDP multicast pingpong test (Multicast one way unicast response): Before 12.97us RTT After 12.63us RTT Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp.h: Remove extern from function prototypesJoe Perches2013-09-231-47/+47
| | | | | | | | | | | | | There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pktgen: Add UDPCSUM flag to support UDP checksumsThomas Graf2013-07-271-0/+1
| | | | | | | | | | | | | UDP checksums are optional, hence pktgen has been omitting them in favour of performance. The optional flag UDPCSUM enables UDP checksumming. If the output device supports hardware checksumming the skb is prepared and marked CHECKSUM_PARTIAL, otherwise the checksum is generated in software. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <edumazet@google.com> Cc: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET ↵Hannes Frederic Sowa2013-07-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pending data We accidentally call down to ip6_push_pending_frames when uncorking pending AF_INET data on a ipv6 socket. This results in the following splat (from Dave Jones): skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:126! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37 task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000 RIP: 0010:[<ffffffff816e759c>] [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282 RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006 RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520 RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800 R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800 FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Stack: ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0 Call Trace: [<ffffffff8159a9aa>] skb_push+0x3a/0x40 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0 [<ffffffff816f5d54>] tracesys+0xdd/0xe2 Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 RIP [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP <ffff8801e6431de8> This patch adds a check if the pending data is of address family AF_INET and directly calls udp_push_ending_frames from udp_v6_push_pending_frames if that is the case. This bug was found by Dave Jones with trinity. (Also move the initialization of fl6 below the AF_INET check, even if not strictly necessary.) Cc: Dave Jones <davej@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: udp4: move GSO functions to udp_offloadDaniel Borkmann2013-06-121-3/+4
| | | | | | | | | Similarly to TCP offloading and UDPv6 offloading, move all related UDPv4 functions to udp_offload.c to make things more explicit. Also, by this, we can make those functions static. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6Benjamin LaHaise2012-04-281-0/+3
| | | | | | | | Now that the sematics of udpv6_queue_rcv_skb() match IPv4's udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cleanup unsigned to unsigned intEric Dumazet2012-04-151-1/+1
| | | | | | | Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: intoduce udp_encap_needed static_keyEric Dumazet2012-04-131-0/+1
| | | | | | | | | | | | | | | | | | | | Most machines dont use UDP encapsulation (L2TP) Adds a static_key so that udp_queue_rcv_skb() doesnt have to perform a test if L2TP never setup the encap_rcv on a socket. Idea of this patch came after Simon Horman proposal to add a hook on TCP as well. If static_key is not yet enabled, the fast path does a single JMP . When static_key is enabled, JMP destination is patched to reach the real encap_type/encap_rcv logic, possibly adding cache misses. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Simon Horman <horms@verge.net.au> Cc: dev@openvswitch.org Signed-off-by: David S. Miller <davem@davemloft.net>
* BUG: headers with BUG/BUG_ON etc. need linux/bug.hPaul Gortmaker2012-03-041-0/+1
| | | | | | | | | | | | | If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any other BUG variant in a static inline (i.e. not in a #define) then that header really should be including <linux/bug.h> and not just expecting it to be implicitly present. We can make this change risk-free, since if the files using these headers didn't have exposure to linux/bug.h already, they would have been causing compile failures/warnings. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* net: use IS_ENABLED(CONFIG_IPV6)Eric Dumazet2011-12-111-2/+2
| | | | | | | Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: Export code sk lookup routinesPavel Emelyanov2011-12-091-0/+6
| | | | | | | | The UDP diag get_exact handler will require them to find a socket by provided net, [sd]addr-s, [sd]ports and device. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: introduce and use netdev_features_t for device features setsMichał Mirosław2011-11-161-1/+2
| | | | | | | | | | v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: make the tcp and udp file_operations for the /proc stuff constArjan van de Ven2011-11-011-5/+7
| | | | | | | | | | | | | the tcp and udp code creates a set of struct file_operations at runtime while it can also be done at compile time, with the added benefit of then having these file operations be const. the trickiest part was to get the "THIS_MODULE" reference right; the naive method of declaring a struct in the place of registration would not work for this reason. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: Switch to ip_finish_skbHerbert Xu2011-03-011-0/+11
| | | | | | | | | | This patch converts UDP to use the new ip_finish_skb API. This would then allows us to more easily use ip_make_skb which allows UDP to run without a socket lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: change netdev->features to u32Michał Mirosław2011-01-241-1/+1
| | | | | | | | | | | | | Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid limits overflowEric Dumazet2010-11-101-2/+2
| | | | | | | | | | | | | | Robin Holt tried to boot a 16TB machine and found some limits were reached : sysctl_tcp_mem[2], sysctl_udp_mem[2] We can switch infrastructure to use long "instead" of "int", now atomic_long_t primitives are available for free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Robin Holt <holt@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* tproxy: added udp6_lib_lookup functionBalazs Scheidler2010-10-211-0/+3
| | | | | | | | | | Just like with IPv4, we need access to the UDP hash table to look up local sockets, but instead of exporting the global udp_table, export a lookup function. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
* udp: add rehash on connect()Eric Dumazet2010-09-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cleanupsChangli Gao2010-07-121-20/+18
| | | | | | | | | | | | remove useless blanks. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_common.h | 55 ++++------- include/net/tcp.h | 222 +++++++++++++++++----------------------------- include/net/udp.h | 38 +++---- 3 files changed, 123 insertions(+), 192 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: bind() optimisationEric Dumazet2009-11-101-1/+2
| | | | | | | | | UDP bind() can be O(N^2) in some pathological cases. Thanks to secondary hash tables, we can make it O(N) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: secondary hash on (local port, local address)Eric Dumazet2009-11-081-2/+20
| | | | | | | | | | | | | | | | | | | | | Extends udp_table to contain a secondary hash table. socket anchor for this second hash is free, because UDP doesnt use skc_bind_node : We define an union to hold both skc_bind_node & a new hlist_nulls_node udp_portaddr_node udp_lib_get_port() inserts sockets into second hash chain (additional cost of one atomic op) udp_lib_unhash() deletes socket from second hash chain (additional cost of one atomic op) Note : No spinlock lockdep annotation is needed, because lock for the secondary hash chain is always get after lock for primary hash chain. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: add a counter into udp_hslotEric Dumazet2009-11-081-0/+8
| | | | | | | | | | | Adds a counter in udp_hslot to keep an accurate count of sockets present in chain. This will permit to upcoming UDP lookup algo to chose the shortest chain when secondary hash is added. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: dynamically size hash tables at boot timeEric Dumazet2009-10-071-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups. 4000 active UDP sockets -> 32 sockets per chain in average. An incoming frame has to lookup all sockets to find best match, so long chains hurt latency. Instead of a fixed size hash table that cant be perfect for every needs, let UDP stack choose its table size at boot time like tcp/ip route, using alloc_large_system_hash() helper Add an optional boot parameter, uhash_entries=x so that an admin can force a size between 256 and 65536 if needed, like thash_entries and rhash_entries. dmesg logs two new lines : [ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes) [ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes) Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non debugging spinlocks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Make setsockopt() optlen be unsigned.David S. Miller2009-09-301-1/+1
| | | | | | | | | | | | This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
* udpv4: Handle large incoming UDP/IPv4 packets and support software UFO.Sridhar Samudrala2009-07-121-0/+3
| | | | | | | | | - validate and forward GSO UDP/IPv4 packets from untrusted sources. - do software UFO if the outgoing device doesn't support UFO. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Fix NULL pointer dereference with time-wait socketsVlad Yasevich2009-04-111-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Commit b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6 (ipv6: Fix conflict resolutions during ipv6 binding) introduced a regression where time-wait sockets were not treated correctly. This resulted in the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000062 IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70 ... Call Trace: [<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6] [<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6] [<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400 [<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6] [<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0 [<ffffffff8056ed49>] sys_bind+0x89/0x100 [<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b Tested-by: Brian Haley <brian.haley@hp.com> Tested-by: Ed Tomlinson <edt@aei.ca> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Fix conflict resolutions during ipv6 bindingVlad Yasevich2009-03-241-0/+2
| | | | | | | | | | The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal() which at times wrongly identified intersections between addresses. It particularly broke down under a few instances and caused erroneous bind conflicts. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: Use hlist_nulls in UDP RCU codeEric Dumazet2008-11-161-1/+1
| | | | | | | | | | | | | | | This is a straightforward patch, using hlist_nulls infrastructure. RCUification already done on UDP two weeks ago. Using hlist_nulls permits us to avoid some memory barriers, both at lookup time and delete time. Patch is large because it adds new macros to include/net/sock.h. These macros will be used by TCP & DCCP in next patch. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: introduce struct udp_table and multiple spinlocksEric Dumazet2008-10-291-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UDP sockets are hashed in a 128 slots hash table. This hash table is protected by *one* rwlock. This rwlock is readlocked each time an incoming UDP message is handled. This rwlock is writelocked each time a socket must be inserted in hash table (bind time), or deleted from this table (close time) This is not scalable on SMP machines : 1) Even in read mode, lock() and unlock() are atomic operations and must dirty a contended cache line, shared by all cpus. 2) A writer might be starved if many readers are 'in flight'. This can happen on a machine with some NIC receiving many UDP messages. User process can be delayed a long time at socket creation/dismantle time. This patch prepares RCU migration, by introducing 'struct udp_table and struct udp_hslot', and using one spinlock per chain, to reduce contention on central rwlock. Introducing one spinlock per chain reduces latencies, for port randomization on heavily loaded UDP servers. This also speedup bindings to specific ports. udp_lib_unhash() was uninlined, becoming to big. Some cleanups were done to ease review of following patch (RCUification of UDP Unicast lookups) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: make uplitev6 mib per/namespaceDenis V. Lunev2008-10-071-7/+4
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: make udpv6 mib per/namespaceDenis V. Lunev2008-10-071-6/+6
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: Export UDP socket lookup functionKOVACS Krisztian2008-10-011-0/+4
| | | | | | | | The iptables tproxy code has to be able to do UDP socket hash lookups, so we have to provide an exported lookup function for this purpose. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* mib: put udplite statistics on struct netPavel Emelyanov2008-07-181-3/+2
| | | | | Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* mib: put udp statistics on struct netPavel Emelyanov2008-07-181-5/+4
| | | | | | | Similar to... ouch, I repeat myself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* MIB: add struct net to UDP6_INC_STATS_BHPavel Emelyanov2008-07-051-2/+2
| | | | | | Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* MIB: add struct net to UDP6_INC_STATS_USERPavel Emelyanov2008-07-051-1/+1
| | | | | | | | As simple as the patch #1 in this set. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* MIB: add struct net to UDP_INC_STATS_BHPavel Emelyanov2008-07-051-3/+3
| | | | | | | | | | | | Two special cases here - one is rxrpc - I put init_net there explicitly, since we haven't touched this part yet. The second place is in __udp4_lib_rcv - we already have a struct net there, but I have to move its initialization above to make it ready at the "drop" label. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* MIB: add struct net to UDP_INC_STATS_USERPavel Emelyanov2008-07-051-1/+1
| | | | | | | | | Nothing special - all the places already have a struct sock at hands, so use the sock_net() net. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: reorder udp_iter_state to remove padding on 64bit buildsRichard Kennedy2008-06-131-1/+1
| | | | | | | | | | reorder udp_iter_state to remove padding on 64bit builds shrinks from 24 to 16 bytes, moving to a smaller slab when CONFIG_NET_NS is undefined & seq_net_private = {} Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6]: inet_sk(sk)->cork.opt leakDenis V. Lunev2008-06-051-0/+1
| | | | | | | | | IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data actually. In this case ip_flush_pending_frames should be called instead of ip6_flush_pending_frames. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [SOCK][NETNS]: Add a struct net argument to sock_prot_inuse_add and _get.Pavel Emelyanov2008-03-311-1/+1
| | | | | | | | | | | | | | This counter is about to become per-proto-and-per-net, so we'll need two arguments to determine which cell in this "table" to work with. All the places, but proc already pass proper net to it - proc will be tuned a bit later. Some indentation with spaces in proc files is done to keep the file coding style consistent. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Remove owner from udp_seq_afinfo.Denis V. Lunev2008-03-281-1/+0
| | | | | | | Move it to udp_seq_afinfo->seq_fops as should be. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Place file operations directly into udp_seq_afinfo.Denis V. Lunev2008-03-281-1/+1
| | | | | | | No need to have separate never-used variable. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud