summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* act_nat: use stack variableChangli Gao2010-06-301-21/+10
| | | | | | | | | | | | act_nat: use stack variable structure tc_nat isn't too big for stack, so we can put it in stack. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/sched/act_nat.c | 31 ++++++++++--------------------- 1 file changed, 10 insertions(+), 21 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
* act_mirred: combine duplicate codeChangli Gao2010-06-301-4/+2
| | | | | | | | | | | | | | act_mirred: combine duplicate code tcf_bstats is updated in any way, so we can do it earlier to reduce the size of the code. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> ---- net/sched/act_mirred.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
* net/core: use ntohs for skb->protocolSebastian Andrzej Siewior2010-06-301-1/+2
| | | | | | | | This is only noticed by people that are not doing everything correct in the first place. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Use interface max_desync_factor instead of static defaultBen Hutchings2010-06-301-4/+4
| | | | | | | | | max_desync_factor can be configured per-interface, but nothing is using the value. Reported-by: Piotr Lewandowski <piotr.lewandowski@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Clamp reported valid_lft to a minimum of 0Ben Hutchings2010-06-301-2/+6
| | | | | | | | | | Since addresses are only revalidated every 2 minutes, the reported valid_lft can underflow shortly before the address is deleted. Clamp it to a minimum of 0, as for prefered_lft. Reported-by: Piotr Lewandowski <piotr.lewandowski@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/Makefile: conditionally descend to wireless and ieee802154Nicolas Kaiser2010-06-291-2/+2
| | | | | | | Don't descend to wireless and ieee802154 unless they are actually used. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* caif: Kconfig and Makefile fixesSjur Braendeland2010-06-292-17/+4
| | | | | | | | Use "depends on" instead of "if" in Kconfig files. Fixed CAIF debug flag, and removed unnecessary clean-* options. Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* act_mirred: don't clone skb when skb isn't sharedChangli Gao2010-06-281-3/+3
| | | | | | | | | | | | | | | | don't clone skb when skb isn't shared When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need to clone a new skb. As the skb will be freed after this function returns, we can use it freely once we get a reference to it. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/sch_generic.h | 11 +++++++++-- net/sched/act_mirred.c | 6 +++--- 2 files changed, 12 insertions(+), 5 deletions(-) Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: tso_fragment() might avoid GFP_ATOMICEric Dumazet2010-06-281-3/+3
| | | | | | | | We can pass a gfp argument to tso_fragment() and avoid GFP_ATOMIC allocations sometimes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: 64 bit rx countersEric Dumazet2010-06-283-25/+41
| | | | | | | | | | Use u64_stats_sync infrastructure to implement 64bit rx stats. (tx stats are addressed later) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use this_cpu_ptr()Eric Dumazet2010-06-283-4/+4
| | | | | | | use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id()) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* syncookies: add support for ECNFlorian Westphal2010-06-264-12/+16
| | | | | | | | | | | | Allows use of ECN when syncookies are in effect by encoding ecn_ok into the syn-ack tcp timestamp. While at it, remove a uneeded #ifdef CONFIG_SYN_COOKIES. With CONFIG_SYN_COOKIES=nm want_cookie is ifdef'd to 0 and gcc removes the "if (0)". Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* syncookies: do not store rcv_wscale in tcp timestampFlorian Westphal2010-06-262-22/+14
| | | | | | | | | | | | | | | As pointed out by Fernando Gont there is no need to encode rcv_wscale into the cookie. We did not use the restored rcv_wscale anyway; it is recomputed via tcp_select_initial_window(). Thus we can save 4 bits in the ts option space by removing rcv_wscale. In case window scaling was not supported, we set the (invalid) wscale value 0xf. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: remove ipv6_statisticsEric Dumazet2010-06-251-2/+0
| | | | | | | | | | | | | | commit 9261e5370112 (ipv6: making ip and icmp statistics per/namespace) forgot to remove ipv6_statistics variable. commit bc417d99bf27 (ipv6: remove stale MIB definitions) took care of icmpv6_statistics & icmpv6msg_statistics Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Denis V. Lunev <den@openvz.org> CC: Alexey Dobriyan <adobriyan@gmail.com> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* snmp: add align parameter to snmp_mib_init()Eric Dumazet2010-06-256-21/+39
| | | | | | | | | | | | | | | | In preparation for 64bit snmp counters for some mibs, add an 'align' parameter to snmp_mib_init(), instead of assuming mibs only contain 'unsigned long' fields. Callers can use __alignof__(type) to provide correct alignment. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* arp: RCU change in arp_solicit()Eric Dumazet2010-06-251-5/+7
| | | | | | | Avoid two atomic ops in arp_solicit() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: make implementation of Syn-RTT symmetricGerrit Renker2010-06-252-3/+16
| | | | | | | | | | | | | | | | | | This patch is thanks to Andre Noll who reported the issue and helped testing. The Syn-RTT sampled during the initial handshake currently only works for the client sending the DCCP-Request. TFRC penalizes the absence of an RTT sample with a very slow initial speed (1 packet per second), which delays slow-start significantly, resulting in sluggish performance. This patch mirrors the "Syn RTT" principle by adding a timestamp also onto the DCCP-Response, producing an RTT sample when the (Data)Ack completing the handshake arrives. Also changed the documentation to 'TFRC' since Syn RTTs are also used by CCID-4. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: remove unused function argumentGerrit Renker2010-06-254-19/+13
| | | | | | | This removes an unused 'sk' argument from several option-inserting functions. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/core/pktgen.c: Use pr_<level>Joe Perches2010-06-251-78/+64
| | | | | | | | | | | | Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt Remove "pktgen: " from formats Convert printks to pr_<level> Added func_enter() for debugging Moved version to end of string at module_init Coalesced long formats Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: optimize Berkeley Packet Filter (BPF) processingHagen Paul Pfeifer2010-06-251-51/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gcc is currenlty not in the ability to optimize the switch statement in sk_run_filter() because of dense case labels. This patch replace the OR'd labels with ordered sequenced case labels. The sk_chk_filter() function is modified to patch/replace the original OPCODES in a ordered but equivalent form. gcc is now in the ability to transform the switch statement in sk_run_filter into a jump table of complexity O(1). Until this patch gcc generates a sequence of conditional branches (O(n) of 567 byte .text segment size (arch x86_64): 7ff: 8b 06 mov (%rsi),%eax 801: 66 83 f8 35 cmp $0x35,%ax 805: 0f 84 d0 02 00 00 je adb <sk_run_filter+0x31d> 80b: 0f 87 07 01 00 00 ja 918 <sk_run_filter+0x15a> 811: 66 83 f8 15 cmp $0x15,%ax 815: 0f 84 c5 02 00 00 je ae0 <sk_run_filter+0x322> 81b: 77 73 ja 890 <sk_run_filter+0xd2> 81d: 66 83 f8 04 cmp $0x4,%ax 821: 0f 84 17 02 00 00 je a3e <sk_run_filter+0x280> 827: 77 29 ja 852 <sk_run_filter+0x94> 829: 66 83 f8 01 cmp $0x1,%ax [...] With the modification the compiler translate the switch statement into the following jump table fragment: 7ff: 66 83 3e 2c cmpw $0x2c,(%rsi) 803: 0f 87 1f 02 00 00 ja a28 <sk_run_filter+0x26a> 809: 0f b7 06 movzwl (%rsi),%eax 80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8) 813: 44 89 e3 mov %r12d,%ebx 816: e9 43 03 00 00 jmpq b5e <sk_run_filter+0x3a0> 81b: 41 89 dc mov %ebx,%r12d 81e: e9 3b 03 00 00 jmpq b5e <sk_run_filter+0x3a0> Furthermore, I reordered the instructions to reduce cache line misses by order the most common instruction to the start. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: do not send reset to already closed socketsKonstantin Khorenko2010-06-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i've found that tcp_close() can be called for an already closed socket, but still sends reset in this case (tcp_send_active_reset()) which seems to be incorrect. Moreover, a packet with reset is sent with different source port as original port number has been already cleared on socket. Besides that incrementing stat counter for LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case. Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same seems to be true for the current mainstream kernel (checked on 2.6.35-rc3). Please, correct me if i missed something. How that happens: 1) the server receives a packet for socket in TCP_CLOSE_WAIT state that triggers a tcp_reset(): Call Trace: <IRQ> [<ffffffff8025b9b9>] tcp_reset+0x12f/0x1e8 [<ffffffff80046125>] tcp_rcv_state_process+0x1c0/0xa08 [<ffffffff8003eb22>] tcp_v4_do_rcv+0x310/0x37a [<ffffffff80028bea>] tcp_v4_rcv+0x74d/0xb43 [<ffffffff8024ef4c>] ip_local_deliver_finish+0x0/0x259 [<ffffffff80037131>] ip_local_deliver+0x200/0x2f4 [<ffffffff8003843c>] ip_rcv+0x64c/0x69f [<ffffffff80021d89>] netif_receive_skb+0x4c4/0x4fa [<ffffffff80032eca>] process_backlog+0x90/0xec [<ffffffff8000cc50>] net_rx_action+0xbb/0x1f1 [<ffffffff80012d3a>] __do_softirq+0xf5/0x1ce [<ffffffff8001147a>] handle_IRQ_event+0x56/0xb0 [<ffffffff8006334c>] call_softirq+0x1c/0x28 [<ffffffff80070476>] do_softirq+0x2c/0x85 [<ffffffff80070441>] do_IRQ+0x149/0x152 [<ffffffff80062665>] ret_from_intr+0x0/0xa <EOI> [<ffffffff80008a2e>] __handle_mm_fault+0x6cd/0x1303 [<ffffffff80008903>] __handle_mm_fault+0x5a2/0x1303 [<ffffffff80033a9d>] cache_free_debugcheck+0x21f/0x22e [<ffffffff8006a263>] do_page_fault+0x49a/0x7dc [<ffffffff80066487>] thread_return+0x89/0x174 [<ffffffff800c5aee>] audit_syscall_exit+0x341/0x35c [<ffffffff80062e39>] error_exit+0x0/0x84 tcp_rcv_state_process() ... // (sk_state == TCP_CLOSE_WAIT here) ... /* step 2: check RST bit */ if(th->rst) { tcp_reset(sk); goto discard; } ... --------------------------------- tcp_rcv_state_process tcp_reset tcp_done tcp_set_state(sk, TCP_CLOSE); inet_put_port __inet_put_port inet_sk(sk)->num = 0; sk->sk_shutdown = SHUTDOWN_MASK; 2) After that the process (socket owner) tries to write something to that socket and "inet_autobind" sets a _new_ (which differs from the original!) port number for the socket: Call Trace: [<ffffffff80255a12>] inet_bind_hash+0x33/0x5f [<ffffffff80257180>] inet_csk_get_port+0x216/0x268 [<ffffffff8026bcc9>] inet_autobind+0x22/0x8f [<ffffffff80049140>] inet_sendmsg+0x27/0x57 [<ffffffff8003a9d9>] do_sock_write+0xae/0xea [<ffffffff80226ac7>] sock_writev+0xdc/0xf6 [<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe [<ffffffff8001fb49>] __pollwait+0x0/0xdd [<ffffffff8008d533>] default_wake_function+0x0/0xe [<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e [<ffffffff800f0b49>] do_readv_writev+0x163/0x274 [<ffffffff80066538>] thread_return+0x13a/0x174 [<ffffffff800145d8>] tcp_poll+0x0/0x1c9 [<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3 [<ffffffff800f0dd0>] sys_writev+0x49/0xe4 [<ffffffff800622dd>] tracesys+0xd5/0xe0 3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace): F: tcp_sendmsg1 -EPIPE: sk=ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3 Call Trace: [<ffffffff80027557>] tcp_sendmsg+0xcb/0xe87 [<ffffffff80033300>] release_sock+0x10/0xae [<ffffffff8016f20f>] vgacon_cursor+0x0/0x1a7 [<ffffffff8026bd32>] inet_autobind+0x8b/0x8f [<ffffffff8003a9d9>] do_sock_write+0xae/0xea [<ffffffff80226ac7>] sock_writev+0xdc/0xf6 [<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe [<ffffffff8001fb49>] __pollwait+0x0/0xdd [<ffffffff8008d533>] default_wake_function+0x0/0xe [<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e [<ffffffff800f0b49>] do_readv_writev+0x163/0x274 [<ffffffff80066538>] thread_return+0x13a/0x174 [<ffffffff800145d8>] tcp_poll+0x0/0x1c9 [<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3 [<ffffffff800f0dd0>] sys_writev+0x49/0xe4 [<ffffffff800622dd>] tracesys+0xd5/0xe0 tcp_sendmsg() ... /* Wait for a connection to finish. */ if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) { int old_state = sk->sk_state; if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) { if (f_d && (err == -EPIPE)) { printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, " "sk_err=%d, sk_shutdown=%d\n", sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state, sk->sk_err, sk->sk_shutdown); dump_stack(); } goto out_err; } } ... 4) Then the process (socket owner) understands that it's time to close that socket and does that (and thus triggers sending reset packet): Call Trace: ... [<ffffffff80032077>] dev_queue_xmit+0x343/0x3d6 [<ffffffff80034698>] ip_output+0x351/0x384 [<ffffffff80251ae9>] dst_output+0x0/0xe [<ffffffff80036ec6>] ip_queue_xmit+0x567/0x5d2 [<ffffffff80095700>] vprintk+0x21/0x33 [<ffffffff800070f0>] check_poison_obj+0x2e/0x206 [<ffffffff80013587>] poison_obj+0x36/0x45 [<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d [<ffffffff80023481>] dbg_redzone1+0x1c/0x25 [<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d [<ffffffff8000ca94>] cache_alloc_debugcheck_after+0x189/0x1c8 [<ffffffff80023405>] tcp_transmit_skb+0x764/0x786 [<ffffffff8025df8a>] tcp_send_active_reset+0xf9/0x14d [<ffffffff80258ff1>] tcp_close+0x39a/0x960 [<ffffffff8026be12>] inet_release+0x69/0x80 [<ffffffff80059b31>] sock_release+0x4f/0xcf [<ffffffff80059d4c>] sock_close+0x2c/0x30 [<ffffffff800133c9>] __fput+0xac/0x197 [<ffffffff800252bc>] filp_close+0x59/0x61 [<ffffffff8001eff6>] sys_close+0x85/0xc7 [<ffffffff800622dd>] tracesys+0xd5/0xe0 So, in brief: * a received packet for socket in TCP_CLOSE_WAIT state triggers tcp_reset() which clears inet_sk(sk)->num and put socket into TCP_CLOSE state * an attempt to write to that socket forces inet_autobind() to get a new port (but the write itself fails with -EPIPE) * tcp_close() called for socket in TCP_CLOSE state sends an active reset via socket with newly allocated port This adds an additional check in tcp_close() for already closed sockets. We do not want to send anything to closed sockets. Signed-off-by: Konstantin Khorenko <khorenko@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix "netpoll: Allow netpoll_setup/cleanup recursion"Andrew Morton2010-06-241-1/+0
| | | | | | | | | Remove rtnl_unlock() which had no corresponding rtnl_lock(). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-06-235-9/+15
|\ | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/ip_output.c
| * udp: Fix bogus UFO packet generationHerbert Xu2010-06-211-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It has been reported that the new UFO software fallback path fails under certain conditions with NFS. I tracked the problem down to the generation of UFO packets that are smaller than the MTU. The software fallback path simply discards these packets. This patch fixes the problem by not generating such packets on the UFO path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bridge: fdb cleanup runs too oftenstephen hemminger2010-06-171-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | It is common in end-node, non STP bridges to set forwarding delay to zero; which causes the forwarding database cleanup to run every clock tick. Change to run only as soon as needed or at next ageing timer interval which ever is sooner. Use round_jiffies_up macro rather than attempting round up by changing value. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Clear IFF_XMIT_DST_RELEASE for teql interfacesTom Hughes2010-06-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://bugzilla.kernel.org/show_bug.cgi?id=16183 The sch_teql module, which can be used to load balance over a set of underlying interfaces, stopped working after 2.6.30 and has been broken in all kernels since then for any underlying interface which requires the addition of link level headers. The problem is that the transmit routine relies on being able to access the destination address in the skb in order to do address resolution once it has decided which underlying interface it is going to transmit through. In 2.6.31 the IFF_XMIT_DST_RELEASE flag was introduced, and set by default for all interfaces, which causes the destination address to be released before the transmit routine for the interface is called. The solution is to clear that flag for teql interfaces. Signed-off-by: Tom Hughes <tom@compton.nu> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2010-06-161-1/+1
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
| * | bridge: Fix OOM crash in deliver_cloneHerbert Xu2010-06-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The bridge multicast patches introduced an OOM crash in the forward path, when deliver_clone fails to clone the skb. Reported-by: Mark Wagner <mwagner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of ↵David S. Miller2010-06-151-0/+4
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
| | * | ipvs: Add missing locking during connection table hashing and unhashingSven Wegener2010-06-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code that hashes and unhashes connections from the connection table is missing locking of the connection being modified, which opens up a race condition and results in memory corruption when this race condition is hit. Here is what happens in pretty verbose form: CPU 0 CPU 1 ------------ ------------ An active connection is terminated and we schedule ip_vs_conn_expire() on this CPU to expire this connection. IRQ assignment is changed to this CPU, but the expire timer stays scheduled on the other CPU. New connection from same ip:port comes in right before the timer expires, we find the inactive connection in our connection table and get a reference to it. We proper lock the connection in tcp_state_transition() and read the connection flags in set_tcp_state(). ip_vs_conn_expire() gets called, we unhash the connection from our connection table and remove the hashed flag in ip_vs_conn_unhash(), without proper locking! While still holding proper locks we write the connection flags in set_tcp_state() and this sets the hashed flag again. ip_vs_conn_expire() fails to expire the connection, because the other CPU has incremented the reference count. We try to re-insert the connection into our connection table, but this fails in ip_vs_conn_hash(), because the hashed flag has been set by the other CPU. We re-schedule execution of ip_vs_conn_expire(). Now this connection has the hashed flag set, but isn't actually hashed in our connection table and has a dangling list_head. We drop the reference we held on the connection and schedule the expire timer for timeouting the connection on this CPU. Further packets won't be able to find this connection in our connection table. ip_vs_conn_expire() gets called again, we think it's already hashed, but the list_head is dangling and while removing the connection from our connection table we write to the memory location where this list_head points to. The result will probably be a kernel oops at some other point in time. This race condition is pretty subtle, but it can be triggered remotely. It needs the IRQ assignment change or another circumstance where packets coming from the same ip:port for the same service are being processed on different CPUs. And it involves hitting the exact time at which ip_vs_conn_expire() gets called. It can be avoided by making sure that all packets from one connection are always processed on the same CPU and can be made harder to exploit by changing the connection timeouts to some custom values. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Cc: stable@kernel.org Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | | net - IP_NODEFRAG option for IPv4 socketJiri Olsa2010-06-233-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this patch is implementing IP_NODEFRAG option for IPv4 socket. The reason is, there's no other way to send out the packet with user customized header of the reassembly part. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | bridge: 64bit rx/tx countersEric Dumazet2010-06-233-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use u64_stats_sync infrastructure to provide 64bit rx/tx counters even on 32bit hosts. It is safe to use a single u64_stats_sync for rx and tx, because BH is disabled on both, and we use per_cpu data. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: consolidate netif_needs_gso() checksJohn Fastabend2010-06-231-36/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netif_needs_gso() is checked twice in the TX path once, before submitting the skb to the qdisc and once after it is dequeued from the qdisc just before calling ndo_hard_start(). This opens a window for a user to change the gso/tso or tx checksum settings that can cause netif_needs_gso to be true in one check and false in the other. Specifically, changing TX checksum setting may cause the warning in skb_gso_segment() to be triggered if the checksum is calculated earlier. This consolidates the netif_needs_gso() calls so that the stack only checks if gso is needed in dev_hard_start_xmit(). Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | caif: Add debug connection type for CAIF.Sjur Braendeland2010-06-201-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added new CAIF protocol type CAIFPROTO_DEBUG for accessing CAIF debug on the ST Ericsson modems. There are two debug servers on the modem, one for radio related debug (CAIF_RADIO_DEBUG_SERVICE) and the other for communication/application related debug (CAIF_COM_DEBUG_SERVICE). The debug connection can contain trace debug printouts or interactive debug used for debugging and test. Debug connections can be of type STREAM or SEQPACKET. Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | caif: Use link layer MTU instead of fixed MTUSjur Braendeland2010-06-2011-56/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously CAIF supported maximum transfer size of ~4050. The transfer size is now calculated dynamically based on the link layers mtu size. Signed-off-by: Sjur Braendeland@stericsson.com Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | caif: Bugfix - RFM must support segmentation.Sjur Braendeland2010-06-205-65/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CAIF Remote File Manager may send or receive more than 4050 bytes. Due to this The CAIF RFM service have to support segmentation. Signed-off-by: Sjur Braendeland@stericsson.com Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | caif: Bugfix not all services uses flow-ctrl.Sjur Braendeland2010-06-208-16/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flow control is not used by all CAIF services. The usage of flow control is now part of the gerneal initialization function for CAIF Services. Signed-off-by: Sjur Braendeland@stericsson.com Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'master' of ↵David S. Miller2010-06-1730-1013/+1242
|\ \ \ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
| * \ \ \ Merge branch 'master' of ↵John W. Linville2010-06-172-2/+39
| |\ \ \ \ | | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 Conflicts: net/mac80211/mlme.c
| | * | | mac80211: fix warn, enum may be used uninitializedChristoph Fritz2010-06-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | regression introduced by b8d92c9c141ee3dc9b3537b1f0ffb4a54ea8d9b2 In function ‘ieee80211_work_rx_queued_mgmt’: warning: ‘rma’ may be used uninitialized in this function this re-adds default value WORK_ACT_NONE back to rma Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: Use a separate CCMP PN receive counter for management framesJouni Malinen2010-06-155-7/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When management frame protection (IEEE 802.11w) is used, we must use a separate counter for tracking received CCMP packet number for the management frames. The previously used NUM_RX_DATA_QUEUESth queue was shared with data frames when QoS was not used and that can cause problems in detecting replays incorrectly for robust management frames. Add a new counter just for robust management frames to avoid this issue. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: Protect Deauthentication frame when using MFPJouni Malinen2010-06-151-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When management frame protection (IEEE 802.11w) is used, Deauthentication frame needs to be protected when the pairwise key is configured. mac80211 was removing the station entry (and its keys) before actually sending out the Deauthentication frame. Fix this by reordering the code to send the frame before the station entry gets removed. This matches an earlier change that handled the Disassociation frame processing, but missed Deauthentication frames. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: Fix ps-qos network latency handlingJuuso Oikarinen2010-06-154-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ps-qos latency handling is broken. It uses predetermined latency values to select specific dynamic PS timeouts. With common AP configurations, these values overlap with beacon interval and are therefore essentially useless (for network latencies less than the beacon interval, PSM is disabled.) This patch remedies the problem by replacing the predetermined network latency values with one high value (1900ms) which is used to go trigger full psm. For backwards compatibility, the value 2000ms is still mapped to a dynamic ps timeout of 100ms. Currently also the mac80211 internal value for storing user space configured dynamic PSM values is incorrectly in the driver visible ieee80211_conf struct. Move it to the ieee80211_local struct. Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: Fix circular locking dependency in ARP filter handlingJuuso Oikarinen2010-06-146-74/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a circular locking dependency when configuring the hardware ARP filters on association, occurring when flushing the mac80211 workqueue. This is what happens: [ 92.026800] ======================================================= [ 92.030507] [ INFO: possible circular locking dependency detected ] [ 92.030507] 2.6.34-04781-g2b2c009 #85 [ 92.030507] ------------------------------------------------------- [ 92.030507] modprobe/5225 is trying to acquire lock: [ 92.030507] ((wiphy_name(local->hw.wiphy))){+.+.+.}, at: [<ffffffff8105b5c0>] flush_workq ueue+0x0/0xb0 [ 92.030507] [ 92.030507] but task is already holding lock: [ 92.030507] (rtnl_mutex){+.+.+.}, at: [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20 [ 92.030507] [ 92.030507] which lock already depends on the new lock. [ 92.030507] [ 92.030507] [ 92.030507] the existing dependency chain (in reverse order) is: [ 92.030507] [ 92.030507] -> #2 (rtnl_mutex){+.+.+.}: [ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110 [ 92.030507] [<ffffffff81341754>] mutex_lock_nested+0x44/0x300 [ 92.030507] [<ffffffff812b9ce2>] rtnl_lock+0x12/0x20 [ 92.030507] [<ffffffffa022d47c>] ieee80211_assoc_done+0x6c/0xe0 [mac80211] [ 92.030507] [<ffffffffa022f2ad>] ieee80211_work_work+0x31d/0x1280 [mac80211] [ 92.030507] -> #1 ((&local->work_work)){+.+.+.}: [ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110 [ 92.030507] [<ffffffff8105a51a>] worker_thread+0x22a/0x370 [ 92.030507] [<ffffffff8105ecc6>] kthread+0x96/0xb0 [ 92.030507] [<ffffffff81003a94>] kernel_thread_helper+0x4/0x10 [ 92.030507] [ 92.030507] -> #0 ((wiphy_name(local->hw.wiphy))){+.+.+.}: [ 92.030507] [<ffffffff81075fdc>] __lock_acquire+0x1c0c/0x1d50 [ 92.030507] [<ffffffff810761fb>] lock_acquire+0xdb/0x110 [ 92.030507] [<ffffffff8105b60e>] flush_workqueue+0x4e/0xb0 [ 92.030507] [<ffffffffa023ff7b>] ieee80211_stop_device+0x2b/0xb0 [mac80211] [ 92.030507] [<ffffffffa0231635>] ieee80211_stop+0x3e5/0x680 [mac80211] The locking in this case is quite complex. Fix the problem by rewriting the way the hardware ARP filter list is handled - i.e. make a copy of the address list to the bss_conf struct, and provide that list to the hardware driver when needed. The current patch will enable filtering also in promiscuous mode. This may need to be changed in the future. Reported-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: remove BSS from cfg80211 list when leaving IBSSTeemu Paasikivi2010-06-141-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove BSS from cfg80211 BSS list if we are only member in IBSS when leaving it. Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: Set changed basic rates flagTeemu Paasikivi2010-06-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add changed basic rates flag to bss_changed while joinig ibss network. This patch is split from the patch containing support for setting basic rates when creating ibss network. Original patch was posted by Johannes Berg on the linux-wireless posting list. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: Set basic rates while joining ibss networkTeemu Paasikivi2010-06-143-1/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to nl80211 and mac80211 to set basic rates when joining/creating ibss network. Original patch was posted by Johannes Berg on the linux-wireless posting list. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: bracket driver tracingJohannes Berg2010-06-142-122/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, driver tracing is sometimes invoked after and sometimes before the actual driver callback. This is fine as long as the driver has no tracing itself, but as soon as it does it gets confusing. To make traces containing such information easier to read, introduce a return tracer in mac80211 that essentially brackets any driver tracing, and invoke the real trace before the driver's callback, only showing the return value, if any, afterwards. Since tracing records the process, there's no problem with overlapping calls if that should happen. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: fix mgmt frame accountingJohannes Berg2010-06-141-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent change to processing action frames from the management frame queue had already broken action frame accounting, and my rework didn't help either. So add back accounting and simplify the code with a label rather than duplicating it, and also add accounting for management frames. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | mac80211: update aggregation documentationJohannes Berg2010-06-142-16/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even before the recent changes, the documentation for TX aggregation was somewhat out of date. Update it and also add documentation for the RX side. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
OpenPOWER on IntegriCloud