summaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAgeFilesLines
* tcp: fix tcp_trim_head()Eric Dumazet2011-12-051-5/+8
| | | | | | | | | | | | | | commit f07d960df3 (tcp: avoid frag allocation for small frames) breaked assumption in tcp stack that skb is either linear (skb->data_len == 0), or fully fragged (skb->data_len == skb->len) tcp_trim_head() made this assumption, we must fix it. Thanks to Vijay for providing a very detailed explanation. Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller2011-12-053-3/+3
| | | | | | | | To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
* tcp: tcp_sendmsg() page recyclingEric Dumazet2011-12-041-1/+6
| | | | | | | | | | | | | | | If our TCP_PAGE(sk) is not shared (page_count() == 1), we can set page offset to 0. This permits better filling of the pages on small to medium tcp writes. "tbench 16" results on my dev server (2x4x2 machine) : Before : 3072 MB/s After : 3146 MB/s (2.4 % gain) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: take care of misalignmentsEric Dumazet2011-12-041-1/+9
| | | | | | | | | | | | | | | | | | | | | We discovered that TCP stack could retransmit misaligned skbs if a malicious peer acknowledged sub MSS frame. This currently can happen only if output interface is non SG enabled : If SG is enabled, tcp builds headless skbs (all payload is included in fragments), so the tcp trimming process only removes parts of skb fragments, header stay aligned. Some arches cant handle misalignments, so force a head reallocation and shrink headroom to MAX_TCP_HEADER. Dont care about misaligments on x86 and PPC (or other arches setting NET_IP_ALIGN to 0) This patch introduces __pskb_copy() which can specify the headroom of new head, and pskb_copy() becomes a wrapper on top of __pskb_copy() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: drop SYN+FIN messagesEric Dumazet2011-12-041-0/+2
| | | | | | | | | | | Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his linux machines to their limits. Dont call conn_request() if the TCP flags includes SYN flag Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-12-024-18/+49
|\
| * ipv4: flush route cache after change accept_localPeter Pan(潘卫平)2011-12-011-0/+5
| | | | | | | | | | | | | | | | | | After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0, we should flush route cache, or it will continue receive packets with local source address, which should be dropped. Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Revert "udp: remove redundant variable"David S. Miller2011-12-011-7/+8
| | | | | | | | | | | | | | | | | | | | | | This reverts commit 81d54ec8479a2c695760da81f05b5a9fb2dbe40a. If we take the "try_again" goto, due to a checksum error, the 'len' has already been truncated. So we won't compute the same values as the original code did. Reported-by: paul bilke <fsmail@conspiracy.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: Perform peer validation on cached route lookup.David S. Miller2011-12-011-7/+19
| | | | | | | | | | | | | | Otherwise we won't notice the peer GENID change. Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: fix lockdep splat in rt_cache_seq_showEric Dumazet2011-11-301-2/+6
| | | | | | | | | | | | | | | | | | | | After commit f2c31e32b378 (fix NULL dereferences in check_peer_redir()), dst_get_neighbour() should be guarded by rcu_read_lock() / rcu_read_unlock() section. Reported-by: Miles Lane <miles.lane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller2011-11-291-1/+2
| |\
| | * netfilter: possible unaligned packet header in ip_route_me_harderPaul Guo2011-11-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch tries to fix the following issue in netfilter: In ip_route_me_harder(), we invoke pskb_expand_head() that rellocates new header with additional head room which can break the alignment of the original packet header. In one of my NAT test case, the NIC port for internal hosts is configured with vlan and the port for external hosts is with general configuration. If we ping an external "unknown" hosts from an internal host, an icmp packet will be sent. We find that in icmp_send()->...->ip_route_me_harder()->pskb_expand_head(), hh_len=18 and current headroom (skb_headroom(skb)) of the packet is 16. After calling pskb_expand_head() the packet header becomes to be unaligned and then our system (arch/tile) panics immediately. Signed-off-by: Paul Guo <ggang@tilera.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | inet: add a redirect generation id in inetpeerEric Dumazet2011-11-261-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now inetpeer is the place where we cache redirect information for ipv4 destinations, we must be able to invalidate informations when a route is added/removed on host. As inetpeer is not yet namespace aware, this patch adds a shared redirect_genid, and a per inetpeer redirect_genid. This might be changed later if inetpeer becomes ns aware. Cache information for one inerpeer is valid as long as its redirect_genid has the same value than global redirect_genid. Reported-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Tested-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: use a 64bit load/store in output pathEric Dumazet2011-12-011-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc compiler is smart enough to use a single load/store if we memcpy(dptr, sptr, 8) on x86_64, regardless of CONFIG_CC_OPTIMIZE_FOR_SIZE In IP header, daddr immediately follows saddr, this wont change in the future. We only need to make sure our flowi4 (saddr,daddr) fields wont break the rule. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4 : igmp : Delete useless parameter in ip_mc_add1_src()Jun Zhao2011-11-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Need not to used 'delta' flag when add single-source to interface filter source list. Signed-off-by: Jun Zhao <mypopydev@gmail.com> Signed-off-by: David S. Miller <davem@drr.davemloft.net>
* | | atm: clip: Use device neigh support on top of "arp_tbl".David Miller2011-11-302-13/+2
| | | | | | | | | | | | | | | | | | | | | Instead of instantiating an entire new neigh_table instance just for ATM handling, use the neigh device private facility. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | neigh: Do not set tbl->entry_size in ipv4/ipv6 neigh tables.David Miller2011-11-301-1/+0
| | | | | | | | | | | | | | | | | | Let the core self-size the neigh entry based upon the key length. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: inherit listener congestion control for passive cnxEric Dumazet2011-11-302-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rick Jones reported that TCP_CONGESTION sockopt performed on a listener was ignored for its children sockets : right after accept() the congestion control for new socket is the system default one. This seems an oversight of the initial design (quoted from Stephen) Based on prior investigation and patch from Rick. Reported-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Yuchung Cheng <ycheng@google.com> Tested-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: remove useless codes in ipmr_device_event()RongQing.Li2011-11-291-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7dc00c82 added a 'notify' parameter for vif_delete() to distinguish whether to unregister the device. When notify=1 means we does not need to unregister the device, so calling unregister_netdevice_many is useless. Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: avoid frag allocation for small framesEric Dumazet2011-11-291-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcp_sendmsg() uses select_size() helper to choose skb head size when a new skb must be allocated. If GSO is enabled for the socket, current strategy is to force all payload data to be outside of headroom, in PAGE fragments. This strategy is not welcome for small packets, wasting memory. Experiments show that best results are obtained when using 2048 bytes for skb head (This includes the skb overhead and various headers) This patch provides better len/truesize ratios for packets sent to loopback device, and reduce memory needs for in-flight loopback packets, particularly on arches with big pages. If a sender sends many 1-byte packets to an unresponsive application, receiver rmem_alloc will grow faster and will stop queuing these packets sooner, or will collapse its receive queue to free excess memory. netperf -t TCP_RR results are improved by ~4 %, and many workloads are improved as well (tbench, mysql...) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: do not scale TSO segment size with reordering degreeNeal Cardwell2011-11-292-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 2005 (c1b4a7e69576d65efc31a8cea0714173c2841244) tcp_tso_should_defer has been using tcp_max_burst() as a target limit for deciding how large to make outgoing TSO packets when not using sysctl_tcp_tso_win_divisor. But since 2008 (dd9e0dda66ba38a2ddd1405ac279894260dc5c36) tcp_max_burst() returns the reordering degree. We should not have tcp_tso_should_defer attempt to build larger segments just because there is more reordering. This commit splits the notion of deferral size used in TSO from the notion of burst size used in cwnd moderation, and returns the TSO deferral limit to its original value. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: dont call jump_label_dec from irq contextEric Dumazet2011-11-291-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Igor Maravic reported an error caused by jump_label_dec() being called from IRQ context : BUG: sleeping function called from invalid context at kernel/mutex.c:271 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper 1 lock held by swapper/0: #0: (&n->timer){+.-...}, at: [<ffffffff8107ce90>] call_timer_fn+0x0/0x340 Pid: 0, comm: swapper Not tainted 3.2.0-rc2-net-next-mpls+ #1 Call Trace: <IRQ> [<ffffffff8104f417>] __might_sleep+0x137/0x1f0 [<ffffffff816b9a2f>] mutex_lock_nested+0x2f/0x370 [<ffffffff810a89fd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff8109a37f>] ? local_clock+0x6f/0x80 [<ffffffff810a90a5>] ? lock_release_holdtime.part.22+0x15/0x1a0 [<ffffffff81557929>] ? sock_def_write_space+0x59/0x160 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90 [<ffffffff810969cd>] atomic_dec_and_mutex_lock+0x5d/0x80 [<ffffffff8112fc1d>] jump_label_dec+0x1d/0x50 [<ffffffff81566525>] net_disable_timestamp+0x15/0x20 [<ffffffff81557a75>] sock_disable_timestamp+0x45/0x50 [<ffffffff81557b00>] __sk_free+0x80/0x200 [<ffffffff815578d0>] ? sk_send_sigurg+0x70/0x70 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90 [<ffffffff81557cba>] sock_wfree+0x3a/0x70 [<ffffffff8155c2b0>] skb_release_head_state+0x70/0x120 [<ffffffff8155c0b6>] __kfree_skb+0x16/0x30 [<ffffffff8155c119>] kfree_skb+0x49/0x170 [<ffffffff815e936e>] arp_error_report+0x3e/0x90 [<ffffffff81575bd9>] neigh_invalidate+0x89/0xc0 [<ffffffff81578dbe>] neigh_timer_handler+0x9e/0x2a0 [<ffffffff81578d20>] ? neigh_update+0x640/0x640 [<ffffffff81073558>] __do_softirq+0xc8/0x3a0 Since jump_label_{inc|dec} must be called from process context only, we must defer jump_label_dec() if net_disable_timestamp() is called from interrupt context. Reported-by: Igor Maravic <igorm@etf.rs> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: tcp_sendmsg() wrong access to sk_route_capsEric Dumazet2011-11-281-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now sk_route_caps is u64, its dangerous to use an integer to store result of an AND operator. It wont work if NETIF_F_SG is moved on the upper part of u64. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_openNeal Cardwell2011-11-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem: Senders were overriding cwnd values picked during an undo by calling tcp_moderate_cwnd() in tcp_try_to_open(). The fix: Don't moderate cwnd in tcp_try_to_open() if we're in TCP_CA_Open, since doing so is generally unnecessary and specifically would override a DSACK-based undo of a cwnd reduction made in fast recovery. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: allow undo from reordered DSACKsNeal Cardwell2011-11-271-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, SACK-enabled connections hung around in TCP_CA_Disorder state while snd_una==high_seq, just waiting to accumulate DSACKs and hopefully undo a cwnd reduction. This could and did lead to the following unfortunate scenario: if some incoming ACKs advance snd_una beyond high_seq then we were setting undo_marker to 0 and moving to TCP_CA_Open, so if (due to reordering in the ACK return path) we shortly thereafter received a DSACK then we were no longer able to undo the cwnd reduction. The change: Simplify the congestion avoidance state machine by removing the behavior where SACK-enabled connections hung around in the TCP_CA_Disorder state just waiting for DSACKs. Instead, when snd_una advances to high_seq or beyond we typically move to TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or TCP_CA_Disorder if we later receive enough DSACKs. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: use SACKs and DSACKs that arrive on ACKs below snd_unaNeal Cardwell2011-11-271-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bug: When the ACK field is below snd_una (which can happen when ACKs are reordered), senders ignored DSACKs (preventing undo) and did not call tcp_fastretrans_alert, so they did not increment prr_delivered to reflect newly-SACKed sequence ranges, and did not call tcp_xmit_retransmit_queue, thus passing up chances to send out more retransmitted and new packets based on any newly-SACKed packets. The change: When the ACK field is below snd_una (the "old_ack" goto label), call tcp_fastretrans_alert to allow undo based on any newly-arrived DSACKs and try to send out more packets based on newly-SACKed packets. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: use DSACKs that arrive when packets_out is 0Neal Cardwell2011-11-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bug: Senders ignored DSACKs after recovery when there were no outstanding packets (a common scenario for HTTP servers). The change: when there are no outstanding packets (the "no_queue" goto label), call tcp_fastretrans_alert() in order to use DSACKs to undo congestion window reductions. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: make is_dupack a parameter to tcp_fastretrans_alert()Neal Cardwell2011-11-271-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow callers to decide whether an ACK is a duplicate ACK. This is a prerequisite to allowing fastretrans_alert to be called from new contexts, such as the no_queue and old_ack code paths, from which we have extra info that tells us whether an ACK is a dupack. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-11-266-16/+27
|\ \ \ | |/ / | | | | | | | | | Conflicts: net/ipv4/inet_diag.c
| * | ipv4: Don't use the cached pmtu informations for input routesSteffen Klassert2011-11-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pmtu informations on the inetpeer are visible for output and input routes. On packet forwarding, we might propagate a learned pmtu to the sender. As we update the pmtu informations of the inetpeer on demand, the original sender of the forwarded packets might never notice when the pmtu to that inetpeer increases. So use the mtu of the outgoing device on packet forwarding instead of the pmtu to the final destination. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Move mtu handling down to the protocol depended handlersSteffen Klassert2011-11-261-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | We move all mtu handling from dst_mtu() down to the protocol layer. So each protocol can implement the mtu handling in a different manner. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Rename the dst_opt default_mtu method to mtuSteffen Klassert2011-11-261-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | We plan to invoke the dst_opt->default_mtu() method unconditioally from dst_mtu(). So rename the method to dst_opt->mtu() to match the name with the new meaning. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | route: Use the device mtu as the default for blackhole routesSteffen Klassert2011-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As it is, we return null as the default mtu of blackhole routes. This may lead to a propagation of a bogus pmtu if the default_mtu method of a blackhole route is invoked. So return dst->dev->mtu as the default mtu instead. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: Save nexthop address of LSRR/SSRR option to IPCB.Li Wei2011-11-232-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can not update iph->daddr in ip_options_rcv_srr(), It is too early. When some exception ocurred later (eg. in ip_forward() when goto sr_failed) we need the ip header be identical to the original one as ICMP need it. Add a field 'nexthop' in struct ip_options to save nexthop of LSRR or SSRR option. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4 : igmp : fix error handle in ip_mc_add_src()Jun Zhao2011-11-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When add sources to interface failure, need to roll back the sfcount[MODE] to before state. We need to match it corresponding. Acked-by: David L Stevens <dlstevens@us.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jun Zhao <mypopydev@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: Remove NOTRACK/RAW dependency on NETFILTER_ADVANCED.David S. Miller2011-11-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Distributions are using this in their default scripts, so don't hide them behind the advanced setting. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net-netlink: fix diag to export IPv4 tos for dual-stack IPv6 socketsMaciej Żenczykowski2011-11-221-5/+9
| |/ | | | | | | | | Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: remove ipv6_addr_copy()Alexey Dobriyan2011-11-222-14/+8
| | | | | | | | | | | | | | C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-11-216-61/+65
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | The forcedeth changes had a conflict with the conversion over to atomic u64 statistics in net-next. The libertas cfg.c code had a conflict with the bss reference counting fix by John Linville in net-next. Conflicts: drivers/net/ethernet/nvidia/forcedeth.c drivers/net/wireless/libertas/cfg.c
| * ipv4: fix redirect handlingEric Dumazet2011-11-181-51/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f39925dbde77 (ipv4: Cache learned redirect information in inetpeer.) introduced a regression in ICMP redirect handling. It assumed ipv4_dst_check() would be called because all possible routes were attached to the inetpeer we modify in ip_rt_redirect(), but thats not true. commit 7cc9150ebe (route: fix ICMP redirect validation) tried to fix this but solution was not complete. (It fixed only one route) So we must lookup existing routes (including different TOS values) and call check_peer_redir() on them. Reported-by: Ivan Zahariev <famzah@icdsoft.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ping: dont increment ICMP_MIB_INERRORSEric Dumazet2011-11-181-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ping module incorrectly increments ICMP_MIB_INERRORS if feeded with a frame not belonging to its own sockets. RFC 2011 states that ICMP_MIB_INERRORS should count "the number of ICMP messages which the entiry received but determined as having ICMP-specific errors (bad ICMP checksums, bad length, etc.)." Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Flavio Leitner <fbl@redhat.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: clear xmit timers in tcp_v4_syn_recv_sock()Eric Dumazet2011-11-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simon Kirby reported divides by zero errors in __tcp_select_window() This happens when inet_csk_route_child_sock() returns a NULL pointer : We free new socket while we eventually armed keepalive timer in tcp_create_openreq_child() Fix this by a call to tcp_clear_xmit_timers() [ This is a followup to commit 918eb39962dff (net: add missing bh_unlock_sock() calls) ] Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Simon Kirby <sim@hostway.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net-netlink: Add a new attribute to expose TCLASS values via netlinkMaciej Żenczykowski2011-11-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3ceca749668a52bd795585e0f71c6f0b04814f7b added a TOS attribute. Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket, furthermore they can have different values. As such one cannot in a sane way expose both through a single attribute. Signed-off-by: Maciej Żenczyowski <maze@google.com> CC: Murali Raja <muralira@google.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ah: Don't return NET_XMIT_DROP on input.Nick Bowler2011-11-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | When the ahash driver returns -EBUSY, AH4/6 input functions return NET_XMIT_DROP, presumably copied from the output code path. But returning transmit codes on input doesn't make a lot of sense. Since NET_XMIT_DROP is a positive int, this gets interpreted as the next header type (i.e., success). As that can only end badly, remove the check. Signed-off-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: fix for ip_options_rcv_srr() daddr update.Li Wei2011-11-091-0/+1
| | | | | | | | | | | | | | | | | | When opt->srr_is_hit is set skb_rtable(skb) has been updated for 'nexthop' and iph->daddr should always equals to skb_rtable->rt_dst holds, We need update iph->daddr either. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ah: Read nexthdr value before overwriting it in ahash input callback.Nick Bowler2011-11-091-2/+2
| | | | | | | | | | | | | | | | | | The AH4/6 ahash input callbacks read out the nexthdr field from the AH header *after* they overwrite that header. This is obviously not going to end well. Fix it up. Signed-off-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ah: Correctly pass error codes in ahash output callback.Nick Bowler2011-11-091-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The AH4/6 ahash output callbacks pass nexthdr to xfrm_output_resume instead of the error code. This appears to be a copy+paste error from the input case, where nexthdr is expected. This causes the driver to continuously add AH headers to the datagram until either an allocation fails and the packet is dropped or the ahash driver hits a synchronous fallback and the resulting monstrosity is transmitted. Correct this issue by simply passing the error code unadulterated. Signed-off-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ip_gre: Set needed_headroom dynamically againHerbert Xu2011-11-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | ip_gre: Set needed_headroom dynamically again Now that all needed_headroom users have been fixed up so that we can safely increase needed_headroom, this patch restore the dynamic update of needed_headroom. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Remove all uses of LL_ALLOCATED_SPACEHerbert Xu2011-11-184-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipv4: Remove all uses of LL_ALLOCATED_SPACE The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the alignment to the sum of needed_headroom and needed_tailroom. As the amount that is then reserved for head room is needed_headroom with alignment, this means that the tail room left may be too small. This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv4 with the macro LL_RESERVED_SPACE and direct reference to needed_tailroom. This also fixes the problem with needed_headroom changing between allocating the skb and reserving the head room. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: introduce and use netdev_features_t for device features setsMichał Mirosław2011-11-163-3/+6
| | | | | | | | | | | | | | | | | | | | v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud