blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	ethtool: Add support for control of RX flow hash indirection	Ben Hutchings	2010-06-30	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \|	Many NICs use an indirection table to map an RX flow hash value to one of an arbitrary number of queues (not necessarily a power of 2). It can be useful to remove some queues from this indirection table so that they are only used for flows that are specifically filtered there. It may also be useful to weight the mapping to account for user processes with the same CPU-affinity as the RX interrupts. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ethtool: Change ethtool_op_set_flags to validate flags	Ben Hutchings	2010-06-30	1	-23/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ethtool_op_set_flags() does not check for unsupported flags, and has no way of doing so. This means it is not suitable for use as a default implementation of ethtool_ops::set_flags. Add a 'supported' parameter specifying the flags that the driver and hardware support, validate the requested flags against this, and change all current callers to pass this parameter. Change some other trivial implementations of ethtool_ops::set_flags to call ethtool_op_set_flags(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/core: use ntohs for skb->protocol	Sebastian Andrzej Siewior	2010-06-30	1	-1/+2
\| \| \| \| \| \| \| \|	This is only noticed by people that are not doing everything correct in the first place. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: use this_cpu_ptr()	Eric Dumazet	2010-06-28	1	-2/+2
\| \| \| \| \| \| \|	use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id()) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/core/pktgen.c: Use pr_<level>	Joe Perches	2010-06-25	1	-78/+64
\| \| \| \| \| \| \| \| \| \| \| \|	Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt Remove "pktgen: " from formats Convert printks to pr_<level> Added func_enter() for debugging Moved version to end of string at module_init Coalesced long formats Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: optimize Berkeley Packet Filter (BPF) processing	Hagen Paul Pfeifer	2010-06-25	1	-51/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gcc is currenlty not in the ability to optimize the switch statement in sk_run_filter() because of dense case labels. This patch replace the OR'd labels with ordered sequenced case labels. The sk_chk_filter() function is modified to patch/replace the original OPCODES in a ordered but equivalent form. gcc is now in the ability to transform the switch statement in sk_run_filter into a jump table of complexity O(1). Until this patch gcc generates a sequence of conditional branches (O(n) of 567 byte .text segment size (arch x86_64): 7ff: 8b 06 mov (%rsi),%eax 801: 66 83 f8 35 cmp $0x35,%ax 805: 0f 84 d0 02 00 00 je adb <sk_run_filter+0x31d> 80b: 0f 87 07 01 00 00 ja 918 <sk_run_filter+0x15a> 811: 66 83 f8 15 cmp $0x15,%ax 815: 0f 84 c5 02 00 00 je ae0 <sk_run_filter+0x322> 81b: 77 73 ja 890 <sk_run_filter+0xd2> 81d: 66 83 f8 04 cmp $0x4,%ax 821: 0f 84 17 02 00 00 je a3e <sk_run_filter+0x280> 827: 77 29 ja 852 <sk_run_filter+0x94> 829: 66 83 f8 01 cmp $0x1,%ax [...] With the modification the compiler translate the switch statement into the following jump table fragment: 7ff: 66 83 3e 2c cmpw $0x2c,(%rsi) 803: 0f 87 1f 02 00 00 ja a28 <sk_run_filter+0x26a> 809: 0f b7 06 movzwl (%rsi),%eax 80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8) 813: 44 89 e3 mov %r12d,%ebx 816: e9 43 03 00 00 jmpq b5e <sk_run_filter+0x3a0> 81b: 41 89 dc mov %ebx,%r12d 81e: e9 3b 03 00 00 jmpq b5e <sk_run_filter+0x3a0> Furthermore, I reordered the instructions to reduce cache line misses by order the most common instruction to the start. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: fix "netpoll: Allow netpoll_setup/cleanup recursion"	Andrew Morton	2010-06-24	1	-1/+0
\| \| \| \| \| \| \| \| \|	Remove rtnl_unlock() which had no corresponding rtnl_lock(). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: consolidate netif_needs_gso() checks	John Fastabend	2010-06-23	1	-36/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	netif_needs_gso() is checked twice in the TX path once, before submitting the skb to the qdisc and once after it is dequeued from the qdisc just before calling ndo_hard_start(). This opens a window for a user to change the gso/tso or tx checksum settings that can cause netif_needs_gso to be true in one check and false in the other. Specifically, changing TX checksum setting may cause the warning in skb_gso_segment() to be triggered if the checksum is calculated earlier. This consolidates the netif_needs_gso() calls so that the stack only checks if gso is needed in dev_hard_start_xmit(). Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Export cred_to_ucred to modules.	David S. Miller	2010-06-16	1	-0/+1
\| \| \| \| \| \| \|	AF_UNIX references this, and can be built as a module, so... Signed-off-by: David S. Miller <davem@davemloft.net>
*	scm: Capture the full credentials of the scm sender.	Eric W. Biederman	2010-06-16	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Start capturing not only the userspace pid, uid and gid values of the sending process but also the struct pid and struct cred of the sending process as well. This is in preparation for properly supporting SCM_CREDENTIALS for sockets that have different uid and/or pid namespaces at the different ends. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	af_unix: Allow SO_PEERCRED to work across namespaces.	Eric W. Biederman	2010-06-16	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use struct pid and struct cred to store the peer credentials on struct sock. This gives enough information to convert the peer credential information to a value relative to whatever namespace the socket is in at the time. This removes nasty surprises when using SO_PEERCRED on socket connetions where the processes on either side are in different pid and user namespaces. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sock: Introduce cred_to_ucred	Eric W. Biederman	2010-06-16	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	To keep the coming code clear and to allow both the sock code and the scm code to share the logic introduce a fuction to translate from struct cred to struct ucred. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: use rx_handler_data pointer to store net_bridge_port pointer	Jiri Pirko	2010-06-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Register net_bridge_port pointer as rx_handler data pointer. As br_port is removed from struct net_device, another netdev priv_flag is added to indicate the device serves as a bridge port. Also rcuized pointers are now correctly dereferenced in br_fdb.c and in netfilter parts. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: add rx_handler data pointer	Jiri Pirko	2010-06-15	1	-1/+5
\| \| \| \| \| \| \|	Add possibility to register rx_handler data pointer along with a rx_handler. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netpoll: Allow netpoll_setup/cleanup recursion	Herbert Xu	2010-06-15	1	-79/+97
\| \| \| \| \| \| \| \| \| \| \|	This patch adds the functions __netpoll_setup/__netpoll_cleanup which is designed to be called recursively through ndo_netpoll_seutp. They must be called with RTNL held, and the caller must initialise np->dev and ensure that it has a valid reference count. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netpoll: Add ndo_netpoll_setup	Herbert Xu	2010-06-15	1	-0/+10
\| \| \| \| \| \| \| \|	This patch adds ndo_netpoll_setup as the initialisation primitive to complement ndo_netpoll_cleanup. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netpoll: Add locking for netpoll_setup/cleanup	Herbert Xu	2010-06-15	1	-75/+76
\| \| \| \| \| \| \| \| \| \| \| \| \|	As it stands, netpoll_setup and netpoll_cleanup have no locking protection whatsoever. So chaos ensures if two entities try to perform them on the same device. This patch adds RTNL to the equation. The code has been rearranged so that bits that do not need RTNL protection are now moved to the top of netpoll_setup. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netpoll: Fix RCU usage	Herbert Xu	2010-06-15	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The use of RCU in netpoll is incorrect in a number of places: 1) The initial setting is lacking a write barrier. 2) The synchronize_rcu is in the wrong place. 3) Read barriers are missing. 4) Some places are even missing rcu_read_lock. 5) npinfo is zeroed after freeing. This patch fixes those issues. As most users are in BH context, this also converts the RCU usage to the BH variant. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup	Herbert Xu	2010-06-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	Since we have to NULL npinfo regardless of whether there is a ndo_netpoll_cleanup, it makes sense to do this unconditionally in netpoll_cleanup rather than having every driver do it by themselves. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2010-06-14	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ixgbe/ixgbe_ethtool.c With merge conflict help from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: rxhash already set in __copy_skb_header	Eric Dumazet	2010-06-13	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	No need to copy rxhash again in __skb_clone() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: fix deliver_no_wcard regression on loopback device	John Fastabend	2010-06-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	deliver_no_wcard is not being set in skb_copy_header. In the skb_cloned case it is not being cleared and may cause the skb to be dropped when the loopback device pushes it back up the stack. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: Enable 64-bit net device statistics on 32-bit architectures	Ben Hutchings	2010-06-12	3	-17/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use struct rtnl_link_stats64 as the statistics structure. On 32-bit architectures, insert 32 bits of padding after/before each field of struct net_device_stats to make its layout compatible with struct rtnl_link_stats64. Add an anonymous union in net_device; move stats into the union and add struct rtnl_link_stats64 stats64. Add net_device_ops::ndo_get_stats64, implementations of which will return a pointer to struct rtnl_link_stats64. Drivers that implement this operation must not update the structure asynchronously. Change dev_get_stats() to call ndo_get_stats64 if available, and to return a pointer to struct rtnl_link_stats64. Change callers of dev_get_stats() accordingly. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	pktgen: increasing transmission granularity	Daniel Turull	2010-06-11	1	-4/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch increases the granularity of the rate generated by pktgen. The previous version of pktgen uses micro seconds (udelay) resolution when it was delayed causing gaps in the rates. It is changed to nanosecond (ndelay). Now any rate is possible. Also it allows to set, the desired rate in Mb/s or packets per second. The documentation has been updated. Signed-off-by: Daniel Turull <daniel.turull@gmail.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	pkt_sched: gen_kill_estimator() rcu fixes	Eric Dumazet	2010-06-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gen_kill_estimator() API is incomplete or not well documented, since caller should make sure an RCU grace period is respected before freeing stats_lock. This was partially addressed in commit 5d944c640b4 (gen_estimator: deadlock fix), but same problem exist for all gen_kill_estimator() users, if lock they use is not already RCU protected. A code review shows xt_RATEEST.c, act_api.c, act_police.c have this problem. Other are ok because they use qdisc lock, already RCU protected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'master' of ↵	David S. Miller	2010-06-11	3	-12/+30
\|\ \ \| \|/ \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
\| *	pktgen: Fix accuracy of inter-packet delay.	Daniel Turull	2010-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch correct a bug in the delay of pktgen. It makes sure the inter-packet interval is accurate. Signed-off-by: Daniel Turull <daniel.turull@gmail.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	pkt_sched: gen_estimator: add a new lock	Eric Dumazet	2010-06-10	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gen_kill_estimator() / gen_new_estimator() is not always called with RTNL held. net/netfilter/xt_RATEEST.c is one user of these API that do not hold RTNL, so random corruptions can occur between "tc" and "iptables". Add a new fine grained lock instead of trying to use RTNL in netfilter. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: deliver skbs on inactive slaves to exact matches	John Fastabend	2010-06-10	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the accelerated receive path for VLAN's will drop packets if the real device is an inactive slave and is not one of the special pkts tested for in skb_bond_should_drop(). This behavior is different then the non-accelerated path and for pkts over a bonded vlan. For example, vlanx -> bond0 -> ethx will be dropped in the vlan path and not delivered to any packet handlers at all. However, bond0 -> vlanx -> ethx and bond0 -> ethx will be delivered to handlers that match the exact dev, because the VLAN path checks the real_dev which is not a slave and netif_recv_skb() doesn't drop frames but only delivers them to exact matches. This patch adds a sk_buff flag which is used for tagging skbs that would previously been dropped and allows the skb to continue to skb_netif_recv(). Here we add logic to check for the deliver_no_wcard flag and if it is set only deliver to handlers that match exactly. This makes both paths above consistent and gives pkt handlers a way to identify skbs that come from inactive slaves. Without this patch in some configurations skbs will be delivered to handlers with exact matches and in others be dropped out right in the vlan path. I have tested the following 4 configurations in failover modes and load balancing modes. # bond0 -> ethx # vlanx -> bond0 -> ethx # bond0 -> vlanx -> ethx # bond0 -> ethx \| vlanx -> -- Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: Print num_rx_queues imbalance warning only when there are allocated queues	Tim Gardner	2010-06-09	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BugLink: http://bugs.launchpad.net/bugs/591416 There are a number of network drivers (bridge, bonding, etc) that are not yet receive multi-queue enabled and use alloc_netdev(), so don't print a num_rx_queues imbalance warning in that case. Also, only print the warning once for those drivers that _are_ multi-queue enabled. Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
* \|	anycast: Some RCU conversions	Eric Dumazet	2010-06-07	1	-9/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- dev_get_by_flags() changed to dev_get_by_flags_rcu() - ipv6_sock_ac_join() dont touch dev & idev refcounts - ipv6_sock_ac_drop() dont touch dev & idev refcounts - ipv6_sock_ac_close() dont touch dev & idev refcounts - ipv6_dev_ac_dec() dount touch idev refcount - ipv6_chk_acast_addr() dont touch idev refcount Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: Remove unnecessary net action assertion	jamal	2010-06-07	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The extra assertion to allow packet munging only when there are no other ptypes listening which may have worked around an old bug is unnecessary. It is sufficient to check if the skb is cloned before trampling on it. Thanks to Herbert Xu for being persistent and patient in getting this across. [Note that cloning checks and assertions are the general rule used by tc actions (documentation/networking/tc-actions-env-rules.txt)]. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net-caif: Added missing lock validator constants	Alex Lorca	2010-06-07	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CAIF is using "xxx-AF_MAX" strings for the lock validator. It should use its own strings. Signed-off-by: Alex Lorca <alex.lorca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'master' of ↵	David S. Miller	2010-06-06	2	-6/+32
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/sfc/net_driver.h drivers/net/sfc/siena.c
\| *	net: fix conflict between null_or_orig and null_or_bond	John Fastabend	2010-06-02	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a skb is received on an inactive bond that does not meet the special cases checked for by skb_bond_should_drop it should only be delivered to exact matches as the comment in netif_receive_skb() says. However because null_or_bond could also be null this is not always true. This patch renames null_or_bond to orig_or_bond and initializes it to orig_dev. This keeps the intent of null_or_bond to pass frames received on VLAN interfaces stacked on bonding interfaces without invalidating the statement for null_or_orig. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: sock_queue_err_skb() dont mess with sk_forward_alloc	Eric Dumazet	2010-05-31	1	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Correct sk_forward_alloc handling for error_queue would need to use a backlog of frames that softirq handler could not deliver because socket is owned by user thread. Or extend backlog processing to be able to process normal and error packets. Another possibility is to not use mem charge for error queue, this is what I implemented in this patch. Note: this reverts commit 29030374 (net: fix sk_forward_alloc corruptions), since we dont need to lock socket anymore. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/	David S. Miller	2010-05-31	5	-63/+79
\| \|\
* \| \|	skbuff: add check for non-linear to warn_if_lro and needs_linearize	Alexander Duyck	2010-06-05	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can avoid an unecessary cache miss by checking if the skb is non-linear before accessing gso_size/gso_type in skb_warn_if_lro, the same can also be done to avoid a cache miss on nr_frags if data_len is 0. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	net: replace hooks in __netif_receive_skb V5	Jiri Pirko	2010-06-02	1	-64/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	What this patch does is it removes two receive frame hooks (for bridge and for macvlan) from __netif_receive_skb. These are replaced them with a single hook for both. It only supports one hook per device because it makes no sense to do bridging and macvlan on the same device. Then a network driver (of virtual netdev like macvlan or bridge) can register an rx_handler for needed net device. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	net: add additional lock to qdisc to increase throughput	Eric Dumazet	2010-06-02	1	-4/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When many cpus compete for sending frames on a given qdisc, the qdisc spinlock suffers from very high contention. The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire the lock, and cannot dequeue packets fast enough, since it must wait for this lock for each dequeued packet. One solution to this problem is to force all cpus spinning on a second lock before trying to get the main lock, when/if they see __QDISC_STATE_RUNNING already set. The owning cpu then compete with at most one other cpu for the main lock, allowing for higher dequeueing rate. Based on a previous patch from Alexander Duyck. I added the heuristic to avoid the atomic in fast path, and put the new lock far away from the cache line used by the dequeue worker. Also try to release the busylock lock as late as possible. Tests with following script gave a boost from ~50.000 pps to ~600.000 pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver. (A single netperf flow can reach ~800.000 pps on this platform) for j in `seq 0 3`; do for i in `seq 0 7`; do netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 & done done Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	net: Define accessors to manipulate QDISC_STATE_RUNNING	Eric Dumazet	2010-06-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a second patch will move on another location. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	net: remove zap_completion_queue	Eric Dumazet	2010-05-31	2	-32/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	netpoll does an interesting work in zap_completion_queue(), but this was before we did skb orphaning before delivering packets to device. It now makes sense to add a test in dev_kfree_skb_irq() to not queue a skb if already orphaned, and to remove netpoll zap_completion_queue() as a bonus. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	net: Use __this_cpu_inc() in fast path	Eric Dumazet	2010-05-31	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch saves 224 bytes of text on my machine. __this_cpu_inc() generates a single instruction, using no scratch registers : 65 ff 04 25 a8 30 01 00 incl %gs:0x130a8 instead of : 48 c7 c2 80 30 01 00 mov $0x13080,%rdx 65 48 8b 04 25 88 ea 00 00 mov %gs:0xea88,%rax 83 44 10 28 01 addl $0x1,0x28(%rax,%rdx,1) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	Merge branch 'master' of ↵	David S. Miller	2010-05-31	1	-6/+10
\|\ \ \ \| \|/ / \| \| / \| \|/ \|/\|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
\| *	net: fix sk_forward_alloc corruptions	Eric Dumazet	2010-05-29	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As David found out, sock_queue_err_skb() should be called with socket lock hold, or we risk sk_forward_alloc corruption, since we use non atomic operations to update this field. This patch adds bh_lock_sock()/bh_unlock_sock() pair to three spots. (BH already disabled) 1) skb_tstamp_tx() 2) Before calling ip_icmp_error(), in __udp4_lib_err() 3) Before calling ipv6_icmp_error(), in __udp6_lib_err() Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	skb: make skb_recycle_check() return a bool value	Changli Gao	2010-05-29	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2010-05-28	4	-13/+53
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits) netlink: bug fix: wrong size was calculated for vfinfo list blob netlink: bug fix: don't overrun skbs on vf_port dump xt_tee: use skb_dst_drop() netdev/fec: fix ifconfig eth0 down hang issue cnic: Fix context memory init. on 5709. drivers/net: Eliminate a NULL pointer dereference drivers/net/hamradio: Eliminate a NULL pointer dereference be2net: Patch removes redundant while statement in loop. ipv6: Add GSO support on forwarding path net: fix __neigh_event_send() vhost: fix the memory leak which will happen when memory_access_ok fails vhost-net: fix to check the return value of copy_to/from_user() correctly vhost: fix to check the return value of copy_to/from_user() correctly vhost: Fix host panic if ioctl called with wrong index net: fix lock_sock_bh/unlock_sock_bh net/iucv: Add missing spin_unlock net: ll_temac: fix checksum offload logic net: ll_temac: fix interrupt bug when interrupt 0 is used sctp: dubious bitfields in sctp_transport ipmr: off by one in __ipmr_fill_mroute() ...
\| *	netlink: bug fix: wrong size was calculated for vfinfo list blob	Scott Feldman	2010-05-28	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The wrong size was being calculated for vfinfo. In one case, it was over- calculating using nlmsg_total_size on attrs, in another case, it was under-calculating by assuming ifla_vf_* structs are packed together, but each struct is it's own attr w/ hdr (and padding). Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	netlink: bug fix: don't overrun skbs on vf_port dump	Scott Feldman	2010-05-28	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Noticed by Patrick McHardy: was continuing to fill skb after a nla_put_failure, ignoring the size calculated by upper layer. Now, return -EMSGSIZE on any overruns, but also allow netdev to fail ndo_get_vf_port with error other than -EMSGSIZE, thus unwinding nest. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: fix __neigh_event_send()	Eric Dumazet	2010-05-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 7fee226ad23 (net: add a noref bit on skb dst) missed one spot where an skb is enqueued, with a possibly not refcounted dst entry. __neigh_event_send() inserts skb into arp_queue, so we must make sure dst entry is refcounted, or dst entry can be freed by garbage collector after caller exits from rcu protected section. Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>