blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	netns: ipmr: enable namespace support in ipv4 multicast routing code	Benjamin Thery	2009-01-22	2	-113/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This last patch makes the appropriate changes to use and propagate the network namespace where needed in IPv4 multicast routing code. This consists mainly in replacing all the remaining init_net occurences with current netns pointer retrieved from sockets, net devices or mfc_caches depending on the routines' contexts. Some routines receive a new 'struct net' parameter to propagate the current netns: * vif_add/vif_delete * ipmr_new_tunnel * mroute_clean_tables * ipmr_cache_find * ipmr_cache_report * ipmr_cache_unresolved * ipmr_mfc_add/ipmr_mfc_delete * ipmr_get_route * rt_fill_info (in route.c) Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netns: ipmr: declare ipmr /proc/net entries per-namespace	Benjamin Thery	2009-01-22	1	-39/+62
\| \| \| \| \| \| \| \| \|	Declare IPv4 multicast forwarding /proc/net entries per-namespace: /proc/net/ip_mr_vif /proc/net/ip_mr_cache Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netns: ipmr: declare reg_vif_num per-namespace	Benjamin Thery	2009-01-22	1	-10/+12
\| \| \| \| \| \| \| \| \| \| \|	Preliminary work to make IPv4 multicast routing netns-aware. Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4. At the moment, this variable is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netns: ipmr: declare mroute_do_assert and mroute_do_pim per-namespace	Benjamin Thery	2009-01-22	1	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Preliminary work to make IPv4 multicast routing netns-aware. Declare IPv multicast routing variables 'mroute_do_assert' and 'mroute_do_pim' per-namespace in struct netns_ipv4. At the moment, these variables are only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netns: ipmr: declare counter cache_resolve_queue_len per-namespace	Benjamin Thery	2009-01-22	1	-18/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Preliminary work to make IPv4 multicast routing netns-aware. Declare variable cache_resolve_queue_len per-namespace: move it into struct netns_ipv4. This variable counts the number of unresolved cache entries queued in the list mfc_unres_queue. This list is kept global to all netns as the number of entries per namespace is limited to 10 (hardcoded in routine ipmr_cache_unresolved). Entries belonging to different namespaces in mfc_unres_queue will be identified by matching the mfc_net member introduced previously in struct mfc_cache. Keeping this list global to all netns, also allows us to keep a single timer (ipmr_expire_timer) to handle their expiration. In some places cache_resolve_queue_len value was tested for arming or deleting the timer. These tests were equivalent to testing mfc_unres_queue value instead and are replaced in this patch. At the moment, cache_resolve_queue_len is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netns: ipmr: dynamically allocate mfc_cache_array	Benjamin Thery	2009-01-22	1	-13/+28
\| \| \| \| \| \| \| \| \| \| \| \|	Preliminary work to make IPv4 multicast routing netns-aware. Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array, and move it to struct netns_ipv4. At the moment, mfc_cache_array is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netns: ipmr: store netns in struct mfc_cache	Benjamin Thery	2009-01-22	1	-9/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch stores into struct mfc_cache the network namespace each mfc_cache belongs to. The new member is mfc_net. mfc_net is assigned at cache allocation and doesn't change during the rest of the cache entry life. A new net parameter is added to ipmr_cache_alloc/ipmr_cache_alloc_unres. This will help to retrieve the current netns around the IPv4 multicast routing code. At the moment, all mfc_cache are allocated in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netns: ipmr: dynamically allocate vif_table	Benjamin Thery	2009-01-22	1	-41/+68
\| \| \| \| \| \| \| \| \| \| \| \|	Preliminary work to make IPv6 multicast routing netns-aware. Dynamically allocate interface table vif_table and move it to struct netns_ipv4, and update MIF_EXISTS() macro. At the moment, vif_table is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netns: ipmr: allocate mroute_socket per-namespace.	Benjamin Thery	2009-01-22	1	-15/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Preliminary work to make IPv4 multicast routing netns-aware. Make IPv4 multicast routing mroute_socket per-namespace, moves it into struct netns_ipv4. At the moment, mroute_socket is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sctp/ipv6.c: use ipv6_addr_copy	Joe Perches	2009-01-22	1	-2/+1
\| \| \| \| \| \|	Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	gre: strict physical device binding	Timo Teras	2009-01-21	1	-40/+88
\| \| \| \| \| \| \| \| \| \| \|	Check the device on receive path and allow otherwise identical devices as long as the physical device differs. This is useful for NBMA tunnels, where you want to use different gre IP for each public IP available via different physical devices. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
*	inet: Allowing more than 64k connections and heavily optimize bind(0) time.	Evgeniy Polyakov	2009-01-21	2	-8/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With simple extension to the binding mechanism, which allows to bind more than 64k sockets (or smaller amount, depending on sysctl parameters), we have to traverse the whole bind hash table to find out empty bucket. And while it is not a problem for example for 32k connections, bind() completion time grows exponentially (since after each successful binding we have to traverse one bucket more to find empty one) even if we start each time from random offset inside the hash table. So, when hash table is full, and we want to add another socket, we have to traverse the whole table no matter what, so effectivelly this will be the worst case performance and it will be constant. Attached picture shows bind() time depending on number of already bound sockets. Green area corresponds to the usual binding to zero port process, which turns on kernel port selection as described above. Red area is the bind process, when number of reuse-bound sockets is not limited by 64k (or sysctl parameters). The same exponential growth (hidden by the green area) before number of ports reaches sysctl limit. At this time bind hash table has exactly one reuse-enbaled socket in a bucket, but it is possible that they have different addresses. Actually kernel selects the first port to try randomly, so at the beginning bind will take roughly constant time, but with time number of port to check after random start will increase. And that will have exponential growth, but because of above random selection, not every next port selection will necessary take longer time than previous. So we have to consider the area below in the graph (if you could zoom it, you could find, that there are many different times placed there), so area can hide another. Blue area corresponds to the port selection optimization. This is rather simple design approach: hashtable now maintains (unprecise and racely updated) number of currently bound sockets, and when number of such sockets becomes greater than predefined value (I use maximum port range defined by sysctls), we stop traversing the whole bind hash table and just stop at first matching bucket after random start. Above limit roughly corresponds to the case, when bind hash table is full and we turned on mechanism of allowing to bind more reuse-enabled sockets, so it does not change behaviour of other sockets. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Tested-by: Denys Fedoryschenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net>
*	dccp: Debugging functions for feature negotiation	Gerrit Renker	2009-01-21	4	-59/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since all feature-negotiation processing now takes place in feat.c, functions for producing verbose debugging output are concentrated there. New functions to print out values, entry records, and options are provided, and also a macro is defined to not always have the function name in the output line. Thanks a lot to Wei Yongjun and Giuseppe Galeota for help and discussion with an earlier revision of this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
*	dccp: Initialisation and type-checking of feature sysctls	Gerrit Renker	2009-01-21	5	-23/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch takes care of initialising and type-checking sysctls related to feature negotiation. Type checking is important since some of the sysctls now directly impact the feature-negotiation process. The sysctls are initialised with the known default values for each feature. For the type-checking the value constraints from RFC 4340 are used: * Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes), tested and confirmed that it works up to 4294967295 - for Gbps speed; * Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer); * CCIDs are between 0 .. 255; * request_retries, retries1, retries2 also between 0..255 for good measure; * tx_qlen is checked to be non-negative; * sync_ratelimit remains as before. Notes: ------ 1. Die s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c. 2. As pointed out by Arnaldo, the pattern of type-checking repeats itself in other places, sometimes with exactly the same kind of definitions (e.g. "static int zero;"). It may be a good idea (kernel janitors?) to consolidate type checking. For the sake of keeping the changeset small and in order not to affect other subsystems, I have not strived to generalise here. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
*	dccp: Implement both feature-local and feature-remote Sequence Window feature	Gerrit Renker	2009-01-21	4	-24/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds full support for local/remote Sequence Window feature, from which the * sequence-number-validity (W) and * acknowledgment-number-validity (W') windows derive as specified in RFC 4340, 7.5.3. Specifically, the following is contained in this patch: * integrated new socket fields into dccp_sk; * updated the update_gsr/gss routines with regard to these fields; * updated handler code: the Sequence Window feature is located at the TX side, so the local feature is meant if the handler-rx flag is false; * the initialisation of `rcv_wnd' in reqsk is removed, since - rcv_wnd is not used by the code anywhere; - sequence number checks are not done in the LISTEN state (cf. 7.5.3); - dccp_check_req checks the Ack number validity more rigorously; * the `struct dccp_minisock' became empty and is now removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
*	dccp: Initialisation framework for feature negotiation	Gerrit Renker	2009-01-21	2	-10/+57
\| \| \| \| \| \| \| \| \| \| \| \|	This initialises feature negotiation from two tables, which are in turn are initialised from sysctls. As a novel feature, specifics of the implementation (e.g. that short seqnos and ECN are not yet available) are advertised for robustness. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
*	appletalk: remove unneeded stubs	Stephen Hemminger	2009-01-21	1	-6/+4
\| \| \| \| \| \| \| \|	With net_device_ops if set_mac_address is null, then error is -EOPNOTSUPPORTED. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rose: convert to network_device_ops	Stephen Hemminger	2009-01-21	1	-4/+8
\| \| \| \| \| \|	Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	rose: convert to internal net_device_stats	Stephen Hemminger	2009-01-21	2	-10/+3
\| \| \| \| \| \|	Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netrom: convert to net_device_ops	Stephen Hemminger	2009-01-21	1	-5/+7
\| \| \| \| \| \|	Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netrom: convert to internal net_device_stats	Stephen Hemminger	2009-01-21	2	-13/+3
\| \| \| \| \| \|	Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	lec: convert to net_device_ops	Stephen Hemminger	2009-01-21	1	-9/+11
\| \| \| \| \|	Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	lec: convert to internal network_device_stats	Stephen Hemminger	2009-01-21	2	-28/+17
\| \| \| \| \|	Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	clip: convert to internal network_device_stats	Stephen Hemminger	2009-01-21	1	-18/+12
\| \| \| \| \|	Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	br2684: convert to net_device_ops	Stephen Hemminger	2009-01-21	1	-9/+11
\| \| \| \| \|	Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	atm: br2684 internal stats	Stephen Hemminger	2009-01-21	1	-23/+15
\| \| \| \| \| \| \|	Now that stats are in net_device, use them. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netfilter: ctnetlink: fix scheduling while atomic	Patrick McHardy	2009-01-21	1	-0/+3
\| \| \| \| \| \| \| \|	Caused by call to request_module() while holding nf_conntrack_lock. Reported-and-tested-by: Kövesdi György <kgy@teledigit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	gro: Fix merging of paged packets	Herbert Xu	2009-01-20	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \|	The previous fix to paged packets broke the merging because it reset the skb->len before we added it to the merged packet. This wasn't detected because it simply resulted in the truncation of the packet while the missing bit is subsequently retransmitted. The fix is to store skb->len before we clobber it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	gro: Fix error handling on extremely short frags	Herbert Xu	2009-01-20	1	-0/+1
\| \| \| \| \| \| \| \|	When a frag is shorter than an Ethernet header, we'd return a zeroed packet instead of aborting. This patch fixes that. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	gro: Fix handling of complete checksums in IPv6	Herbert Xu	2009-01-20	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	We need to perform skb_postpull_rcsum after pulling the IPv6 header in order to maintain the correctness of the complete checksum. This patch also adds a missing iph reload after pulling. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	NET: net_namespace, fix lock imbalance	Jiri Slaby	2009-01-20	1	-1/+1
\| \| \| \| \| \| \| \|	register_pernet_gen_subsys omits mutex_unlock in one fail path. Fix it. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2009-01-20	2	-12/+117
\|\ \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
\| *	cfg80211: Fix parsed country IE info for 5 GHz	Luis R. Rodriguez	2009-01-16	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The country IE number of channels on 5 GHz specifies the number of 5 GHz channels, not the number of sequential channel numbers. For example, if in a country IEs if the first channel given is 36 and the number of channels passed is 4 then the individual channel numbers defined for the 5 GHz PHY by these parameters are: 36, 40, 44, 48 not: 36, 37, 38, 39 See: http://tinyurl.com/11d-clarification Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	cfg80211: Fix regression with 11d on bands	Luis R. Rodriguez	2009-01-16	1	-3/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a regression on disallowing bands introduced with the new 802.11d support. The issue is that IEEE-802.11 allows APs to send a subset of what a country regulatory domain defines. This was clarified in this document: http://tinyurl.com/11d-clarification As such it is possible, and this is what is done in practice, that a single band 2.4 GHz AP will only send 2.4 GHz band regulatory information through the 802.11 country information element and then the current intersection with what CRDA provided yields a regulatory domain with no 5 GHz information -- even though that country may actually allow 5 GHz operation. We correct this by only applying the intersection rules on a channel if the the intersection yields a regulatory rule on the same band the channel is on. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	cfg80211: make handle_band() and handle_channel() wiphy specific	Luis R. Rodriguez	2009-01-16	1	-6/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to make more wiphy specific judgements when handling the channels later on. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: more kernel-doc fixes	Randy Dunlap	2009-01-16	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix (delete) more mac80211 kernel-doc: Warning(linux-2.6.28-git13//include/net/mac80211.h:375): Excess struct/union/enum/typedef member 'retry_count' description in 'ieee80211_tx_info' Warning(linux-2.6.28-git13//net/mac80211/sta_info.h:308): Excess struct/union/enum/typedef member 'last_txrate' description in 'sta_info' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* \|	Revert "xfrm: For 32/64 compatability wrt. xfrm_usersa_info"	David S. Miller	2009-01-20	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit fc8c7dc1b29560c016a67a34ccff32a712b5aa86. As indicated by Jiri Klimes, this won't work. These numbers are not only used the size validation, they are also used to locate attributes sitting after the message. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: Fix data corruption when splicing from sockets.	Jarek Poplawski	2009-01-19	1	-32/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The trick in socket splicing where we try to convert the skb->data into a page based reference using virt_to_page() does not work so well. The idea is to pass the virt_to_page() reference via the pipe buffer, and refcount the buffer using a SKB reference. But if we are splicing from a socket to a socket (via sendpage) this doesn't work. The from side processing will grab the page (and SKB) references. The sendpage() calls will grab page references only, return, and then the from side processing completes and drops the SKB ref. The page based reference to skb->data is not enough to keep the kmalloc() buffer backing it from being reused. Yet, that is all that the socket send side has at this point. This leads to data corruption if the skb->data buffer is reused by SLAB before the send side socket actually gets the TX packet out to the device. The fix employed here is to simply allocate a page and copy the skb->data bytes into that page. This will hurt performance, but there is no clear way to fix this properly without a copy at the present time, and it is important to get rid of the data corruption. With fixes from Herbert Xu. Tested-by: Willy Tarreau <w@1wt.eu> Foreseen-by: Changli Gao <xiaosuo@gmail.com> Diagnosed-by: Willy Tarreau <w@1wt.eu> Reported-by: Willy Tarreau <w@1wt.eu> Fixed-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: Add debug info to track down GSO checksum bug	Herbert Xu	2009-01-19	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm trying to track down why people're hitting the checksum warning in skb_gso_segment. As the problem seems to be hitting lots of people and I can't reproduce it or locate the bug, here is a patch to print out more details which hopefully should help us to track this down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/9p: fid->fid is used uninitialized	Roel Kluin	2009-01-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2009-01-15	24	-99/+172
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (95 commits) b44: GFP_DMA skb should not escape from driver korina: do not use IRQF_SHARED with IRQF_DISABLED korina: do not stop queue here korina: fix handling tx_chain_tail korina: do tx at the right position korina: do schedule napi after testing for it korina: rework korina_rx() for use with napi korina: disable napi on close and restart korina: reset resource buffer size to 1536 korina: fix usage of driver_data bnx2x: First slow path interrupt race bnx2x: MTU Filter bnx2x: Indirection table initialization index bnx2x: Missing brackets bnx2x: Fixing the doorbell size bnx2x: Endianness issues bnx2x: VLAN tagged packets without VLAN offload bnx2x: Protecting the link change indication bnx2x: Flow control updated before reporting the link bnx2x: Missing mask when calculating flow control ...
\| *	can: fix slowpath issue in hrtimer callback function	Oliver Hartkopp	2009-01-14	1	-27/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to the loopback functionality in can_send() we can not invoke it from hardirq context which was done inside the bcm_tx_timeout_handler() hrtimer callback: [ 700.361154] [<c012228c>] warn_slowpath+0x80/0xb6 [ 700.361163] [<c013d559>] valid_state+0x125/0x136 [ 700.361171] [<c013d858>] mark_lock+0x18e/0x332 [ 700.361180] [<c013e300>] __lock_acquire+0x12e/0xb1e [ 700.361189] [<f8ab5915>] bcm_tx_timeout_handler+0x0/0xbc [can_bcm] [ 700.361198] [<c031e20a>] dev_queue_xmit+0x191/0x479 [ 700.361206] [<c01262a7>] __local_bh_disable+0x2b/0x64 [ 700.361213] [<c031e20a>] dev_queue_xmit+0x191/0x479 [ 700.361225] [<f8aa69a1>] can_send+0xd7/0x11a [can] [ 700.361235] [<f8ab522b>] bcm_can_tx+0x9d/0xd9 [can_bcm] [ 700.361245] [<f8ab597f>] bcm_tx_timeout_handler+0x6a/0xbc [can_bcm] [ 700.361255] [<f8ab5915>] bcm_tx_timeout_handler+0x0/0xbc [can_bcm] [ 700.361263] [<c0134143>] __run_hrtimer+0x5a/0x86 [ 700.361273] [<f8ab5915>] bcm_tx_timeout_handler+0x0/0xbc [can_bcm] [ 700.361282] [<c0134a50>] hrtimer_interrupt+0xb9/0x110 This patch moves the rest of the functionality from the hrtimer callback to the already existing tasklet to fix this slowpath problem. Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: Add init_dummy_netdev() and fix EMAC driver using it	Benjamin Herrenschmidt	2009-01-14	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds an init_dummy_netdev() function that gets a network device structure (allocation and lifetime entirely under caller's control) and initialize the minimum amount of fields so it can be used to schedule NAPI polls without registering a full blown interface. This is to be used by drivers that need to tie several hardware interfaces to a single NAPI poll scheduler due to HW limitations. It also updates the ibm_newemac driver to use that, this fixing the oops on 2.6.29 due to passing NULL as "dev" to netif_napi_add() Symbol is exported GPL only a I don't think we want binary drivers doing that sort of acrobatics (if we want them at all). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	gso: Ensure that the packet is long enough	Herbert Xu	2009-01-14	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we get a GSO packet from an untrusted source, we need to ensure that it is sufficiently long so that we don't end up crashing. Based on discovery and patch by Ian Campbell. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	gro: Fix page ref count for skbs freed normally	Herbert Xu	2009-01-14	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an skb with page frags is merged into an existing one, we cannibalise its reference count. This is OK when the skb is reused because we set nr_frags to zero in that case. However, for the case where the skb is freed through kfree_skb, we didn't clear nr_frags which causes the page to be freed prematurely. This is fixed by moving the skb resetting into skb_gro_receive. Reported-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	xfrm: For 32/64 compatability wrt. xfrm_usersa_info	David S. Miller	2009-01-14	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reported by Jiri Klimes. Fix suggested by Patrick McHardy. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	gro: Check for GSO packets and packets with frag_list	Herbert Xu	2009-01-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As GRO cannot be applied to packets with frag_list we need to make sure that we reject such packets if they are fed to us, e.g., through a tunnel device. Also there is no point in applying GRO on GSO packets so they too should be rejected. This allows GRO to be used in virtio-net which may produce GSO packets directly but may still benefit from GRO if the other end of it doesn't support GSO. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: Fix fib6_dump_table walker leak	Herbert Xu	2009-01-13	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a fib6 table dump is prematurely ended, we won't unlink its walker from the list. This causes all sorts of grief for other users of the list later. Reported-by: Chris Caputo <ccaputo@alt.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tcp: splice as many packets as possible at once	Willy Tarreau	2009-01-13	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As spotted by Willy Tarreau, current splice() from tcp socket to pipe is not optimal. It processes at most one segment per call. This results in low performance and very high overhead due to syscall rate when splicing from interfaces which do not support LRO. Willy provided a patch inside tcp_splice_read(), but a better fix is to let tcp_read_sock() process as many segments as possible, so that tcp_rcv_space_adjust() and tcp_cleanup_rbuf() are called less often. With this change, splice() behaves like tcp_recvmsg(), being able to consume many skbs in one system call. With typical 1460 bytes of payload per frame, that means splice(SPLICE_F_NONBLOCK) can return 16*1460 = 23360 bytes. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'master' of ↵	David S. Miller	2009-01-13	4	-6/+10
\| \|\ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6