summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2009-05-123-2/+6
|\ | | | | | | | | | | | | | | | | | | * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: nfsd: silence lockdep warning lockd: fix list corruption on lockd restart nfsd4: check for negative dentry before use in nfsv4 readdir nfsd41: slots are freed with session svcrdma: clean up error paths. svcrdma: Fix dma map direction for rdma read targets
| * svcrdma: clean up error paths.Steve Wise2009-05-032-1/+5
| | | | | | | | | | | | | | | | | | These fixes resolved crashes due to resource leak BUG_ON checks. The resource leaks were detected by introducing asynchronous transport errors. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * svcrdma: Fix dma map direction for rdma read targetsSteve Wise2009-04-251-1/+1
| | | | | | | | | | | | | | | | | | The nfs server rdma transport was mapping rdma read target pages for TO_DEVICE instead of FROM_DEVICE. This causes data corruption on non cache-coherent systems if frmrs are used. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-05-1017-93/+124
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits) bonding: fix panic if initialization fails IXP4xx: complete Ethernet netdev setup before calling register_netdev(). IXP4xx: use "ENODEV" instead of "ENOSYS" in module initialization. ipvs: Fix IPv4 FWMARK virtual services ipv4: Make INET_LRO a bool instead of tristate. net: remove stale reference to fastroute from Kconfig help text net: update skb_recycle_check() for hardware timestamping changes bnx2: Fix panic in bnx2_poll_work(). net-sched: fix bfifo default limit igb: resolve panic on shutdown when SR-IOV is enabled wimax: oops: wimax_dev_add() is the only one that can initialize the state wimax: fix oops if netlink fails to add attribute Bluetooth: Move dev_set_name() to a context that can sleep netfilter: ctnetlink: fix wrong message type in user updates netfilter: xt_cluster: fix use of cluster match with 32 nodes netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONE netfilter: add missing linux/types.h include to xt_LED.h mac80211: pid, fix memory corruption mac80211: minstrel, fix memory corruption cfg80211: fix comment on regulatory hint processing ...
| * | ipvs: Fix IPv4 FWMARK virtual servicesSimon Horman2009-05-082-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the use of fwmarks to denote IPv4 virtual services which was unfortunately broken as a result of the integration of IPv6 support into IPVS, which was included in 2.6.28. The problem arises because fwmarks are stored in the 4th octet of a union nf_inet_addr .all, however in the case of IPv4 only the first octet, corresponding to .ip, is assigned and compared. In other words, using .all = { 0, 0, 0, htonl(svc->fwmark) always results in a value of 0 (32bits) being stored for IPv4. This means that one fwmark can be used, as it ends up being mapped to 0, but things break down when multiple fwmarks are used, as they all end up being mapped to 0. As fwmarks are 32bits a reasonable fix seems to be to just store the fwmark in .ip, and comparing and storing .ip when fwmarks are used. This patch makes the assumption that in calls to ip_vs_ct_in_get() and ip_vs_sched_persist() if the proto parameter is IPPROTO_IP then we are dealing with an fwmark. I believe this is valid as ip_vs_in() does fairly strict filtering on the protocol and IPPROTO_IP should not be used in these calls unless explicitly passed when making these calls for fwmarks in ip_vs_sched_persist(). Tested-by: Fabien Duchêne <fabien.duchene@student.uclouvain.be> Cc: Joseph Mack NA3T <jmack@wm7d.net> Cc: Julius Volz <julius.volz@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: Make INET_LRO a bool instead of tristate.David S. Miller2009-05-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code is used as a library by several device drivers, which select INET_LRO. If some are modules and some are statically built into the kernel, we get build failures if INET_LRO is modular. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: remove stale reference to fastroute from Kconfig help textAshish Karkare2009-05-071-6/+0
| | | | | | | | | | | | | | | | | | Signed-off-by: Ashish Karkare <akarkare@marvell.com> Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: update skb_recycle_check() for hardware timestamping changesLennert Buytenhek2009-05-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ac45f602ee3d1b6f326f68bc0c2591ceebf05ba4 ("net: infrastructure for hardware time stamping") added two skb initialization actions to __alloc_skb(), which need to be added to skb_recycle_check() as well. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net-sched: fix bfifo default limitPatrick McHardy2009-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | When no limit is given, the bfifo uses a default of tx_queue_len * mtu. Packets handled by qdiscs include the link layer header, so this should be taken into account, similar to what other qdiscs do. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'linux-2.6.30.y' of ↵David S. Miller2009-05-062-7/+21
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax
| | * | wimax: oops: wimax_dev_add() is the only one that can initialize the stateInaky Perez-Gonzalez2009-05-061-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a new wimax_dev is created, it's state has to be __WIMAX_ST_NULL until wimax_dev_add() is succesfully called. This allows calls into the stack that happen before said time to be rejected. Until now, the state was being set (by mistake) to UNINITIALIZED, which was allowing calls such as wimax_report_rfkill_hw() to go through even when a call to wimax_dev_add() had failed; that was causing an oops when touching uninitialized data. This situation is normal when the device starts reporting state before the whole initialization has been completed. It just has to be dealt with. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
| | * | wimax: fix oops if netlink fails to add attributeInaky Perez-Gonzalez2009-05-061-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sending a message to user space using wimax_msg(), if nla_put() fails, correctly interpret the return code from wimax_msg_alloc() as an err ptr and return the error code instead of crashing (as it is assuming than non-NULL means the pointer is ok). Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
| * | | Bluetooth: Move dev_set_name() to a context that can sleepMarcel Holtmann2009-05-051-4/+3
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting the name of a sysfs device has to be done in a context that can actually sleep. It allocates its memory with GFP_KERNEL. Previously it was a static (size limited) string and that got changed to accommodate longer device names. So move the dev_set_name() just before calling device_add() which is executed in a work queue. This fixes the following error: [ 110.012125] BUG: sleeping function called from invalid context at mm/slub.c:1595 [ 110.012135] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper [ 110.012141] 2 locks held by swapper/0: [ 110.012145] #0: (hci_task_lock){++.-.+}, at: [<ffffffffa01f822f>] hci_rx_task+0x2f/0x2d0 [bluetooth] [ 110.012173] #1: (&hdev->lock){+.-.+.}, at: [<ffffffffa01fb9e2>] hci_event_packet+0x72/0x25c0 [bluetooth] [ 110.012198] Pid: 0, comm: swapper Tainted: G W 2.6.30-rc4-g953cdaa #1 [ 110.012203] Call Trace: [ 110.012207] <IRQ> [<ffffffff8023eabd>] __might_sleep+0x14d/0x170 [ 110.012228] [<ffffffff802cfbe1>] __kmalloc+0x111/0x170 [ 110.012239] [<ffffffff803c2094>] kvasprintf+0x64/0xb0 [ 110.012248] [<ffffffff803b7a5b>] kobject_set_name_vargs+0x3b/0xa0 [ 110.012257] [<ffffffff80465326>] dev_set_name+0x76/0xa0 [ 110.012273] [<ffffffffa01fb9e2>] ? hci_event_packet+0x72/0x25c0 [bluetooth] [ 110.012289] [<ffffffffa01ffc1d>] hci_conn_add_sysfs+0x3d/0x70 [bluetooth] [ 110.012303] [<ffffffffa01fba2c>] hci_event_packet+0xbc/0x25c0 [bluetooth] [ 110.012312] [<ffffffff80516eb0>] ? sock_def_readable+0x80/0xa0 [ 110.012328] [<ffffffffa01fee0c>] ? hci_send_to_sock+0xfc/0x1c0 [bluetooth] [ 110.012343] [<ffffffff80516eb0>] ? sock_def_readable+0x80/0xa0 [ 110.012347] [<ffffffff805e88c5>] ? _read_unlock+0x75/0x80 [ 110.012354] [<ffffffffa01fee0c>] ? hci_send_to_sock+0xfc/0x1c0 [bluetooth] [ 110.012360] [<ffffffffa01f8403>] hci_rx_task+0x203/0x2d0 [bluetooth] [ 110.012365] [<ffffffff80250ab5>] tasklet_action+0xb5/0x160 [ 110.012369] [<ffffffff8025116c>] __do_softirq+0x9c/0x150 [ 110.012372] [<ffffffff805e850f>] ? _spin_unlock+0x3f/0x80 [ 110.012376] [<ffffffff8020cbbc>] call_softirq+0x1c/0x30 [ 110.012380] [<ffffffff8020f01d>] do_softirq+0x8d/0xe0 [ 110.012383] [<ffffffff80250df5>] irq_exit+0xc5/0xe0 [ 110.012386] [<ffffffff8020e71d>] do_IRQ+0x9d/0x120 [ 110.012389] [<ffffffff8020c3d3>] ret_from_intr+0x0/0xf [ 110.012391] <EOI> [<ffffffff80431832>] ? acpi_idle_enter_bm+0x264/0x2a6 [ 110.012399] [<ffffffff80431828>] ? acpi_idle_enter_bm+0x25a/0x2a6 [ 110.012403] [<ffffffff804f50d5>] ? cpuidle_idle_call+0xc5/0x130 [ 110.012407] [<ffffffff8020a4b4>] ? cpu_idle+0xc4/0x130 [ 110.012411] [<ffffffff805d2268>] ? rest_init+0x88/0xb0 [ 110.012416] [<ffffffff807e2fbd>] ? start_kernel+0x3b5/0x412 [ 110.012420] [<ffffffff807e2281>] ? x86_64_start_reservations+0x91/0xb5 [ 110.012424] [<ffffffff807e2394>] ? x86_64_start_kernel+0xef/0x11b Based on a report by Davide Pesavento <davidepesa@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Tested-by: Hugo Mildenberger <hugo.mildenberger@namir.de> Tested-by: Bing Zhao <bzhao@marvell.com>
| * | Merge branch 'master' of ↵David S. Miller2009-05-053-32/+30
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
| | * | netfilter: ctnetlink: fix wrong message type in user updatesPablo Neira Ayuso2009-05-051-28/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the wrong message type that are triggered by user updates, the following commands: (term1)# conntrack -I -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state LISTEN (term1)# conntrack -U -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state SYN_SENT (term1)# conntrack -U -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state SYN_RECV only trigger event message of type NEW, when only the first is NEW while others should be UPDATE. (term2)# conntrack -E [NEW] tcp 6 10 LISTEN src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0 [NEW] tcp 6 10 SYN_SENT src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0 [NEW] tcp 6 10 SYN_RECV src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0 This patch also removes IPCT_REFRESH from the bitmask since it is not of any use. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: xt_cluster: fix use of cluster match with 32 nodesPablo Neira Ayuso2009-05-051-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a problem when you use 32 nodes in the cluster match: % iptables -I PREROUTING -t mangle -i eth0 -m cluster \ --cluster-total-nodes 32 --cluster-local-node 32 \ --cluster-hash-seed 0xdeadbeef -j MARK --set-mark 0xffff iptables: Invalid argument. Run `dmesg' for more information. % dmesg | tail -1 xt_cluster: this node mask cannot be higher than the total number of nodes The problem is related to this checking: if (info->node_mask >= (1 << info->total_nodes)) { printk(KERN_ERR "xt_cluster: this node mask cannot be " "higher than the total number of nodes\n"); return false; } (1 << 32) is 1. Thus, the checking fails. BTW, I said this before but I insist: I have only tested the cluster match with 2 nodes getting ~45% extra performance in an active-active setup. The maximum limit of 32 nodes is still completely arbitrary. I'd really appreciate if people that have more nodes in their setups let me know. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONEChristoph Paasch2009-05-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As packets ending with NEXTHDR_NONE don't have a last extension header, the check for the length needs to be after the check for NEXTHDR_NONE. Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | Merge branch 'master' of ↵David S. Miller2009-05-055-38/+57
| |\ \ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
| | * | | mac80211: pid, fix memory corruptionJiri Slaby2009-05-041-34/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pid doesn't count with some band having more bitrates than the one associated the first time. Fix that by counting the maximal available bitrate count and allocate big enough space. Secondly, fix touching uninitialized memory which causes panics. Index sucked from this random memory points to the hell. The fix is to sort the rates on each band change. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | mac80211: minstrel, fix memory corruptionJiri Slaby2009-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | minstrel doesn't count max rate count in fact, since it doesn't use a loop variable `i' and hence allocs space only for bitrates found in the first band. Fix it by involving the `i' as an index so that it traverses all the bands now and finds the real max bitrate count. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | cfg80211: fix comment on regulatory hint processingLuis R. Rodriguez2009-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | cfg80211: fix bug while trying to process beacon hints on initLuis R. Rodriguez2009-05-041-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During initialization we would not have received any beacons so skip processing reg beacon hints, also adds a check to reg_is_world_roaming() for last_request before accessing its fields. This should fix this: BUG: unable to handle kernel NULL pointer dereference at IP: [<e0171332>] wiphy_update_regulatory+0x20f/0x295 *pdpt = 0000000008bf1001 *pde = 0000000000000000 Oops: 0000 [#1] last sysfs file: /sys/class/backlight/eeepc/brightness Modules linked in: ath5k(+) mac80211 led_class cfg80211 go_bit cfbcopyarea cfbimgblt cfbfillrect ipv6 ydev usual_tables(P) snd_hda_codec_realtek snd_hda_intel nd_hwdep uhci_hcd snd_pcm_oss snd_mixer_oss i2c_i801 e serio_raw i2c_core pcspkr atl2 snd_pcm intel_agp re agpgart eeepc_laptop snd_page_alloc ac video backlight rfkill button processor evdev thermal fan ata_generic Pid: 2909, comm: modprobe Tainted: Pc #112) 701 EIP: 0060:[<e0171332>] EFLAGS: 00010246 CPU: 0 EIP is at wiphy_update_regulatory+0x20f/0x295 [cfg80211] EAX: 00000000 EBX: c5da0000 ECX: 00000000 EDX: c5da0060 ESI: 0000001a EDI: c5da0060 EBP: df3bdd70 ESP: df3bdd40 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process modprobe (pid: 2909, ti=df3bc000 task=c5d030000) Stack: df3bdd90 c5da0060 c04277e0 00000001 00000044 c04277e402 00000002 c5da0000 0000001a c5da0060 df3bdda8 e01706a2 02 00000282 000080d0 00000068 c5d53500 00000080 0000028240 Call Trace: [<e01706a2>] ? wiphy_register+0x122/0x1b7 [cfg80211] [<e0328e02>] ? ieee80211_register_hw+0xd8/0x346 [<e06a7c9f>] ? ath5k_hw_set_bssid_mask+0x71/0x78 [ath5k] [<e06b0c52>] ? ath5k_pci_probe+0xa5c/0xd0a [ath5k] [<c01a6037>] ? sysfs_find_dirent+0x16/0x27 [<c01fec95>] ? local_pci_probe+0xe/0x10 [<c01ff526>] ? pci_device_probe+0x48/0x66 [<c024c9fd>] ? driver_probe_device+0x7f/0xf2 [<c024cab3>] ? __driver_attach+0x43/0x5f [<c024c0af>] ? bus_for_each_dev+0x39/0x5a [<c024c8d0>] ? driver_attach+0x14/0x16 [<c024ca70>] ? __driver_attach+0x0/0x5f [<c024c5b3>] ? bus_add_driver+0xd7/0x1e7 [<c024ccb9>] ? driver_register+0x7b/0xd7 [<c01ff827>] ? __pci_register_driver+0x32/0x85 [<e00a8018>] ? init_ath5k_pci+0x18/0x30 [ath5k] [<c0101131>] ? _stext+0x49/0x10b [<e00a8000>] ? init_ath5k_pci+0x0/0x30 [ath5k] [<c012f452>] ? __blocking_notifier_call_chain+0x40/0x4c [<c013a714>] ? sys_init_module+0x87/0x18b [<c0102804>] ? sysenter_do_call+0x12/0x22 Code: b8 da 17 e0 83 c0 04 e8 92 f9 ff ff 84 c0 75 2a 8b 85 c0 74 0c 83 c0 04 e8 7c f9 ff ff 84 c0 75 14 a1 bc da 4 03 74 66 8b 4d d4 80 79 08 00 74 5d a1 e0 d2 17 e0 48 EIP: [<e0171332>] wiphy_update_regulatory+0x20f/0x295 SP 0068:df3bdd40 CR2: 0000000000000004 ---[ end trace 830f2dd2a95fd1a8 ]--- This issue is hard to reproduce, but it was noticed and discussed on this thread: http://marc.info/?t=123938022700005&r=1&w=2 Cc: stable@kernel.org Reported-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | cfg80211: fix race condition with wiphy_apply_custom_regulatory()Luis R. Rodriguez2009-05-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We forgot to lock using the cfg80211_mutex in wiphy_apply_custom_regulatory(). Without the lock there is possible race between processing a reply from CRDA and a driver calling wiphy_apply_custom_regulatory(). During the processing of the reply from CRDA we free last_request and wiphy_apply_custom_regulatory() eventually accesses an element from last_request in the through freq_reg_info_regd(). This is very difficult to reproduce (I haven't), it takes us 3 hours and you need to be banging hard, but the race is obvious by looking at the code. This should only affect those who use this caller, which currently is ath5k, ath9k, and ar9170. EIP: 0060:[<f8ebec50>] EFLAGS: 00210282 CPU: 1 EIP is at freq_reg_info_regd+0x24/0x121 [cfg80211] EAX: 00000000 EBX: f7ca0060 ECX: f5183d94 EDX: 0024cde0 ESI: f8f56edc EDI: 00000000 EBP: 00000000 ESP: f5183d44 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process modprobe (pid: 14617, ti=f5182000 task=f3934d10 task.ti=f5182000) Stack: c0505300 f7ca0ab4 f5183d94 0024cde0 f8f403a6 f8f63160 f7ca0060 00000000 00000000 f8ebedf8 f5183d90 f8f56edc 00000000 00000004 00000f40 f8f56edc f7ca0060 f7ca1234 00000000 00000000 00000000 f7ca14f0 f7ca0ab4 f7ca1289 Call Trace: [<f8ebedf8>] wiphy_apply_custom_regulatory+0x8f/0x122 [cfg80211] [<f8f3f798>] ath_attach+0x707/0x9e6 [ath9k] [<f8f45e46>] ath_pci_probe+0x18d/0x29a [ath9k] [<c023c7ba>] pci_device_probe+0xa3/0xe4 [<c02a860b>] really_probe+0xd7/0x1de [<c02a87e7>] __driver_attach+0x37/0x55 [<c02a7eed>] bus_for_each_dev+0x31/0x57 [<c02a83bd>] driver_attach+0x16/0x18 [<c02a78e6>] bus_add_driver+0xec/0x21b [<c02a8959>] driver_register+0x85/0xe2 [<c023c9bb>] __pci_register_driver+0x3c/0x69 [<f8e93043>] ath9k_init+0x43/0x68 [ath9k] [<c010112b>] _stext+0x3b/0x116 [<c014a872>] sys_init_module+0x8a/0x19e [<c01049ad>] sysenter_do_call+0x12/0x21 [<ffffe430>] 0xffffe430 ======================= Code: 0f 94 c0 c3 31 c0 c3 55 57 56 53 89 c3 83 ec 14 8b 74 24 2c 89 54 24 0c 89 4c 24 08 85 f6 75 06 8b 35 c8 bb ec f8 a1 cc bb ec f8 <8b> 40 04 83 f8 03 74 3a 48 74 37 8b 43 28 85 c0 74 30 89 c6 8b EIP: [<f8ebec50>] freq_reg_info_regd+0x24/0x121 [cfg80211] SS:ESP 0068:f5183d44 Cc: stable@kernel.org Reported-by: Nataraj Sadasivam <Nataraj.Sadasivam@Atheros.com> Reported-by: Vivek Natarajan <Vivek.Natarajan@Atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | cfg80211: fix truncated IEsJohannes Berg2009-05-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Another bug in the "cfg80211: do not replace BSS structs" patch, a forgotten length update leads to bogus data being stored and passed to userspace, often truncated. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | mac80211: correct fragmentation threshold checkJohannes Berg2009-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fragmentation threshold is defined to be including the FCS, and the code that sets the TX_FRAGMENTED flag correctly accounts for those four bytes. The code that verifies this doesn't though, which could lead to spurious warnings and frames being dropped although everything is ok. Correct the code by accounting for the FCS. (JWL -- The problem is described here: http://article.gmane.org/gmane.linux.kernel.wireless.general/32205 ) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-05-059-72/+74
|\ \ \ \ \ | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits) e1000: fix virtualization bug bonding: fix alb mode locking regression Bluetooth: Fix issue with sysfs handling for connections usbnet: CDC EEM support (v5) tcp: Fix tcp_prequeue() to get correct rto_min value ehea: fix invalid pointer access ne2k-pci: Do not register device until initialized. Subject: [PATCH] br2684: restore net_dev initialization net: Only store high 16 bits of kernel generated filter priorities virtio_net: Fix function name typo virtio_net: Cleanup command queue scatterlist usage bonding: correct the cleanup in bond_create() virtio: add missing include to virtio_net.h smsc95xx: add support for LAN9512 and LAN9514 smsc95xx: configure LED outputs netconsole: take care of NETDEV_UNREGISTER event xt_socket: checks for the state of nf_conntrack bonding: bond_slave_info_query() fix cxgb3: fixing gcc 4.4 compiler warning: suggest parentheses around operand of ‘!’ netfilter: use likely() in xt_info_rdlock_bh() ...
| * | | | Bluetooth: Fix issue with sysfs handling for connectionsMarcel Holtmann2009-05-042-34/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to a semantic changes in flush_workqueue() the current approach of synchronizing the sysfs handling for connections doesn't work anymore. The whole approach is actually fully broken and based on assumptions that are no longer valid. With the introduction of Simple Pairing support, the creation of low-level ACL links got changed. This change invalidates the reason why in the past two independent work queues have been used for adding/removing sysfs devices. The adding of the actual sysfs device is now postponed until the host controller successfully assigns an unique handle to that link. So the real synchronization happens inside the controller and not the host. The only left-over problem is that some internals of the sysfs device handling are not initialized ahead of time. This leaves potential access to invalid data and can cause various NULL pointer dereferences. To fix this a new function makes sure that all sysfs details are initialized when an connection attempt is made. The actual sysfs device is only registered when the connection has been successfully established. To avoid a race condition with the registration, the check if a device is registered has been moved into the removal work. As an extra protection two flush_work() calls are left in place to make sure a previous add/del work has been completed first. Based on a report by Marc Pignat <marc.pignat@hevs.ch> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Tested-by: Justin P. Mattock <justinmattock@gmail.com> Tested-by: Roger Quadros <ext-roger.quadros@nokia.com> Tested-by: Marc Pignat <marc.pignat@hevs.ch>
| * | | | tcp: Fix tcp_prequeue() to get correct rto_min valueSatoru SATOH2009-05-041-10/+0
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of the actual value might be tuned. The following patches fix this and make tcp_prequeue get the actual value returns from tcp_rto_min(). Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Subject: [PATCH] br2684: restore net_dev initializationRabin Vincent2009-05-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0ba25ff4c669e5395110ba6ab4958a97a9f96922 ("br2684: convert to net_device_ops") inadvertently deleted the initialization of the net_dev pointer in the br2684_dev structure, leading to crashes. This patch adds it back. Reported-by: Mikko Vinni <mmvinni@yahoo.com> Tested-by: Mikko Vinni <mmvinni@yahoo.com> Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Only store high 16 bits of kernel generated filter prioritiesRobert Love2009-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel should only be using the high 16 bits of a kernel generated priority. Filter priorities in all other cases only use the upper 16 bits of the u32 'prio' field of 'struct tcf_proto', but when the kernel generates the priority of a filter is saves all 32 bits which can result in incorrect lookup failures when a filter needs to be deleted or modified. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | xt_socket: checks for the state of nf_conntrackLaszlo Attila Toth2009-05-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | xt_socket can use connection tracking, and checks whether it is a module. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Fix skb_tx_hash() for forwarding workloads.Eric Dumazet2009-05-011-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When skb_rx_queue_recorded() is true, we dont want to use jash distribution as the device driver exactly told us which queue was selected at RX time. jhash makes a statistical shuffle, but this wont work with 8 static inputs. Later improvements would be to compute reciprocal value of real_num_tx_queues to avoid a divide here. But this computation should be done once, when real_num_tx_queues is set. This needs a separate patch, and a new field in struct net_device. Reported-by: Andrew Dickinson <andrew@whydna.net> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Fix oops when splicing skbs from a frag_list.Jarek Poplawski2009-04-301-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lennert Buytenhek wrote: > Since 4fb669948116d928ae44262ab7743732c574630d ("net: Optimize memory > usage when splicing from sockets.") I'm seeing this oops (e.g. in > 2.6.30-rc3) when splicing from a TCP socket to /dev/null on a driver > (mv643xx_eth) that uses LRO in the skb mode (lro_receive_skb) rather > than the frag mode: My patch incorrectly assumed skb->sk was always valid, but for "frag_listed" skbs we can only use skb->sk of their parent. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Debugged-by: Lennert Buytenhek <buytenh@wantstofly.org> Tested-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'master' of ↵David S. Miller2009-04-291-10/+10
| |\ \ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
| | * | | mac80211: default to automatic power controlJohannes Berg2009-04-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In "mac80211: correct wext transmit power handler" I fixed the wext handler, but forgot to make the default of the user_power_level -1 (aka "auto"), so that now the transmit power is always set to 0, causing associations to time out and similar problems since we're transmitting with very little power. Correct this by correcting the default user_power_level to -1. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Bisected-by: Niel Lambrechts <niel.lambrechts@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | mac80211: fix modprobe deadlock by not calling wep_init under rtnl_lockAlan Jenkins2009-04-291-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - ieee80211_wep_init(), which is called with rtnl_lock held, blocks in request_module() [waiting for modprobe to load a crypto module]. - modprobe blocks in a call to flush_workqueue(), when it closes a TTY [presumably when it exits]. - The workqueue item linkwatch_event() blocks on rtnl_lock. There's no reason for wep_init() to be called with rtnl_lock held, so just move it outside the critical section. Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | | | | net/9p: handle correctly interrupted 9P requestsLatchesar Ionkov2009-04-054-60/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the 9p code crashes when a operation is interrupted, i.e. for example when the user presses ^C while reading from a file. This patch fixes the code that is responsible for interruption and flushing of 9P operations. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
* | | | | net/9p: return error when p9_client_stat failsLatchesar Ionkov2009-04-051-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | p9_client_stat function doesn't return correct value if it fails. p9_client_stat should return ERR_PTR of the error value when it fails. Instead, it always returns a value to the allocated p9_wstat struct even when it is not populated correctly. This patch makes p9_client_stat to handle failure correctly. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Reviewed-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | | | net/9p: set correct stat size when sending Twstat messagesLatchesar Ionkov2009-04-051-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 9P2000 Twstat message requires the size of the stat structure to be specified. Currently the 9p code writes zero instead of the actual size. This behavior confuses some of the file servers that check if the size is correct. This patch adds a new function that calculcates the stat size and puts the value in the appropriate place in the 9P message. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Reviewed-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | | | SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on reconnectTrond Myklebust2009-05-022-9/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | See http://bugzilla.kernel.org/show_bug.cgi?id=13034 If the port gets into a TIME_WAIT state, then we cannot reconnect without binding to a new port. Tested-by: Petr Vandrovec <petr@vandrovec.name> Tested-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-04-2917-333/+246
|\ \ \ \ \ | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits) e100: do not go D3 in shutdown unless system is powering off netfilter: revised locking for x_tables Bluetooth: Fix connection establishment with low security requirement Bluetooth: Add different pairing timeout for Legacy Pairing Bluetooth: Ensure that HCI sysfs add/del is preempt safe net: Avoid extra wakeups of threads blocked in wait_for_packet() net: Fix typo in net_device_ops description. ipv4: Limit size of route cache hash table Add reference to CAPI 2.0 standard Documentation/isdn/INTERFACE.CAPI update Documentation/isdn/00-INDEX ixgbe: Fix WoL functionality for 82599 KX4 devices veth: prevent oops caused by netdev destructor xfrm: wrong hash value for temporary SA forcedeth: tx timeout fix net: Fix LL_MAX_HEADER for CONFIG_TR_MODULE mlx4_en: Handle page allocation failure during receive mlx4_en: Fix cleanup flow on cq activation vlan: update vlan carrier state for admin up/down netfilter: xt_recent: fix stack overread in compat code ...
| * | | | Merge branch 'master' of ↵David S. Miller2009-04-283-28/+55
| |\ \ \ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
| | * | | | Bluetooth: Fix connection establishment with low security requirementMarcel Holtmann2009-04-281-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Bluetooth 2.1 specification introduced four different security modes that can be mapped using Legacy Pairing and Simple Pairing. With the usage of Simple Pairing it is required that all connections (except the ones for SDP) are encrypted. So even the low security requirement mandates an encrypted connection when using Simple Pairing. When using Legacy Pairing (for Bluetooth 2.0 devices and older) this is not required since it causes interoperability issues. To support this properly the low security requirement translates into different host controller transactions depending if Simple Pairing is supported or not. However in case of Simple Pairing the command to switch on encryption after a successful authentication is not triggered for the low security mode. This patch fixes this and actually makes the logic to differentiate between Simple Pairing and Legacy Pairing a lot simpler. Based on a report by Ville Tervo <ville.tervo@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | | | Bluetooth: Add different pairing timeout for Legacy PairingMarcel Holtmann2009-04-282-1/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Bluetooth stack uses a reference counting for all established ACL links and if no user (L2CAP connection) is present, the link will be terminated to save power. The problem part is the dedicated pairing when using Legacy Pairing (Bluetooth 2.0 and before). At that point no user is present and pairing attempts will be disconnected within 10 seconds or less. In previous kernel version this was not a problem since the disconnect timeout wasn't triggered on incoming connections for the first time. However this caused issues with broken host stacks that kept the connections around after dedicated pairing. When the support for Simple Pairing got added, the link establishment procedure needed to be changed and now causes issues when using Legacy Pairing When using Simple Pairing it is possible to do a proper reference counting of ACL link users. With Legacy Pairing this is not possible since the specification is unclear in some areas and too many broken Bluetooth devices have already been deployed. So instead of trying to deal with all the broken devices, a special pairing timeout will be introduced that increases the timeout to 60 seconds when pairing is triggered. If a broken devices now puts the stack into an unforeseen state, the worst that happens is the disconnect timeout triggers after 120 seconds instead of 4 seconds. This allows successful pairings with legacy and broken devices now. Based on a report by Johan Hedberg <johan.hedberg@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | | | Bluetooth: Ensure that HCI sysfs add/del is preempt safeRoger Quadros2009-04-281-21/+16
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a different work_struct variables for add_conn() and del_conn() and use single work queue instead of two for adding and deleting connections. It eliminates the following error on a preemptible kernel: [ 204.358032] Unable to handle kernel NULL pointer dereference at virtual address 0000000c [ 204.370697] pgd = c0004000 [ 204.373443] [0000000c] *pgd=00000000 [ 204.378601] Internal error: Oops: 17 [#1] PREEMPT [ 204.383361] Modules linked in: vfat fat rfcomm sco l2cap sd_mod scsi_mod iphb pvr2d drm omaplfb ps [ 204.438537] CPU: 0 Not tainted (2.6.28-maemo2 #1) [ 204.443664] PC is at klist_put+0x2c/0xb4 [ 204.447601] LR is at klist_put+0x18/0xb4 [ 204.451568] pc : [<c0270f08>] lr : [<c0270ef4>] psr: a0000113 [ 204.451568] sp : cf1b3f10 ip : cf1b3f10 fp : cf1b3f2c [ 204.463104] r10: 00000000 r9 : 00000000 r8 : bf08029c [ 204.468353] r7 : c7869200 r6 : cfbe2690 r5 : c78692c8 r4 : 00000001 [ 204.474945] r3 : 00000001 r2 : cf1b2000 r1 : 00000001 r0 : 00000000 [ 204.481506] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 204.488861] Control: 10c5387d Table: 887fc018 DAC: 00000017 [ 204.494628] Process btdelconn (pid: 515, stack limit = 0xcf1b22e0) Signed-off-by: Roger Quadros <ext-roger.quadros@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | netfilter: revised locking for x_tablesStephen Hemminger2009-04-284-291/+136
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The x_tables are organized with a table structure and a per-cpu copies of the counters and rules. On older kernels there was a reader/writer lock per table which was a performance bottleneck. In 2.6.30-rc, this was converted to use RCU and the counters/rules which solved the performance problems for do_table but made replacing rules much slower because of the necessary RCU grace period. This version uses a per-cpu set of spinlocks and counters to allow to table processing to proceed without the cache thrashing of a global reader lock and keeps the same performance for table updates. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Avoid extra wakeups of threads blocked in wait_for_packet()Eric Dumazet2009-04-281-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 2.6.25 we added UDP mem accounting. This unfortunatly added a penalty when a frame is transmitted, since we have at TX completion time to call sock_wfree() to perform necessary memory accounting. This calls sock_def_write_space() and utimately scheduler if any thread is waiting on the socket. Thread(s) waiting for an incoming frame was scheduled, then had to sleep again as event was meaningless. (All threads waiting on a socket are using same sk_sleep anchor) This adds lot of extra wakeups and increases latencies, as noted by Christoph Lameter, and slows down softirq handler. Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 Fortunatly, Davide Libenzi recently added concept of keyed wakeups into kernel, and particularly for sockets (see commit 37e5540b3c9d838eb20f2ca8ea2eb8072271e403 epoll keyed wakeups: make sockets use keyed wakeups) Davide goal was to optimize epoll, but this new wakeup infrastructure can help non epoll users as well, if they care to setup an appropriate handler. This patch introduces new DEFINE_WAIT_FUNC() helper and uses it in wait_for_packet(), so that only relevant event can wakeup a thread blocked in this function. Trace of function calls from bnx2 TX completion bnx2_poll_work() is : __kfree_skb() skb_release_head_state() sock_wfree() sock_def_write_space() __wake_up_sync_key() __wake_up_common() receiver_wake_function() : Stops here since thread is waiting for an INPUT Reported-by: Christoph Lameter <cl@linux.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipv4: Limit size of route cache hash tableAnton Blanchard2009-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we have no upper limit on the size of the route cache hash table. On a 128GB POWER6 box it ends up as 32MB: IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes) It would be nice to cap this for memory consumption reasons, but a massive hashtable also causes a significant spike when measuring OS jitter. With a 32MB hashtable and 4 million entries, rt_worker_func is taking 5 ms to complete. On another system with more memory it's taking 14 ms. Even though rt_worker_func does call cond_sched() to limit its impact, in an HPC environment we want to keep all sources of OS jitter to a minimum. With the patch applied we limit the number of entries to 512k which can still be overriden by using the rt_entries boot option: IP route cache hash table entries: 524288 (order: 6, 4194304 bytes) With this patch rt_worker_func now takes 0.460 ms on the same system. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | xfrm: wrong hash value for temporary SANicolas Dichtel2009-04-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kernel inserts a temporary SA for IKE, it uses the wrong hash value for dst list. Two hash values were calcultated before: one with source address and one with a wildcard source address. Bug hinted by Junwei Zhang <junwei.zhang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | vlan: update vlan carrier state for admin up/downJay Vosburgh2009-04-252-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the VLAN event handler does not adjust the VLAN device's carrier state when the real device or the VLAN device is set administratively up or down. The following patch adds a transfer of operating state from the real device to the VLAN device when the real device is administratively set up or down, and sets the carrier state up or down during init, open and close of the VLAN device. This permits observers above the VLAN device that care about the carrier state (bonding's link monitor, for example) to receive updates for administrative changes by more closely mimicing the behavior of real devices. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
OpenPOWER on IntegriCloud