blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2014-08-05	27	-121/+132
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/net/Makefile net/ipv6/sysctl_net_ipv6.c Two ipv6_table_template[] additions overlap, so the index of the ipv6_table[x] assignments needed to be adjusted. In the drivers/net/Makefile case, we've gotten rid of the garbage whereby we had to list every single USB networking driver in the top-level Makefile, there is just one "USB_NETWORKING" that guards everything. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tg3: Modify tg3_tso_bug() to handle multiple TX rings	Prashant Sreedharan	2014-08-05	1	-10/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tg3_tso_bug() was originally designed to handle only HW TX ring 0, Commit d3f6f3a1d818410c17445bce4f4caab52eb102f1 ("tg3: Prevent page allocation failure during TSO workaround") changed the driver logic to use tg3_tso_bug() for all HW TX rings that are enabled. This patch fixes the regression by modifying tg3_tso_bug() to handle multiple HW TX rings. Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: sun4i-emac: fix memory leak on bad packet	Marc Zyngier	2014-08-05	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upon reception of a new frame, the emac driver checks for a number of error conditions, and flag the packet as "bad" if any of these are present. It then allocates a skb unconditionally, but only uses it if the packet is "good". On the error path, the skb is just forgotten, and the system leaks memory. The piece of junk I have on my desk seems to encounter such error frequently enough so that the box goes OOM after a couple of days, which makes me grumpy. Fix this by moving the allocation on the "good_packet" path (and convert it to netdev_alloc_skb while we're at it). Tested on a random Allwinner A20 board. Cc: Stefan Roese <sr@denx.de> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Cc: <stable@vger.kernel.org> # 3.11+ Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sctp: fix possible seqlock seadlock in sctp_packet_transmit()	Eric Dumazet	2014-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dave reported following splat, caused by improper use of IP_INC_STATS_BH() in process context. BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3 Call Trace: [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100 [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20 [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp] [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp] [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0 [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp] [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp] [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp] [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100 [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp] [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp] [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp] [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160 [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30 [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220 [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220 [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0 [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0 [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0 [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0 [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330 [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10 [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2 This is a followup of commits f1d8cba61c3c4b ("inet: fix possible seqlock deadlocks") and 7f88c6b23afbd315 ("ipv6: fix possible seqlock deadlock in ip6_finish_output2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Revert "net: phy: Set the driver when registering an MDIO bus device"	Fabio Estevam	2014-08-05	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a71e3c37960ce5f9 ("net: phy: Set the driver when registering an MDIO bus device") caused the following regression on the fec driver: root@imx6qsabresd:~# echo mem > /sys/power/state PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.003 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. Unable to handle kernel NULL pointer dereference at virtual address 0000002c pgd = bcd14000 [0000002c] pgd=4d9e0831, pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: CPU: 0 PID: 617 Comm: sh Not tainted 3.16.0 #17 task: bc0c4e00 ti: bceb6000 task.ti: bceb6000 PC is at fec_suspend+0x10/0x70 LR is at dpm_run_callback.isra.7+0x34/0x6c pc : [<803f8a98>] lr : [<80361f44>] psr: 600f0013 sp : bceb7d70 ip : bceb7d88 fp : bceb7d84 r10: 8091523c r9 : 00000000 r8 : bd88f478 r7 : 803f8a88 r6 : 81165988 r5 : 00000000 r4 : 00000000 r3 : 00000000 r2 : 00000000 r1 : bd88f478 r0 : bd88f478 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: 4cd1404a DAC: 00000015 Process sh (pid: 617, stack limit = 0xbceb6240) Stack: (0xbceb7d70 to 0xbceb8000) .... The problem with the original commit is explained by Russell King: "It has the effect (as can be seen from the oops) of attaching the MDIO bus device (itself is a bus-less device) to the platform driver, which means that if the platform driver supports power management, it will be called to power manage the MDIO bus device. Moreover, drivers do not expect to be called for power management operations for devices which they haven't probed, and certainly not for devices which aren't part of the same bus that the driver is registered against." This reverts commit a71e3c37960ce5f9c6a519bc1215e3ba9fa83e75. Cc: <stable@vger.kernel.org> #3.16 Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'qlcnic'	David S. Miller	2014-08-05	3	-11/+19
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rajesh Borundia says: ==================== qlcnic: Bug fixes The patch series contains following bug fixes. * Aggregating tx stats in adapter variable was resulting in increase of stats when user runs ifconfig command and no traffic is running. Instead aggregate tx stats in local variable and then assign it to adapter struct variable. * Set_driver_version was called after registering netdev which was resulting in a race between FLR in open handler and set_driver_version command as open handler can be called simulatneously on another cpu even if probe is not complete. So call this command before registering netdev. * dcbnl_ops should be initialized before registering netdev as they are referenced in open handler. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	qlcnic: Initialize dcbnl_ops before register_netdev	Rajesh Borundia	2014-08-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Initialization of dcbnl_ops after register netdev may result in dcbnl_ops not getting set before it is being accessed from open. So, moving it before register_netdev. Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	qlcnic: Set driver version before registering netdev	Rajesh Borundia	2014-08-05	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Earlier, set_drv_version was getting called after register_netdev. This was resulting in a race between set_drv_version and FLR called from open(). Moving set_drv_version before register_netdev avoids the race. o Log response code in error message on CDRP failure. Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	qlcnic: Fix update of ethtool stats.	Rajesh Borundia	2014-08-05	1	-5/+13
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o Aggregating tx stats in adapter variable was resulting in an increase in stats even after no traffic was run and user runs ifconfig/ethtool command. o qlcnic_update_stats used to accumulate stats in adapter struct at each function call, instead accumulate tx stats in local variable and then assign it to adapter structure. Reported-by: Holger Kiehl <holger.kiehl@dwd.de> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge	David S. Miller	2014-08-05	1	-3/+7
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Antonio Quartulli says: ==================== pull request net: batman-adv 2014-08-04 this is a pull request intended for net. It just contains a patch by Sven Eckelmann that fixes the reception of out-of-order fragments. As explained in the commit message, the issue was due to a wrong assumption about hlist_for_each_entry() in batadv_frag_insert_packet(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	batman-adv: Fix out-of-order fragmentation support	Sven Eckelmann	2014-08-05	1	-3/+7
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	batadv_frag_insert_packet was unable to handle out-of-order packets because it dropped them directly. This is caused by the way the fragmentation lists is checked for the correct place to insert a fragmentation entry. The fragmentation code keeps the fragments in lists. The fragmentation entries are kept in descending order of sequence number. The list is traversed and each entry is compared with the new fragment. If the current entry has a smaller sequence number than the new fragment then the new one has to be inserted before the current entry. This ensures that the list is still in descending order. An out-of-order packet with a smaller sequence number than all entries in the list still has to be added to the end of the list. The used hlist has no information about the last entry in the list inside hlist_head and thus the last entry has to be calculated differently. Currently the code assumes that the iterator variable of hlist_for_each_entry can be used for this purpose after the hlist_for_each_entry finished. This is obviously wrong because the iterator variable is always NULL when the list was completely traversed. Instead the information about the last entry has to be stored in a different variable. This problem was introduced in 610bfc6bc99bc83680d190ebc69359a05fc7f605 ("batman-adv: Receive fragmented packets and merge"). Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	r8152: add missing Makefile rule	JF Le Fillatre	2014-08-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add missing Makefile rule for r8152 driver In the current kernel the r8152 driver is never built because of a missing rule in drivers/net/Makefile, despite being selected as built-in or module in the .config file. There is no error message or warning to indicate that the driver isn't built. This change adds the rule and lets the driver build. Tested as built-in and module for 3.15.8. Signed-off by: JF Le Fillatre <jflf-kernel@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf	David S. Miller	2014-08-02	3	-4/+9
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Maintain all DSCP and ECN bits for IPv6 tun forwarding. This resolves an inconsistency between IPv4 and IPv6 behaviour. Patch from Alex Gartrell via Simon Horman. 2) Fix unnoticeable blink in xt_LED when the led-always-blink option is used, from Jiri Prchal. 3) Add missing return in nft_del_setelem(), otherwise this results in a double call of nft_data_uninit() in the nf_tables code, from Thomas Graf. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	netfilter: nf_tables: Avoid duplicate call to nft_data_uninit() for same key	Thomas Graf	2014-08-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nft_del_setelem() currently calls nft_data_uninit() twice on the same key. Once to release the key which is guaranteed to be NFT_DATA_VALUE and a second time in the error path to which it falls through. The second call has been harmless so far though because the type passed is always NFT_DATA_VALUE which is currently a no-op. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| \| *	netfilter: xt_LED: fix too short led-always-blink	Jiri Prchal	2014-07-25	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If led-always-blink is set, then between switch led OFF and ON is almost zero time. So blink is invisible. This use oneshot led trigger with fixed time 50ms witch is enough to see blink. Signed-off-by: Jiri Prchal <jiri.prchal@aksignal.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
\| \| *	ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding	Alex Gartrell	2014-07-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, only the four high bits of the tclass were maintained in the ipv6 case. This matches the behavior of ipv4, though whether or not we should reflect ECN bits may be up for debate. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
\| * \|	ipv6: data of fwmark_reflect sysctl needs to be updated on netns construction	Hannes Frederic Sowa	2014-08-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes: e110861f86094cd ("net: add a sysctl to reflect the fwmark on replies") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	cdc_subset: deal with a device that needs reset for timeout	Oliver Neukum	2014-08-02	3	-3/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This device needs to be reset to recover from a timeout. Unfortunately this can be handled only at the level of the subdrivers. Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	iovec: make sure the caller actually wants anything in memcpy_fromiovecend	Sasha Levin	2014-08-02	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check for cases when the caller requests 0 bytes instead of running off and dereferencing potentially invalid iovecs. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: Correctly set segment mac_len in skb_segment().	Vlad Yasevich	2014-07-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When performing segmentation, the mac_len value is copied right out of the original skb. However, this value is not always set correctly (like when the packet is VLAN-tagged) and we'll end up copying a bad value. One way to demonstrate this is to configure a VM which tags packets internally and turn off VLAN acceleration on the forwarding bridge port. The packets show up corrupt like this: 16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q (0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0, 0x0000: 8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...\|...d@.....d. 0x0010: 0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0.. 0x0020: 29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3 0x0030: 000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp 0x0040: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0050: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp 0x0060: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp ... This also leads to awful throughput as GSO packets are dropped and cause retransmissions. The solution is to set the mac_len using the values already available in then new skb. We've already adjusted all of the header offset, so we might as well correctly figure out the mac_len using skb_reset_mac_len(). After this change, packets are segmented correctly and performance is restored. CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	Merge branch 'xen-netfront'	David S. Miller	2014-07-31	1	-64/+10
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	David Vrabel says: ==================== xen-netfront: more multiqueue fixes A few more xen-netfront fixes for the multiqueue support added in 3.16-rc1. It would be great if these could make it into 3.16 but I suspect it's a little late for that now. The second patch fixes a significant resource leak that prevents guests from migrating more than a handful of times. These have been tested by repeatedly migrating a guest over 250 times (it would previously fail with this guest after only 8 iterations). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \|	xen-netfront: print correct number of queues	David Vrabel	2014-07-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When less than the requested number of queues could be created, include the actual number in the warning (instead of the requested number). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \|	xen-netfront: release per-queue Tx and Rx resource when disconnecting	David Vrabel	2014-07-31	1	-61/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since netfront may reconnect to a backend with a different number of queues, all per-queue Rx and Tx resources (skbs and grant references) should be freed when disconnecting. Without this fix, the Tx and Rx grant refs are not released and netfront will exhaust them after only a few reconnections. netfront will fail to connect when no free grant references are available. Since all Rx bufs are freed and reallocated instead of reused this will add some additional delay to the reconnection but this is expected to be small compared to the time taken by any backend hotplug scripts etc. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \|	xen-netfront: fix locking in connect error path	David Vrabel	2014-07-31	1	-1/+1
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If no queues could be created when connecting to the backend, one of the error paths would deadlock. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	macvlan: Initialize vlan_features to turn on offload support.	Vlad Yasevich	2014-07-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Macvlan devices do not initialize vlan_features. As a result, any vlan devices configured on top of macvlans perform very poorly. Initialize vlan_features based on the vlan features of the lower-level device. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	bridge: Don't include NDA_VLAN for FDB entries with vid 0	Toshiaki Makita	2014-07-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An FDB entry with vlan_id 0 doesn't mean it is used in vlan 0, but used when vlan_filtering is disabled. There is inconsistency around NDA_VLAN whose payload is 0 - even if we add an entry by RTM_NEWNEIGH without any NDA_VLAN, and even though adding an entry with NDA_VLAN 0 is prohibited, we get an entry with NDA_VLAN 0 by RTM_GETNEIGH. Dumping an FDB entry with vlan_id 0 shouldn't include NDA_VLAN. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	bonding: use kobject_put instead of _del after kobject_add	Veaceslav Falico	2014-07-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise the name of the kobject isn't getting freed and other stuff from kobject_cleanup() isn't getting called. kobject_put() will call kobject_del() on its own in kobject_cleanup(). CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	bna: fix performance regression	Ivan Vecera	2014-07-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The recent commit "e29aa33 bna: Enable Multi Buffer RX" is causing a performance regression. It does not properly update 'cmpl' pointer at the end of the loop in NAPI handler bnad_cq_process(). The result is only one packet / per NAPI-schedule is processed. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	tcp: Fix integer-overflow in TCP vegas	Christoph Paasch	2014-07-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In vegas we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. Then, we need to do do_div to allow this to be used on 32-bit arches. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Doug Leith <doug.leith@nuim.ie> Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	tcp: Fix integer-overflows in TCP veno	Christoph Paasch	2014-07-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In veno we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. A first attempt at fixing 76f1017757aa0 ([TCP]: TCP Veno congestion control) was made by 159131149c2 (tcp: Overflow bug in Vegas), but it failed to add the required cast in tcp_veno_cong_avoid(). Fixes: 76f1017757aa0 ([TCP]: TCP Veno congestion control) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"	Dmitry Popov	2014-07-30	2	-11/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ipv4 tunnels created with "local any remote $ip" didn't work properly since 7d442fab0 (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels had src addr = 0.0.0.0. That was because only dst_entry was cached, although fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry (tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit(). This patch adds saddr to ip_tunnel->dst_cache, fixing this issue. Reported-by: Sergey Popov <pinkbyte@gentoo.org> Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	bna: fill the magic in bnad_get_eeprom() instead of validating	Ivan Vecera	2014-07-30	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A driver should fill magic field of ethtool_eeprom struct in .get_eeprom and validate it in .set_eeprom. The bna incorrectly validates it in both and this makes its .get_eeprom interface unusable. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	cxgb4 : Disable recursive mailbox commands when enabling vi	Anish Bhatt	2014-08-05	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enabling a Virtual Interface can result in an interrupt during the processing of the VI Enable command and, in some paths, result in an attempt to issue another command in the interrupt context, eventually crashing the system. Thus, we disable interrupts during the course of the VI Enable command and ensure enable doesn't sleep. Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	net: reduce USB network driver config options.	Francois Romieu	2014-08-05	2	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	USB network drivers are already handled in drivers/net/usb/Kconfig. Let's save the maintenance burden of dependencies in drivers/net/Makefile. The newly introduced USB_NET_DRIVERS umbrella config option defaults to 'y' so as to minimize the changes of behavior. Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	Merge branch 'amd-xgbe'	David S. Miller	2014-08-05	3	-101/+126
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver update 2014-08-05 The following series of patches includes fixes/updates to the driver. - Use dma_set_mask_and_coherent to set the DMA mask - Move the phy connect/disconnect logic to allow for module unloading Changes in V2: - Check the return value of the dma_set_mask_and_coherent call ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	amd-xgbe: Perform phy connect/disconnect at dev open/stop	Lendacky, Thomas	2014-08-05	2	-99/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A change added to the mdiobus/phy api added a module_get/module_put during phy connect/disconnect processing. Currently, the driver performs a phy connect during module probe and a phy disconnect during module remove. With the addition of the module_get during phy connect the amd-xgbe module use count is incremented and can no longer be unloaded. Move the phy connect/disconnect from the driver probe/remove functions to the net_device_ops ndo_open/ndo_stop functions. This allows the module use count to be decremented when the device(s) are brought down and allows the module to be unloaded. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask	Lendacky, Thomas	2014-08-05	1	-2/+5
\|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the dma_set_mask_and_coherent function to set the DMA mask rather than setting the DMA mask fields directly. This was originally done to work around a bug in the arm64 DMA support when RAM started above the 4GB boundary which has since been fixed. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown ↵	Hariprasad Shenai	2014-08-05	1	-14/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	routine Need to turn off SGE RX/TX Callback Timers & interrupt in cxgb4vf PCI Shutdown routine in order to prevent crashes during reboot/poweroff when traffic is running. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge	David S. Miller	2014-08-05	6	-27/+37
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Antonio Quartulli says: ==================== pull request: batman-adv 2014-08-05 this is a pull request intended for net-next/linux-3.17 (yeah..it's really late). Patches 1, 2 and 4 are really minor changes: - kmalloc_array is substituted to kmalloc when possible (as suggested by checkpatch); - net_ratelimited() is now used properly and the "suppressed" message is not printed anymore if not needed; - the internal version number has been increased to reflect our current version. Patch 3 instead is introducing a change in the metric computation function by changing the penalty applied at each mesh hop from 15/255 (~6%) to 30/255 (~11%). This change is introduced by Simon Wunderlich after having observed a performance improvement in several networks when using the new value. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	batman-adv: Start new development cycle	Simon Wunderlich	2014-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| * \| \|	batman-adv: increase default hop penalty	Simon Wunderlich	2014-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The default hop penalty is currently set to 15, which is applied like that for multi interface devices (e.g. dual band APs). Single band devices will still use an effective penalty of 30 (hop penalty + wifi penalty). After receiving reports of too long paths in mesh networks with dual band APs which were fixed by increasing the hop penalty, we'd like to suggest to increase that default value in the default setting as well. We've evaluated that increase in a handful of medium sized mesh networks (5-20 nodes) with single and dual band devices, with changes for the better (shorter routes, higher throughput) or no change at all. This patch changes the hop penalty to 30, which will give an effective penalty of 60 on single band devices (hop penalty + wifi penalty). Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| * \| \|	batman-adv: remove unnecessary logspam	André Gaul	2014-08-05	2	-15/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes unnecessary logspam which resulted from superfluous calls to net_ratelimit(). With the supplied patch, net_ratelimit() is called after the loglevel has been checked. Signed-off-by: André Gaul <gaul@web-yard.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| * \| \|	batman-adv: prefer kmalloc_array to kmalloc when possible	Antonio Quartulli	2014-08-04	3	-10/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reported by checkpatch with the following warning: WARNING: Prefer kmalloc_array over kmalloc with multiply Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
* \| \| \|	team: Simplify return path of team_newlink	Toshiaki Makita	2014-08-05	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The variable "err" is not necessary. Return register_netdevice() directly. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	bridge: Update outdated comment on promiscuous mode	Toshiaki Makita	2014-08-05	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now bridge ports can be non-promiscuous, vlan_vid_add() is no longer an unnecessary operation. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	Merge branch 'net-timestamp-next'	David S. Miller	2014-08-05	13	-64/+170
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Willem de Bruijn says: ==================== net-timestamp: new tx tstamps and tcp Extend socket tx timestamping: - allow multiple types of software timestamps aside from send (1) - add software timestamp on enter packet scheduling (4) - add software timestamp for TCP (5) - add software timestamp for TCP on ACK (6) The sk_flags option space is nearly exhausted. Also move the many timestamp options to a new sk->sk_tstamps (2). To disambiguate data when tstamps may arrive out of order, optionally return a sequential ID assigned at send (3). Extend Linux tx timestamping to monitoring of latency incurred within the kernel stack and to protocols embedded in TCP. Complex kernel setups may have multiple layers of queueing, including multiple instances of packet scheduling, and many classes per layer. Many applications embed discrete payloads into TCP bytestreams for reliability, flow control, etcetera. Detecting application tail latency in such scenarios relies on identifying the exact queue responsible if on the host, or the network latency if otherwise. Changelog: v4->v5 - define SCM_TSTAMP_SND == 0, for legacy behavior - add TCP tstamps without changing the generated byte stream - modify GSO and ACK to find offset: slightly more complex than previous invariant that it is the last byte - consistent naming of packet scheduling - rename SCM_TSTAMP_ENQ to SCM_TSTAMP_SCHED - add unique key in ee_data - add id field in ee_info to disambiguate tstamps - optional, only on new flag SOF_TIMESTAMPING_OPT_ID - for bytestream, in bytes v3->v4 - (v3 review comment) removed skb->mark packet identification (A) - (v3 review comment) fixed indentation - tcp: fixed poll() to return POLLERR on non-zero queue - rebased to work without syststamp - comments: removed all traces of MSG_TSTAMP_.. (B) v2->v3 - extend the SO_TIMESTAMPING API, instead of defining a new one. - add protocol independent support to correlate tstamps with data, based on returning skb->mark. - removed no-payload optimization and documentation (for now): I have a follow-on patch that reintroduces MSG_TSTAMP along with a new socket option SOF_TIMESTAMPING_OPT_ONFLAG. This is equivalent to sequence setsockopt(<enable>); send(..); setsockopt(<disable>), but avoids the need to define a MSG_TSTAMP_<TYPE> for each type. I will leave these three patches as follow-on, as this patchset is large enough as is. v1->v2 - expand timestamping (existing and new) to SOCK_RAW and ping sockets - rename sock_errqueue_timestamping to scm_timestamping - change timestamp data format: do not add fields to scm_timestamping. Doing so could break legacy applications. Instead, communicate through an existing, but unused, field in the error message. - rename SOF_.._OPT_TX_NO_PAYLOAD to shorter SOF_.._OPT_TSONLY - move msg_tstamp test app out of patchset and to github git://github.com/wdebruij/kerneltools.git ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	net-timestamp: ACK timestamp for bytestreams	Willem de Bruijn	2014-08-05	5	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte in the send() call is acknowledged. It implements the feature for TCP. The timestamp is generated when the TCP socket cumulative ACK is moved beyond the tracked seqno for the first time. The feature ignores SACK and FACK, because those acknowledge the specific byte, but not necessarily the entire contents of the buffer up to that byte. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	net-timestamp: TCP timestamping	Willem de Bruijn	2014-08-05	4	-6/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TCP timestamping extends SO_TIMESTAMPING to bytestreams. Bytestreams do not have a 1:1 relationship between send() buffers and network packets. The feature interprets a send call on a bytestream as a request for a timestamp for the last byte in that send() buffer. The choice corresponds to a request for a timestamp when all bytes in the buffer have been sent. That assumption depends on in-order kernel transmission. This is the common case. That said, it is possible to construct a traffic shaping tree that would result in reordering. The guarantee is strong, then, but not ironclad. This implementation supports send and sendpages (splice). GSO replaces one large packet with multiple smaller packets. This patch also copies the option into the correct smaller packet. This patch does not yet support timestamping on data in an initial TCP Fast Open SYN, because that takes a very different data path. If ID generation in ee_data is enabled, bytestream timestamps return a byte offset, instead of the packet counter for datagrams. The implementation supports a single timestamp per packet. It silenty replaces requests for previous timestamps. To avoid missing tstamps, flush the tcp queue by disabling Nagle, cork and autocork. Missing tstamps can be detected by offset when the ee_data ID is enabled. Implementation details: - On GSO, the timestamping code can be included in the main loop. I moved it into its own loop to reduce the impact on the common case to a single branch. - To avoid leaking the absolute seqno to userspace, the offset returned in ee_data must always be relative. It is an offset between an skb and sk field. The first is always set (also for GSO & ACK). The second must also never be uninitialized. Only allow the ID option on sockets in the ESTABLISHED state, for which the seqno is available. Never reset it to zero (instead, move it to the current seqno when reenabling the option). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	net-timestamp: SCHED timestamp on entering packet scheduler	Willem de Bruijn	2014-08-05	6	-7/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kernel transmit latency is often incurred in the packet scheduler. Introduce a new timestamp on transmission just before entering the scheduler. When data travels through multiple devices (bonding, tunneling, ...) each device will export an individual timestamp. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	net-timestamp: add key to disambiguate concurrent datagrams	Willem de Bruijn	2014-08-05	7	-4/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Datagrams timestamped on transmission can coexist in the kernel stack and be reordered in packet scheduling. When reading looped datagrams from the socket error queue it is not always possible to unique correlate looped data with original send() call (for application level retransmits). Even if possible, it may be expensive and complex, requiring packet inspection. Introduce a data-independent ID mechanism to associate timestamps with send calls. Pass an ID alongside the timestamp in field ee_data of sock_extended_err. The ID is a simple 32 bit unsigned int that is associated with the socket and incremented on each send() call for which software tx timestamp generation is enabled. The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to avoid changing ee_data for existing applications that expect it 0. The counter is reset each time the flag is reenabled. Reenabling does not change the ID of already submitted data. It is possible to receive out of order IDs if the timestamp stream is not quiesced first. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>