blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	net: via-rhine: add OF bus binding	Alexey Charkov	2014-04-23	6	-115/+229
\| \| \| \| \| \| \| \| \| \|	This should make the driver usable with VIA/WonderMedia ARM-based Systems-on-Chip integrated Rhine III adapters. Note that these are always in MMIO mode, and don't have any known EEPROM. Signed-off-by: Alexey Charkov <alchark@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: via-rhine: reduce usage of the PCI-specific struct	Alexey Charkov	2014-04-23	1	-54/+62
\| \| \| \| \| \| \| \|	Use more generic data structures instead of struct pci_dev wherever possible in preparation for OF bus binding Signed-off-by: Alexey Charkov <alchark@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: via-rhine: switch to generic DMA functions	Alexey Charkov	2014-04-23	1	-41/+43
\| \| \| \| \| \| \| \|	Remove legacy PCI DMA wrappers and instead use generic DMA functions directly in preparation for OF bus binding Signed-off-by: Alexey Charkov <alchark@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	macvlan: Fix leak and NULL dereference on error path	Herbert Xu	2014-04-23	1	-4/+7
\| \| \| \| \| \| \| \| \| \|	The recent patch that moved broadcasts to process context added a couple of bugs on the error path where we may dereference NULL or leak an skb. This patch fixes them. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'gre_netns'	David S. Miller	2014-04-23	2	-23/+35
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nicolas Dichtel says: ==================== gre: allow to switch netns during encap/decap The goal of this serie is to add x-netns support for the module ip_gre and ip6_gre, ie the encapsulation addresses and the network device are not owned by the same namespace. Example to configure an ipv4 gre tunnel: modprobe ip_gre ip netns add netns1 ip netns exec netns1 ip link set lo up ip link add gre1 type gre remote 10.16.0.121 local 10.16.0.249 ikey 10 okey 10 csum ip link set gre1 netns netns1 ip netns exec netns1 ip link set gre1 up ip netns exec netns1 ip addr add dev gre1 192.168.0.249 remote 192.168.0.121 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ip6gre: add x-netns support	Nicolas Dichtel	2014-04-23	1	-20/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	gre: add x-netns support	Nicolas Dichtel	2014-04-23	1	-3/+3
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	hyperv: Simplify the send_completion variables	Haiyang Zhang	2014-04-23	4	-16/+11
\| \| \| \| \| \| \| \|	The union contains only one member now, so we use the variables in it directly. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	hyperv: Remove recv_pkt_list and lock	Haiyang Zhang	2014-04-23	4	-198/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed recv_pkt_list and lock, and updated related code, so that the locking overhead is reduced especially when multiple channels are in use. The recv_pkt_list isn't actually necessary because the packets are processed sequentially in each channel. It has been replaced by a local variable, and the related lock for this list is also removed. The is_data_pkt field is not used in receive path, so its assignment is cleaned up. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2014-04-22	16	-166/+369
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates This series contains updates to i40e and i40evf. Greg provides two patches for i40e, the first adds the netdev ops to support the addition of static FDB entries in the physical function (PF) MAC/VLAN filter table so that the virtual functions (VFs) can communicate with bridged virtual Ethernet ports such as those provided by the virtio driver. The second is to fix an issue where the assignment of a port VLAN after it is already up and running requires the VF driver to be reloaded, so print a message warning the host administrator about the need to reload the VF driver. In addition, knock the VF offline so that it does not continue to receive traffic not on the port VLAN assigned to it. Jesse provides a patch for i40e and i40evf to unhide and enable the PREFENA field in the receive host memory cache (RX-HMC) for best performance. Mitch provides a i40e patch to implement the net device op for Tx bandwidth setting. Catherine removes a firmware workaround that is no longer needed with the latest firmware for i40e. She also provides some minor cleanups as well bumps the driver versions. Anjali provides a fix for i40e displaying IPv4 flow director filters which needed additional information to be communicated up above in order for it to be displayed correctly. Shannon adds tracking of the NVM busy state so that the driver won't allow a new NVM update command until a completion event is received from the current update. Updates the admin queue API to reflect recent changes in the firmware. Also rearranges the "if netdev" logic to prepare for handling non-netdev VSIs. Lastly rework the fdir setup and tear down to use the newly created i40e_vsi_open() and i40e_vsi_close(), which also fixes a memory leak of the FDIR queue buffer info structs across a reset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	i40e/i40evf: Bump build versions	Catherine Sullivan	2014-04-22	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bump i40e to version 0.3.43 and i40evf to version 0.9.21. Change-ID: Ice4c715731bfa1dfc12dd45418675a3ba6e08d57 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Tweak for-loop in i40e_ethtool.c	Catherine Sullivan	2014-04-22	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tweak a for-loop to make it easier to add conditional stats in the future. Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Cleanup if/else statements	Catherine Sullivan	2014-04-22	1	-26/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify some if/else statements in i40e_main.c Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: rework fdir setup and teardown	Shannon Nelson	2014-04-22	1	-40/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the newer i40e_vsi_open() and i40e_vsi_close() in the FDIR VSI lifetime. This makes sure we're using standard methods for all the VSI open and close paths. This also fixes a memory leak of the FDIR queue buffer info structs across a reset. Change-ID: I1b60a1b08ab923afe4f49810c2c7844d850e19b9 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: use generic vsi_open to unquiesce vsi	Shannon Nelson	2014-04-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the new i40e_vsi_open() for waking VSIs back up in order to be sure all the standard actions happen. Change-ID: Ic3479410dd3079733f4951dcea69f101e69e77df Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: abstract the close path for better netdev vsis	Shannon Nelson	2014-04-22	1	-15/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Abstract out the vsi close actions into a single function so they can be used correctly for both netdev and non-netdev based VSIs. Change-ID: I59e3d115fcb20e614a09477281b7787dd340d276 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: prep vsi_open logic for non-netdev cases	Shannon Nelson	2014-04-22	1	-15/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rearrange the "if netdev" logic slightly to get ready for handling non-netdev VSIs. Change-ID: Ia0bfe13d4c994a2351a3c31fe725b75caeb397ee Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e/i40evf: update AdminQ API	Shannon Nelson	2014-04-22	2	-58/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reflect recent changes in firmware: - remove storm control - simplify PHY link management values - add partition bandwidth configuration Change-ID: If266ed2f9a89ad176cf8a74aeaef68613af76bc8 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e/i40evf: add tracking to NVM busy state	Shannon Nelson	2014-04-21	4	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NVM updates take some time and are asynchronous actions that signal their completion with an AdminQ event. This code tracks when there is an NVM update outstanding and won't allow a new update command until a completion event is received from the current update. Change-ID: Ic132fe16bd9dc09b002ed38297a877c1a01553ce Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Fix an issue with displaying IPv4 FD filters	Anjali Singhai Jain	2014-04-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The flow spec coming in for IPv4 filters is IP_USER_FLOW, which needed some more info to be communicated up above in order for it to be displayed correctly. Change-ID: Ia968238e0d7c4c4df12908ba81f0c4501280f3ec Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Remove a FW workaround	Catherine Sullivan	2014-04-21	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the FW workaround to increment the number of msix vectors. Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e/i40evf: Bump build versions	Catherine Sullivan	2014-04-21	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bump i40e to 0.3.41 and i40evf to 0.9.20. Change-ID: If49251a1a81a0f25e8f74bc8b7d086befb6df676 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Enable VF Tx bandwidth setting	Mitch Williams	2014-04-21	4	-2/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement the net device op for Tx bandwidth setting. Setting the Tx bandwidth is done by 'ip link set <PF device> vf <VF num> rate <Tx rate>', with the rate specified in Mbit/sec. The rate setting is displayed with 'ip link show'. Change-ID: I4d45dda8320632fdb6ec92c87d083e51070b46ab Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Reset the VF upon conflicting VLAN configuration	Greg Rose	2014-04-21	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a host VMM administrator hoses his VF by assigning a port VLAN after it is already up and running with implicit permission to set local VLANs then we print a message warning the host administrator that the VF driver needs to be reloaded. In addition we need to knock the VF offline so that it does not continue to receive traffic not on the port VLAN assigned to it. So we reset the VF. The VF will cease operation and the administrator will be forced to unload and reload the VF driver to make it work again. Change-ID: Iae1ae006b244e74e30a4ee546b3c5fca5cfb40aa Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e/i40evf: unhide and enable to one prefena field	Jesse Brandeburg	2014-04-21	4	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PREFENA field in the receive host memory cache (RX-HMC) must be visible in order to be set to 1 at driver init for best performance. Change-ID: I16b0bcd84cf56f4b6c938201ff5e954bee5a1992 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Add bridge FDB add/del/dump ops	Greg Rose	2014-04-21	1	-0/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the netdev ops to support addition of static FDB entries in the physical function (PF) MAC/VLAN filter table so that virtual functions (VFs) can communicate with bridged virtual Ethernet ports such as those provided by the virtio driver. Change-ID: Ifbd6817a75074e3b5cdf945a5635f26440bf15df Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* \|	Merge branch 'netlink-bind'	David S. Miller	2014-04-22	8	-35/+135
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Richard Guy Briggs says: ==================== audit: implement multicast socket for journald This is a patch set Eric Paris and I have been working on to add a restricted capability read-only netlink multicast socket to kernel audit to enable userspace clients such as systemd/journald to receive audit logs, in addition to the bidirectional auditd userspace client. Currently, auditd has the CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE capabilities (but uses CAP_NET_ADMIN). The CAP_AUDIT_READ capability will be added for use by read-only AUDIT_NLGRP_READLOG multicast group clients to the kaudit subsystem. This will remove the dependence on CAP_NET_ADMIN for the multicast read-only socket. Patches 1-3 provide a way for per-protocol bind functions to signal an error and to be able to clean up after themselves. The first netfilter cleanup patch has already been accepted by a netfilter maintainer, though I don't see it upstream yet, so it is included for completeness. The second patch adds the per-protocol bind function return code to signal to the netlink code that no further processing should be done and to undo the work already done. V1: This rev fixes a bug introduced by flattening the code in the last posting. *V2: This rev moves the per-protocol bind call above the socket exposure call and refactors out the unbind procedure. The third provides a way per protocol to undo bind actions on DROP. Patches 4-6 implement the audit multicast socket with capability checking. The fourth patch adds the bind function capability check to multicast join requests for audit. The fifth patch adds the audit log read multicast group. An assumption has been made that systemd/journald reside in the initial network namespace. This could be changed to check the actual network namespace of systemd/journald should this assumption no longer be true since audit now supports all network namespaces. This version of the patch now directly sends the broadcast when the packet is ready rather than waiting until it passes the queue. The sixth checks if any clients actually exist before sending. Since the net tree is busier than the audit tree, conflicts are more likely and the audit patches depend on the net patches, it is proposed to have the net tree carry this entire patchset for 3.16. Are the net maintainers ok with this? https://bugzilla.redhat.com/show_bug.cgi?id=887992 First posted: https://www.redhat.com/archives/linux-audit/2013-January/msg00008.html https://lkml.org/lkml/2013/1/27/279 Please find source for a test program at: http://people.redhat.com/rbriggs/audit-multicast-listen/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	audit: send multicast messages only if there are listeners	Richard Guy Briggs	2014-04-22	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test first to see if there are any userspace multicast listeners bound to the socket before starting the multicast send work. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	audit: add netlink multicast group for log read	Richard Guy Briggs	2014-04-22	2	-4/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a netlink multicast socket with one group to kaudit for "best-effort" delivery to read-only userspace clients such as systemd, in addition to the existing bidirectional unicast auditd userspace client. Currently, auditd is intended to use the CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE capabilities, but actually uses CAP_NET_ADMIN. The CAP_AUDIT_READ capability is added for use by read-only AUDIT_NLGRP_READLOG netlink multicast group clients to the kaudit subsystem. This will safely give access to services such as systemd to consume audit logs while ensuring write access remains restricted for integrity. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	audit: add netlink audit protocol bind to check capabilities on multicast join	Richard Guy Briggs	2014-04-22	3	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Register a netlink per-protocol bind fuction for audit to check userspace process capabilities before allowing a multicast group connection. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	netlink: implement unbind to netlink_setsockopt NETLINK_DROP_MEMBERSHIP	Richard Guy Briggs	2014-04-22	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Call the per-protocol unbind function rather than bind function on NETLINK_DROP_MEMBERSHIP in netlink_setsockopt(). Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	netlink: have netlink per-protocol bind function return an error code.	Richard Guy Briggs	2014-04-22	4	-24/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Have the netlink per-protocol optional bind function return an int error code rather than void to signal a failure. This will enable netlink protocols to perform extra checks including capabilities and permissions verifications when updating memberships in multicast groups. In netlink_bind() and netlink_setsockopt() the call to the per-protocol bind function was moved above the multicast group update to prevent any access to the multicast socket groups before checking with the per-protocol bind function. This will enable the per-protocol bind function to be used to check permissions which could be denied before making them available, and to avoid the messy job of undoing the addition should the per-protocol bind function fail. The netfilter subsystem seems to be the only one currently using the per-protocol bind function. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	netlink: simplify nfnetlink_bind	Richard Guy Briggs	2014-04-22	1	-5/+2
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove duplicity and simplify code flow by moving the rcu_read_unlock() above the condition and let the flow control exit naturally at the end of the function. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	filter: added BPF random opcode	Chema Gonzalez	2014-04-22	6	-2/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added a new ancillary load (bpf call in eBPF parlance) that produces a 32-bit random number. We are implementing it as an ancillary load (instead of an ISA opcode) because (a) it is simpler, (b) allows easy JITing, and (c) seems more in line with generic ISAs that do not have "get a random number" as a instruction, but as an OS call. The main use for this ancillary load is to perform random packet sampling. Signed-off-by: Chema Gonzalez <chema@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	vlan: unnecessary to check if vlan_pcpu_stats is NULL	Li RongQing	2014-04-22	1	-30/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	if allocating memory for vlan_pcpu_stats failed, the device can not be operated Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	be2net: Support for configurable RSS hash key	Venkata Duvvuru	2014-04-22	5	-31/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This be2net patch implements the get/set_rxfh() ethtool hooks. RSS_CONFIG device command is invoked to set hashkey and indirection table. It also uses an initial random value for RSS hash key instead of a hard-coded value as hard-coded values for a hash-key are usually considered a security risk. Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ethtool: Support for configurable RSS hash key	Venkata Duvvuru	2014-04-22	3	-14/+252
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This ethtool patch primarily copies the ioctl command data structures from/to the User space and invokes the driver hook. Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	neterion/s2io: remove unused s2io_start_tx_queue routine	Ying Xue	2014-04-22	1	-9/+0
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	tcp: avoid retransmits of TCP packets hanging in host queues	Eric Dumazet	2014-04-22	1	-8/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 0e280af026a5 ("tcp: introduce TCPSpuriousRtxHostQueues SNMP counter") we added a logic to detect when a packet was retransmitted while the prior clone was still in a qdisc or driver queue. We are now confident we can do better, and catch the problem before we fragment a TSO packet before retransmit, or in TLP path. This patch fully exploits the logic by simply canceling the spurious retransmit. Original packet is in a queue and will eventually leave the host. This helps to avoid network collapses when some events make the RTO estimations very wrong, particularly when dealing with huge number of sockets with synchronized blast. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv6: support IFA_F_MANAGETEMPADDR for address deletion too	Heiner Kallweit	2014-04-22	1	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Userspace applications can use IFA_F_MANAGETEMPADDR with RTM_NEWADDR already to indicate that the kernel should take care of temporary address management. This patch adds related functionality to RTM_DELADDR. By setting IFA_F_MANAGETEMPADDR a userspace application can indicate that the kernel should delete all related temporary addresses as well. A corresponding patch for the "ip addr del" command has been applied to iproute2 already. Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'tipc-next'	David S. Miller	2014-04-22	14	-205/+215
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ying Xue says: ==================== purge tipc_net_lock Now tipc routing hierarchy comprises the structures 'node', 'link'and 'bearer'. The whole hierarchy is protected by a big read/write lock, tipc_net_lock, to ensure that nothing is added or removed while code is accessing any of these structures. Obviously the locking policy makes node, link and bearer components closely bound together so that their relationship becomes unnecessarily complex. In the worst case, such locking policy not only has a negative influence on performance, but also it's prone to lead to deadlock occasionally. In order o decouple the complex relationship between bearer and node as well as link, the locking policy is adjusted as follows: - Bearer level RTNL lock is used on update side, and RCU is used on read side. Meanwhile, all bearer instances including broadcast bearer are saved into bearer_list array. - Node and link level All node instances are saved into two tipc_node_list and node_htable lists. The two lists are protected by node_list_lock on write side, and they are guarded with RCU lock on read side. All members in node structure including link instances are protected by node spin lock. - The relationship between bearer and node When link accesses bearer, it first needs to find the bearer with its bearer identity from the bearer_list array. When bearer accesses node, it can iterate the node_htable hash list with the node address to find the corresponding node. In the new locking policy, every component has its private locking solution and the relationship between bearer and node is very simple, that is, they can find each other with node address or bearer identity from node_htable hash list or bearer_list array. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: fix race in disc create/delete	Ying Xue	2014-04-22	3	-20/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a21a584d6720ce349b05795b9bcfab3de8e58419 (tipc: fix neighbor detection problem after hw address change) introduces a race condition involving tipc_disc_delete() and tipc_disc_add/remove_dest that can cause TIPC to dereference the pointer to the bearer discovery request structure after it has been freed since a stray pointer is left in the bearer structure. In order to fix the issue, the process of resetting the discovery request handler is optimized: the discovery request handler and request buffer are just reset instead of being freed, allocated and initialized. As the request point is always valid and the request's lock is taken while the request handler is reset, the race doesn't happen any more. Reported-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: use bc_lock to protect node map in bearer structure	Ying Xue	2014-04-22	3	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The node map variable - 'nodes' in bearer structure is only used by bclink. When bclink accesses it, bc_lock is held. But when change it, for instance, in tipc_bearer_add_dest() or tipc_bearer_remove_dest() the bc_lock is not taken at all. To avoid any inconsistent data, we should always grab bc_lock while accessing node map variable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: use bearer_disable to disable bearer in tipc_l2_device_event	Ying Xue	2014-04-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As bearer pointer is known in tipc_l2_device_event(), it's unnecessary to search it again in tipc_disable_bearer(). If tipc_disable_bearer() is replaced with bearer_disable() in tipc_l2_device_event(), this will help us save a bit time when bearer is disabled. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: make media_ptr pointed netdevice valid	Ying Xue	2014-04-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'media_ptr' pointer in bearer structure which points to network device, is protected by RCU. So, before netdevice is released, synchronize_net() should be involved to prevent no any user of the netdevice on read side from accessing it after it is freed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: purge tipc_net_lock lock	Ying Xue	2014-04-22	7	-100/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now tipc routing hierarchy comprises the structures 'node', 'link'and 'bearer'. The whole hierarchy is protected by a big read/write lock, tipc_net_lock, to ensure that nothing is added or removed while code is accessing any of these structures. Obviously the locking policy makes node, link and bearer components closely bound together so that their relationship becomes unnecessarily complex. In the worst case, such locking policy not only has a negative influence on performance, but also it's prone to lead to deadlock occasionally. In order o decouple the complex relationship between bearer and node as well as link, the locking policy is adjusted as follows: - Bearer level RTNL lock is used on update side, and RCU is used on read side. Meanwhile, all bearer instances including broadcast bearer are saved into bearer_list array. - Node and link level All node instances are saved into two tipc_node_list and node_htable lists. The two lists are protected by node_list_lock on write side, and they are guarded with RCU lock on read side. All members in node structure including link instances are protected by node spin lock. - The relationship between bearer and node When link accesses bearer, it first needs to find the bearer with its bearer identity from the bearer_list array. When bearer accesses node, it can iterate the node_htable hash list with the node address to find the corresponding node. In the new locking policy, every component has its private locking solution and the relationship between bearer and node is very simple, that is, they can find each other with node address or bearer identity from node_htable hash list or bearer_list array. Until now above all changes have been done, so tipc_net_lock can be removed safely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: use RCU to protect media_ptr pointer	Ying Xue	2014-04-22	2	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now the media_ptr pointer is protected with tipc_net_lock write lock on write side; tipc_net_lock read lock is used to read side. As the part of effort of eliminating tipc_net_lock, we decide to adjust the locking policy of media_ptr pointer protection: on write side, RTNL lock is use while on read side RCU read lock is applied. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: decouple the relationship between bearer and link	Ying Xue	2014-04-22	7	-46/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently on both paths of message transmission and reception, the read lock of tipc_net_lock must be held before bearer is accessed, while the write lock of tipc_net_lock has to be taken before bearer is configured. Although it can ensure that bearer is always valid on the two data paths, link and bearer is closely bound together. So as the part of effort of removing tipc_net_lock, the locking policy of bearer protection will be adjusted as below: on the two data paths, RCU is used, and on the configuration path of bearer, RTNL lock is applied. Now RCU just covers the path of message reception. To make it possible to protect the path of message transmission with RCU, link should not use its stored bearer pointer to access bearer, but it should use the bearer identity of its attached bearer as index to get bearer instance from bearer_list array, which can help us decouple the relationship between bearer and link. As a result, bearer on the path of message transmission can be safely protected by RCU when we access bearer_list array within RCU lock protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: convert bearer_list to RCU list	Ying Xue	2014-04-22	3	-13/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert bearer_list to RCU list. It's protected by RTNL lock on update side, and RCU read lock is applied to read side. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tipc: use RTNL lock to protect tipc_net_stop routine	Ying Xue	2014-04-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As the tipc network initialization(ie, tipc_net_start routine) is under RTNL protection, its corresponding deinitialization part(ie, tipc_net_stop routine) should be protected by RTNL too. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>