| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When applicable verify that the caller has permisson to the underlying
network namespace for a newly created network device.
Similary checks exist for the network namespace a network device will
be created in.
Fixes: 317f4810e45e ("rtnl: allow to create device with IFLA_LINK_NETNSID set")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When applicable verify that the caller has permision to create a
network device in another network namespace. This check is already
present when moving a network device between network namespaces in
setlink so all that is needed is to duplicate that check in newlink.
This change almost backports cleanly, but there are context conflicts
as the code that follows was added in v4.0-rc1
Fixes: b51642f6d77b net: Enable a userns root rtnl calls that are safe for unprivilged users
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.
It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.
Reference: http://patchwork.ozlabs.org/patch/443040/
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before da413eec729d ("packet: Fixed TPACKET V3 to signal poll when block is
closed rather than every packet") poll listening for an af_packet socket was
not signaled if there was no packets to process. After the patch poll is
signaled evety time when block retire timer expires. That happens because
af_packet closes the current block on timeout even if the block is empty.
Passing empty blocks to the user not only wastes CPU but also wastes ring
buffer space increasing probability of packets dropping on small timeouts.
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Cc: Dan Collins <dan@dcollins.co.nz>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Guy Harris <guy@alum.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
Arrays (when not in a struct) "shall have a value greater than zero".
GCC complains when it's not the case here.
Fixes: ba7d49b1f0 ("rtnetlink: provide api for getting and setting slave info")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.
If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)
The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.
Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With commit a7526eb5d06b (net: Unbreak compat_sys_{send,recv}msg), the
MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points,
changing the kernel compat behaviour from the one before the commit it
was trying to fix (1be374a0518a, net: Block MSG_CMSG_COMPAT in
send(m)msg and recv(m)msg).
On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native
32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by
the kernel). However, on a 64-bit kernel, the compat ABI is different
with commit a7526eb5d06b.
This patch changes the compat_sys_{send,recv}msg behaviour to the one
prior to commit 1be374a0518a.
The problem was found running 32-bit LTP (sendmsg01) binary on an arm64
kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg()
but the general rule is not to break user ABI (even when the user
behaviour is not entirely sane).
Fixes: a7526eb5d06b (net: Unbreak compat_sys_{send,recv}msg)
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use helper functions to access current->state.
Direct assignments are prone to races and therefore buggy.
current->state = TASK_RUNNING can be replaced by __set_current_state()
Thanks to Peter Zijlstra for the exact definition of the problem.
Suggested-By: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
| |
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tcp_should_expand_sndbuf() does not expand the send buffer if we have
filled the congestion window.
However, it should use tcp_packets_in_flight() instead of
tp->packets_out to make this check.
Testing has established that the difference matters a lot if there are
many SACKed packets, causing a needless performance shortfall.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trying to use burst capability (aka xmit_more) on a virtual device
like bonding is not supported.
For example, skb might be queued multiple times on a qdisc, with
various list corruptions.
Fixes: 38b2cf2982dc ("net: pktgen: packet bursting via skb->xmit_more")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Packets defragmentation was introduced for PACKET_FANOUT_HASH only,
see 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4
fanouts")
It may be useful to have defragmentation enabled regardless of
fanout type. Without that, the AF_PACKET user may have to:
1. Collect fragments from different rings
2. Defragment by itself
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
colons are used as a separator in netdev device lookup in dev_ioctl.c
Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME
Signed-off-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains updates for your net tree, they are:
1) Fix removal of destination in IPVS when the new mixed family support
is used, from Alexey Andriyanov via Simon Horman.
2) Fix module refcount undeflow in nft_compat when reusing a match /
target.
3) Fix iptables-restore when the recent match is used with a new hitcount
that exceeds threshold, from Florian Westphal.
4) Fix stack corruption in xt_socket due to using stack storage to save
the inner IPv6 header, from Eric Dumazet.
I'll follow up soon with another batch with more fixes that are still
cooking.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As soon as extract_icmp6_fields() returns, its local storage (automatic
variables) is deallocated and can be overwritten.
Lets add an additional parameter to make sure storage is valid long
enough.
While we are at it, adds some const qualifiers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: b64c9256a9b76 ("tproxy: added IPv6 support to the socket match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
given:
-A INPUT -m recent --update --seconds 30 --hitcount 4
and
iptables-save > foo
then
iptables-restore < foo
will fail with:
kernel: xt_recent: hitcount (4) is larger than packets to be remembered (4) for table DEFAULT
Even when the check is fixed, the restore won't work if the hitcount is
increased to e.g. 6, since by the time checkentry runs it will find the
'old' incarnation of the table.
We can avoid this by increasing the maximum threshold silently; we only
have to rm all the current entries of the table (these entries would
not have enough room to handle the increased hitcount).
This even makes (not-very-useful)
-A INPUT -m recent --update --seconds 30 --hitcount 4
-A INPUT -m recent --update --seconds 30 --hitcount 42
work.
Fixes: abc86d0f99242b7f142b (netfilter: xt_recent: relax ip_pkt_list_tot restrictions)
Tracked-down-by: Chris Vine <chris@cvine.freeserve.co.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Feb 12 18:20:42 nfdev kernel: ------------[ cut here ]------------
Feb 12 18:20:42 nfdev kernel: WARNING: CPU: 4 PID: 4359 at kernel/module.c:963 module_put+0x9b/0xba()
Feb 12 18:20:42 nfdev kernel: CPU: 4 PID: 4359 Comm: ebtables-compat Tainted: G W 3.19.0-rc6+ #43
[...]
Feb 12 18:20:42 nfdev kernel: Call Trace:
Feb 12 18:20:42 nfdev kernel: [<ffffffff815fd911>] dump_stack+0x4c/0x65
Feb 12 18:20:42 nfdev kernel: [<ffffffff8103e6f7>] warn_slowpath_common+0x9c/0xb6
Feb 12 18:20:42 nfdev kernel: [<ffffffff8109919f>] ? module_put+0x9b/0xba
Feb 12 18:20:42 nfdev kernel: [<ffffffff8103e726>] warn_slowpath_null+0x15/0x17
Feb 12 18:20:42 nfdev kernel: [<ffffffff8109919f>] module_put+0x9b/0xba
Feb 12 18:20:42 nfdev kernel: [<ffffffff813ecf7c>] nft_match_destroy+0x45/0x4c
Feb 12 18:20:42 nfdev kernel: [<ffffffff813e683f>] nf_tables_rule_destroy+0x28/0x70
Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The current code prevents any operation with a mixed-family dest
unless IP_VS_CONN_F_TUNNEL flag is set. The problem is that it's impossible
for the client to follow this rule, because ip_vs_genl_parse_dest does
not even read the destination conn_flags when cmd = IPVS_CMD_DEL_DEST
(need_full_dest = 0).
Also, not every client can pass this flag when removing a dest. ipvsadm,
for example, does not support the "-i" command line option together with
the "-d" option.
This change disables any checks for mixed-family on IPVS_CMD_DEL_DEST command.
Signed-off-by: Alexey Andriyanov <alan@al-an.info>
Fixes: bc18d37f676f ("ipvs: Allow heterogeneous pools now that we support them")
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The cfpkt_iterate() function can return -EPROTO on error, but the
function is a u16 so the negative value gets truncated to a positive
unsigned short. This causes a static checker warning.
The only caller which might care is cffrml_receive(), when it's checking
the frame checksum. I modified cffrml_receive() so that it never says
-EPROTO is a valid checksum.
Also this isn't ever going to be inlined so I removed the "inline".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
commit aafb3e98b279 (netdev: introduce new NETIF_F_HW_SWITCH_OFFLOAD feature
flag for switch device offloads) add a new feature without adding it to
netdev_features_strings array; this patch fixes this.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Non NAPI drivers can call skb_tstamp_tx() and then sock_queue_err_skb()
from hard IRQ context.
Therefore, sock_dequeue_err_skb() needs to block hard irq or
corruptions or hangs can happen.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 364a9e93243d1 ("sock: deduplicate errqueue dequeue")
Fixes: cb820f8e4b7f7 ("net: Provide a generic socket error queue delivery method for Tx time stamps.")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Open vSwitch allows moving internal vport to different namespace
while still connected to the bridge. But when namespace deleted
OVS does not detach these vports, that results in dangling
pointer to netdevice which causes kernel panic as follows.
This issue is fixed by detaching all ovs ports from the deleted
namespace at net-exit.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa0aadaa5>] ovs_vport_locate+0x35/0x80 [openvswitch]
Oops: 0000 [#1] SMP
Call Trace:
[<ffffffffa0aa6391>] lookup_vport+0x21/0xd0 [openvswitch]
[<ffffffffa0aa65f9>] ovs_vport_cmd_get+0x59/0xf0 [openvswitch]
[<ffffffff8167e07c>] genl_family_rcv_msg+0x1bc/0x3e0
[<ffffffff8167e319>] genl_rcv_msg+0x79/0xc0
[<ffffffff8167d919>] netlink_rcv_skb+0xb9/0xe0
[<ffffffff8167deac>] genl_rcv+0x2c/0x40
[<ffffffff8167cffd>] netlink_unicast+0x12d/0x1c0
[<ffffffff8167d3da>] netlink_sendmsg+0x34a/0x6b0
[<ffffffff8162e140>] sock_sendmsg+0xa0/0xe0
[<ffffffff8162e5e8>] ___sys_sendmsg+0x408/0x420
[<ffffffff8162f541>] __sys_sendmsg+0x51/0x90
[<ffffffff8162f592>] SyS_sendmsg+0x12/0x20
[<ffffffff81764ee9>] system_call_fastpath+0x12/0x17
Reported-by: Assaf Muller <amuller@redhat.com>
Fixes: 46df7b81454("openvswitch: Add support for network namespaces.")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In tcf_em_validate(), after calling request_module() to load the
kind-specific module, set em->ops to NULL before returning -EAGAIN, so
that module_put() is not called again by tcf_em_tree_destroy().
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
skb_copy_bits() returns zero on success and negative value on error,
so it is needed to invert the condition in ip_check_defrag().
Fixes: 1bf3751ec90c ("ipv4: ip_check_defrag must not modify skb before unsharing")
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The gnet_stats_copy_app() function gets called, more often than not, with its
second argument a pointer to an automatic variable in the caller's stack.
Therefore, to avoid copying garbage afterwards when calling
gnet_stats_finish_copy(), this data is better copied to a dynamically allocated
memory that gets freed after use.
[xiyou.wangcong@gmail.com: remove a useless kfree()]
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull networking updates from David Miller:
1) Missing netlink attribute validation in nft_lookup, from Patrick
McHardy.
2) Restrict ipv6 partial checksum handling to UDP, since that's the
only case it works for. From Vlad Yasevich.
3) Clear out silly device table sentinal macros used by SSB and BCMA
drivers. From Joe Perches.
4) Make sure the remote checksum code never creates a situation where
the remote checksum is applied yet the tunneling metadata describing
the remote checksum transformation is still present. Otherwise an
external entity might see this and apply the checksum again. From
Tom Herbert.
5) Use msecs_to_jiffies() where applicable, from Nicholas Mc Guire.
6) Don't explicitly initialize timer struct fields, use setup_timer()
and mod_timer() instead. From Vaishali Thakkar.
7) Don't invoke tg3_halt() without the tp->lock held, from Jun'ichi
Nomura.
8) Missing __percpu annotation in ipvlan driver, from Eric Dumazet.
9) Don't potentially perform skb_get() on shared skbs, also from Eric
Dumazet.
10) Fix COW'ing of metrics for non-DST_HOST routes in ipv6, from Martin
KaFai Lau.
11) Fix merge resolution error between the iov_iter changes in vhost and
some bug fixes that occurred at the same time. From Jason Wang.
12) If rtnl_configure_link() fails we have to perform a call to
->dellink() before unregistering the device. From WANG Cong.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
net: dsa: Set valid phy interface type
rtnetlink: call ->dellink on failure when ->newlink exists
com20020-pci: add support for eae single card
vhost_net: fix wrong iter offset when setting number of buffers
net: spelling fixes
net/core: Fix warning while make xmldocs caused by dev.c
net: phy: micrel: disable NAND-tree for KSZ8021, KSZ8031, KSZ8051, KSZ8081
ipv6: fix ipv6_cow_metrics for non DST_HOST case
openvswitch: Fix key serialization.
r8152: restore hw settings
hso: fix rx parsing logic when skb allocation fails
tcp: make sure skb is not shared before using skb_get()
bridge: netfilter: Move sysctl-specific error code inside #ifdef
ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gc
ipvlan: add a missing __percpu pcpu_stats
tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one()
bgmac: fix device initialization on Northstar SoCs (condition typo)
qlcnic: Delete existing multicast MAC list before adding new
net/mlx5_core: Fix configuration of log_uar_page_sz
sunvnet: don't change gso data on clones
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the phy interface mode is not found in devicetree, or if devicetree
is not configured, of_get_phy_mode returns -ENODEV. The current code
sets the phy interface mode to the return value from of_get_phy_mode
without checking if it is valid.
This invalid phy interface mode is passed as parameter to of_phy_connect
or to phy_connect_direct. This sets the phy interface mode to the invalid
value, which in turn causes problems for any code using phydev->interface.
Fixes: b31f65fb4383 ("net: dsa: slave: Fix autoneg for phys on switch MDIO bus")
Fixes: 0d8bcdd383b8 ("net: dsa: allow for more complex PHY setups")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Ignacy reported that when eth0 is down and add a vlan device
on top of it like:
ip link add link eth0 name eth0.1 up type vlan id 1
We will get a refcount leak:
unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2
The problem is when rtnl_configure_link() fails in rtnl_newlink(),
we simply call unregister_device(), but for stacked device like vlan,
we almost do nothing when we unregister the upper device, more work
is done when we unregister the lower device, so call its ->dellink().
Reported-by: Ignacy Gawedzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Spelling errors caught by codespell.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch fix following warning wile make xmldocs.
Warning(.//net/core/dev.c:5345): No description found
for parameter 'bonding_info'
Warning(.//net/core/dev.c:5345): Excess function parameter
'netdev_bonding_info' description in 'netdev_bonding_info_change'
This warning starts to appear after following patch was added
into Linus's tree during merger period.
commit 61bd3857ff2c7daf756d49b41e6277bbdaa8f789
net/core: Add event for a change in slave state
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ipv6_cow_metrics() currently assumes only DST_HOST routes require
dynamic metrics allocation from inetpeer. The assumption breaks
when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric.
Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set()
is called after the route is created.
This patch creates the metrics array (by calling dst_cow_metrics_generic) in
ipv6_cow_metrics().
Test:
radvd.conf:
interface qemubr0
{
AdvLinkMTU 1300;
AdvCurHopLimit 30;
prefix fd00:face:face:face::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr off;
};
};
Before:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec
fe80::/64 dev eth0 proto kernel metric 256
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec
After:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300
fe80::/64 dev eth0 proto kernel metric 256 mtu 1300
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30
Fixes: 8e2ec639173f325 (ipv6: don't use inetpeer to store metrics for routes.)
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix typo where mask is used rather than key.
Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
Reported-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
IPv6 can keep a copy of SYN message using skb_get() in
tcp_v6_conn_request() so that caller wont free the skb when calling
kfree_skb() later.
Therefore TCP fast open has to clone the skb it is queuing in
child->sk_receive_queue, as all skbs consumed from receive_queue are
freed using __kfree_skb() (ie assuming skb->users == 1)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: 5b7ed0892f2af ("tcp: move fastopen functions to tcp_fastopen.c")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If CONFIG_SYSCTL=n:
net/bridge/br_netfilter.c: In function ‘br_netfilter_init’:
net/bridge/br_netfilter.c:996: warning: label ‘err1’ defined but not used
Move the label and the code after it inside the existing #ifdef to get
rid of the warning.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use spin_lock_bh in ip6_fl_purge() to prevent following potentially
deadlock scenario between ip6_fl_purge() and ip6_fl_gc() timer.
=================================
[ INFO: inconsistent lock state ]
3.19.0 #1 Not tainted
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(ip6_fl_lock){+.?...}, at: [<ffffffff8171155d>] ip6_fl_gc+0x2d/0x180
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810ee9a0>] __lock_acquire+0x4a0/0x10b0
[<ffffffff810efd54>] lock_acquire+0xc4/0x2b0
[<ffffffff81751d2d>] _raw_spin_lock+0x3d/0x80
[<ffffffff81711798>] ip6_flowlabel_net_exit+0x28/0x110
[<ffffffff815f9759>] ops_exit_list.isra.1+0x39/0x60
[<ffffffff815fa320>] cleanup_net+0x100/0x1e0
[<ffffffff810ad80a>] process_one_work+0x20a/0x830
[<ffffffff810adf4b>] worker_thread+0x11b/0x460
[<ffffffff810b42f4>] kthread+0x104/0x120
[<ffffffff81752bfc>] ret_from_fork+0x7c/0xb0
irq event stamp: 84640
hardirqs last enabled at (84640): [<ffffffff81752080>] _raw_spin_unlock_irq+0x30/0x50
hardirqs last disabled at (84639): [<ffffffff81751eff>] _raw_spin_lock_irq+0x1f/0x80
softirqs last enabled at (84628): [<ffffffff81091ad1>] _local_bh_enable+0x21/0x50
softirqs last disabled at (84629): [<ffffffff81093b7d>] irq_exit+0x12d/0x150
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(ip6_fl_lock);
<Interrupt>
lock(ip6_fl_lock);
*** DEADLOCK ***
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds infrastructure so that remote checksum offload can
set CHECKSUM_PARTIAL instead of calling csum_partial and writing
the modfied checksum field.
Add skb_remcsum_adjust_partial function to set an skb for using
CHECKSUM_PARTIAL with remote checksum offload. Changed
skb_remcsum_process and skb_gro_remcsum_process to take a boolean
argument to indicate if checksum partial can be set or the
checksum needs to be modified using the normal algorithm.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Properly set GSO types and skb->encapsulation in the UDP tunnel GRO
complete so that packets are properly represented for GSO. This sets
SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM depending on whether
non-zero checksums were received, and sets SKB_GSO_TUNNEL_REMCSUM if
the remote checksum option was processed.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Remote checksum offload processing is currently the same for both
the GRO and non-GRO path. When the remote checksum offload option
is encountered, the checksum field referred to is modified in
the packet. So in the GRO case, the packet is modified in the
GRO path and then the operation is skipped when the packet goes
through the normal path based on skb->remcsum_offload. There is
a problem in that the packet may be modified in the GRO path, but
then forwarded off host still containing the remote checksum option.
A remote host will again perform RCO but now the checksum verification
will fail since GRO RCO already modified the checksum.
To fix this, we ensure that GRO restores a packet to it's original
state before returning. In this model, when GRO processes a remote
checksum option it still changes the checksum per the algorithm
but on return from lower layer processing the checksum is restored
to its original value.
In this patch we add define gro_remcsum structure which is passed
to skb_gro_remcsum_process to save offset and delta for the checksum
being changed. After lower layer processing, skb_gro_remcsum_cleanup
is called to restore the checksum before returning from GRO.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
net/openvswitch/flow_netlink.c: In function ‘validate_and_copy_set_tun’:
net/openvswitch/flow_netlink.c:1749: warning: ‘err’ may be used uninitialized in this function
If ipv4_tun_from_nlattr() returns a different positive value than
OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, err will be uninitialized, and
validate_and_copy_set_tun() may return an undefined value instead of a
zero success indicator. Initialize err to zero to fix this.
Fixes: 1dd144cf5b4b47e1 ("openvswitch: Support VXLAN Group Policy extension")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Userspace packet execute command pass down flow key for given
packet. But userspace can skip some parameter with zero value.
Therefore kernel needs to initialize key metadata to zero.
Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When the RDS transport is TCP, we cannot inline the call to rds_send_xmit
from rds_cong_queue_update because
(a) we are already holding the sock_lock in the recv path, and
will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock
lock
(b) cong_queue_update does an irqsave on the rds_cong_lock, and this
will trigger warnings (for a good reason) from functions called
out of sock_lock.
This patch reverts the change introduced by
2fa57129d ("RDS: Bypass workqueue when queueing cong updates").
The patch has been verified for both RDS/TCP as well as RDS/RDMA
to ensure that there are not regressions for either transport:
- for verification of RDS/TCP a client-server unit-test was used,
with the server blocked in gdb and thus unable to drain its rcvbuf,
eventually triggering a RDS congestion update.
- for RDS/RDMA, the standard IB regression tests were used
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ip6_append_data is used by other protocols and some of them can't
be partially checksummed. Only partially checksum UDP protocol.
Fixes: 32dce968dd987a (ipv6: Allow for partial checksums on non-ufo packets)
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains two small Netfilter updates for your
net-next tree, they are:
1) Add ebtables support to nft_compat, from Arturo Borrero.
2) Fix missing validation of the SET_ID attribute in the lookup
expressions, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch extends nft_compat to support ebtables extensions.
ebtables verdict codes are translated to the ones used by the nf_tables engine,
so we can properly use ebtables target extensions from nft_compat.
This patch extends previous work by Giuseppe Longo <giuseppelng@gmail.com>.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
printk and friends can now format bitmaps using '%*pb[l]'. cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Move memcg_socket_limit_enabled decrement to tcp_destroy_cgroup (called
from memcg_destroy_kmem -> mem_cgroup_sockets_destroy) and zap a bunch of
wrapper functions.
Although this patch moves static keys decrement from __mem_cgroup_free to
mem_cgroup_css_free, it does not introduce any functional changes, because
the keys are incremented on setting the limit (tcp or kmem), which can
only happen after successful mem_cgroup_css_online.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull nfsd updates from Bruce Fields:
"The main change is the pNFS block server support from Christoph, which
allows an NFS client connected to shared disk to do block IO to the
shared disk in place of NFS reads and writes. This also requires xfs
patches, which should arrive soon through the xfs tree, barring
unexpected problems. Support for other filesystems is also possible
if there's interest.
Thanks also to Chuck Lever for continuing work to get NFS/RDMA into
shape"
* 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd: default NFSv4.2 to on
nfsd: pNFS block layout driver
exportfs: add methods for block layout exports
nfsd: add trace events
nfsd: update documentation for pNFS support
nfsd: implement pNFS layout recalls
nfsd: implement pNFS operations
nfsd: make find_any_file available outside nfs4state.c
nfsd: make find/get/put file available outside nfs4state.c
nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c
nfsd: add fh_fsid_match helper
nfsd: move nfsd_fh_match to nfsfh.h
fs: add FL_LAYOUT lease type
fs: track fl_owner for leases
nfs: add LAYOUT_TYPE_MAX enum value
nfsd: factor out a helper to decode nfstime4 values
sunrpc/lockd: fix references to the BKL
nfsd: fix year-2038 nfs4 state problem
svcrdma: Handle additional inline content
svcrdma: Move read list XDR round-up logic
...
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Christoph's block pnfs patches have some minor dependencies on these
lock patches.
|