summaryrefslogtreecommitdiffstats
path: root/net/netlink
Commit message (Collapse)AuthorAgeFilesLines
* genetlink: fix genlmsg_multicast() bugJohannes Berg2013-11-211-2/+2
| | | | | | | | | | | | | | | | | Unfortunately, I introduced a tremendously stupid bug into genlmsg_multicast() when doing all those multicast group changes: it adjusts the group number, but then passes it to genlmsg_multicast_netns() which does that again. Somehow, my tests failed to catch this, so add a warning into genlmsg_multicast_netns() and remove the offending group ID adjustment. Also add a warning to the similar code in other functions so people who misuse them are more loudly warned. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rework recvmsg handler msg_name and msg_namelen logicHannes Frederic Sowa2013-11-201-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: make multicast groups const, prevent abuseJohannes Berg2013-11-191-112/+166
| | | | | | | | | | | | | | | | | | Register generic netlink multicast groups as an array with the family and give them contiguous group IDs. Then instead of passing the global group ID to the various functions that send messages, pass the ID relative to the family - for most families that's just 0 because the only have one group. This avoids the list_head and ID in each group, adding a new field for the mcast group ID offset to the family. At the same time, this allows us to prevent abusing groups again like the quota and dropmon code did, since we can now check that a family only uses a group it owns. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: pass family to functions using groupsJohannes Berg2013-11-191-5/+7
| | | | | | | | | This doesn't really change anything, but prepares for the next patch that will change the APIs to pass the group ID within the family, rather than the global group ID. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: remove family pointer from genl_multicast_groupJohannes Berg2013-11-191-20/+18
| | | | | | | | There's no reason to have the family pointer there since it can just be passed internally where needed, so remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: remove genl_unregister_mc_group()Johannes Berg2013-11-191-23/+0
| | | | | | | | There are no users of this API remaining, and we'll soon change group registration to be static (like ops are now) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* quota/genetlink: use proper genetlink multicast APIsJohannes Berg2013-11-191-2/+8
| | | | | | | | | | | | | | | The quota code is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make the quota code use the correct API, but since this is already used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* drop_monitor/genetlink: use proper genetlink multicast APIsJohannes Berg2013-11-191-3/+10
| | | | | | | | | | | | | | | | The drop monitor code is abusing the genetlink API and is statically using the generic netlink multicast group 1, even if that group belongs to somebody else (which it invariably will, since it's not reserved.) Make the drop monitor code use the proper APIs to reserve a group ID, but also reserve the group id 1 in generic netlink code to preserve the userspace API. Since drop monitor can be a module, don't clear the bit for it on unregistration. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: only pass array to genl_register_family_with_ops()Johannes Berg2013-11-191-6/+8
| | | | | | | | | | | | | | As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: fix documentation typo in netlink_set_err()Johannes Berg2013-11-191-1/+1
| | | | | | | The parameter is just 'group', not 'groups', fix the documentation typo. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: rename shadowed variableJohannes Berg2013-11-181-5/+5
| | | | | | | | | Sparse pointed out that the new flags variable I had added shadowed an existing one, rename the new one to avoid that, making the code clearer. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: unify registration functionsJohannes Berg2013-11-151-37/+16
| | | | | | | | | | | | Now that the ops assignment is just two variables rather than a long list iteration etc., there's no reason to separately export __genl_register_family() and __genl_register_family_with_ops(). Unify the two functions into __genl_register_family() and make genl_register_family_with_ops() call it after assigning the ops. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: allow making ops constJohannes Berg2013-11-141-15/+21
| | | | | | | | | | | | Allow making the ops array const by not modifying the ops flags on registration but rather only when ops are sent out in the family information. No users are updated yet except for the pre_doit/post_doit calls in wireless (the only ones that exist now.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: register family ops as arrayJohannes Berg2013-11-141-45/+33
| | | | | | | | | | | | | | Instead of using a linked list, use an array. This reduces the data size needed by the users of genetlink, for example in wireless (net/wireless/nl80211.c) on 64-bit it frees up over 1K of data space. Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns -EINVAL anyway, therefore no such event could ever be sent. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: remove genl_register_ops/genl_unregister_opsJohannes Berg2013-11-141-56/+1
| | | | | | | | genl_register_ops() is still needed for internal registration, but is no longer available to users of the API. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: netlink: filter particular protocols from analyzersDaniel Borkmann2013-09-061-0/+30
| | | | | | | | | | | | Fix finer-grained control and let only a whitelist of allowed netlink protocols pass, in our case related to networking. If later on, other subsystems decide they want to add their protocol as well to the list of allowed protocols they shall simply add it. While at it, we also need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can not pick it up (as it's not filled out). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-09-051-12/+55
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c net/bridge/br_multicast.c net/ipv6/sit.c The conflicts were minor: 1) sit.c changes overlap with change to ip_tunnel_xmit() signature. 2) br_multicast.c had an overlap between computing max_delay using msecs_to_jiffies and turning MLDV2_MRC() into an inline function with a name using lowercase instead of uppercase letters. 3) stmmac had two overlapping changes, one which conditionally allocated and hooked up a dma_cfg based upon the presence of the pbl OF property, and another one handling store-and-forward DMA made. The latter of which should not go into the new of_find_property() basic block. Signed-off-by: David S. Miller <davem@davemloft.net>
| * genl: Hold reference on correct module while netlink-dump.Pravin B Shelar2013-08-281-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | netlink dump operations take module as parameter to hold reference for entire netlink dump duration. Currently it holds ref only on genl module which is not correct when we use ops registered to genl from another module. Following patch adds module pointer to genl_ops so that netlink can hold ref count on it. CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * genl: Fix genl dumpit() locking.Pravin B Shelar2013-08-281-5/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of genl-family with parallel ops off, dumpif() callback is expected to run under genl_lock, But commit def3117493eafd9df (genl: Allow concurrent genl callbacks.) changed this behaviour where only first dumpit() op was called under genl-lock. For subsequent dump, only nlk->cb_lock was taken. Following patch fixes it by defining locked dumpit() and done() callback which takes care of genl-locking. CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-261-7/+0
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Revert "genetlink: fix family dump race"Johannes Berg2013-08-221-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 58ad436fcf49810aa006016107f494c9ac9013db. It turns out that the change introduced a potential deadlock by causing a locking dependency with netlink's cb_mutex. I can't seem to find a way to resolve this without doing major changes to the locking, so revert this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-161-0/+7
|\ \ | |/
| * genetlink: fix family dump raceJohannes Berg2013-08-131-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When dumping generic netlink families, only the first dump call is locked with genl_lock(), which protects the list of families, and thus subsequent calls can access the data without locking, racing against family addition/removal. This can cause a crash. Fix it - the locking needs to be conditional because the first time around it's already locked. A similar bug was reported to me on an old kernel (3.4.47) but the exact scenario that happened there is no longer possible, on those kernels the first round wasn't locked either. Looking at the current code I found the race described above, which had also existed on the old kernel. Cc: stable@vger.kernel.org Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: Eliminate kmalloc in netlink dump operation.Pravin B Shelar2013-08-152-56/+44
| | | | | | | | | | | | | | | | | | | | | | Following patch stores struct netlink_callback in netlink_sock to avoid allocating and freeing it on every netlink dump msg. Only one dump operation is allowed for a given socket at a time therefore we can safely convert cb pointer to cb struct inside netlink_sock. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-031-1/+3
|\ \ | |/ | | | | | | | | | | Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACEPablo Neira2013-07-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, it is not possible to use neither NLM_F_EXCL nor NLM_F_REPLACE from genetlink. This is due to this checking in genl_family_rcv_msg: if (nlh->nlmsg_flags & NLM_F_DUMP) NLM_F_DUMP is NLM_F_MATCH|NLM_F_ROOT. Thus, if NLM_F_EXCL or NLM_F_REPLACE flag is set, genetlink believes that you're requesting a dump and it calls the .dumpit callback. The solution that I propose is to refine this checking to make it stricter: if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) And given the combination NLM_F_REPLACE and NLM_F_EXCL does not make sense to me, it removes the ambiguity. There was a patch that tried to fix this some time ago (0ab03c2 netlink: test for all flags of the NLM_F_DUMP composite) but it tried to resolve this ambiguity in *all* existing netlink subsystems, not only genetlink. That patch was reverted since it broke iproute2, which is using NLM_F_ROOT to request the dump of the routing cache. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * genetlink: release cb_lock before requesting additional moduleStanislaw Gruszka2013-07-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Requesting external module with cb_lock taken can result in the deadlock like showed below: [ 2458.111347] Showing all locks held in the system: [ 2458.111347] 1 lock held by NetworkManager/582: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40 [ 2458.111347] 1 lock held by modprobe/603: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2461.579457] SysRq : Show Blocked State [ 2461.580103] task PC stack pid father [ 2461.580103] NetworkManager D ffff880034b84500 4040 582 1 0x00000080 [ 2461.580103] ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8 [ 2461.580103] ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff [ 2461.580103] ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360 [ 2461.580103] [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140 [ 2461.580103] [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170 [ 2461.580103] [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20 [ 2461.580103] [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210 [ 2461.580103] [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170 [ 2461.580103] [<ffffffff81095cc3>] __request_module+0x1b3/0x370 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190 [ 2461.580103] [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0 [ 2461.580103] [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0 [ 2461.580103] [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0 [ 2461.580103] [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0 [ 2461.580103] [<ffffffff8162bc88>] genl_rcv+0x28/0x40 [ 2461.580103] [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190 [ 2461.580103] [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750 [ 2461.580103] [<ffffffff815db849>] sock_sendmsg+0x99/0xd0 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350 [ 2461.580103] [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0 [ 2461.580103] [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50 [ 2461.580103] [<ffffffff810218b9>] ? sched_clock+0x9/0x10 [ 2461.580103] [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80 [ 2461.580103] [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100 [ 2461.580103] [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0 [ 2461.580103] [<ffffffff8120fec9>] ? fget_light+0xf9/0x510 [ 2461.580103] [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510 [ 2461.580103] [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80 [ 2461.580103] [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] modprobe D ffff88000f2c8000 4632 603 602 0x00000080 [ 2461.580103] ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8 [ 2461.580103] ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500 [ 2461.580103] ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0 [ 2461.580103] [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0 [ 2461.580103] [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20 [ 2461.580103] [<ffffffff8173492d>] ? down_write+0x9d/0xb2 [ 2461.580103] [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211] [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211] [ 2461.580103] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 [ 2461.580103] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50 [ 2461.580103] [<ffffffff810f75af>] load_module+0x1c6f/0x27f0 [ 2461.580103] [<ffffffff810f2c90>] ? store_uevent+0x40/0x40 [ 2461.580103] [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1 Problem start to happen after adding net-pf-16-proto-16-family-nl80211 alias name to cfg80211 module by below commit (though that commit itself is perfectly fine): commit fb4e156886ce6e8309e912d8b370d192330d19d3 Author: Marcel Holtmann <marcel@holtmann.org> Date: Sun Apr 28 16:22:06 2013 -0700 nl80211: Add generic netlink module alias for cfg80211/nl80211 Reported-and-tested-by: Jeff Layton <jlayton@redhat.com> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Reviewed-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: netlink: minor: remove unused pointer in alloc_pg_vecDaniel Borkmann2013-08-021-2/+2
|/ | | | | | | Variable ptr is being assigned, but never used, so just remove it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: fix splat in skb_clone with large messagesPablo Neira2013-06-271-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since (c05cdb1 netlink: allow large data transfers from user-space), netlink splats if it invokes skb_clone on large netlink skbs since: * skb_shared_info was not correctly initialized. * skb->destructor is not set in the cloned skb. This was spotted by trinity: [ 894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001 [ 894.991034] IP: [<ffffffff81a212c4>] skb_clone+0x24/0xc0 [...] [ 894.991034] Call Trace: [ 894.991034] [<ffffffff81ad299a>] nl_fib_input+0x6a/0x240 [ 894.991034] [<ffffffff81c3b7e6>] ? _raw_read_unlock+0x26/0x40 [ 894.991034] [<ffffffff81a5f189>] netlink_unicast+0x169/0x1e0 [ 894.991034] [<ffffffff81a601e1>] netlink_sendmsg+0x251/0x3d0 Fix it by: 1) introducing a new netlink_skb_clone function that is used in nl_fib_input, that sets our special skb->destructor in the cloned skb. Moreover, handle the release of the large cloned skb head area in the destructor path. 2) not allowing large skbuffs in the netlink broadcast path. I cannot find any reasonable use of the large data transfer using netlink in that path, moreover this helps to skip extra skb_clone handling. I found two more netlink clients that are cloning the skbs, but they are not in the sendmsg path. Therefore, the sole client cloning that I found seems to be the fib frontend. Thanks to Eric Dumazet for helping to address this issue. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: netlink: virtual tap device managementDaniel Borkmann2013-06-241-0/+107
| | | | | | | | | | | | | Similarly to the networking receive path with ptype_all taps, we add the possibility to register netdevices that are for ARPHRD_NETLINK to the netlink subsystem, so that those can be used for netlink analyzers resp. debuggers. We do not offer a direct callback function as out-of-tree modules could do crap with it. Instead, a netdevice must be registered properly and only receives a clone, managed by the netlink layer. Symbols are exported as GPL-only. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-06-191-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/wireless/ath/ath9k/Kconfig drivers/net/xen-netback/netback.c net/batman-adv/bat_iv_ogm.c net/wireless/nl80211.c The ath9k Kconfig conflict was a change of a Kconfig option name right next to the deletion of another option. The xen-netback conflict was overlapping changes involving the handling of the notify list in xen_netbk_rx_action(). Batman conflict resolution provided by Antonio Quartulli, basically keep everything in both conflict hunks. The nl80211 conflict is a little more involved. In 'net' we added a dynamic memory allocation to nl80211_dump_wiphy() to fix a race that Linus reported. Meanwhile in 'net-next' the handlers were converted to use pre and post doit handlers which use a flag to determine whether to hold the RTNL mutex around the operation. However, the dump handlers to not use this logic. Instead they have to explicitly do the locking. There were apparent bugs in the conversion of nl80211_dump_wiphy() in that we were not dropping the RTNL mutex in all the return paths, and it seems we very much should be doing so. So I fixed that whilst handling the overlapping changes. To simplify the initial returns, I take the RTNL mutex after we try to allocate 'tb'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * netlink: fix error propagation in netlink_mmap()Patrick McHardy2013-06-111-1/+1
| | | | | | | | | | | | | | | | Return the error if something went wrong instead of unconditionally returning 0. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: make compare exist all the timeGao feng2013-06-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit da12c90e099789a63073fc82a19542ce54d4efb9 "netlink: Add compare function for netlink_table" only set compare at the time we create kernel netlink, and reset compare to NULL at the time we finially release netlink socket, but netlink_lookup wants the compare exist always. So we should set compare after we allocate nl_table, and never reset it. make comapre exist all the time. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: Add compare function for netlink_tableGao feng2013-06-112-8/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we know, netlink sockets are private resource of net namespace, they can communicate with each other only when they in the same net namespace. this works well until we try to add namespace support for other subsystems which use netlink. Don't like ipv4 and route table.., it is not suited to make these subsytems belong to net namespace, Such as audit and crypto subsystems,they are more suitable to user namespace. So we must have the ability to make the netlink sockets in same user namespace can communicate with each other. This patch adds a new function pointer "compare" for netlink_table, we can decide if the netlink sockets can communicate with each other through this netlink_table self-defined compare function. The behavior isn't changed if we don't provide the compare function for netlink_table. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: allow large data transfers from user-spacePablo Neira Ayuso2013-06-071-2/+35
|/ | | | | | | | | | | | | | | | | | | | | | I can hit ENOBUFS in the sendmsg() path with a large batch that is composed of many netlink messages. Here that limit is 8 MBytes of skbuff data area as kmalloc does not manage to get more than that. While discussing atomic rule-set for nftables with Patrick McHardy, we decided to put all rule-set updates that need to be applied atomically in one single batch to simplify the existing approach. However, as explained above, the existing netlink code limits us to a maximum of ~20000 rules that fit in one single batch without hitting ENOBUFS. iptables does not have such limitation as it is using vmalloc. This patch adds netlink_alloc_large_skb() which is only used in the netlink_sendmsg() path. It uses alloc_skb if the memory requested is <= one memory page, that should be the common case for most subsystems, else vmalloc for higher memory allocations. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix sk_buff head without data areaPablo Neira2013-06-041-1/+1
| | | | | | | | | | | | | | | | | Eric Dumazet spotted that we have to check skb->head instead of skb->data as skb->head points to the beginning of the data area of the skbuff. Similarly, we have to initialize the skb->head pointer, not skb->data in __alloc_skb_head. After this fix, netlink crashes in the release path of the sk_buff, so let's fix that as well. This bug was introduced in (0ebd0ac net: add function to allocate sk_buff head without data area). Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: kconfig: move mmap i/o into netlink kconfigDaniel Borkmann2013-05-011-0/+9
| | | | | | | | | | | | | | | | | Currently, in menuconfig, Netlink's new mmaped IO is the very first entry under the ``Networking support'' item and comes even before ``Networking options'': [ ] Netlink: mmaped IO Networking options ---> ... Lets move this into ``Networking options'' under netlink's Kconfig, since this might be more appropriate. Introduced by commit ccdfcc398 (``netlink: mmaped netlink: ring setup''). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: Fix skb ref counting.Pravin B Shelar2013-05-011-1/+0
| | | | | | | | | | | | Commit f9c2288837ba072b21dba955f04a4c97eaa77b1e (netlink: implement memory mapped recvmsg) increamented skb->users ref count twice for a dump op which does not look right. Following patch fixes that. CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: fix possible memory leak in genl_family_rcv_msg()Wei Yongjun2013-04-261-2/+3
| | | | | | | | | | | | 'attrbuf' is malloced in genl_family_rcv_msg() when family->maxattr && family->parallel_ops, thus should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Introduced by commit def3117493eafd9dfa1f809d861e0031b2cc8a07 (genl: Allow concurrent genl callbacks.) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* genl: Allow concurrent genl callbacks.Pravin B Shelar2013-04-251-38/+76
| | | | | | | | | | | | | | | | All genl callbacks are serialized by genl-mutex. This can become bottleneck in multi threaded case. Following patch adds an parameter to genl_family so that a particular family can get concurrent netlink callback without genl_lock held. New rw-sem is used to protect genl callback from genl family unregister. in case of parallel_ops genl-family read-lock is taken for callbacks and write lock is taken for register or unregistration for any family. In case of locked genl family semaphore and gel-mutex is locked for any openration. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: fix compilation after memory mapped patchesNicolas Dichtel2013-04-241-4/+4
| | | | | | | | | | | | | | | | | | | Depending of the kernel configuration (CONFIG_UIDGID_STRICT_TYPE_CHECKS), we can get the following errors: net/netlink/af_netlink.c: In function ‘netlink_queue_mmaped_skb’: net/netlink/af_netlink.c:663:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kuid_t’ net/netlink/af_netlink.c:664:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kgid_t’ net/netlink/af_netlink.c: In function ‘netlink_ring_set_copied’: net/netlink/af_netlink.c:693:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kuid_t’ net/netlink/af_netlink.c:694:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kgid_t’ We must use the helpers to get the uid and gid, and also take care of user_ns. Fix suggested by Eric W. Biederman <ebiederm@xmission.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: Fix build with mmap disabled.David S. Miller2013-04-231-0/+7
| | | | | | | | | | | net/netlink/diag.c: In function 'sk_diag_put_rings_cfg': net/netlink/diag.c:28:17: error: 'struct netlink_sock' has no member named 'pg_vec_lock' net/netlink/diag.c:29:29: error: 'struct netlink_sock' has no member named 'rx_ring' net/netlink/diag.c:31:30: error: 'struct netlink_sock' has no member named 'tx_ring' net/netlink/diag.c:33:19: error: 'struct netlink_sock' has no member named 'pg_vec_lock' Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: fix typo in net/netlink/af_netlink.cStephen Rothwell2013-04-231-1/+1
| | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: add RX/TX-ring support to netlink diagPatrick McHardy2013-04-191-0/+32
| | | | | | | Based on AF_PACKET. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: add flow control for memory mapped I/OPatrick McHardy2013-04-191-27/+61
| | | | | | | | | | Add flow control for memory mapped RX. Since user-space usually doesn't invoke recvmsg() when using memory mapped I/O, flow control is performed in netlink_poll(). Dumps are allowed to continue if at least half of the ring frames are unused. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: implement memory mapped recvmsg()Patrick McHardy2013-04-191-3/+141
| | | | | | | | | | | | | | | | | Add support for mmap'ed recvmsg(). To allow the kernel to construct messages into the mapped area, a dataless skb is allocated and the data pointer is set to point into the ring frame. This means frames will be delivered to userspace in order of allocation instead of order of transmission. This usually doesn't matter since the order is either not determinable by userspace or message creation/transmission is serialized. The only case where this can have a visible difference is nfnetlink_queue. Userspace can't assume mmap'ed messages have ordered IDs anymore and needs to check this if using batched verdicts. For non-mapped sockets, nothing changes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: implement memory mapped sendmsg()Patrick McHardy2013-04-191-6/+129
| | | | | | | | | | | | Add support for mmap'ed sendmsg() to netlink. Since the kernel validates received messages before processing them, the code makes sure userspace can't modify the message contents after invoking sendmsg(). To do that only a single mapping of the TX ring is allowed to exist and the socket must not be shared. If either of these two conditions does not hold, it falls back to copying. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: add mmap'ed netlink helper functionsPatrick McHardy2013-04-191-2/+183
| | | | | | | | | Add helper functions for looking up mmap'ed frame headers, reading and writing their status, allocating skbs with mmap'ed data areas and a poll function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: mmaped netlink: ring setupPatrick McHardy2013-04-192-2/+286
| | | | | | | | | Add support for mmap'ed RX and TX ring setup and teardown based on the af_packet.c code. The following patches will use this to add the real mmap'ed receive and transmit functionality. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: add netlink_skb_set_owner_r()Patrick McHardy2013-04-191-3/+17
| | | | | | | | | | | | | | | | For mmap'ed I/O a netlink specific skb destructor needs to be invoked after the final kfree_skb() to clean up state. This doesn't work currently since the skb's ownership is transfered to the receiving socket using skb_set_owner_r(), which orphans the skb, thereby invoking the destructor prematurely. Since netlink doesn't account skbs to the originating socket, there's no need to orphan the skb. Add a netlink specific skb_set_owner_r() variant that does not orphan the skb and use a netlink specific destructor to call sock_rfree(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud