summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | | ethtool: fix a memory leak in ethnl_default_start()Dan Carpenter2020-01-081-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ethnl_default_parse() fails then we need to free a couple memory allocations before returning. Fixes: 728480f12442 ("ethtool: default handlers for GET requests") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | net: dsa: Get information about stacked DSA protocolFlorian Fainelli2020-01-083-5/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to stack multiple DSA switches in a way that they are not part of the tree (disjoint) but the DSA master of a switch is a DSA slave of another. When that happens switch drivers may have to know this is the case so as to determine whether their tagging protocol has a remove chance of working. This is useful for specific switch drivers such as b53 where devices have been known to be stacked in the wild without the Broadcom tag protocol supporting that feature. This allows b53 to continue supporting those devices by forcing the disabling of Broadcom tags on the outermost switches if necessary. The get_tag_protocol() function is therefore updated to gain an additional enum dsa_tag_protocol argument which denotes the current tagging protocol used by the DSA master we are attached to, else DSA_TAG_PROTO_NONE for the top of the dsa_switch_tree. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | devlink: add devink notification when reporter update health stateVikas Gupta2020-01-081-17/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | add a devlink notification when reporter update the health state. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | devlink: add support for reporter recovery completionVikas Gupta2020-01-081-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible that a reporter recovery completion do not finish successfully when recovery is triggered via devlink_health_reporter_recover as recovery could be processed in different context. In such scenario an error is returned by driver when recover hook is invoked and successful recovery completion is intimated later. Expose devlink recover done API to update recovery stats. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | socket: fix unused-function warningArnd Bergmann2020-01-081-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When procfs is disabled, the fdinfo code causes a harmless warning: net/socket.c:1000:13: error: 'sock_show_fdinfo' defined but not used [-Werror=unused-function] static void sock_show_fdinfo(struct seq_file *m, struct file *f) Move the function definition up so we can use a single #ifdef around it. Fixes: b4653342b151 ("net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | hsr: fix dummy hsr_debugfs_rename() declarationArnd Bergmann2020-01-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hsr_debugfs_rename prototype got an extra 'void' that needs to be removed again: In file included from /git/arm-soc/net/hsr/hsr_main.c:12: net/hsr/hsr_main.h:194:20: error: two or more data types in declaration specifiers static inline void void hsr_debugfs_rename(struct net_device *dev) Fixes: 4c2d5e33dcd3 ("hsr: rename debugfs file when interface name is changed") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | net/ncsi: Send device address as source addressVijay Khemka2020-01-081-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After receiving device mac address from device, send this as a source address for further commands instead of broadcast address. This will help in multi host NIC cards. Signed-off-by: Vijay Khemka <vijaykhemka@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | net/rose: remove redundant assignment to variable failedColin Ian King2020-01-071-1/+0
| | |_|_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The variable failed is being assigned a value that is never read, the following goto statement jumps to the end of the function and variable failed is not referenced at all. Remove the redundant assignment. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: dsa: Pass pcs_poll flag from driver to PHYLINKVladimir Oltean2020-01-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DSA drivers that implement .phylink_mac_link_state should normally register an interrupt for the PCS, from which they should call phylink_mac_change(). However not all switches implement this, and those who don't should set this flag in dsa_switch in the .setup callback, so that PHYLINK will poll for a few ms until the in-band AN link timer expires and the PCS state settles. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: dsa: tag_sja1105: Slightly improve the Xmas tree in sja1105_xmitVladimir Oltean2020-01-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a cosmetic patch that makes the dp, tx_vid, queue_mapping and pcp local variable definitions a bit closer in length, so they don't look like an eyesore as much. The 'ds' variable is not used otherwise, except for ds->dp. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: dsa: Make deferred_xmit private to sja1105Vladimir Oltean2020-01-053-39/+15
| | |_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 3 things that are wrong with the DSA deferred xmit mechanism: 1. Its introduction has made the DSA hotpath ever so slightly more inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs to be initialized to false for every transmitted frame, in order to figure out whether the driver requested deferral or not (a very rare occasion, rare even for the only driver that does use this mechanism: sja1105). That was necessary to avoid kfree_skb from freeing the skb. 2. Because L2 PTP is a link-local protocol like STP, it requires management routes and deferred xmit with this switch. But as opposed to STP, the deferred work mechanism needs to schedule the packet rather quickly for the TX timstamp to be collected in time and sent to user space. But there is no provision for controlling the scheduling priority of this deferred xmit workqueue. Too bad this is a rather specific requirement for a feature that nobody else uses (more below). 3. Perhaps most importantly, it makes the DSA core adhere a bit too much to the NXP company-wide policy "Innovate Where It Doesn't Matter". The sja1105 is probably the only DSA switch that requires some frames sent from the CPU to be routed to the slave port via an out-of-band configuration (register write) rather than in-band (DSA tag). And there are indeed very good reasons to not want to do that: if that out-of-band register is at the other end of a slow bus such as SPI, then you limit that Ethernet flow's throughput to effectively the throughput of the SPI bus. So hardware vendors should definitely not be encouraged to design this way. We do _not_ want more widespread use of this mechanism. Luckily we have a solution for each of the 3 issues: For 1, we can just remove that variable in the skb->cb and counteract the effect of kfree_skb with skb_get, much to the same effect. The advantage, of course, being that anybody who doesn't use deferred xmit doesn't need to do any extra operation in the hotpath. For 2, we can create a kernel thread for each port's deferred xmit work. If the user switch ports are named swp0, swp1, swp2, the kernel threads will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15 character length limit on kernel thread names). With this, the user can change the scheduling priority with chrt $(pidof swp2_xmit). For 3, we can actually move the entire implementation to the sja1105 driver. So this patch deletes the generic implementation from the DSA core and adds a new one, more adequate to the requirements of PTP TX timestamping, in sja1105_main.c. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | l2tp: Remove redundant BUG_ON() check in l2tp_pernetXu Wang2020-01-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Passing NULL to l2tp_pernet causes a crash via BUG_ON. Dereferencing net in net_generic() also has the same effect. This patch removes the redundant BUG_ON check on the same parameter. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: Remove redundant BUG_ON() check in phonet_pernetXu Wang2020-01-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Passing NULL to phonet_pernet causes a crash via BUG_ON. Dereferencing net in net_generic() also has the same effect. This patch removes the redundant BUG_ON check on the same parameter. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: remove the check argument from __skb_gro_checksum_convertLi RongQing2020-01-033-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The argument is always ignored, so remove it. Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ethtool: remove set but not used variable 'lsettings'YueHaibing2020-01-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes gcc '-Wunused-but-set-variable' warning: net/ethtool/linkmodes.c: In function 'ethnl_set_linkmodes': net/ethtool/linkmodes.c:326:32: warning: variable 'lsettings' set but not used [-Wunused-but-set-variable] struct ethtool_link_settings *lsettings; ^ It is never used, so remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | tcp: use REXMIT_NEW instead of magic numberMao Wenan2020-01-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | REXMIT_NEW is a macro for "FRTO-style transmit of unsent/new packets", this patch makes it more readable. Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: Add device index to tcp_md5sigDavid Ahern2020-01-022-1/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for userspace to specify a device index to limit the scope of an entry via the TCP_MD5SIG_EXT setsockopt. The existing __tcpm_pad is renamed to tcpm_ifindex and the new field is only checked if the new TCP_MD5SIG_FLAG_IFINDEX is set in tcpm_flags. For now, the device index must point to an L3 master device (e.g., VRF). The API and error handling are setup to allow the constraint to be relaxed in the future to any device index. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | tcp: Add l3index to tcp_md5sig_key and md5 functionsDavid Ahern2020-01-022-39/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add l3index to tcp_md5sig_key to represent the L3 domain of a key, and add l3index to tcp_md5_do_add and tcp_md5_do_del to fill in the key. With the key now based on an l3index, add the new parameter to the lookup functions and consider the l3index when looking for a match. The l3index comes from the skb when processing ingress packets leveraging the helpers created for socket lookups, tcp_v4_sdif and inet_iif (and the v6 variants). When the sdif index is set it means the packet ingressed a device that is part of an L3 domain and inet_iif points to the VRF device. For egress, the L3 domain is determined from the socket binding and sk_bound_dev_if. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv4/tcp: Pass dif and sdif to tcp_v4_inbound_md5_hashDavid Ahern2020-01-021-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original ingress device index is saved to the cb space of the skb and the cb is moved during tcp processing. Since tcp_v4_inbound_md5_hash can be called before and after the cb move, pass dif and sdif to it so the caller can save both prior to the cb move. Both are used by a later patch. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv6/tcp: Pass dif and sdif to tcp_v6_inbound_md5_hashDavid Ahern2020-01-021-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original ingress device index is saved to the cb space of the skb and the cb is moved during tcp processing. Since tcp_v6_inbound_md5_hash can be called before and after the cb move, pass dif and sdif to it so the caller can save both prior to the cb move. Both are used by a later patch. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv4/tcp: Use local variable for tcp_md5_addrDavid Ahern2020-01-021-17/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract the typecast to (union tcp_md5_addr *) to a local variable rather than the current long, inline declaration with function calls. No functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | page_pool: help compiler remove code in case CONFIG_NUMA=nJesper Dangaard Brouer2020-01-021-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kernel is compiled without NUMA support, then page_pool NUMA config setting (pool->p.nid) doesn't make any practical sense. The compiler cannot see that it can remove the code paths. This patch avoids reading pool->p.nid setting in case of !CONFIG_NUMA, in allocation and numa check code, which helps compiler to see the optimisation potential. It leaves update code intact to keep API the same. $ ./scripts/bloat-o-meter net/core/page_pool.o-numa-enabled \ net/core/page_pool.o-numa-disabled add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-113 (-113) Function old new delta page_pool_create 401 398 -3 __page_pool_alloc_pages_slow 439 426 -13 page_pool_refill_alloc_cache 425 328 -97 Total: Before=3611, After=3498, chg -3.13% Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | page_pool: handle page recycle for NUMA_NO_NODE conditionJesper Dangaard Brouer2020-01-021-19/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check in pool_page_reusable (page_to_nid(page) == pool->p.nid) is not valid if page_pool was configured with pool->p.nid = NUMA_NO_NODE. The goal of the NUMA changes in commit d5394610b1ba ("page_pool: Don't recycle non-reusable pages"), were to have RX-pages that belongs to the same NUMA node as the CPU processing RX-packet during softirq/NAPI. As illustrated by the performance measurements. This patch moves the NAPI checks out of fast-path, and at the same time solves the NUMA_NO_NODE issue. First realize that alloc_pages_node() with pool->p.nid = NUMA_NO_NODE will lookup current CPU nid (Numa ID) via numa_mem_id(), which is used as the the preferred nid. It is only in rare situations, where e.g. NUMA zone runs dry, that page gets doesn't get allocated from preferred nid. The page_pool API allows drivers to control the nid themselves via controlling pool->p.nid. This patch moves the NAPI check to when alloc cache is refilled, via dequeuing/consuming pages from the ptr_ring. Thus, we can allow placing pages from remote NUMA into the ptr_ring, as the dequeue/consume step will check the NUMA node. All current drivers using page_pool will alloc/refill RX-ring from same CPU running softirq/NAPI process. Drivers that control the nid explicitly, also use page_pool_update_nid when changing nid runtime. To speed up transision to new nid the alloc cache is now flushed on nid changes. This force pages to come from ptr_ring, which does the appropate nid check. For the NUMA_NO_NODE case, when a NIC IRQ is moved to another NUMA node, we accept that transitioning the alloc cache doesn't happen immediately. The preferred nid change runtime via consulting numa_mem_id() based on the CPU processing RX-packets. Notice, to avoid stressing the page buddy allocator and avoid doing too much work under softirq with preempt disabled, the NUMA check at ptr_ring dequeue will break the refill cycle, when detecting a NUMA mismatch. This will cause a slower transition, but its done on purpose. Fixes: d5394610b1ba ("page_pool: Don't recycle non-reusable pages") Reported-by: Li RongQing <lirongqing@baidu.com> Reported-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2019-12-3141-295/+330
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simple overlapping changes in bpf land wrt. bpf_helper_defs.h handling. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net/ncsi: Fix gma flag setting after responseVijay Khemka2019-12-302-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gma_flag was set at the time of GMA command request but it should only be set after getting successful response. Movinng this flag setting in GMA response handler. This flag is used mainly for not repeating GMA command once received MAC address. Signed-off-by: Vijay Khemka <vijaykhemka@fb.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | sctp: add enabled check for path tracepoint loop.Kevin Kou2019-12-301-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sctp_outq_sack is the main function handles SACK, it is called very frequently. As the commit "move trace_sctp_probe_path into sctp_outq_sack" added below code to this function, sctp tracepoint is disabled most of time, but the loop of transport list will be always called even though the tracepoint is disabled, this is unnecessary. + /* SCTP path tracepoint for congestion control debugging. */ + list_for_each_entry(transport, transport_list, transports) { + trace_sctp_probe_path(transport, asoc); + } This patch is to add tracepoint enabled check at outside of the loop of transport list, and avoid traversing the loop when trace is disabled, it is a small optimization. Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | tcp_cubic: refactor code to perform a divide only when neededEric Dumazet2019-12-301-23/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Neal Cardwell suggested to not change ca->delay_min and apply the ack delay cushion only when Hystart ACK train is still under consideration. This should avoid a 64bit divide unless needed. Tested: 40Gbit(mlx4) testbed (with sch_fq as packet scheduler) $ echo -n 'file tcp_cubic.c +p' >/sys/kernel/debug/dynamic_debug/control $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpaa24 -l -4000000; done;nstat|egrep "Hystart" 14815 16280 15293 15563 11574 15145 14789 18548 16972 12520 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 1396 0.0 $ dmesg | tail -10 [ 4873.951350] hystart_ack_train (116 > 93) delay_min 24 (+ ack_delay 69) cwnd 80 [ 4875.155379] hystart_ack_train (55 > 50) delay_min 21 (+ ack_delay 29) cwnd 160 [ 4876.333921] hystart_ack_train (69 > 62) delay_min 23 (+ ack_delay 39) cwnd 130 [ 4877.519037] hystart_ack_train (69 > 60) delay_min 22 (+ ack_delay 38) cwnd 130 [ 4878.701559] hystart_ack_train (87 > 63) delay_min 24 (+ ack_delay 39) cwnd 160 [ 4879.844597] hystart_ack_train (93 > 50) delay_min 21 (+ ack_delay 29) cwnd 216 [ 4880.956650] hystart_ack_train (74 > 67) delay_min 20 (+ ack_delay 47) cwnd 108 [ 4882.098500] hystart_ack_train (61 > 57) delay_min 23 (+ ack_delay 34) cwnd 130 [ 4883.262056] hystart_ack_train (72 > 67) delay_min 21 (+ ack_delay 46) cwnd 130 [ 4884.418760] hystart_ack_train (74 > 67) delay_min 29 (+ ack_delay 38) cwnd 152 10Gbit(bnx2x) testbed (with sch_fq as packet scheduler) $ echo -n 'file tcp_cubic.c +p' >/sys/kernel/debug/dynamic_debug/control $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpk52 -l -4000000; done;nstat|egrep "Hystart" 7050 7065 7100 6900 7202 7263 7189 6869 7463 7034 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 3199 0.0 $ dmesg | tail -10 [ 176.920012] hystart_ack_train (161 > 141) delay_min 83 (+ ack_delay 58) cwnd 264 [ 179.144645] hystart_ack_train (164 > 159) delay_min 120 (+ ack_delay 39) cwnd 444 [ 181.354527] hystart_ack_train (214 > 168) delay_min 125 (+ ack_delay 43) cwnd 436 [ 183.539565] hystart_ack_train (170 > 147) delay_min 96 (+ ack_delay 51) cwnd 326 [ 185.727309] hystart_ack_train (177 > 160) delay_min 61 (+ ack_delay 99) cwnd 128 [ 187.947142] hystart_ack_train (184 > 167) delay_min 123 (+ ack_delay 44) cwnd 367 [ 190.166680] hystart_ack_train (230 > 153) delay_min 116 (+ ack_delay 37) cwnd 444 [ 192.327285] hystart_ack_train (210 > 206) delay_min 86 (+ ack_delay 120) cwnd 152 [ 194.511392] hystart_ack_train (173 > 151) delay_min 94 (+ ack_delay 57) cwnd 239 [ 196.736023] hystart_ack_train (149 > 146) delay_min 105 (+ ack_delay 41) cwnd 399 Fixes: 42f3a8aaae66 ("tcp_cubic: tweak Hystart detection for short RTT flows") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Link: https://www.spinics.net/lists/netdev/msg621886.html Link: https://www.spinics.net/lists/netdev/msg621797.html Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2019-12-305-144/+352
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Remove #ifdef pollution around nf_ingress(), from Lukas Wunner. 2) Document ingress hook in netdevice, also from Lukas. 3) Remove htons() in tunnel metadata port netlink attributes, from Xin Long. 4) Missing erspan netlink attribute validation also from Xin Long. 5) Missing erspan version in tunnel, from Xin Long. 6) Missing attribute nest in NFTA_TUNNEL_KEY_OPTS_{VXLAN,ERSPAN} Patch from Xin Long. 7) Missing nla_nest_cancel() in tunnel netlink dump path, from Xin Long. 8) Remove two exported conntrack symbols with no clients, from Florian Westphal. 9) Add nft_meta_get_eval_time() helper to nft_meta, from Florian. 10) Add nft_meta_pkttype helper for loopback, also from Florian. 11) Add nft_meta_socket uid helper, from Florian Westphal. 12) Add nft_meta_cgroup helper, from Florian. 13) Add nft_meta_ifkind helper, from Florian. 14) Group all interface related meta selector, from Florian. 15) Add nft_prandom_u32() helper, from Florian. 16) Add nft_meta_rtclassid helper, from Florian. 17) Add support for matching on the slave device index, from Florian. This batch, among other things, contains updates for the netfilter tunnel netlink interface: This extension is still incomplete and lacking proper userspace support which is actually my fault, I did not find the time to go back and finish this. This update is breaking tunnel UAPI in some aspects to fix it but do it better sooner than never. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | | | netfilter: nft_meta: add support for slave device ifindex matchingFlorian Westphal2019-12-261-7/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow to match on vrf slave ifindex or name. In case there was no slave interface involved, store 0 in the destination register just like existing iif/oif matching. sdif(name) is restricted to the ipv4/ipv6 input and forward hooks, as it depends on ip(6) stack parsing/storing info in skb->cb[]. Cc: Martin Willi <martin@strongswan.org> Cc: David Ahern <dsahern@kernel.org> Cc: Shrijeet Mukherjee <shrijeet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_meta: place rtclassid handling in a helperFlorian Westphal2019-12-261-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_dst is an inline helper with a WARN_ON(), so this is a bit more code than it looks like. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_meta: place prandom handling in a helperFlorian Westphal2019-12-261-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move this out of the main eval loop, the numgen expression provides a better alternative to meta random. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_meta: move all interface related keys to helperFlorian Westphal2019-12-261-25/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduces repetiveness and reduces size of meta eval function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_meta: move interface kind handling to helperFlorian Westphal2019-12-261-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | checkpatch complains about == NULL checks in original code, so use !in instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_meta: move cgroup handling to helperFlorian Westphal2019-12-261-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce size of main eval function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_meta: move sk uid/git handling to helperFlorian Westphal2019-12-261-29/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not a hot path. Also, both have copy&paste case statements, so use a common helper for both. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_meta: move pkttype handling to helperFlorian Westphal2019-12-261-39/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When pkttype is loopback, nft_meta performs guesswork to detect broad/multicast packets. Place this in a helper, this is hardly a hot path. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_meta: move time handling to helperFlorian Westphal2019-12-261-6/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reduce size of the (large) meta evaluation function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: conntrack: remove two export symbolsFlorian Westphal2019-12-172-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not used anywhere, remove them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_tunnel: add the missing nla_nest_cancel()Xin Long2019-12-171-12/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When nla_put_xxx() fails under nla_nest_start_noflag(), nla_nest_cancel() should be called, so that the skb can be trimmed properly. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_tunnel: also dump OPTS_ERSPAN/VXLANXin Long2019-12-171-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to add the nest attr OPTS_ERSPAN/VXLAN when dumping KEY_OPTS, and it would be helpful when parsing in userpace. Also, this is needed for supporting multiple geneve opts in the future patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_tunnel: also dump ERSPAN_VERSIONXin Long2019-12-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is not necessary, but it'll be easier to parse in userspace, also given that other places like act_tunnel_key, cls_flower and ip_tunnel_core are also doing so. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_tunnel: add the missing ERSPAN_VERSION nla_policyXin Long2019-12-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ERSPAN_VERSION is an attribute parsed in kernel side, nla_policy type should be added for it, like other attributes. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nft_tunnel: no need to call htons() when dumping portsXin Long2019-12-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | info->key.tp_src and tp_dst are __be16, when using nla_put_be16() to dump them, htons() is not needed, so remove it in this patch. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: Clean up unnecessary #ifdefLukas Wunner2019-12-171-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CONFIG_NETFILTER_INGRESS is not enabled, nf_ingress() becomes a no-op because it solely contains an if-clause calling nf_hook_ingress_active(), for which an empty inline stub exists in <linux/netfilter_ingress.h>. All the symbols used in the if-clause's body are still available even if CONFIG_NETFILTER_INGRESS is not enabled. The additional "#ifdef CONFIG_NETFILTER_INGRESS" in nf_ingress() is thus unnecessary, so drop it. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | | | | | net: dsa: Deny PTP on master if switch supports itVladimir Oltean2019-12-281-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to kill PTP on a DSA switch completely and absolutely, until a reboot, with a simple command: tcpdump -i eth2 -j adapter_unsynced where eth2 is the switch's DSA master. Why? Well, in short, the PTP API in place today is a bit rudimentary and relies on applications to retrieve the TX timestamps by polling the error queue and looking at the cmsg structure. But there is no timestamp identification of any sorts (except whether it's HW or SW), you don't know how many more timestamps are there to come, which one is this one, from whom it is, etc. In other words, the SO_TIMESTAMPING API is fundamentally limited in that you can get a single HW timestamp from the stack. And the "-j adapter_unsynced" flag of tcpdump enables hardware timestamping. So let's imagine what happens when the DSA master decides it wants to deliver TX timestamps to the skb's socket too: - The timestamp that the user space sees is taken by the DSA master. Whereas the RX timestamp will eventually be overwritten by the DSA switch. So the RX and TX timestamps will be in different time bases (aka garbage). - The user space applications have no way to deal with the second (real) TX timestamp finally delivered by the DSA switch, or even to know to wait for it. Take ptp4l from the linuxptp project, for example. This is its behavior after running tcpdump, before the patch: ptp4l[172]: [6469.594] Unexpected data on socket err queue: ptp4l[172]: [6469.693] rms 8 max 16 freq -21257 +/- 11 delay 748 +/- 0 ptp4l[172]: [6469.711] Unexpected data on socket err queue: ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 05 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.721] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b1 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.838] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 06 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.848] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 13 02 ptp4l[172]: 0010 00 36 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 04 1a 45 05 7f ptp4l[172]: 0030 00 00 5e 05 41 32 27 c2 1a 68 00 04 9f ff fe 05 ptp4l[172]: 0040 de 06 00 01 ptp4l[172]: [6469.855] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b2 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.974] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 07 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 The ptp4l program itself is heavily patched to show this (more details here [0]). Otherwise, by default it just hangs. On the other hand, with the DSA patch to disallow HW timestamping applied: tcpdump -i eth2 -j adapter_unsynced tcpdump: SIOCSHWTSTAMP failed: Device or resource busy So it is a fact of life that PTP timestamping on the DSA master is incompatible with timestamping on the switch MAC, at least with the current API. And if the switch supports PTP, taking the timestamps from the switch MAC is highly preferable anyway, due to the fact that those don't contain the queuing latencies of the switch. So just disallow PTP on the DSA master if there is any PTP-capable switch attached. [0]: https://sourceforge.net/p/linuxptp/mailman/message/36880648/ Fixes: 0336369d3a4d ("net: dsa: forward hardware timestamping ioctls to switch driver") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | ethtool: provide link state with LINKSTATE_GET requestMichal Kubecek2019-12-277-5/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement LINKSTATE_GET netlink request to get link state information. At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command is returned. LINKSTATE_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | ethtool: add LINKMODES_NTF notificationMichal Kubecek2019-12-273-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Send ETHTOOL_MSG_LINKMODES_NTF notification message whenever device link settings or advertised modes are modified using ETHTOOL_MSG_LINKMODES_SET netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands. The notification message has the same format as reply to LINKMODES_GET request. ETHTOOL_MSG_LINKMODES_SET netlink request only triggers the notification if there is a change but the ioctl command handlers do not check if there is an actual change and trigger the notification whenever the commands are executed. As all work is done by ethnl_default_notify() handler and callback functions introduced to handle LINKMODES_GET requests, all that remains is adding entries for ETHTOOL_MSG_LINKMODES_NTF into ethnl_notify_handlers and ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where needed. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | ethtool: set link modes related data with LINKMODES_SET requestMichal Kubecek2019-12-273-0/+241
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement LINKMODES_SET netlink request to set advertised linkmodes and related attributes as ETHTOOL_SLINKSETTINGS and ETHTOOL_SSET commands do. The request allows setting autonegotiation flag, speed, duplex and advertised link modes. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | ethtool: provide link mode information with LINKMODES_GET requestMichal Kubecek2019-12-274-1/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement LINKMODES_GET netlink request to get link modes related information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands. This request provides supported, advertised and peer advertised link modes, autonegotiation flag, speed and duplex. LINKMODES_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | ethtool: add LINKINFO_NTF notificationMichal Kubecek2019-12-273-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Send ETHTOOL_MSG_LINKINFO_NTF notification message whenever device link settings are modified using ETHTOOL_MSG_LINKINFO_SET netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands. The notification message has the same format as reply to LINKINFO_GET request. ETHTOOL_MSG_LINKINFO_SET netlink request only triggers the notification if there is a change but the ioctl command handlers do not check if there is an actual change and trigger the notification whenever the commands are executed. As all work is done by ethnl_default_notify() handler and callback functions introduced to handle LINKINFO_GET requests, all that remains is adding entries for ETHTOOL_MSG_LINKINFO_NTF into ethnl_notify_handlers and ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where needed. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud