| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until this patch, the number of UPs was hard coded for eight.
Replace this with a variable in struct "mlx4_en_port_profile".
Currently, the variable will hold the maximum number of UP,
as before.
The patch creates an infrastructure to add an option for dynamic
change of the actual number of TCs.
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Tarick Bedeir <tarick@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
Trivial fix to spelling mistake in en_dbg debug message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Add support to mlx4 to report bpf_prog ID during XDP_QUERY_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of having their own NAPIs, XDP TX completion queues get
polled within the corresponding RX NAPI.
This prevents any possible race on TX ring prod/cons indices,
between the context that issues the transmits (RX NAPI) and the
context that handles the completions (was previously done in
a separate NAPI).
This also improves performance, as it decreases the number
of NAPIs running on a CPU, saving the overhead of syncing
and switching between the contexts.
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.
XDP_TX packet rate:
-------------------------------------
| Before | After | Gain |
IPv4 | 12.0 Mpps | 13.8 Mpps | 15% |
IPv6 | 12.0 Mpps | 13.8 Mpps | 15% |
-------------------------------------
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid touching RX QP RSS context when loading with only
one RX ring, to allow optimized A0 RX steering.
Enable by:
- loading mlx4_core with module param: log_num_mgm_entry_size = -6.
- then: ethtool -L <interface> rx 1
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_DROP packet rate:
-------------------------------------
| Before | After | Gain |
IPv4 | 20.5 Mpps | 28.1 Mpps | 37% |
IPv6 | 18.4 Mpps | 28.1 Mpps | 53% |
-------------------------------------
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid
filter and update drivers which can timestamp all packets, or which
explicitly list unsupported filters instead of using a default case, to
handle the filter.
CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The configurable priority to traffic class mapping and the user specified
queue ranges are used to configure the traffic class, overriding the
hardware defaults when the 'hw' option is set to 0. However, when the 'hw'
option is non-zero, the hardware QOS defaults are used.
This patch makes it so that we can pass the data the user provided to
ndo_setup_tc. This allows us to pull in the queue configuration if the
user requested it as well as any additional hardware offload type
requested by using a value other than 1 for the hw value.
Finally it also provides a means for the device driver to return the level
supported for the offload type via the qopt->hw value. Previously we were
just always assuming the value to be 1, in the future values beyond just 1
may be supported.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
Spoofcheck can't be enabled if VF MAC is zero.
Vice versa, can't zero MAC if spoofcheck is on.
Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) In the case where rate == priv->pkt_rate_low == priv->pkt_rate_high,
mlx4_en_auto_moderation() does a divide by zero.
2) We want to properly change the moderation parameters if rx_frames
was changed (like in ethtool -C eth0 rx-frames 16)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| | |
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After calling mlx4_en_try_alloc_resources (e.g. by changing the
number of rx-queues with ethtool -L), the existing xdp_prog becomes
inactive.
The bug is that the xdp_prog ptr has not been carried over from
the old rx-queues to the new rx-queues
Fixes: 47a38e155037 ("net/mlx4_en: add support for fast rx drop bpf program")
Cc: Brenden Blanco <bblanco@plumgrid.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t]
are over-written by src->tx_ring[t] and src->tx_cq[t] without
first calling kfree.
One of the reproducible code paths is by doing 'ethtool -L'.
The fix is to do the kfree in mlx4_en_free_resources().
Here is the kmemleak report:
unreferenced object 0xffff880841211800 (size 2048):
comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81930718>] kmemleak_alloc+0x28/0x50
[<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
[<ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0
[<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
[<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
[<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
[<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
[<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
[<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
[<ffffffff812480f9>] SyS_ioctl+0x79/0x90
[<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
[<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff880841213000 (size 2048):
comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81930718>] kmemleak_alloc+0x28/0x50
[<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
[<ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0
[<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
[<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
[<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
[<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
[<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
[<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
[<ffffffff812480f9>] SyS_ioctl+0x79/0x90
[<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
[<ffffffffffffffff>] 0xffffffffffffffff
(gdb) list *mlx4_en_try_alloc_resources+0x118
0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145).
2140 if (!dst->tx_ring_num[t])
2141 continue;
2142
2143 dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring *) *
2144 MAX_TX_RINGS, GFP_KERNEL);
2145 if (!dst->tx_ring[t])
2146 goto err_free_tx;
2147
2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149 MAX_TX_RINGS, GFP_KERNEL);
(gdb) list *mlx4_en_try_alloc_resources+0x13b
0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150).
2145 if (!dst->tx_ring[t])
2146 goto err_free_tx;
2147
2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149 MAX_TX_RINGS, GFP_KERNEL);
2150 if (!dst->tx_cq[t]) {
2151 kfree(dst->tx_ring[t]);
2152 goto err_free_tx;
2153 }
2154 }
Fixes: ec25bc04ed8e ("net/mlx4_en: Add resilience in low memory systems")
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When starting the port, driver will inform Firmware about the actual MTU
which does not include implicit headers, such as FCS or VLAN tags.
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Disable BH around the call to napi_schedule() to avoid following warning
[ 52.095499] NOHZ: local_softirq_pending 08
[ 52.421291] NOHZ: local_softirq_pending 08
[ 52.608313] NOHZ: local_softirq_pending 08
Fixes: 8d59de8f7bb3 ("net/mlx4_en: Process all completions in RX rings after port goes up")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\|
| |
| |
| |
| |
| |
| | |
Two AF_* families adding entries to the lockdep tables
at the same time.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In commit b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs"),
it changed EOPNOTSUPP to ENOTSUPP by mistake. This patch fixes it.
Fixes: b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
|
| |
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.
Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
The user prio field is wrong (and overflows) in the XDP forward
flow.
This is a result of a bad value for num_tx_rings_p_up, which should
account all XDP TX rings, as they operate for the same user prio.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head()
support. This patch only affects the code path when XDP is active.
After testing, the tx_dropped counter is incremented if the xdp_prog sends
more than wire MTU.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When XDP is active in mlx4, mlx4 is using one page/pkt.
At the same time (i.e. when XDP is active), it is currently
limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN)
which is 1514 in x86. AFAICT, we can at least raise the MTU
limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this
patch is doing. It will be useful in the next patch which
allows XDP program to extend the packet by adding new header(s).
Note: In the earlier XDP patches, there is already existing guard
to ensure the page/pkt scheme only applies when XDP is active
in mlx4.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows XDP prog to extend/remove the packet
data at the head (like adding or removing header). It is
done by adding a new XDP helper bpf_xdp_adjust_head().
It also renames bpf_helper_changes_skb_data() to
bpf_helper_changes_pkt_data() to better reflect
that XDP prog does not work on skb.
This patch adds one "xdp_adjust_head" bit to bpf_prog for the
XDP-capable driver to check if the XDP prog requires
bpf_xdp_adjust_head() support. The driver can then decide
to error out during XDP_SETUP_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Couple conflicts resolved here:
1) In the MACB driver, a bug fix to properly initialize the
RX tail pointer properly overlapped with some changes
to support variable sized rings.
2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
overlapping with a reorganization of the driver to support
ACPI, OF, as well as PCI variants of the chip.
3) In 'net' we had several probe error path bug fixes to the
stmmac driver, meanwhile a lot of this code was cleaned up
and reorganized in 'net-next'.
4) The cls_flower classifier obtained a helper function in
'net-next' called __fl_delete() and this overlapped with
Daniel Borkamann's bug fix to use RCU for object destruction
in 'net'. It also overlapped with Jiri's change to guard
the rhashtable_remove_fast() call with a check against
tc_skip_sw().
5) In mlx4, a revert bug fix in 'net' overlapped with some
unrelated changes in 'net-next'.
6) In geneve, a stale header pointer after pskb_expand_head()
bug fix in 'net' overlapped with a large reorganization of
the same code in 'net-next'. Since the 'net-next' code no
longer had the bug in question, there was nothing to do
other than to simply take the 'net-next' hunks.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit 9d76931180557270796f9631e2c79b9c7bb3c9fb.
Using unregister_netdev at shutdown flow prevents calling
the netdev's ndos or trying to access its freed resources.
This fixes crashes like the following:
Call Trace:
[<ffffffff81587a6e>] dev_get_phys_port_id+0x1e/0x30
[<ffffffff815a36ce>] rtnl_fill_ifinfo+0x4be/0xff0
[<ffffffff815a53f3>] rtmsg_ifinfo_build_skb+0x73/0xe0
[<ffffffff815a5476>] rtmsg_ifinfo.part.27+0x16/0x50
[<ffffffff815a54c8>] rtmsg_ifinfo+0x18/0x20
[<ffffffff8158a6c6>] netdev_state_change+0x46/0x50
[<ffffffff815a5e78>] linkwatch_do_dev+0x38/0x50
[<ffffffff815a6165>] __linkwatch_run_queue+0xf5/0x170
[<ffffffff815a6205>] linkwatch_event+0x25/0x30
[<ffffffff81099a82>] process_one_work+0x152/0x400
[<ffffffff8109a325>] worker_thread+0x125/0x4b0
[<ffffffff8109a200>] ? rescuer_thread+0x350/0x350
[<ffffffff8109fc6a>] kthread+0xca/0xe0
[<ffffffff8109fba0>] ? kthread_park+0x60/0x60
[<ffffffff816a1285>] ret_from_fork+0x25/0x30
Fixes: 9d7693118055 ("net/mlx4_en: Avoid unregister_netdev at shutdown flow")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
My recent commit to get more precise rx/tx counters in ndo_get_stats64()
can lead to crashes at device dismantle, as Jesper found out.
We must prevent mlx4_en_fold_software_stats() trying to access
tx/rx rings if they are deleted.
Fix this by adding a test against priv->port_up in
mlx4_en_fold_software_stats()
Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port()
allows us to eventually broadcast the latest/current counters to
rtnetlink monitors.
Fixes: 40931b85113d ("mlx4: give precise rx/tx bytes/packets counters")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
mlx4 stats are chaotic because a deferred work queue is responsible
to update them every 250 ms.
Even sampling stats every one second with "sar -n DEV 1" gives
variations like the following :
lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:39:22 eth0 146877.00 3265554.00 9467.15 4828168.50
07:39:23 eth0 146587.00 3260329.00 9448.15 4820445.98
07:39:24 eth0 146894.00 3259989.00 9468.55 4819943.26
07:39:25 eth0 110368.00 2454497.00 7113.95 3629012.17 <<>>
07:39:26 eth0 146563.00 3257502.00 9447.25 4816266.23
07:39:27 eth0 145678.00 3258292.00 9389.79 4817414.39
07:39:28 eth0 145268.00 3253171.00 9363.85 4809852.46
07:39:29 eth0 146439.00 3262185.00 9438.97 4823172.48
07:39:30 eth0 146758.00 3264175.00 9459.94 4826124.13
07:39:31 eth0 146843.00 3256903.00 9465.44 4815381.97
Average: eth0 142827.50 3179259.70 9206.30 4700578.16
This patch allows rx/tx bytes/packets counters being folded at the
time we need stats.
We now can fetch stats every 1 ms if we want to check NIC behavior
on a small time window. It is also easier to detect anomalies.
lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:42:50 eth0 142915.00 3177696.00 9212.06 4698270.42
07:42:51 eth0 143741.00 3200232.00 9265.15 4731593.02
07:42:52 eth0 142781.00 3171600.00 9202.92 4689260.16
07:42:53 eth0 143835.00 3192932.00 9271.80 4720761.39
07:42:54 eth0 141922.00 3165174.00 9147.64 4679759.21
07:42:55 eth0 142993.00 3207038.00 9216.78 4741653.05
07:42:56 eth0 141394.06 3154335.64 9113.85 4663731.73
07:42:57 eth0 141850.00 3161202.00 9144.48 4673866.07
07:42:58 eth0 143439.00 3180736.00 9246.05 4702755.35
07:42:59 eth0 143501.00 3210992.00 9249.99 4747501.84
Average: eth0 142835.66 3182165.93 9206.98 4704874.08
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Per RX ring packets/bytes counters are not protected by global
priv->stats_lock.
Better not confuse the reader, and use READ_ONCE() to show we read
these counters without surrounding synchronization.
Interrupt moderation is best effort, and we do not really care of
ultra precise counters.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.
Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Make sure mlx4_en_free_resources is called under the netdev state lock.
This is needed since RCU dereference of XDP prog should be protected.
Fixes: 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sagi Grimberg <sagi@grimberg.me>
CC: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Goal is to reorganize this critical structure to increase performance.
ndo_start_xmit() should only dirty one cache line, and access as few
cache lines as possible.
Add sp_ (Slow Path) prefix to fields that are not used in fast path,
to make clear what is going on.
After this patch pahole reports something much better, as all
ndo_start_xmit() needed fields are packed into two cache lines instead
of seven or eight
struct mlx4_en_tx_ring {
u32 last_nr_txbb; /* 0 0x4 */
u32 cons; /* 0x4 0x4 */
long unsigned int wake_queue; /* 0x8 0x8 */
struct netdev_queue * tx_queue; /* 0x10 0x8 */
u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x18 0x8 */
struct mlx4_en_rx_ring * recycle_ring; /* 0x20 0x8 */
/* XXX 24 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 prod; /* 0x40 0x4 */
unsigned int tx_dropped; /* 0x44 0x4 */
long unsigned int bytes; /* 0x48 0x8 */
long unsigned int packets; /* 0x50 0x8 */
long unsigned int tx_csum; /* 0x58 0x8 */
long unsigned int tso_packets; /* 0x60 0x8 */
long unsigned int xmit_more; /* 0x68 0x8 */
struct mlx4_bf bf; /* 0x70 0x18 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
__be32 doorbell_qpn; /* 0x88 0x4 */
__be32 mr_key; /* 0x8c 0x4 */
u32 size; /* 0x90 0x4 */
u32 size_mask; /* 0x94 0x4 */
u32 full_size; /* 0x98 0x4 */
u32 buf_size; /* 0x9c 0x4 */
void * buf; /* 0xa0 0x8 */
struct mlx4_en_tx_info * tx_info; /* 0xa8 0x8 */
int qpn; /* 0xb0 0x4 */
u8 queue_index; /* 0xb4 0x1 */
bool bf_enabled; /* 0xb5 0x1 */
bool bf_alloced; /* 0xb6 0x1 */
u8 hwtstamp_tx_type; /* 0xb7 0x1 */
u8 * bounce_buf; /* 0xb8 0x8 */
/* --- cacheline 3 boundary (192 bytes) --- */
long unsigned int queue_stopped; /* 0xc0 0x8 */
struct mlx4_hwq_resources sp_wqres; /* 0xc8 0x58 */
/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
struct mlx4_qp sp_qp; /* 0x120 0x30 */
/* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */
struct mlx4_qp_context sp_context; /* 0x150 0xf8 */
/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
cpumask_t sp_affinity_mask; /* 0x248 0x20 */
enum mlx4_qp_state sp_qp_state; /* 0x268 0x4 */
u16 sp_stride; /* 0x26c 0x2 */
u16 sp_cqn; /* 0x26e 0x2 */
/* size: 640, cachelines: 10, members: 36 */
/* sum members: 600, holes: 1, sum holes: 24 */
/* padding: 16 */
};
Instead of this silly placement :
struct mlx4_en_tx_ring {
u32 last_nr_txbb; /* 0 0x4 */
u32 cons; /* 0x4 0x4 */
long unsigned int wake_queue; /* 0x8 0x8 */
/* XXX 48 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 prod; /* 0x40 0x4 */
/* XXX 4 bytes hole, try to pack */
long unsigned int bytes; /* 0x48 0x8 */
long unsigned int packets; /* 0x50 0x8 */
long unsigned int tx_csum; /* 0x58 0x8 */
long unsigned int tso_packets; /* 0x60 0x8 */
long unsigned int xmit_more; /* 0x68 0x8 */
unsigned int tx_dropped; /* 0x70 0x4 */
/* XXX 4 bytes hole, try to pack */
struct mlx4_bf bf; /* 0x78 0x18 */
/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
long unsigned int queue_stopped; /* 0x90 0x8 */
cpumask_t affinity_mask; /* 0x98 0x10 */
struct mlx4_qp qp; /* 0xa8 0x30 */
/* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
struct mlx4_hwq_resources wqres; /* 0xd8 0x58 */
/* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */
u32 size; /* 0x130 0x4 */
u32 size_mask; /* 0x134 0x4 */
u16 stride; /* 0x138 0x2 */
/* XXX 2 bytes hole, try to pack */
u32 full_size; /* 0x13c 0x4 */
/* --- cacheline 5 boundary (320 bytes) --- */
u16 cqn; /* 0x140 0x2 */
/* XXX 2 bytes hole, try to pack */
u32 buf_size; /* 0x144 0x4 */
__be32 doorbell_qpn; /* 0x148 0x4 */
__be32 mr_key; /* 0x14c 0x4 */
void * buf; /* 0x150 0x8 */
struct mlx4_en_tx_info * tx_info; /* 0x158 0x8 */
struct mlx4_en_rx_ring * recycle_ring; /* 0x160 0x8 */
u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168 0x8 */
u8 * bounce_buf; /* 0x170 0x8 */
struct mlx4_qp_context context; /* 0x178 0xf8 */
/* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */
int qpn; /* 0x270 0x4 */
enum mlx4_qp_state qp_state; /* 0x274 0x4 */
u8 queue_index; /* 0x278 0x1 */
bool bf_enabled; /* 0x279 0x1 */
bool bf_alloced; /* 0x27a 0x1 */
/* XXX 5 bytes hole, try to pack */
/* --- cacheline 10 boundary (640 bytes) --- */
struct netdev_queue * tx_queue; /* 0x280 0x8 */
int hwtstamp_tx_type; /* 0x288 0x4 */
/* size: 704, cachelines: 11, members: 36 */
/* sum members: 587, holes: 6, sum holes: 65 */
/* padding: 52 */
};
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\|
| |
| |
| |
| |
| |
| | |
Several cases of bug fixes in 'net' overlapping other changes in
'net-next-.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit 9d2afba058722d40cc02f430229c91611c0e8d16.
The original issue would possibly exist if an external module
tried calling our "ethtool_ops" without checking if it still
exists.
The right way of solving it is by simply doing the check in
the caller side.
Currently, no action is required as there's no such use case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings
scheme") added a bug in that the prog's reference count is not dropped
in the error path when mlx4_en_try_alloc_resources() is failing from
mlx4_xdp_set().
We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we
need to release again. Earlier in the call path, dev_change_xdp_fd()
itself holds a reference to the prog as well (hence the '- 1' in the
bpf_prog_add()), so a simple atomic_sub() is safe to use here. When
an error is propagated, then bpf_prog_put() is called eventually from
dev_change_xdp_fd()
Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
XDP statistics are reported in ethtool, in total and per ring,
as follows:
- xdp_drop: the number of packets dropped by xdp.
- xdp_tx: the number of packets forwarded by xdp.
- xdp_tx_full: the number of times an xdp forward failed
due to a full tx xdp ring.
In addition, all packets that are dropped/forwarded by XDP
are no longer accounted in rx_packets/rx_bytes of the ring,
so that they count traffic that is passed to the stack.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Separately manage the two types of TX rings: regular ones, and XDP.
Upon an XDP set, do not borrow regular TX rings and convert them
into XDP ones, but allocate new ones, unless we hit the max number
of rings.
Which means that in systems with smaller #cores we will not consume
the current TX rings for XDP, while we are still in the num TX limit.
XDP TX rings counters are not shown in ethtool statistics.
Instead, XDP counters will be added to the respective RX rings
in a downstream patch.
This has no performance implications.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| | |
Mostly simple overlapping changes.
For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Following the previous patch, as an optimization, the slave will
not even bother sending the DUMP_ETH_STATS command over the
comm channel.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix a kernel panic that occurs as a result of an asynchronous event
handled in roce_gid_mgmt:
mlx4_en_get_drvinfo is called and accesses freed resources.
This happens in a shutdown flow only, since pci device is destroyed
while netdevice is still alive.
Fixes: c27a02cd94d6 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently there is a race between incoming traffic and
initialization flow. HW is able to receive the packets
after INIT_PORT is done and unicast steering is configured.
Before we set priv->port_up NAPI is not scheduled and
receive queues become full. Therefore we never get
new interrupts about the completions.
This issue could happen if running heavy traffic during
bringing port up.
The resolution is to schedule NAPI once port_up is set.
If receive queues were full this will process all cqes
and release them.
Fixes: c27a02cd94d6 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
|
| |
mlx4: min_mtu 46, max_mtu depends on hardware
mlx5: min_mtu 68, max_mtu depends on hardware
CC: netdev@vger.kernel.org
CC: Tariq Toukan <tariqt@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the vf to VST 802.1ad mode (mlx4 VST QinQ mode) by setting vf vlan
protocol to 802.1ad.
VST 802.1ad mode in mlx4, is used for STAG strip/insertion by PF, while
the CTAG is set by the VF.
Read current vlan protocol as part of the vf configuration state.
Upon setting vf vlan protocol to 802.1ad, we use a mechanism of handshake
to verify that both the vf and the pf driver version support it.
The handshake uses the command QUERY_FUNC_CAP:
- The vf sets a pre-defined support bit in input modifier.
- A pf that supports the feature sends the request to the vf through a
pre-defined field in the output mailbox.
- In case vf does not support the feature, the pf will fail the control
command (in this case, IP link tool command to set the vf vlan
protocol to 802.1ad).
No change in VST 802.1Q mode.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce new rtnl UAPI that exposes a list of vlans per VF, giving
the ability for user-space application to specify it for the VF, as an
option to support 802.1ad.
We adjusted IP Link tool to support this option.
For future use cases, the new UAPI supports multiple vlans. For now we
limit the list size to a single vlan in kernel.
Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older versions of IP Link tool.
Add a vlan protocol parameter to the ndo_set_vf_vlan callback.
We kept 802.1Q as the drivers' default vlan protocol.
Suitable ip link tool command examples:
Set vf vlan protocol 802.1ad:
ip link set eth0 vf 1 vlan 100 proto 802.1ad
Set vf to VST (802.1Q) mode:
ip link set eth0 vf 1 vlan 100 proto 802.1Q
Or by omitting the new parameter
ip link set eth0 vf 1 vlan 100
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
In Ethernet VF, disable vlan HW acceleration on VF
while it is set to VF vlan protocol 802.1ad mode.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
drivers/net/ethernet/mediatek/mtk_eth_soc.c
drivers/net/ethernet/qlogic/qed/qed_dcbx.c
drivers/net/phy/Kconfig
All conflicts were cases of overlapping commits.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds a capability check before enabling DCBX.
In addition, it re-organizes the relevant data structures,
and fixes a typo in a define.
Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Depending on the preempt mode, the bpf_prog stored in xdp_prog may be
freed despite the use of call_rcu inside bpf_prog_put. The situation is
possible when running in PREEMPT_RCU=y mode, for instance, since the rcu
callback for destroying the bpf prog can run even during the bh handling
in the mlx4 rx path.
Several options were considered before this patch was settled on:
Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all
of the rings are updated with the new program.
This approach has the disadvantage that as the number of rings
increases, the speed of update will slow down significantly due to
napi_synchronize's msleep(1).
Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh.
The action of the bpf_prog_put_bh would be to then call bpf_prog_put
later. Those drivers that consume a bpf prog in a bh context (like mlx4)
would then use the bpf_prog_put_bh instead when the ring is up. This has
the problem of complexity, in maintaining proper refcnts and rcu lists,
and would likely be harder to review. In addition, this approach to
freeing must be exclusive with other frees of the bpf prog, for instance
a _bh prog must not be referenced from a prog array that is consumed by
a non-_bh prog.
The placement of rcu_read_lock in this patch is functionally the same as
putting an rcu_read_lock in napi_poll. Actually doing so could be a
potentially controversial change, but would bring the implementation in
line with sk_busy_loop (though of course the nature of those two paths
is substantially different), and would also avoid future copy/paste
problems with future supporters of XDP. Still, this patch does not take
that opinionated option.
Testing was done with kernels in either PREEMPT_RCU=y or
CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting
any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did
not show up in the perf report whatsoever, and with PREEMPT_RCU=y the
overhead of rcu_read_lock (according to perf) was the same before/after.
In the rx path, rcu_read_lock is eventually called for every packet
from netif_receive_skb_internal, so the napi poll call's rcu_read_lock
is easily amortized.
v2:
Remove extra rcu_read_lock in mlx4_en_process_rx_cq body
Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or
rcu_dereference[_protected] as appropriate.
Add explicit mutex lock around rcu_assign instead of xchg loop.
Fixes: d576acf0a22 ("net/mlx4_en: add page recycle to prepare rx ring for tx support")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| | |
Just several instances of overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch fixes the lost of Ethernet port on low memory system,
when driver frees its resources and fails to allocate new resources.
Issue could happen while changing number of channels, rings size or
changing the timestamp configuration.
This fix is necessary because of removing vmap use in the code.
When vmap was in use driver could allocate non-contiguous memory
and make it contiguous with vmap. Now it could fail to allocate
a large chunk of contiguous memory and lose the port.
Current code tries to allocate new resources and then upon success
frees the old resources.
Fixes: 73898db04301 ('net/mlx4: Avoid wrong virtual mappings')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Filters cleanup should be done once before destroying net device,
since filters list is contained in the private data.
Fixes: 1eb8c695bda9 ('net/mlx4_en: Add accelerated RFS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|