| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the firmware is slowly running out of command IDs and grouping of
commands is desirable anyway, the firmware is extending the command
header from 4 bytes to 8 bytes to introduce a group (in place of the
former flags field, since that's always 0 on commands and thus can
be easily used to distinguish between the two.
In order to support this most easily in the driver widen the command
command ID used in the command sending functions and encode the new
values (group and version) in the ID. That way existing code doesn't
have to be changed (since the higher bits are 0 automatically) and
newer code can easily use the new ID generation function to create a
value to use in place of just the command ID.
Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
ToF is a time based method for measurement of the WiFi device
location within a WiFi environment. The driver functionality provided
by this patch is the interface for communication with FW and receiving
location related updates from the FW. The interface provided by this
patch is via debugfs.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
All the supported firmwares support this API.
This includes removing dwell per band, as band is no longer a factor
in calculating the dwell. Only basic dwell is used and FW will calculate
the actual dwell time.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 'flags' field really has been reserved in the firmware API for a
very long time, probably since 4965. As a consequence, the field is
always 0 and checking for a IWL_CMD_FAILED_MSK flag makes no sense.
Rename the field to 'reserved', get rid of IWL_CMD_FAILED_MSK and
all the code for it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
In iwlmvm firmwares, the Byte count written in the scheduler
byte count table is in DWORDs and not in bytes.
We should check that this value fits in the 12 bits and
the value can be either in bits of in DWORD or bytes
depending on the firmware. Check the value after the
translation to DWORDs is done (if needed).
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
In a few places, we were disabling interrupts but didn't
make sure that the interrupt handler has finished running.
Add calls to synchronize_irq() to ensure we finish handling
the interrupts before we free resources or other things that
could lead to a crash if the interrupt were to be handled
later.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
| |
When the firmware crashes, we can't expect the Tx queues to
progress. Cancel their timer.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
| |
Since the time-event is sent with the immediate flag set, there is
no need to sample the device time.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
| |
With the previous patch series, no opmode continues using the
command or handler_status (i.e. the return value from the RX)
so it can be removed now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
| |
In the mvm driver, neither the old command nor the return value
are used, so remove them.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
| |
After the previous patches, the command that's passed in nor the
return value are used any more, so can be removed.
While at it, make some functions static.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
| |
This makes the logging a little less useful, but as they're mostly
synchronous commands it won't matter much. It gets rid of the
dependency on the input command, which this is the only user of.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This driver currently has some very confusing ADD_STA response handling
that runs asynchronously in the background for all of the commands, but
is only really necessary for synchronous ones (the really asynchronous
ones can only be done for already existing stations), and for the sync
ones it actually waits for the RX handler to return a status code.
Rework this to keep the debug printing in the handler, but do the code
that's supposed to have an effect only for sync commands in the command
sending function.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current key offset assignment algorithm always uses the lowest
unused key offset, which will potentially lead to issues when the
firmware will change to take the key material for TX from the key
table rather than from the TX command.
In order to avoid those issues (and avoid forgetting about them)
change the key offset allocation algorithm now to avoid reusing key
offsets quickly.
The new algorithm always picks as the next offset the least recently
freed offset, i.e. the offset that has been unused for the longest
amount of time. This is implemented by having a generation counter
for each key offset that is incremented every time a key is deleted,
except for the one that's deleted, which is reset to zero. Thus the
highest counter is the key that's been unused longest.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
During NIC initialization shared HW is reset and this disables the
scheduler. Some HW platforms do not activate the scheduler after it.
Consequently all HCMD sent by the driver stay at the queues which cause
to queue stuck.
Set the scheduler to work on auto active mode so it would be activated upon
change over one of the queues' write pointer.
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This allows to ensure that we don't have races between them.
A user reported that stop_device was called twice upon
rfkill interrupt after suspend. When the interrupts are
enabled, and right after when we directly check the rfkill
state.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
The new locking in PCIe transport requires to start_hw
before start_fw. This uncovered a bug in dvm which failed
to do so.
Fix that.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
| |
This firmware is not supported anymore - stop loading this firmware.
Remove code handling older versions.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
There's no need to forward RX MPDUs to notification wait tests, nor
do we need to check them for firmware dump triggers, nor could they
be asynchronous. It's thus more efficient to handle them separately,
before going into the regular RX handlers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In scenarios where we haven't converged yet to a specific modulation
and rate it could be better to report to userspace the last tx rate
based on the STA capabilities and RSSI. This is important as sometimes
userspace displays the last tx rate as the link speed.
This avoids being presented with low legacy rates when rs just begins
its search or after an idle period in which it resets itself.
Signed-off-by: Eyal Shapira <eyalx.shapira@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull networking updates from David Miller:
1) Add TX fast path in mac80211, from Johannes Berg.
2) Add TSO/GRO support to ibmveth, from Thomas Falcon
3) Move away from cached routes in ipv6, just like ipv4, from Martin
KaFai Lau.
4) Lots of new rhashtable tests, from Thomas Graf.
5) Run ingress qdisc lockless, from Alexei Starovoitov.
6) Allow servers to fetch TCP packet headers for SYN packets of new
connections, for fingerprinting. From Eric Dumazet.
7) Add mode parameter to pktgen, for testing receive. From Alexei
Starovoitov.
8) Cache access optimizations via simplifications of build_skb(), from
Alexander Duyck.
9) Move page frag allocator under mm/, also from Alexander.
10) Add xmit_more support to hv_netvsc, from KY Srinivasan.
11) Add a counter guard in case we try to perform endless reclassify
loops in the packet scheduler.
12) Extern flow dissector to be programmable and use it in new "Flower"
classifier. From Jiri Pirko.
13) AF_PACKET fanout rollover fixes, performance improvements, and new
statistics. From Willem de Bruijn.
14) Add netdev driver for GENEVE tunnels, from John W Linville.
15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.
16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.
17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
Borkmann.
18) Add tail call support to BPF, from Alexei Starovoitov.
19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.
20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.
21) Favor even port numbers for allocation to connect() requests, and
odd port numbers for bind(0), in an effort to help avoid
ip_local_port_range exhaustion. From Eric Dumazet.
22) Add Cavium ThunderX driver, from Sunil Goutham.
23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
from Alexei Starovoitov.
24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.
25) Double TCP Small Queues default to 256K to accomodate situations
like the XEN driver and wireless aggregation. From Wei Liu.
26) Add more entropy inputs to flow dissector, from Tom Herbert.
27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
Jonassen.
28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.
29) Track and act upon link status of ipv4 route nexthops, from Andy
Gospodarek.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
bridge: vlan: flush the dynamically learned entries on port vlan delete
bridge: multicast: add a comment to br_port_state_selection about blocking state
net: inet_diag: export IPV6_V6ONLY sockopt
stmmac: troubleshoot unexpected bits in des0 & des1
net: ipv4 sysctl option to ignore routes when nexthop link is down
net: track link-status of ipv4 nexthops
net: switchdev: ignore unsupported bridge flags
net: Cavium: Fix MAC address setting in shutdown state
drivers: net: xgene: fix for ACPI support without ACPI
ip: report the original address of ICMP messages
net/mlx5e: Prefetch skb data on RX
net/mlx5e: Pop cq outside mlx5e_get_cqe
net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
net/mlx5e: Remove extra spaces
net/mlx5e: Avoid TX CQE generation if more xmit packets expected
net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add a new argument to br_fdb_delete_by_port which allows to specify a
vid to match when flushing entries and use it in nbp_vlan_delete() to
flush the dynamically learned entries of the vlan/port pair when removing
a vlan from a port. Before this patch only the local mac was being
removed and the dynamically learned ones were left to expire.
Note that the do_all argument is still respected and if specified, the
vid will be ignored.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add a comment to explain why we're not disabling port's multicast when it
goes in blocking state. Since there's a check in the timer's function which
bypasses the timer if the port's in blocking/disabled state, the timer will
simply expire and stop without sending more queries.
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
drivers/net/ethernet/mellanox/mlx4/main.c
net/packet/af_packet.c
Both conflicts were cases of simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Current implementation of descriptor init procedure only takes
care about setting/clearing ownership flag in "des0"/"des1"
fields while it is perfectly possible to get unexpected bits
set because of the following factors:
[1] On driver probe underlying memory allocated with
dma_alloc_coherent() might not be zeroed and so
it will be filled with garbage.
[2] During driver operation some bits could be set by SD/MMC
controller (for example error flags etc).
And unexpected and/or randomly set flags in "des0"/"des1"
fields may lead to unpredictable behavior of GMAC DMA block.
This change addresses both items above with:
[1] Use of dma_zalloc_coherent() instead of simple
dma_alloc_coherent() to make sure allocated memory is
zeroed. That shouldn't affect performance because
this allocation only happens once on driver probe.
[2] Do explicit zeroing of both "des0" and "des1" fields
of all buffer descriptors during initialization of
DMA transfer.
And while at it fixed identation of dma_free_coherent()
counterpart as well.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: arc-linux-dev@synopsys.com
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ICMP messages can trigger ICMP and local errors. In this case
serr->port is 0 and starting from Linux 4.0 we do not return
the original target address to the error queue readers.
Add function to define which errors provide addr_offset.
With this fix my ping command is not silent anymore.
Fixes: c247f0534cc5 ("ip: fix error queue empty skb handling")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use the correct unregister function matching the register
function on the error path.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Fixes: c1beeef7a32a791a ("rocker: implement IPv4 fib offloading")
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
Oliver Hartkopp fixed a bug in the generic CAN frame handling code, which may
lead to loss of CAN frames. It was introduced during v4.1 development.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
gpg: Signature made Sun 21 Jun 2015 09:59:36 AM PDT using RSA key ID C9B5CFC7
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
As reported by Manfred Schlaegl here
http://marc.info/?l=linux-netdev&m=143482089824232&w=2
commit 514ac99c64b "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
As net timestamping is influenced by several players (netstamp_needed and
netdev_tstamp_prequeue) Manfred missed a proper timestamp which leads to
CAN frame loss.
As skb timestamping became now mandatory for CAN related skbs this patch
makes sure that received CAN skbs always have a proper timestamp set.
Maybe there's a better solution in the future but this patch fixes the
CAN frame loss so far.
Reported-by: Manfred Schlaegl <manfred.schlaegl@gmx.at>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit 4fc9b87bae25 ("net: fs_enet: Implement NETIF_F_SG feature")
brings a trouble to Freescale MPC512x: a kernel oops happens
during sending non-linear sk_buff with .data not aligned by 4.
Log quotation:
Unable to handle kernel paging request for data at address 0xe467c000
Faulting instruction address: 0xc000cd44
Oops: Kernel access of bad area, sig: 11 [#1]
MPC512x generic
Modules linked in:
CPU: 0 PID: 984 Comm: kworker/0:1H Not tainted 4.1.0-rc8-00024-gbb16140 #2
Workqueue: rpciod rpc_async_schedule
task: cf364a50 ti: cf362000 task.ti: cf362000
NIP: c000cd44 LR: c000c720 CTR: 00000206
REGS: cf363ac0 TRAP: 0300 Not tainted (4.1.0-rc8-00024-gbb16140)
MSR: 00009032 <EE,ME,IR,DR,RI> CR: 42004082 XER: 00000000
DAR: e467c000 DSISR: 20000000
GPR00: c0279e24 cf363b70 cf364a50 e467c000 00000206 0000001f 00000001 00000001
GPR08: 00000000 e467c000 e46800be 000139a6 82002082 00000000 c002e46c cf3c3680
GPR16: c044cb30 c04b0000 cf363c48 00000000 00000001 fde0315c 00000000 0000000b
GPR24: 0000002c 000040be cf339aa0 0000000b 00000001 cf873210 00282f85 00000000
NIP [c000cd44] clean_dcache_range+0x1c/0x30
LR [c000c720] dma_direct_map_page+0x40/0x94
Call Trace:
[cf363b70] [cf339b60] 0xcf339b60 (unreliable)
[cf363b90] [c0279e24] fs_enet_start_xmit+0x1c8/0x42c
[cf363bd0] [c02ff710] dev_hard_start_xmit+0x2dc/0x3d4
[cf363c40] [c0319c60] sch_direct_xmit+0xcc/0x1cc
[cf363c70] [c02ff9c0] __dev_queue_xmit+0x1b8/0x47c
[cf363ca0] [c032a3e8] ip_finish_output+0x1fc/0x9a8
[cf363ce0] [c032c31c] ip_send_skb+0x1c/0xa4
[cf363cf0] [c035112c] udp_send_skb+0xe4/0x2e8
[cf363d10] [c0351368] udp_push_pending_frames+0x38/0x84
[cf363d20] [c03537b8] udp_sendpage+0x134/0x174
[cf363d70] [c0384fd4] xs_sendpages+0x21c/0x250
[cf363db0] [c03852bc] xs_udp_send_request+0x50/0xf8
[cf363de0] [c0382f08] xprt_transmit+0x64/0x280
[cf363e20] [c038017c] call_transmit+0x168/0x234
[cf363e40] [c0387918] __rpc_execute+0x88/0x2b0
[cf363e80] [c00296f8] process_one_work+0x124/0x2fc
[cf363ea0] [c0029a00] worker_thread+0x130/0x480
[cf363ef0] [c002e528] kthread+0xbc/0xd0
[cf363f40] [c000e4a8] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
7c70faa6 60630800 7c70fba6 4c00012c 4e800020 38a0001f 7c632878 7c832050
7c842a14 5484d97f 4d820020 7c8903a6 <7c00186c> 38630020 4200fff8 7c0004ac
---[ end trace c846c1eceb513c85 ]---
The reason:
MPC5121 FEC requires 4-byte alignment for TX data buffer and calls
tx_skb_align_workaround() for copying sk_buff with not aligned .data to a new
sk_buff with aligned one. But tx_skb_align_workaround() uses
skb_copy_from_linear_data() which doesn't work for non-linear sk_buff:
a new sk_buff has non-zero nr_frags and zero .data_len.
So improve the condition of calling tx_skb_align_workaround() and use
skb_linearize() in it.
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit edafc132baac ("xen-netback: making the bandwidth limiter runtime settable")
introduced the capability to change the bandwidth rate limit at runtime.
But it also introduced a possible crashing bug.
If netback receives two XenbusStateConnected without getting the
hotplug-status watch firing in between, then it will try to register the
watches for the rate limiter again. But this triggers a BUG() in the watch
registration code.
The fix modifies connect() to remove the possibly existing packet-rate
watches before trying to install those watches. This behaviour is in line
with how connect() deals with the hotplug-status watch.
Signed-off-by: Imre Palik <imrep@amazon.de>
Cc: Matt Wilson <msw@amazon.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When a port goes through a link down/up the multicast router configuration
is not restored.
Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
A ROSE socket doesn't necessarily always have a neighbour pointer so check
if the neighbour pointer is valid before dereferencing it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Cc: stable@vger.kernel.org #2.6.11+
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.
Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
[ 36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
[ 36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[ 36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[ 36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 36.008250] 00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
[ 36.009630] ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
[ 36.011076] 0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
[ 36.012494] Call Trace:
[ 36.012953] <IRQ> [<ffffffff8171d53a>] dump_stack+0x4f/0x6d
[ 36.014085] [<ffffffff810967d3>] ___might_sleep+0x103/0x170
[ 36.015117] [<ffffffff81096892>] __might_sleep+0x52/0x90
[ 36.016117] [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190
[ 36.017266] [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130
[ 36.018485] [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130
[ 36.019679] [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70
[ 36.020884] [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60
[ 36.022058] [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730
[ 36.023118] [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0
[ 36.024185] [<ffffffff810e3872>] ? __module_text_address+0x12/0x60
[ 36.025327] [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60
[ 36.026410] [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0
[ 36.027556] [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170
[ 36.028784] [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0
[ 36.029832] [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20
[ 36.030936] [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0
[ 36.031875] [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70
[ 36.032953] [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0
[ 36.034065] [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0
[ 36.035069] [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0
[ 36.035963] [<ffffffff81657569>] ip_rcv_finish+0x119/0x330
[ 36.036950] [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0
[ 36.037847] [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930
[ 36.038994] [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70
[ 36.040033] [<ffffffff81610b72>] process_backlog+0xd2/0x1f0
[ 36.041025] [<ffffffff81611482>] net_rx_action+0x122/0x310
[ 36.042007] [<ffffffff81076743>] __do_softirq+0x103/0x2f0
[ 36.042978] [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30
This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The commit 898b2970e2c9 ("mvneta: implement SGMII-based in-band link state
signaling")
changed mvneta_adjust_link() so that it does not clear the auto-negotiation
bits in MVNETA_GMAC_AUTONEG_CONFIG register. This was necessary for
auto-negotiation mode to work.
Unfortunately I haven't checked if these bits are ever initialized.
It appears they are not.
This patch adds the missing initialization of the auto-negotiation bits
in the MVNETA_GMAC_AUTONEG_CONFIG register.
It fixes the following regression:
https://www.mail-archive.com/netdev@vger.kernel.org/msg67928.html
Since the patch was tested to fix a regression, it should be applied to
stable tree.
Tested-by: Arnaud Ebalard <arno@natisbad.org>
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
CC: Florian Fainelli <f.fainelli@gmail.com>
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: stable@vger.kernel.org
Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f600698d ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Unfortunately, Michal's change to fix AP_VLAN crypto tailroom
caused a locking issue that was reported by lockdep, but only
in a few cases - the issue was a classic ABBA deadlock caused
by taking the mtx after the key_mtx, where normally they're
taken the other way around.
As the key mutex protects the field in question (I'm adding a
few annotations to make that clear) only the iteration needs
to be protected, but we can also iterate the interface list
with just RCU protection while holding the key mutex.
Fixes: f9dca80b98ca ("mac80211: fix AP_VLAN crypto tailroom calculation")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix an allmodconfig compilation failer on microblaze due to big endian
architectures being apparently unsupported by the NetJet code:
drivers/isdn/hisax/nj_s.c: In function 'setup_netjet_s':
drivers/isdn/hisax/nj_s.c:265:2:
error: #error "not running on big endian machines now"
Modify the relevant Kconfig such that the NetJet code is not built on
microblaze anymore.
Note that endianess on microblaze is not determined through Kconfig,
but by means of a compiler provided CPP macro, namely __MICROBLAZEEL__.
However, gcc defaults to big endianess on that platform.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Acked-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:
1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.
2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.
Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.
Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f600698dc ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f19 ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f19 ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Due to firmware bug, under VPI configuration when port1 = IB and
port2 = Eth, Granular QoS per VF isn't working properly. More over,
the whole QP0/QP1 Para-Virtualization in the mlx4 IB driver is
broken on that config.
Hence, we must disable Granular QoS per VF under that configuration
till a fix is introduced. Once that happens, a new device capability
will be used to mark the feature support on that specific configuration.
Reported-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is
exported to userspace. It indicates whether a socket bound to in6addr_any
listens on IPv4 as well as IPv6. Since the socket is natively IPv6, it is not
listed by e.g. 'ss -l -4'.
This patch is accompanied by an appropriate one for iproute2 to enable
the additional information in 'ss -e'.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Andy Gospodarek says:
====================
changes to make ipv4 routing table aware of next-hop link status
This series adds the ability to have the Linux kernel track whether or
not a particular route should be used based on the link-status of the
interface associated with the next-hop.
Before this patch any link-failure on an interface that was serving as a
gateway for some systems could result in those systems being isolated
from the rest of the network as the stack would continue to attempt to
send frames out of an interface that is actually linked-down. When the
kernel is responsible for all forwarding, it should also be responsible
for taking action when the traffic can no longer be forwarded -- there
is no real need to outsource link-monitoring to userspace anymore.
This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.
net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...
When the above sysctls are set, the kernel will not only report to
userspace that the link is down, but it will also report to userspace
that a route is dead. This will signal to userspace that the route will
not be selected.
With the new sysctls set, the following behavior can be observed
(interface p8p1 is link-down):
default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache <local>
80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15
cache
While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down. Now interface p8p1 is linked-up:
default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache <local>
80.0.0.2 dev p8p1 src 80.0.0.1
cache
and the output changes to what one would expect.
If the global or interface sysctl is not set, the following output would
be expected when p8p1 is down:
default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
If the dead flag does not appear there should be no expectation that the
kernel would skip using this route due to link being down.
v2: Split kernel changes into 2 patches: first to add linkdown flag and
second to add new sysctl settings. Also took suggestion from Alex to
simplify code by only checking sysctl during fib lookup and suggestion
from Scott to add a per-interface sysctl. Added iproute2 patch to
recognize and print linkdown flag.
v3: Code cleanups along with reverse-path checks suggested by Alex and
small fixes related to problems found when multipath was disabled.
v4: Drop binary sysctls
v5: Whitespace and variable declaration fixups suggested by Dave
v6: Style changes noticed by Dave and checkpath suggestions.
v7: Last checkpatch fixup.
Though there were some that preferred not to have a configuration option
and to make this behavior the default when it was discussed in Ottawa
earlier this year since "it was time to do this." I wanted to propose
the config option to preserve the current behavior for those that desire
it. I'll happily remove it if Dave and Linus approve.
An IPv6 implementation is also needed (DECnet too!), but I wanted to
start with the IPv4 implementation to get people comfortable with the
idea before moving forward. If this is accepted the IPv6 implementation
can be posted shortly.
There was also a request for switchdev support for this, but that will
be posted as a followup as switchdev does not currently handle dead
next-hops in a multi-path case and I felt that infra needed to be added
first.
FWIW, we have been running the original version of this series with a
global sysctl and our customers have been happily using a backported
version for IPv4 and IPv6 for >6 months.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.
net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...
When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup. This will signal to userspace that the route will not be
selected. The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down. This was done as without it
the netlink listeners would have no idea whether or not a nexthop would
be selected. The kernel only sets RTNH_F_DEAD internally if the
interface has IFF_UP cleared.
With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):
default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache <local>
80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15
cache
While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down. Now interface p8p1 is linked-up:
default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1
cache
local 80.0.0.1 dev lo src 80.0.0.1
cache <local>
80.0.0.2 dev p8p1 src 80.0.0.1
cache
and the output changes to what one would expect.
If the sysctl is not set, the following output would be expected when
p8p1 is down:
default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.
v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set. Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.
v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.
v4: Drop binary sysctl
v5: Whitespace fixups from Dave
v6: Style changes from Dave and checkpatch suggestions
v7: One more checkpatch fixup
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off. No action is taken,
but additional flags are passed to userspace to indicate carrier status.
This also includes a cleanup to fib_disable_ip to more clearly indicate
what event made the function call to replace the more cryptic force
option previously used.
v2: Split out kernel functionality into 2 patches, this patch simply
sets and clears new nexthop flag RTNH_F_LINKDOWN.
v3: Cleanups suggested by Alex as well as a bug noticed in
fib_sync_down_dev and fib_sync_up when multipath was not enabled.
v5: Whitespace and variable declaration fixups suggested by Dave.
v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
all.
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS
attributes, but a driver doesn't need to implement this in order to get
bridge link information.
So error out only on errors different than -EOPNOTSUPP.
(This is a follow-up patch for 7d4f8d8.)
Fixes: 8793d0a664a8 ("switchdev: add new switchdev_port_bridge_getlink")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This bug pops up with NetworkManager on Fedora 21. NetworkManager tends to
stop the interface (nicvf_stop() is called) before changing settings. In
stopped state MAC cannot be sent to a PF. However, when the interface is
restarted (nicvf_open() is called), we ping the PF using NIC_MBOX_MSG_READY
message, and the PF replies back with old MAC address, overriding what we
had after MAC setting from userspace. As a result, we cannot set MAC
address using NetworkManager.
This patch introduces special tracking of MAC change in stopped state so
that the correct new MAC address is sent to a PF when interface is reopen.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|