summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* iwlwifi: add wide firmware command infrastructure for TXAviya Erenfeld2015-08-049-27/+121
| | | | | | | | | | | | | | | | | | | As the firmware is slowly running out of command IDs and grouping of commands is desirable anyway, the firmware is extending the command header from 4 bytes to 8 bytes to introduce a group (in place of the former flags field, since that's always 0 on commands and thus can be easily used to distinguish between the two. In order to support this most easily in the driver widen the command command ID used in the command sending functions and encode the new values (group and version) in the ID. That way existing code doesn't have to be changed (since the higher bits are 0 automatically) and newer code can easily use the new ID generation function to create a value to use in place of just the command ID. Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: mvm: add basic Time of Flight (802.11mc FTM) supportGregory Greenman2015-08-0410-0/+1563
| | | | | | | | | | | | ToF is a time based method for measurement of the WiFi device location within a WiFi environment. The driver functionality provided by this patch is the interface for communication with FW and receiving location related updates from the FW. The interface provided by this patch is via debugfs. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: mvm: remove IWL_UCODE_TLV_API_BASIC_DWELLSara Sharon2015-08-042-70/+20
| | | | | | | | | | All the supported firmwares support this API. This includes removing dwell per band, as band is no longer a factor in calculating the dwell. Only basic dwell is used and FW will calculate the actual dwell time. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: remove command header flags fieldJohannes Berg2015-08-047-66/+19
| | | | | | | | | | | | The 'flags' field really has been reserved in the firmware API for a very long time, probably since 4965. As a consequence, the field is always 0 and checking for a IWL_CMD_FAILED_MSK flag makes no sense. Rename the field to 'reserved', get rid of IWL_CMD_FAILED_MSK and all the code for it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: pcie: don't warn on long MPDUs when supportedEmmanuel Grumbach2015-08-041-2/+3
| | | | | | | | | | | In iwlmvm firmwares, the Byte count written in the scheduler byte count table is in DWORDs and not in bytes. We should check that this value fits in the 12 bits and the value can be either in bits of in DWORD or bytes depending on the firmware. Check the value after the translation to DWORDs is done (if needed). Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: pcie: add missing calls to synchronize_irq()Emmanuel Grumbach2015-08-041-0/+7
| | | | | | | | | | | In a few places, we were disabling interrupts but didn't make sure that the interrupt handler has finished running. Add calls to synchronize_irq() to ensure we finish handling the interrupts before we free resources or other things that could lead to a crash if the interrupt were to be handled later. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: pcie: cancel Tx timer upon firmware crashEmmanuel Grumbach2015-08-041-0/+4
| | | | | | | | When the firmware crashes, we can't expect the Tx queues to progress. Cancel their timer. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: mvm: Do not sample the device time for session protectionIlan Peer2015-08-041-2/+1
| | | | | | | | | Since the time-event is sent with the immediate flag set, there is no need to sample the device time. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: remove command and return value from opmode RXJohannes Berg2015-08-048-38/+17
| | | | | | | | | With the previous patch series, no opmode continues using the command or handler_status (i.e. the return value from the RX) so it can be removed now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: mvm: remove command/return value from RX handlersJohannes Berg2015-08-0418-232/+139
| | | | | | | | In the mvm driver, neither the old command nor the return value are used, so remove them. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: dvm: remove command/return value from RX handlersJohannes Berg2015-08-047-124/+65
| | | | | | | | | After the previous patches, the command that's passed in nor the return value are used any more, so can be removed. While at it, make some functions static. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: dvm: remove ADD_STA prints relying on station IDJohannes Berg2015-08-041-28/+6
| | | | | | | | | This makes the logging a little less useful, but as they're mostly synchronous commands it won't matter much. It gets rid of the dependency on the input command, which this is the only user of. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: dvm: move ADD_STA response handling to sync commandJohannes Berg2015-08-041-11/+18
| | | | | | | | | | | | | | | This driver currently has some very confusing ADD_STA response handling that runs asynchronously in the background for all of the commands, but is only really necessary for synchronous ones (the really asynchronous ones can only be done for already existing stations), and for the sync ones it actually waits for the RX handler to return a status code. Rework this to keep the debug printing in the handler, but do the code that's supposed to have an effect only for sync commands in the command sending function. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: mvm: LRU-assign key offsetsJohannes Berg2015-08-042-6/+27
| | | | | | | | | | | | | | | | | | | | | The current key offset assignment algorithm always uses the lowest unused key offset, which will potentially lead to issues when the firmware will change to take the key material for TX from the key table rather than from the TX command. In order to avoid those issues (and avoid forgetting about them) change the key offset allocation algorithm now to avoid reusing key offsets quickly. The new algorithm always picks as the next offset the least recently freed offset, i.e. the offset that has been unused for the longest amount of time. This is implemented by having a generation counter for each key offset that is incremented every time a key is deleted, except for the one that's deleted, which is reset to zero. Thus the highest counter is the key that's been unused longest. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: pcie: Set scheduler to work on auto modeHaim Dreyfuss2015-08-042-0/+2
| | | | | | | | | | | | During NIC initialization shared HW is reset and this disables the scheduler. Some HW platforms do not activate the scheduler after it. Consequently all HCMD sent by the driver stay at the queues which cause to queue stuck. Set the scheduler to work on auto active mode so it would be activated upon change over one of the queues' write pointer. Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: pcie: lock start_hw / start_fw / stop_deviceEmmanuel Grumbach2015-06-264-10/+81
| | | | | | | | | | | This allows to ensure that we don't have races between them. A user reported that stop_device was called twice upon rfkill interrupt after suspend. When the interrupts are enabled, and right after when we directly check the rfkill state. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: dvm: start HW before running FWEmmanuel Grumbach2015-06-261-0/+12
| | | | | | | | | | The new locking in PCIe transport requires to start_hw before start_fw. This uncovered a bug in dvm which failed to do so. Fix that. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: deprecate -10.ucode for 3160 / 7260 / 7265Sara Sharon2015-06-263-4/+3
| | | | | | | | This firmware is not supported anymore - stop loading this firmware. Remove code handling older versions. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: mvm: handle RX MPDUs separatelyJohannes Berg2015-06-261-1/+3
| | | | | | | | | | There's no need to forward RX MPDUs to notification wait tests, nor do we need to check them for firmware dump triggers, nor could they be asynchronous. It's thus more efficient to handle them separately, before going into the regular RX handlers. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* iwlwifi: mvm: rs: report last tx rate based on RSSI and capsEyal Shapira2015-06-262-7/+160
| | | | | | | | | | | | | In scenarios where we haven't converged yet to a specific modulation and rate it could be better to report to userspace the last tx rate based on the STA capabilities and RSSI. This is important as sometimes userspace displays the last tx rate as the link speed. This avoids being presented with low legacy rates when rs just begins its search or after an idle period in which it resets itself. Signed-off-by: Eyal Shapira <eyalx.shapira@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2015-06-241418-27517/+109464
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Add TX fast path in mac80211, from Johannes Berg. 2) Add TSO/GRO support to ibmveth, from Thomas Falcon 3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau. 4) Lots of new rhashtable tests, from Thomas Graf. 5) Run ingress qdisc lockless, from Alexei Starovoitov. 6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet. 7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov. 8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck. 9) Move page frag allocator under mm/, also from Alexander. 10) Add xmit_more support to hv_netvsc, from KY Srinivasan. 11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler. 12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko. 13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn. 14) Add netdev driver for GENEVE tunnels, from John W Linville. 15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso. 16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet. 17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann. 18) Add tail call support to BPF, from Alexei Starovoitov. 19) Add several pktgen helper scripts, from Jesper Dangaard Brouer. 20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa. 21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet. 22) Add Cavium ThunderX driver, from Sunil Goutham. 23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov. 24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai. 25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu. 26) Add more entropy inputs to flow dissector, from Tom Herbert. 27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen. 28) Convert ipset over to RCU locking, from Jozsef Kadlecsik. 29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...
| * bridge: vlan: flush the dynamically learned entries on port vlan deleteNikolay Aleksandrov2015-06-246-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Add a new argument to br_fdb_delete_by_port which allows to specify a vid to match when flushing entries and use it in nbp_vlan_delete() to flush the dynamically learned entries of the vlan/port pair when removing a vlan from a port. Before this patch only the local mac was being removed and the dynamically learned ones were left to expire. Note that the do_all argument is still respected and if specified, the vid will be ignored. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bridge: multicast: add a comment to br_port_state_selection about blocking stateNikolay Aleksandrov2015-06-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Add a comment to explain why we're not disabling port's multicast when it goes in blocking state. Since there's a check in the timer's function which bypasses the timer if the port's in blocking/disabled state, the timer will simply expire and stop without sending more queries. Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-06-2429-72/+186
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/mellanox/mlx4/main.c net/packet/af_packet.c Both conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * stmmac: troubleshoot unexpected bits in des0 & des1Alexey Brodkin2015-06-244-24/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current implementation of descriptor init procedure only takes care about setting/clearing ownership flag in "des0"/"des1" fields while it is perfectly possible to get unexpected bits set because of the following factors: [1] On driver probe underlying memory allocated with dma_alloc_coherent() might not be zeroed and so it will be filled with garbage. [2] During driver operation some bits could be set by SD/MMC controller (for example error flags etc). And unexpected and/or randomly set flags in "des0"/"des1" fields may lead to unpredictable behavior of GMAC DMA block. This change addresses both items above with: [1] Use of dma_zalloc_coherent() instead of simple dma_alloc_coherent() to make sure allocated memory is zeroed. That shouldn't affect performance because this allocation only happens once on driver probe. [2] Do explicit zeroing of both "des0" and "des1" fields of all buffer descriptors during initialization of DMA transfer. And while at it fixed identation of dma_free_coherent() counterpart as well. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: arc-linux-dev@synopsys.com Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * ip: report the original address of ICMP messagesJulian Anastasov2015-06-242-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ICMP messages can trigger ICMP and local errors. In this case serr->port is 0 and starting from Linux 4.0 we do not return the original target address to the error queue readers. Add function to define which errors provide addr_offset. With this fix my ping command is not silent anymore. Fixes: c247f0534cc5 ("ip: fix error queue empty skb handling") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * rocker: call correct unregister function on errorGilad Ben-Yossef2015-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the correct unregister function matching the register function on the error path. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Fixes: c1beeef7a32a791a ("rocker: implement IPv4 fib offloading") Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * Merge tag 'linux-can-fixes-for-4.1-20150621' of ↵David S. Miller2015-06-234-1/+14
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== Oliver Hartkopp fixed a bug in the generic CAN frame handling code, which may lead to loss of CAN frames. It was introduced during v4.1 development. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> gpg: Signature made Sun 21 Jun 2015 09:59:36 AM PDT using RSA key ID C9B5CFC7
| | | * can: fix loss of CAN frames in raw_rcvOliver Hartkopp2015-06-214-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Manfred Schlaegl here http://marc.info/?l=linux-netdev&m=143482089824232&w=2 commit 514ac99c64b "can: fix multiple delivery of a single CAN frame for overlapping CAN filters" requires the skb->tstamp to be set to check for identical CAN skbs. As net timestamping is influenced by several players (netstamp_needed and netdev_tstamp_prequeue) Manfred missed a proper timestamp which leads to CAN frame loss. As skb timestamping became now mandatory for CAN related skbs this patch makes sure that received CAN skbs always have a proper timestamp set. Maybe there's a better solution in the future but this patch fixes the CAN frame loss so far. Reported-by: Manfred Schlaegl <manfred.schlaegl@gmx.at> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | * | net: fs_enet: Fix NETIF_F_SG feature for Freescale MPC5121Alexander Popov2015-06-231-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4fc9b87bae25 ("net: fs_enet: Implement NETIF_F_SG feature") brings a trouble to Freescale MPC512x: a kernel oops happens during sending non-linear sk_buff with .data not aligned by 4. Log quotation: Unable to handle kernel paging request for data at address 0xe467c000 Faulting instruction address: 0xc000cd44 Oops: Kernel access of bad area, sig: 11 [#1] MPC512x generic Modules linked in: CPU: 0 PID: 984 Comm: kworker/0:1H Not tainted 4.1.0-rc8-00024-gbb16140 #2 Workqueue: rpciod rpc_async_schedule task: cf364a50 ti: cf362000 task.ti: cf362000 NIP: c000cd44 LR: c000c720 CTR: 00000206 REGS: cf363ac0 TRAP: 0300 Not tainted (4.1.0-rc8-00024-gbb16140) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 42004082 XER: 00000000 DAR: e467c000 DSISR: 20000000 GPR00: c0279e24 cf363b70 cf364a50 e467c000 00000206 0000001f 00000001 00000001 GPR08: 00000000 e467c000 e46800be 000139a6 82002082 00000000 c002e46c cf3c3680 GPR16: c044cb30 c04b0000 cf363c48 00000000 00000001 fde0315c 00000000 0000000b GPR24: 0000002c 000040be cf339aa0 0000000b 00000001 cf873210 00282f85 00000000 NIP [c000cd44] clean_dcache_range+0x1c/0x30 LR [c000c720] dma_direct_map_page+0x40/0x94 Call Trace: [cf363b70] [cf339b60] 0xcf339b60 (unreliable) [cf363b90] [c0279e24] fs_enet_start_xmit+0x1c8/0x42c [cf363bd0] [c02ff710] dev_hard_start_xmit+0x2dc/0x3d4 [cf363c40] [c0319c60] sch_direct_xmit+0xcc/0x1cc [cf363c70] [c02ff9c0] __dev_queue_xmit+0x1b8/0x47c [cf363ca0] [c032a3e8] ip_finish_output+0x1fc/0x9a8 [cf363ce0] [c032c31c] ip_send_skb+0x1c/0xa4 [cf363cf0] [c035112c] udp_send_skb+0xe4/0x2e8 [cf363d10] [c0351368] udp_push_pending_frames+0x38/0x84 [cf363d20] [c03537b8] udp_sendpage+0x134/0x174 [cf363d70] [c0384fd4] xs_sendpages+0x21c/0x250 [cf363db0] [c03852bc] xs_udp_send_request+0x50/0xf8 [cf363de0] [c0382f08] xprt_transmit+0x64/0x280 [cf363e20] [c038017c] call_transmit+0x168/0x234 [cf363e40] [c0387918] __rpc_execute+0x88/0x2b0 [cf363e80] [c00296f8] process_one_work+0x124/0x2fc [cf363ea0] [c0029a00] worker_thread+0x130/0x480 [cf363ef0] [c002e528] kthread+0xbc/0xd0 [cf363f40] [c000e4a8] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7c70faa6 60630800 7c70fba6 4c00012c 4e800020 38a0001f 7c632878 7c832050 7c842a14 5484d97f 4d820020 7c8903a6 <7c00186c> 38630020 4200fff8 7c0004ac ---[ end trace c846c1eceb513c85 ]--- The reason: MPC5121 FEC requires 4-byte alignment for TX data buffer and calls tx_skb_align_workaround() for copying sk_buff with not aligned .data to a new sk_buff with aligned one. But tx_skb_align_workaround() uses skb_copy_from_linear_data() which doesn't work for non-linear sk_buff: a new sk_buff has non-zero nr_frags and zero .data_len. So improve the condition of calling tx_skb_align_workaround() and use skb_linearize() in it. Signed-off-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | xen-netback: fix a BUG() during initializationPalik, Imre2015-06-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit edafc132baac ("xen-netback: making the bandwidth limiter runtime settable") introduced the capability to change the bandwidth rate limit at runtime. But it also introduced a possible crashing bug. If netback receives two XenbusStateConnected without getting the hotplug-status watch firing in between, then it will try to register the watches for the rate limiter again. But this triggers a BUG() in the watch registration code. The fix modifies connect() to remove the possibly existing packet-rate watches before trying to install those watches. This behaviour is in line with how connect() deals with the hotplug-status watch. Signed-off-by: Imre Palik <imrep@amazon.de> Cc: Matt Wilson <msw@amazon.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | bridge: multicast: restore router configuration on port link down/upSatish Ashok2015-06-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a port goes through a link down/up the multicast router configuration is not restored. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries") Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | NET: ROSE: Don't dereference NULL neighbour pointer.Ralf Baechle2015-06-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A ROSE socket doesn't necessarily always have a neighbour pointer so check if the neighbour pointer is valid before dereferencing it. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Tested-by: Bernard Pidoux <f6bvp@free.fr> Cc: stable@vger.kernel.org #2.6.11+ Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | tcp: Do not call tcp_fastopen_reset_cipher from interrupt contextChristoph Paasch2015-06-233-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcp_fastopen_reset_cipher really cannot be called from interrupt context. It allocates the tcp_fastopen_context with GFP_KERNEL and calls crypto_alloc_cipher, which allocates all kind of stuff with GFP_KERNEL. Thus, we might sleep when the key-generation is triggered by an incoming TFO cookie-request which would then happen in interrupt- context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP: [ 36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266 [ 36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill [ 36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14 [ 36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 36.008250] 00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8 [ 36.009630] ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928 [ 36.011076] 0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00 [ 36.012494] Call Trace: [ 36.012953] <IRQ> [<ffffffff8171d53a>] dump_stack+0x4f/0x6d [ 36.014085] [<ffffffff810967d3>] ___might_sleep+0x103/0x170 [ 36.015117] [<ffffffff81096892>] __might_sleep+0x52/0x90 [ 36.016117] [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190 [ 36.017266] [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130 [ 36.018485] [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130 [ 36.019679] [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70 [ 36.020884] [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60 [ 36.022058] [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730 [ 36.023118] [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0 [ 36.024185] [<ffffffff810e3872>] ? __module_text_address+0x12/0x60 [ 36.025327] [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60 [ 36.026410] [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0 [ 36.027556] [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170 [ 36.028784] [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0 [ 36.029832] [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20 [ 36.030936] [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0 [ 36.031875] [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70 [ 36.032953] [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0 [ 36.034065] [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0 [ 36.035069] [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0 [ 36.035963] [<ffffffff81657569>] ip_rcv_finish+0x119/0x330 [ 36.036950] [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0 [ 36.037847] [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930 [ 36.038994] [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70 [ 36.040033] [<ffffffff81610b72>] process_backlog+0xd2/0x1f0 [ 36.041025] [<ffffffff81611482>] net_rx_action+0x122/0x310 [ 36.042007] [<ffffffff81076743>] __do_softirq+0x103/0x2f0 [ 36.042978] [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30 This patch moves the call to tcp_fastopen_init_key_once to the places where a listener socket creates its TFO-state, which always happens in user-context (either from the setsockopt, or implicitly during the listen()-call) Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once") Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | mvneta: add forgotten initialization of autonegotiation bitsStas Sergeev2015-06-231-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 898b2970e2c9 ("mvneta: implement SGMII-based in-band link state signaling") changed mvneta_adjust_link() so that it does not clear the auto-negotiation bits in MVNETA_GMAC_AUTONEG_CONFIG register. This was necessary for auto-negotiation mode to work. Unfortunately I haven't checked if these bits are ever initialized. It appears they are not. This patch adds the missing initialization of the auto-negotiation bits in the MVNETA_GMAC_AUTONEG_CONFIG register. It fixes the following regression: https://www.mail-archive.com/netdev@vger.kernel.org/msg67928.html Since the patch was tested to fix a regression, it should be applied to stable tree. Tested-by: Arnaud Ebalard <arno@natisbad.org> CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> CC: Florian Fainelli <f.fainelli@gmail.com> CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: stable@vger.kernel.org Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | packet: avoid out of bounds read in round robin fanoutWillem de Bruijn2015-06-211-16/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PACKET_FANOUT_LB computes f->rr_cur such that it is modulo f->num_members. It returns the old value unconditionally, but f->num_members may have changed since the last store. Ensure that the return value is always < num. When modifying the logic, simplify it further by replacing the loop with an unconditional atomic increment. Fixes: dc99f600698d ("packet: Add fanout support.") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | mac80211: fix locking in update_vlan_tailroom_need_count()Johannes Berg2015-06-211-3/+10
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, Michal's change to fix AP_VLAN crypto tailroom caused a locking issue that was reported by lockdep, but only in a few cases - the issue was a classic ABBA deadlock caused by taking the mtx after the key_mtx, where normally they're taken the other way around. As the key mutex protects the field in question (I'm adding a few annotations to make that clear) only the iteration needs to be protected, but we can also iterate the interface list with just RCU protection while holding the key mutex. Fixes: f9dca80b98ca ("mac80211: fix AP_VLAN crypto tailroom calculation") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * isdn: disable HiSax NetJet driver on microblaze archNicolai Stange2015-06-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix an allmodconfig compilation failer on microblaze due to big endian architectures being apparently unsupported by the NetJet code: drivers/isdn/hisax/nj_s.c: In function 'setup_netjet_s': drivers/isdn/hisax/nj_s.c:265:2: error: #error "not running on big endian machines now" Modify the relevant Kconfig such that the NetJet code is not built on microblaze anymore. Note that endianess on microblaze is not determined through Kconfig, but by means of a compiler provided CPP macro, namely __MICROBLAZEEL__. However, gcc defaults to big endianess on that platform. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Acked-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * neigh: do not modify unlinked entriesJulian Anastasov2015-06-211-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lockless lookups can return entry that is unlinked. Sometimes they get reference before last neigh_cleanup_and_release, sometimes they do not need reference. Later, any modification attempts may result in the following problems: 1. entry is not destroyed immediately because neigh_update can start the timer for dead entry, eg. on change to NUD_REACHABLE state. As result, entry lives for some time but is invisible and out of control. 2. __neigh_event_send can run in parallel with neigh_destroy while refcnt=0 but if timer is started and expired refcnt can reach 0 for second time leading to second neigh_destroy and possible crash. Thanks to Eric Dumazet and Ying Xue for their work and analyze on the __neigh_event_send change. Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour") Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet path.") Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().") Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * packet: read num_members once in packet_rcv_fanout()Eric Dumazet2015-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to tell compiler it must not read f->num_members multiple times. Otherwise testing if num is not zero is flaky, and we could attempt an invalid divide by 0 in fanout_demux_cpu() Note bug was present in packet_rcv_fanout_hash() and packet_rcv_fanout_lb() but final 3.1 had a simple location after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.") Fixes: dc99f600698dc ("packet: Add fanout support.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * bridge: fix br_stp_set_bridge_priority race conditionsNikolay Aleksandrov2015-06-182-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the ->set() spinlocks were removed br_stp_set_bridge_priority was left running without any protection when used via sysfs. It can race with port add/del and could result in use-after-free cases and corrupted lists. Tested by running port add/del in a loop with stp enabled while setting priority in a loop, crashes are easily reproducible. The spinlocks around sysfs ->set() were removed in commit: 14f98f258f19 ("bridge: range check STP parameters") There's also a race condition in the netlink priority support that is fixed by this change, but it was introduced recently and the fixes tag covers it, just in case it's needed the commit is: af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Fixes: 14f98f258f19 ("bridge: range check STP parameters") Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net/mlx4_core: Disable Granular QoS per VF under IB/Eth VPI configurationOr Gerlitz2015-06-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to firmware bug, under VPI configuration when port1 = IB and port2 = Eth, Granular QoS per VF isn't working properly. More over, the whole QP0/QP1 Para-Virtualization in the mlx4 IB driver is broken on that config. Hence, we must disable Granular QoS per VF under that configuration till a fix is introduced. Once that happens, a new device capability will be used to mark the feature support on that specific configuration. Reported-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * sctp: fix ASCONF list handlingMarcelo Ricardo Leitner2015-06-143-11/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->auto_asconf_splist is per namespace and mangled by functions like sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization. Also, the call to inet_sk_copy_descendant() was backuping ->auto_asconf_list through the copy but was not honoring ->do_auto_asconf, which could lead to list corruption if it was different between both sockets. This commit thus fixes the list handling by using ->addr_wq_lock spinlock to protect the list. A special handling is done upon socket creation and destruction for that. Error handlig on sctp_init_sock() will never return an error after having initialized asconf, so sctp_destroy_sock() can be called without addrq_wq_lock. The lock now will be take on sctp_close_sock(), before locking the socket, so we don't do it in inverse order compared to sctp_addr_wq_timeout_handler(). Instead of taking the lock on sctp_sock_migrate() for copying and restoring the list values, it's preferred to avoid rewritting it by implementing sctp_copy_descendant(). Issue was found with a test application that kept flipping sysctl default_auto_asconf on and off, but one could trigger it by issuing simultaneous setsockopt() calls on multiple sockets or by creating/destroying sockets fast enough. This is only triggerable locally. Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).") Reported-by: Ji Jianwen <jiji@redhat.com> Suggested-by: Neil Horman <nhorman@tuxdriver.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: inet_diag: export IPV6_V6ONLY sockoptPhil Sutter2015-06-242-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is exported to userspace. It indicates whether a socket bound to in6addr_any listens on IPv4 as well as IPv6. Since the socket is natively IPv6, it is not listed by e.g. 'ss -l -4'. This patch is accompanied by an appropriate one for iproute2 to enable the additional information in 'ss -e'. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'ipv4-nexthop-link-status'David S. Miller2015-06-2412-47/+130
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andy Gospodarek says: ==================== changes to make ipv4 routing table aware of next-hop link status This series adds the ability to have the Linux kernel track whether or not a particular route should be used based on the link-status of the interface associated with the next-hop. Before this patch any link-failure on an interface that was serving as a gateway for some systems could result in those systems being isolated from the rest of the network as the stack would continue to attempt to send frames out of an interface that is actually linked-down. When the kernel is responsible for all forwarding, it should also be responsible for taking action when the traffic can no longer be forwarded -- there is no real need to outsource link-monitoring to userspace anymore. This feature is only enabled with the new per-interface or ipv4 global sysctls called 'ignore_routes_with_linkdown'. net.ipv4.conf.all.ignore_routes_with_linkdown = 0 net.ipv4.conf.default.ignore_routes_with_linkdown = 0 net.ipv4.conf.lo.ignore_routes_with_linkdown = 0 ... When the above sysctls are set, the kernel will not only report to userspace that the link is down, but it will also report to userspace that a route is dead. This will signal to userspace that the route will not be selected. With the new sysctls set, the following behavior can be observed (interface p8p1 is link-down): default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache <local> 80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15 cache While the route does remain in the table (so it can be modified if needed rather than being wiped away as it would be if IFF_UP was cleared), the proper next-hop is chosen automatically when the link is down. Now interface p8p1 is linked-up: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2 90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache <local> 80.0.0.2 dev p8p1 src 80.0.0.1 cache and the output changes to what one would expect. If the global or interface sysctl is not set, the following output would be expected when p8p1 is down: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 If the dead flag does not appear there should be no expectation that the kernel would skip using this route due to link being down. v2: Split kernel changes into 2 patches: first to add linkdown flag and second to add new sysctl settings. Also took suggestion from Alex to simplify code by only checking sysctl during fib lookup and suggestion from Scott to add a per-interface sysctl. Added iproute2 patch to recognize and print linkdown flag. v3: Code cleanups along with reverse-path checks suggested by Alex and small fixes related to problems found when multipath was disabled. v4: Drop binary sysctls v5: Whitespace and variable declaration fixups suggested by Dave v6: Style changes noticed by Dave and checkpath suggestions. v7: Last checkpatch fixup. Though there were some that preferred not to have a configuration option and to make this behavior the default when it was discussed in Ottawa earlier this year since "it was time to do this." I wanted to propose the config option to preserve the current behavior for those that desire it. I'll happily remove it if Dave and Linus approve. An IPv6 implementation is also needed (DECnet too!), but I wanted to start with the IPv4 implementation to get people comfortable with the idea before moving forward. If this is accepted the IPv6 implementation can be posted shortly. There was also a request for switchdev support for this, but that will be posted as a followup as switchdev does not currently handle dead next-hops in a multi-path case and I felt that infra needed to be added first. FWIW, we have been running the original version of this series with a global sysctl and our customers have been happily using a backported version for IPv4 and IPv6 for >6 months. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: ipv4 sysctl option to ignore routes when nexthop link is downAndy Gospodarek2015-06-2411-24/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This feature is only enabled with the new per-interface or ipv4 global sysctls called 'ignore_routes_with_linkdown'. net.ipv4.conf.all.ignore_routes_with_linkdown = 0 net.ipv4.conf.default.ignore_routes_with_linkdown = 0 net.ipv4.conf.lo.ignore_routes_with_linkdown = 0 ... When the above sysctls are set, will report to userspace that a route is dead and will no longer resolve to this nexthop when performing a fib lookup. This will signal to userspace that the route will not be selected. The signalling of a RTNH_F_DEAD is only passed to userspace if the sysctl is enabled and link is down. This was done as without it the netlink listeners would have no idea whether or not a nexthop would be selected. The kernel only sets RTNH_F_DEAD internally if the interface has IFF_UP cleared. With the new sysctl set, the following behavior can be observed (interface p8p1 is link-down): default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache <local> 80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15 cache While the route does remain in the table (so it can be modified if needed rather than being wiped away as it would be if IFF_UP was cleared), the proper next-hop is chosen automatically when the link is down. Now interface p8p1 is linked-up: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2 90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache <local> 80.0.0.2 dev p8p1 src 80.0.0.1 cache and the output changes to what one would expect. If the sysctl is not set, the following output would be expected when p8p1 is down: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 Since the dead flag does not appear, there should be no expectation that the kernel would skip using this route due to link being down. v2: Split kernel changes into 2 patches, this actually makes a behavioral change if the sysctl is set. Also took suggestion from Alex to simplify code by only checking sysctl during fib lookup and suggestion from Scott to add a per-interface sysctl. v3: Code clean-ups to make it more readable and efficient as well as a reverse path check fix. v4: Drop binary sysctl v5: Whitespace fixups from Dave v6: Style changes from Dave and checkpatch suggestions v7: One more checkpatch fixup Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: track link-status of ipv4 nexthopsAndy Gospodarek2015-06-244-23/+67
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are reachable via an interface where carrier is off. No action is taken, but additional flags are passed to userspace to indicate carrier status. This also includes a cleanup to fib_disable_ip to more clearly indicate what event made the function call to replace the more cryptic force option previously used. v2: Split out kernel functionality into 2 patches, this patch simply sets and clears new nexthop flag RTNH_F_LINKDOWN. v3: Cleanups suggested by Alex as well as a bug noticed in fib_sync_down_dev and fib_sync_up when multipath was not enabled. v5: Whitespace and variable declaration fixups suggested by Dave. v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them all. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: switchdev: ignore unsupported bridge flagsVivien Didelot2015-06-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS attributes, but a driver doesn't need to implement this in order to get bridge link information. So error out only on errors different than -EOPNOTSUPP. (This is a follow-up patch for 7d4f8d8.) Fixes: 8793d0a664a8 ("switchdev: add new switchdev_port_bridge_getlink") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Cavium: Fix MAC address setting in shutdown statePavel Fedin2015-06-242-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bug pops up with NetworkManager on Fedora 21. NetworkManager tends to stop the interface (nicvf_stop() is called) before changing settings. In stopped state MAC cannot be sent to a PF. However, when the interface is restarted (nicvf_open() is called), we ping the PF using NIC_MBOX_MSG_READY message, and the PF replies back with old MAC address, overriding what we had after MAC setting from userspace. As a result, we cannot set MAC address using NetworkManager. This patch introduces special tracking of MAC change in stopped state so that the correct new MAC address is sent to a PF when interface is reopen. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | drivers: net: xgene: fix for ACPI support without ACPIStephen Rothwell2015-06-241-0/+2
| | | | | | | | | | | | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud