summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel
Commit message (Collapse)AuthorAgeFilesLines
...
* | ice: Cleanup duplicate control queue codeBruce Allan2018-11-201-142/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Assigning the register offset and mask values contains duplicate code that can easily be replaced with a macro. 2. Separate functions for freeing send queue and receive queue rings are not needed; replace with a single function that uses a pointer to the struct ice_ctl_q_ring structure as a parameter instead of a pointer to the struct ice_ctl_q_info structure. 3. Initializing register settings for both send queue and receive queue contains duplicate code that can easily be replaced with a helper function. 4. Separate functions for freeing send queue and receive queue buffers are not needed; duplicate code can easily be replaced with a macro. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | ice: Do autoneg based on VSI stateAkeem G Abodunrin2018-11-201-5/+5
| | | | | | | | | | | | | | | | | | | | If VSI state is up, we should do autoneg with link up, otherwise with link down. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | Merge branch '40GbE' of ↵David S. Miller2018-11-1511-43/+116
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-11-14 This series contains updates to i40e and virtchnl. Lance Roy updates i40e to use lockdep_assert_held() instead of spin_is_locked(), since it is better suited to check locking requirements. Jan improves the code readability in XDP by adding the use of a local variable. Provides protection on methods that create/modify/destroy VF's via locking mechanism to prevent unstable behaviour and potential kernel panics. Krzysztof adds a hardware capability flag to indicate whether firmware supports stopping the LLDP agent. Patryk replaces the use of strncpy() with strlcpy() to ensure the buffer is NULL terminated. Mitch fixes the issue of trying to start nway on devices that do not support auto-negotiation, by checking the autoneg state before attempting to restart nway. Alice updates virtchnl to keep the checks all together for ease of readability and consistency. Also fixed a "off by one" error in the number of traffic classes being calculated. Richard fixed VF port VLANs, where the priority bits were incorrectly set because the incorrect shift and mask bits were being used. Alan adds a bit to set and check if a timeout recovery is already pending to prevent overlapping transmit timeout recovery. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | i40e: prevent overlapping tx_timeout recoverAlan Brady2018-11-142-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a TX hang occurs, we attempt to recover by incrementally resetting. If we're starved for CPU time, it's possible the reset doesn't actually complete (or even fire) before another tx_timeout fires causing us to fly through the different resets without actually doing them. This adds a bit to set and check if a timeout recovery is already pending and, if so, bail out of tx_timeout. The bit will get cleared at the end of i40e_rebuild when reset is complete. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: suppress bogus error messageMitch Williams2018-11-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The i40e driver complains about unprivileged VFs trying to configure promiscuous mode each time a VF reset occurs. This isn't the fault of the poor VF driver - the PF driver itself is making the request. To fix this, skip the privilege check if the request is to disable all promiscuous activity. This gets rid of the bogus message, but doesn't affect privilege checks, since we really only care if the unprivileged VF is trying to enable promiscuous mode. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Use correct shift for VLAN priorityRichard Rodriguez2018-11-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using port VLAN, for VFs, and setting priority bits, the device was sending out incorrect priority bits, and also setting the CFI bit incorrectly. To fix this, changed shift and mask bit definition for this function, to use the correct ones. Signed-off-by: Richard Rodriguez <richard.rodriguez@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: always set ks->base.speed in i40e_get_settings_link_upJacob Keller2018-11-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | In i40e_get_settings_link_up, set ks->base.speed to SPEED_UNKNOWN in the case where we don't know the link speed. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: don't restart nway if autoneg not supportedMitch Williams2018-11-141-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On link types that do not support autoneg, we cannot attempt to restart nway negotiation. This results in a dead link that requires a power cycle to remedy. Fix this by saving off the autoneg state and checking this value before we try to restart nway. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Allow disabling FW LLDP on X722 devicesPatryk Małek2018-11-144-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows disabling FW LLDP agent on X722 devices. It also changes a source of information for this feature from pf->hw_features to pf->hw.flags which are set in i40e_init_adminq. Signed-off-by: Patryk Małek <patryk.malek@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: update driver versionAlice Michael2018-11-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The version numbers have not been kept up to date and this is an effort to ammend that. Signed-off-by: Alice Michael <alice.michael@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Protect access to VF control methodsJan Sokolowski2018-11-142-5/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A scenario has been found in which simultaneous addition/removal and modification of VF's might cause unstable behaviour, up to and including kernel panics. Protect the methods that create/modify/destroy VF's by locking them behind an atomically set bit in PF status bitfield. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Replace strncpy with strlcpy to ensure null terminationPatryk Małek2018-11-142-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using strncpy allows destination buffer to be not null terminated after the copying takes place. strlcpy ensures that's not the case by explicitly setting last element in the buffer as '\0'. Signed-off-by: Patryk Małek <patryk.malek@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Add capability flag for stopping FW LLDPKrzysztof Galazka2018-11-143-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add HW capability flag to indicate that firmware supports stopping LLDP agent. This feature has been added in FW API 1.7 for XL710 devices and 1.6 for X722. Also raise expected minor version number for X722 FW API to 6. Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Use a local variable for readabilityJan Sokolowski2018-11-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Use a local variable to make the code a bit more readable. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | i40e: Replace spin_is_locked() with lockdepLance Roy2018-11-141-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep_assert_held() is better suited to checking locking requirements, since it won't get confused when someone else holds the lock. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Remove ICE_MAX_TXQ_PER_TXQG check when configuring Tx queueMd Fahad Iqbal Polash2018-11-131-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the condition checking of VSI TX queue number to ICE_MAX_TXQ_PER_TXQG. This is an unnecessary check and causes a driver load error on hosts that have more than 128 cores. Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Destroy scheduler tree in reset pathHenry Tieman2018-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scheduler tree is is always rebuilt during reset. The existing code adds new scheduler nodes for queues but may not clean up earlier nodes. This patch removed the old scheduler tree during reset before it is rebuilt. Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Fix to make VLAN priority tagged traffic to appear on all TCsUsha Ketineni2018-11-135-51/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch includes below changes to resolve the issue of ETS bandwidth shaping to work. 1. Allocation of Tx queues is accounted for based on the enabled TC's in ice_vsi_setup_q_map() and enabled the Tx queues on those TC's via ice_vsi_cfg_txqs() 2. Get the mapped netdev TC # for the user priority and set the priority to TC mapping for the VSI. Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Call pci_disable_sriov before stopping queues for VFBrett Creeley2018-11-131-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous to this commit the driver was immediately stopping Tx/Rx queues when doing the following "echo 0 > sriov_numvfs" and then it was calling pci_disable_sriov if the VFs are not assigned. This was causing the VIRTCHNL_OP_DISABLE_QUEUES to fail because it was trying to stop the queues for a second time. Fix this by calling pci_disable_sriov before stopping the Tx/Rx queues. This allows the VIRTCHNL_OP_DISABLE_QUEUES to get processed before the driver tries to stop the Rx/Tx queues in ice_free_vfs. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Increase Rx queue disable timeoutPiotr Raczynski2018-11-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With much traffic coming into the port, Rx queue disable procedure can take more time until all pending queue requests on PCIe finish. Reuse ICE_Q_WAIT_MAX_RETRY macro and increase the delay itself. Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Fix NVM mask definesLev Faerman2018-11-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes bad masks that would break compilation when evaluated. Signed-off-by: Lev Faerman <lev.faerman@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Avoid nested RTNL locking in ice_dis_vsiDave Ertman2018-11-131-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ice_dis_vsi() performs an rtnl_lock() if it detects a netdev that is running on the VSI. In cases where the RTNL lock has already been acquired, a deadlock results. Add a boolean to pass to ice_dis_vsi to tell it if the RTNL lock is already held. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Calculate guaranteed VSIs per function and use itAnirudh Venkataramanan2018-11-135-6/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we are setting the guar_num_vsi to equal to ICE_MAX_VSI which is the device limit of 768. This is incorrect and could have unintended consequences. To fix this use the valid_function's 8-bit bitmap returned from discovering device capabilities to determine the guar_num_vsi per function. guar_num_vsi value is then passed on to pf->num_alloc_vsi. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Remove node before releasing VSIAnirudh Venkataramanan2018-11-133-1/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before releasing the VSI, remove the VSI scheduler node. If not, the node is left in the scheduler tree and, on subsequent load, the scheduler tree contains the node so it does not set it in vsi_ctx. This, later, causes the node to not be found in ice_sched_get_free_qparent which leads to a "Failed to set LAN Tx queue context, error: -1". To remove the scheduler node, this patch introduces ice_rm_vsi_lan_cfg and related helpers. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Check for q_vector when stopping ringsTony Nguyen2018-11-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a gap in time between a VF reset, which sets the q_vector to NULL, and the VF requesting mapping of the q_vectors. If ice_vsi_stop_tx_rings() is called during this time, a NULL pointer dereference is encountered. Add a check in ice_vsi_stop_tx_rings() to ensure the q_vector is set to avoid this situation from occurring. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | ice: Fix debug print in ice_tx_timeoutBrett Creeley2018-11-132-6/+12
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the debug print in ice_tx_timeout is printing useless and duplicate values. First, head is being assigned to tx_ring->next_to_clean and we are printing both of those values, but naming them HWB and NTC respectively. Also, reading tail always returns 0 so remove that as well. Instead of assigning the SW head (NTC) read to head, use the actual head register and change the debug print to note that this is HW_HEAD. Also reduce the scope of a couple variables. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-11-1114-60/+113
|\|
| * i40e: enable NETIF_F_NTUPLE and NETIF_F_HW_TC at driver loadJacob Keller2018-11-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assignment of the feature flag NETIF_F_NTUPLE and NETIF_F_HW_TC occurs prior to the initial setup of the local hw_features variable. This means the features are set as user-changeable, but are not set in the currently active feature list. This results in the features being disabled at the driver's initial load. Move the assignment after the initial assignment of hw_features, and assign to the local variable. This ensures that NETIF_F_NTUPLE and NETIF_F_HW_TC are marked as user-changeable, and also enables them by default when the driver loads. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e: restore NETIF_F_GSO_IPXIP[46] to netdev featuresJacob Keller2018-11-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit bacd75cfac8a ("i40e/i40evf: Add capability exchange for outer checksum", 2017-04-06) the i40e driver has not reported support for IP-in-IP offloads. This likely occurred due to a bad rebase, as the commit extracts hw_enc_features into its own variable. As part of this change, it dropped the NETIF_F_FSO_IPXIP flags from the netdev->hw_enc_features. This was unfortunately not caught during code review. Fix this by adding back the missing feature flags. For reference, NETIF_F_GSO_IPXIP4 was added in commit 7e13318daa4a ("net: define gso types for IPx over IPv4 and IPv6", 2016-05-20), replacing NETIF_F_GSO_IPIP and NETIF_F_GSO_SIT. NETIF_F_GSO_IPXIP6 was added in commit bf2d1df39502 ("intel: Add support for IPv6 IP-in-IP offload", 2016-05-20). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Change req_speeds to be u16Chinh T Cao2018-11-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the req_speeds field in struct ice_link_status is a u8, req_speeds & ICE_AQ_LINK_SPEED_40GB always returns 0. This was caught by a coverity scan. Fix this by changing req_speeds to be u16. Reported-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * igb: shorten maximum PHC timecounter update intervalMiroslav Lichvar2018-11-061-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The timecounter needs to be updated at least once per ~550 seconds in order to avoid a 40-bit SYSTIM timestamp to be misinterpreted as an old timestamp. Since commit 500462a9de65 ("timers: Switch to a non-cascading wheel"), scheduling of delayed work seems to be less accurate and a requested delay of 540 seconds may actually be longer than 550 seconds. Also, the PHC may be adjusted to run up to 6% faster than real time and the system clock up to 10% slower. Shorten the delay to 360 seconds to be sure the timecounter is updated in time. This fixes an issue with HW timestamps on 82580/I350/I354 being off by ~1100 seconds for few seconds every ~9 minutes. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Fix the bytecount sent to netdev_tx_sent_queueBrett Creeley2018-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if the driver does a TSO offload the bytecount sent to netdev_tx_sent_queue will be incorrect. This is because in ice_tso we overwrite the initial value that we set in ice_tx_map. This creates a mismatch between the Tx and Tx clean flow. In the Tx clean flow we calculate the bytecount (called total_bytes) as we clean the descriptors so the value used in the Tx clean path is correct. Fix this by using += in ice_tso instead of =. This fixes the mismatch in bytecount mentioned above. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Fix tx_timeout in PF driverBrett Creeley2018-11-064-6/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this commit the driver was running into tx_timeouts when a queue was stressed enough. This was happening because the HW tail and SW tail (NTU) were incorrectly out of sync. Consequently this was causing the HW head to collide with the HW tail, which to the hardware means that all descriptors posted for Tx have been processed. Due to the Tx logic used in the driver SW tail and HW tail are allowed to be out of sync. This is done as an optimization because it allows the driver to write HW tail as infrequently as possible, while still updating the SW tail index to keep track. However, there are situations where this results in the tail never getting updated, resulting in Tx timeouts. Tx HW tail write condition: if (netif_xmit_stopped(txring_txq(tx_ring) || !skb->xmit_more) writel(sw_tail, tx_ring->tail); An issue was found in the Tx logic that was causing the afore mentioned condition for updating HW tail to never happen, causing tx_timeouts. In ice_xmit_frame_ring we calculate how many descriptors we need for the Tx transaction based on the skb the kernel hands us. This is then passed into ice_maybe_stop_tx along with some extra padding to determine if we have enough descriptors available for this transaction. If we don't then we return -EBUSY to the stack, otherwise we move on and eventually prepare the Tx descriptors accordingly in ice_tx_map and set next_to_watch. In ice_tx_map we make another call to ice_maybe_stop_tx with a value of MAX_SKB_FRAGS + 4. The key here is that this value is possibly less than the value we sent in the first call to ice_maybe_stop_tx in ice_xmit_frame_ring. Now, if the number of unused descriptors is between MAX_SKB_FRAGS + 4 and the value used in the first call to ice_maybe_stop_tx in ice_xmit_frame_ring then we do not update the HW tail because of the "Tx HW tail write condition" above. This is because in ice_maybe_stop_tx we return success from ice_maybe_stop_tx instead of calling __ice_maybe_stop_tx and subsequently calling netif_stop_subqueue, which sets the __QUEUE_STATE_DEV_XOFF bit. This bit is then checked in the "Tx HW tail write condition" by calling netif_xmit_stopped and subsequently updating HW tail if the afore mentioned bit is set. In ice_clean_tx_irq, if next_to_watch is not NULL, we end up cleaning the descriptors that HW sets the DD bit on and we have the budget. The HW head will eventually run into the HW tail in response to the description in the paragraph above. The next time through ice_xmit_frame_ring we make the initial call to ice_maybe_stop_tx with another skb from the stack. This time we do not have enough descriptors available and we return NETDEV_TX_BUSY to the stack and end up setting next_to_watch to NULL. This is where we are stuck. In ice_clean_tx_irq we never clean anything because next_to_watch is always NULL and in ice_xmit_frame_ring we never update HW tail because we already return NETDEV_TX_BUSY to the stack and eventually we hit a tx_timeout. This issue was fixed by making sure that the second call to ice_maybe_stop_tx in ice_tx_map is passed a value that is >= the value that was used on the initial call to ice_maybe_stop_tx in ice_xmit_frame_ring. This was done by adding the following defines to make the logic more clear and to reduce the chance of mucking this up again: ICE_CACHE_LINE_BYTES 64 ICE_DESCS_PER_CACHE_LINE (ICE_CACHE_LINE_BYTES / \ sizeof(struct ice_tx_desc)) ICE_DESCS_FOR_CTX_DESC 1 ICE_DESCS_FOR_SKB_DATA_PTR 1 The ICE_CACHE_LINE_BYTES being 64 is an assumption being made so we don't have to figure this out on every pass through the Tx path. Instead I added a sanity check in ice_probe to verify cache line size and print a message if it's not 64 Bytes. This will make it easier to file issues if they are seen when the cache line size is not 64 Bytes when reading from the GLPCI_CNF2 register. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Fix napi delete calls for removeDave Ertman2018-11-063-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the remove path, the vsi->netdev is being set to NULL before the call to free vectors. This is causing the netif_napi_del call to never be made. Add a call to ice_napi_del to the same location as the calls to unregister_netdev and just prior to them. This will use the reverse flow as the register and netif_napi_add calls. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Fix typo in error messageAnirudh Venkataramanan2018-11-061-1/+1
| | | | | | | | | | | | | | | | | | Print should say "Enabling" instead of "Enaabling" Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Fix flags for port VLANMd Fahad Iqbal Polash2018-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | According to the spec, whenever insert PVID field is set, the VLAN driver insertion mode should be set to 01b which isn't done currently. Fix it. Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Remove duplicate addition of VLANs in replay pathAnirudh Venkataramanan2018-11-063-39/+6
| | | | | | | | | | | | | | | | | | | | | | ice_restore_vlan and active_vlans were originally put in place to reprogram VLAN filters in the replay path. This is now done as part of the much broader VSI rebuild/replay framework. So remove both ice_restore_vlan and active_vlans Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Free VSI contexts during for unloadVictor Raj2018-11-063-0/+17
| | | | | | | | | | | | | | | | | | | | In the unload path, all VSIs are freed. Also free the related VSI contexts to prevent memory leaks. Signed-off-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Fix dead device link issue with flow controlAkeem G Abodunrin2018-11-061-1/+6
| | | | | | | | | | | | | | | | | | | | Setting Rx or Tx pause parameter currently results in link loss on the interface, requiring the platform/host to be cold power cycled. Fix it. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Check for reset in progress during removeAnirudh Venkataramanan2018-11-062-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The remove path does not currently check to see if a reset is in progress before proceeding. This can cause a resource collision resulting in various types of errors. Check for reset in progress and wait for a reasonable amount of time before allowing the remove to progress. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ice: Set carrier state and start/stop queues in rebuildAnirudh Venkataramanan2018-11-061-1/+17
| | | | | | | | | | | | | | | | | | Set the carrier state post rebuild by querying the link status. Also start/stop queues based on link status. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | ixgbe: extend PTP gettime function to read system clockMiroslav Lichvar2018-11-091-10/+44
| | | | | | | | | | | | | | | | | | | | | | This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl. Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | igb: extend PTP gettime function to read system clockMiroslav Lichvar2018-11-091-10/+55
| | | | | | | | | | | | | | | | | | | | | | This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl. Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | e1000e: extend PTP gettime function to read system clockMiroslav Lichvar2018-11-093-16/+45
| | | | | | | | | | | | | | | | | | | | | | This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl. Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | igc: Clean up codeSasha Neftin2018-11-072-24/+0
| | | | | | | | | | | | | | | | | | | | Address few community comments. Remove unused code, will be added per demand. Remove blank lines and unneeded includes. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | e1000e: allow non-monotonic SYSTIM readingsMiroslav Lichvar2018-11-071-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems with some NICs supported by the e1000e driver a SYSTIM reading may occasionally be few microseconds before the previous reading and if enabled also pass e1000e_sanitize_systim() without reaching the maximum number of rereads, even if the function is modified to check three consecutive readings (i.e. it doesn't look like a double read error). This causes an underflow in the timecounter and the PHC time jumps hours ahead. This was observed on 82574, I217 and I219. The fastest way to reproduce it is to run a program that continuously calls the PTP_SYS_OFFSET ioctl on the PHC. Modify e1000e_phc_gettime() to use timecounter_cyc2time() instead of timecounter_read() in order to allow non-monotonic SYSTIM readings and prevent the PHC from jumping. Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | igc: Tidy up some white spaceDan Carpenter2018-11-071-5/+5
| | | | | | | | | | | | | | | | | | I just cleaned up a couple small white space issues. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | igc: fix error return handling from call to netif_set_real_num_tx_queuesColin Ian King2018-11-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The call to netif_set_real_num_tx_queues is not assigning the error return to variable err even though the next line checks err for an error. Fix this by adding the missing err assignment. Detected by CoverityScan, CID#1474551 ("Logically dead code") Fixes: 3df25e4c1e66 ("igc: Add interrupt support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | igc: Remove set but not used variable 'pci_using_dac'YueHaibing2018-11-071-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/intel/igc/igc_main.c: In function 'igc_probe': drivers/net/ethernet/intel/igc/igc_main.c:3535:11: warning: variable 'pci_using_dac' set but not used [-Wunused-but-set-variable] It never used since introduction in commit d89f88419f99 ("igc: Add skeletal frame for Intel(R) 2.5G Ethernet Controller support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | igc: Remove set but not used variables 'ctrl_ext, link_mode'YueHaibing2018-11-071-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/intel/igc/igc_base.c: In function 'igc_init_phy_params_base': drivers/net/ethernet/intel/igc/igc_base.c:240:6: warning: variable 'ctrl_ext' set but not used [-Wunused-but-set-variable] u32 ctrl_ext; drivers/net/ethernet/intel/igc/igc_base.c: In function 'igc_get_invariants_base': drivers/net/ethernet/intel/igc/igc_base.c:290:6: warning: variable 'link_mode' set but not used [-Wunused-but-set-variable] u32 link_mode = 0; It never used since introduction in commit c0071c7aa5fe ("igc: Add HW initialization code") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
OpenPOWER on IntegriCloud