summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx4/en_cq.c
Commit message (Collapse)AuthorAgeFilesLines
* net/mlx4_en: remove napi_hash_del() callEric Dumazet2016-11-171-4/+0
| | | | | | | | | | | | | | | There is no need calling napi_hash_del()+synchronize_rcu() before calling netif_napi_del() netif_napi_del() does this already. Using napi_hash_del() in a driver is useful only when dealing with a batch of NAPI structures, so that a single synchronize_rcu() can be used. mlx4_en_deactivate_cq() is deactivating a single NAPI. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Refactor the XDP forwarding rings schemeTariq Toukan2016-11-021-9/+1
| | | | | | | | | | | | | | | | | | Separately manage the two types of TX rings: regular ones, and XDP. Upon an XDP set, do not borrow regular TX rings and convert them into XDP ones, but allocate new ones, unless we hit the max number of rings. Which means that in systems with smaller #cores we will not consume the current TX rings for XDP, while we are still in the num TX limit. XDP TX rings counters are not shown in ethtool statistics. Instead, XDP counters will be added to the respective RX rings in a downstream patch. This has no performance implications. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Add TX_XDP for CQ typesTariq Toukan2016-11-021-9/+9
| | | | | | | | Support XDP CQ type, and refactor the CQ type enum. Rename the is_tx field to match the change. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: fixup xdp tx irq to match rxBrenden Blanco2016-10-141-1/+9
| | | | | | | | | | | | | | | | | | | | | In cases where the number of tx rings is not a multiple of the number of rx rings, the tx completion event will be handled on a different core from the transmit and population of the ring. Races on the ring will lead to a double-free of the page, and possibly other corruption. The rings are initialized by default with a valid multiple of rings, based on the number of cpus, therefore an invalid configuration requires ethtool to change the ring layout. For instance 'ethtool -L eth0 rx 9 tx 8' will cause packets received on rx0, and XDP_TX'd to tx48, to be completed on cpu3 (48 % 9 == 3). Resolve this discrepancy by shifting the irq for the xdp tx queues to start again from 0, modulo rx_ring_num. Fixes: 9ecc2d86171a ("net/mlx4_en: add xdp forwarding and data write support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4: Avoid wrong virtual mappingsHaggai Abramovsky2016-05-051-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The dma_alloc_coherent() function returns a virtual address which can be used for coherent access to the underlying memory. On some architectures, like arm64, undefined behavior results if this memory is also accessed via virtual mappings that are not coherent. Because of their undefined nature, operations like virt_to_page() return garbage when passed virtual addresses obtained from dma_alloc_coherent(). Any subsequent mappings via vmap() of the garbage page values are unusable and result in bad things like bus errors (synchronous aborts in ARM64 speak). The mlx4 driver contains code that does the equivalent of: vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the device is opened. Prevent Ethernet driver to run this problematic code by forcing it to allocate contiguous memory. As for the Infiniband driver, at first we are trying to allocate contiguous memory, but in case of failure roll back to work with fragmented memory. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reported-by: David Daney <david.daney@cavium.com> Tested-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: provide generic busy polling to all NAPI driversEric Dumazet2015-11-181-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | NAPI drivers no longer need to observe a particular protocol to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y) napi_hash_add() and napi_hash_del() are automatically called from core networking stack, respectively from netif_napi_add() and netif_napi_del() This patch depends on free_netdev() and netif_napi_del() being called from process context, which seems to be the norm. Drivers might still prefer to call napi_hash_del() on their own, since they might combine all the rcu grace periods into a single one, knowing their NAPI structures lifetime, while core networking stack has no idea of a possible combining. Once this patch proves to not bring serious regressions, we will cleanup drivers to either remove napi_hash_del() or provide appropriate rcu grace periods combining. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add netif_tx_napi_add()Eric Dumazet2015-11-181-2/+2
| | | | | | | | | | | | | netif_tx_napi_add() is a variant of netif_napi_add() It should be used by drivers that use a napi structure to exclusively poll TX. We do not want to add this kind of napi in napi_hash[] in following patches, adding generic busy polling to all NAPI drivers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Fix unintialized variable used in error pathCarol L Soto2015-08-271-3/+2
| | | | | | | | | The uninitialized value name in mlx4_en_activate_cq was used in order to print an error message. Fixing it by replacing it with cq->vector. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Move affinity hints to mlx4_core ownershipIdo Shamay2015-05-301-9/+1
| | | | | | | | | | | | | | | | | | Now that EQs management is in the sole responsibility of mlx4_core, the IRQ affinity hints configuration should be in its hands as well. request_irq is called only once by the first consumer (maybe mlx4_ib), so mlx4_en passes the affinity mask too late. We also need to request vectors according to the cores we want to run on. mlx4_core distribution of IRQs to cores is straight forward, EQ(i)->IRQ will set affinity hint to core i. Consumers need to request EQ vectors, according to their cores considerations (NUMA). Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4: Add EQ poolMatan Barak2015-05-301-25/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, mlx4_en allocated EQs and used them exclusively. This affected RoCE performance, as applications which are events sensitive were limited to use only the legacy EQs. Change that by introducing an EQ pool. This pool is managed by mlx4_core. EQs are assigned to ports (when there are limited number of EQs, multiple ports could be assigned to the same EQs). An exception to this rule is the ASYNC EQ which handles various events. Legacy EQs are completely removed as all EQs could be shared. When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for EQ serving on a specific port. The core driver calculates which EQ should be assigned to that request. Because IRQs are shared between IB and Ethernet modules, their names only include the PCI device BDF address. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Maintain a persistent memory for mlx4 deviceYishai Hadas2015-01-251-2/+2
| | | | | | | | | Maintain a persistent memory that should survive reset flow/PCI error. This comes as a preparation for coming series to support above flows. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: cq->irq_desc wasn't set in legacy EQ'sAmir Vadai2014-07-161-3/+4
| | | | | | | | | | | | Fix a regression introduced by commit 35f6f45 ("net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map"). When core is started in legacy EQ's (number of IRQ's < rx rings), cq->irq_desc was NULL. This caused a kernel crash under heavy traffic - when having more than rx NAPI budget completions. Fixed to have it set for both EQ modes. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: IRQ affinity hint is not cleared on port downAmir Vadai2014-07-021-2/+1
| | | | | | | | | Need to remove affinity hint at mlx4_en_deactivate_cq() and not at mlx4_en_destroy_cq() - since affinity_mask might be free'd while still being used by procfs. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ ↵Amir Vadai2014-07-021-0/+4
| | | | | | | | | | | | | | | affinity map IRQ affinity notifier can only have a single notifier - cpu_rmap notifier. Can't use it to track changes in IRQ affinity map. Detect IRQ affinity changes by comparing CPU to current IRQ affinity map during NAPI poll thread. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Hutchings <ben@decadent.org.uk> Fixes: 2eacc23 ("net/mlx4_core: Enforce irq affinity changes immediatly") Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Use affinity hintYuval Atias2014-06-111-1/+11
| | | | | | | | | | | | | | | | | The “affinity hint” mechanism is used by the user space daemon, irqbalancer, to indicate a preferred CPU mask for irqs. Irqbalancer can use this hint to balance the irqs between the cpus indicated by the mask. We wish the HCA to preferentially map the IRQs it uses to numa cores close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that sets the affinity hint according the following policy: First it maps IRQs to “close” numa cores. If these are exhausted, the remaining IRQs are mapped to “far” numa cores. Signed-off-by: Yuval Atias <yuvala@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert "net/mlx4_en: Use affinity hint"David S. Miller2014-06-021-5/+1
| | | | | | This reverts commit 70a640d0dae3a9b1b222ce673eb5d92c263ddd61. Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Use affinity hintYuval Atias2014-06-011-1/+5
| | | | | | | | | | | | | | | | | The “affinity hint” mechanism is used by the user space daemon, irqbalancer, to indicate a preferred CPU mask for irqs. Irqbalancer can use this hint to balance the irqs between the cpus indicated by the mask. We wish the HCA to preferentially map the IRQs it uses to numa cores close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that sets the affinity hint according the following policy: First it maps IRQs to “close” numa cores. If these are exhausted, the remaining IRQs are mapped to “far” numa cores. Signed-off-by: Yuval Atias <yuvala@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mellanox: Logging message cleanupsJoe Perches2014-05-081-2/+1
| | | | | | | | | | | | | | | | | | Use a more current logging style. o Coalesce formats o Add missing spaces for coalesced formats o Align arguments for modified formats o Add missing newlines for some logging messages o Use DRV_NAME as part of format instead of %s, DRV_NAME to reduce overall text. o Use ..., ##__VA_ARGS__ instead of args... in macros o Correct a few format typos o Use a single line message where appropriate Signed-off-by: Joe Perches <joe@perches.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx4_en: don't use napi_synchronize inside mlx4_en_netpollChris Mason2014-04-161-1/+0
| | | | | | | | | | | | | | | | | | | The mlx4 driver is triggering schedules while atomic inside mlx4_en_netpoll: spin_lock_irqsave(&cq->lock, flags); napi_synchronize(&cq->napi); ^^^^^ msleep here mlx4_en_process_rx_cq(dev, cq, 0); spin_unlock_irqrestore(&cq->lock, flags); This was part of a patch by Alexander Guller from Mellanox in 2011, but it still isn't upstream. Signed-off-by: Chris Mason <clm@fb.com> cc: stable@vger.kernel.org Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Add NAPI support for transmit sideEugenia Emantayev2013-12-191-4/+8
| | | | | | | | | | Add NAPI for TX side, implement its support and provide NAPI callback. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Datapath structures are allocated per NUMA nodeEugenia Emantayev2013-11-071-4/+13
| | | | | | | | | | | | | | | For each RX/TX ring and its CQ, allocation is done on a NUMA node that corresponds to the core that the data structure should operate on. The assumption is that the core number is reflected by the ring index. The affected allocations are the ring/CQ data structures, the TX/RX info and the shared HW/SW buffer. For TX rings, each core has rings of all UPs. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Datapath resources allocated dynamicallyEugenia Emantayev2013-11-071-8/+26
| | | | | | | | | | | | | | Currently all TX/RX rings and completion queues are part of the netdev priv structure and are allocated statically. This patch will change the priv to hold only arrays of pointers and therefore all TX/RX rings and completetion queues will be allocated dynamically. This is in preparation for NUMA aware allocations. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Add Low Latency Socket (LLS) supportAmir Vadai2013-06-191-0/+3
| | | | | | | | Add basic support for LLS. Signed-off-by: Amir Vadai <amirv@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Add HW timestamping (TS) supportAmir Vadai2013-04-241-2/+8
| | | | | | | | | | | | | | | | | The patch allows to enable/disable HW timestamping for incoming and/or outgoing packets. It adds and initializes all structs and callbacks needed by kernel TS API. To enable/disable HW timestamping appropriate ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are supported. When enabling TS on receive flow - VLAN stripping will be disabled. Also were made all relevant changes in RX/TX flows to consider TS request and plant HW timestamps into relevant structures. mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx4: 64-byte CQE/EQE supportOr Gerlitz2012-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ConnectX-3 devices can use either 64- or 32-byte completion queue entries (CQEs) and event queue entries (EQEs). Using 64-byte EQEs/CQEs performs better because each entry is aligned to a complete cacheline. This patch queries the HCA's capabilities, and if it supports 64-byte CQEs and EQES the driver will configure the HW to work in 64-byte mode. The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ. Since this mode is global, userspace (libmlx4) must be updated to work with the configured CQE size, and guests using SR-IOV virtual functions need to know both EQE and CQE size. In case one of the 64-byte CQE/EQE capabilities is activated, the patch makes sure that older guest drivers that use the QUERY_DEV_FUNC command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that they need an update to be able to work with the PPF. This is done by changing the returned pf_context_behaviour not to be zero any more. In case none of these capabilities is activated that value remains zero and older guest drivers can run OK. The SRIOV related flow is as follows 1. the PPF does the detection of the new capabilities using QUERY_DEV_CAP command. 2. the PPF activates the new capabilities using INIT_HCA. 3. the VF detects if the PPF activated the capabilities using QUERY_HCA, and if this is the case activates them for itself too. Note that the VF detects that it must be aware to the new PF behaviour using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode. User space notification is done through a new field introduced in struct mlx4_ib_ucontext which holds device capabilities for which user space must take action. This changes the binary interface so the ABI towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only when **needed** i.e. only when the driver does use 64-byte CQEs or future device capabilities which must be in sync by user space. This practice allows to work with unmodified libmlx4 on older devices (e.g A0, B0) which don't support 64-byte CQEs. In order to keep existing systems functional when they update to a newer kernel that contains these changes in VF and userspace ABI, a module parameter enable_64b_cqe_eqe must be set to enable 64-byte mode; the default is currently false. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* net/mlx4_en: Add accelerated RFS supportAmir Vadai2012-07-191-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Use RFS infrastructure and flow steering in HW to keep CPU affinity of rx interrupts and application per TCP stream. A flow steering filter is added to the HW whenever the RFS ndo callback is invoked by core networking code. Because the invocation takes place in interrupt context, the actual setup of HW is done using workqueue. Whenever new filter is added, the driver checks for expiry of existing filters. Since there's window in time between the point where the core RFS code invoked the ndo callback, to the point where the HW is configured from the workqueue context, the 2nd, 3rd etc packets from that stream will cause the net core to invoke the callback again and again. To prevent inefficient/double configuration of the HW, the filters are kept in a database which is indexed using hash function to enable fast access. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* {NET,IB}/mlx4: Add rmap support to mlx4_assign_eqAmir Vadai2012-07-191-1/+2
| | | | | | | | | Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap. If supplied, the assigned IRQ is tracked using rmap infrastructure. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx4_en: Moving to Interrupts for TX completionsYevgeny Petrilin2012-04-231-11/+3
| | | | | | | | | | | | | | Moving to interrupts instead of polling fpr TX completions Avoiding situations where skb can be held in by the driver for a long time (till timer expires). The change is also necessary for supporting BQL. Removing comp_lock that was required because we could handle TX completions from several contexts: Interrupts, timer, polling. Now there is only interrupts Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-01-021-0/+1
|\
| * mlx4_en: nullify cq->vector field when closing completion queueYevgeny Petrilin2011-12-301-0/+1
| | | | | | | | | | | | | | Caused loss of connectivity when changing ring size. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx4_en: using non collapsed CQ on TXYevgeny Petrilin2011-11-271-5/+2
|/ | | | | | | | Moving to regular Completion Queue implementation (not collapsed) Completion for each transmitted packet is written to new entry. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx4_en: Removing reserve vectorsAlexander Guller2011-10-091-3/+2
| | | | | | | | | Fixed a bug where ring size change caused insufficient memory upon driver restart due to unreleased EQs. Signed-off-by: Alexander Guller <alexg@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx4_en: Assigning TX irq per ringAlexander Guller2011-10-091-10/+16
| | | | | | | | | | | | | Until now only RX rings used irq per ring and TX used only one per port. >From now on, both of them will use the irq per ring while RX & TX ring[i] will use the same irq. Signed-off-by: Alexander Guller <alexg@mellanox.co.il> Signed-off-by: Sharon Cohen <sharonc@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx4: Move the Mellanox driverJeff Kirsher2011-08-111-0/+178
Moves the Mellanox driver into drivers/net/ethernet/mellanox/ and make the necessary Kconfig and Makefile changes. CC: Roland Dreier <roland@kernel.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
OpenPOWER on IntegriCloud