summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'rdma_mmap' into rdma.git for-nextJason Gunthorpe2019-04-2410-104/+106
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jason Gunthorpe says: ==================== Upon review it turns out there are some long standing problems in BAR mapping area: * BAR pages intended for read-only can be switched to writable via mprotect. * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page. * Disassociate causes SIGBUS when touching the pages. * CPU pages are being mapped through to the process via remap_pfn_range instead of the more appropriate vm_insert_page, causing weird behaviors during disassociation. This series adds the missing VM_* flag manipulation, adds faulting a zero page for disassociation and revises the CPU page mappings to use vm_insert_page. ==================== For dependencies this branch is based on for-rc from git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git * branch 'rdma_mmap': RDMA: Remove rdma_user_mmap_page RDMA/mlx5: Use get_zeroed_page() for clock_info RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA: Remove rdma_user_mmap_pageJason Gunthorpe2019-04-242-57/+17
| | | | | | | | | | | | | | | | | | | | | | Upon further research drivers that want this should simply call the core function vm_insert_page(). The VMA holds a reference on the page and it will be automatically freed when the last reference drops. No need for disassociate to sequence the cleanup. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/mlx5: Use get_zeroed_page() for clock_infoJason Gunthorpe2019-04-241-2/+3
| | | | | | | | | | | | | | | | | | | | | | get_zeroed_page() returns a virtual address for the page which is better than allocating a struct page and doing a permanent kmap on it. Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/ucontext: Fix regression with disassociateJason Gunthorpe2019-04-242-3/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When this code was consolidated the intention was that the VMA would become backed by anonymous zero pages after the zap_vma_pte - however this very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED bits to transform the VMA into an anonymous VMA. Since the vm_ops was removed this broke. Now userspace gets a SIGBUS if it touches the vma after disassociation. Instead of converting the VMA to anonymous provide a fault handler that puts a zero'd page into the VMA when user-space touches it after disassociation. Cc: stable@vger.kernel.org Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/mlx5: Use rdma_user_map_io for mapping BAR pagesJason Gunthorpe2019-04-241-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | Since mlx5 supports device disassociate it must use this API for all BAR page mmaps, otherwise the pages can remain mapped after the device is unplugged causing a system crash. Cc: stable@vger.kernel.org Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
| * RDMA/mlx5: Do not allow the user to write to the clock pageJason Gunthorpe2019-04-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | The intent of this VMA was to be read-only from user space, but the VM_MAYWRITE masking was missed, so mprotect could make it writable. Cc: stable@vger.kernel.org Fixes: 5c99eaecb1fc ("IB/mlx5: Mmap the HCA's clock info to user-space") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
| * IB/mlx5: Fix scatter to CQE in DCT QP creationGuy Levi2019-04-182-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command since it tried to treat it as as QP create mailbox command instead of a DCT create command. The corrupted mailbox command causes userspace to malfunction as the device doesn't create the QP as expected. A new mlx5 capability is exposed to user-space which ensures that it will not enable the feature on DCT without this fix in the kernel. Fixes: 5d6ff1babe78 ("IB/mlx5: Support scatter to CQE for DC transport type") Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/rdmavt: Fix frwr memory registrationJosh Collier2019-04-161-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current implementation was not properly handling frwr memory registrations. This was uncovered by commit 27f26cec761das ("xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is used for NFS over RDMA, started failing as it was the first ULP to modify the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR when attempting to perform RDMA Writes to the client. The fix is to properly capture the true iova, offset, and length in the call to ib_map_mr_sg, and then update the iova when processing the IB_WR_REG_MEM on the send queue. Fixes: a41081aa5936 ("IB/rdmavt: Add support for ib_map_mr_sg") Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/hfi1: Do not flush send queue in the TID RDMA second legKaike Wan2019-04-101-23/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a QP is put into error state, the send queue will be flushed. This mechanism is implemented in both the first and the second leg of the send engine. Since the second leg is only responsible for data transactions in the KDETH space for the TID RDMA WRITE request, it should not perform the flushing of the send queue. This patch removes the flushing function of the second leg, but still keeps the bailing out of the QP if it is put into error state. Fixes: 70dcb2e3dc6a ("IB/hfi1: Add the TID second leg send packet builder") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/hns: Bugfix for SCC hem freeYangyang Li2019-04-081-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The method of hem free for SCC context is different from qp context. In the current version, if free SCC hem during the execution of qp free, there may be smmu error as below: arm-smmu-v3 arm-smmu-v3.1.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.1.auto: 0x00007d0000000010 arm-smmu-v3 arm-smmu-v3.1.auto: 0x000012000000017c arm-smmu-v3 arm-smmu-v3.1.auto: 0x00000000000009e0 arm-smmu-v3 arm-smmu-v3.1.auto: 0x0000000000000000 As SCC context is still used by hardware after qp free, we can solve this problem by removing SCC hem free from hns_roce_qp_free. Fixes: 6a157f7d1b14 ("RDMA/hns: Add SCC context allocation support for hip08") Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/hns: Fix bug that caused srq creation to failLijun Ou2019-04-082-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the incorrect use of the seg and obj information, the position of the mtt is calculated incorrectly, and the free space of the page is not enough to store the entire mtt, resulting in access to the next page. This patch fixes this problem. Unable to handle kernel paging request at virtual address ffff00006e3cd000 ... Call trace: hns_roce_write_mtt+0x154/0x2f0 [hns_roce] hns_roce_buf_write_mtt+0xa8/0xd8 [hns_roce] hns_roce_create_srq+0x74c/0x808 [hns_roce] ib_create_srq+0x28/0xc8 Fixes: 0203b14c4f32 ("RDMA/hns: Unify the calculation for hem index in hip08") Signed-off-by: chenglang <chenglang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_removeKamal Heib2019-04-081-0/+2
| | | | | | | | | | | | | | | | | | Make sure to free the DSR on pvrdma_pci_remove() to avoid the memory leak. Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/hfi1: Remove reference to RHF.VCRCErrJohn Fleck2019-04-243-5/+4
| | | | | | | | | | | | | | | | | | | | | | The bit VCRCErr in the receive header flag is actually a reserved field. Remove bit operations on this field. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: John Fleck <john.fleck@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/hfi1: Add selected Rcv countersMike Marciniszyn2019-04-243-0/+9
| | | | | | | | | | | | | | | | | | These counters are required for error analysis and debug. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/{rdmavt, qib, hfi1}: Use new routine to release reference countsMike Marciniszyn2019-04-243-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | The reference count adjustments on reference count completion are open coded throughout. Add a routine to do all reference count adjustments and use. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/rdmavt: Use more efficient allowed_opsMike Marciniszyn2019-04-241-11/+4
| | | | | | | | | | | | | | | | | | | | | | | | QP creation already records the allowed_ops. Take advantage of that single field to replace multiple qp_type specific tests. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/rdmavt: Fix ab/ba include issuesMike Marciniszyn2019-04-246-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The currently include file ordering for rdmavt headers has an ab/ba include issue the precludes using inlines from rdma_vt.h in rdmavt_qp.h. At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h. Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h and move qp related inlines to rdmavt_qp.h. Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared by rdmavt_cq.h and rdmavt_qp.h. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/hfi1: Make opfn.h self sufficientMike Marciniszyn2019-04-241-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | The opfn.h include file build-ablility depends on the including file having the correct includes. Fix by making opfn.h self sufficient. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/{rdmavt, hfi1): Miscellaneous comment fixesKaike Wan2019-04-241-1/+1
| | | | | | | | | | | | | | | | | | This patch fixes miscellaneous comment errors. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/hfi1: Add debugfs to control expansion ROM write protectJosh Collier2019-04-241-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some kernels now enable CONFIG_IO_STRICT_DEVMEM which prevents multiple handles to PCI resource0. In order to continue to support expansion ROM updates while the driver is loaded, the driver must now provide an interface to control the expansion ROM write protection. This patch adds an exprom_wp debugfs interface that allows the hfi1_eprom user tool to disable the expansion ROM write protection by opening the file and writing a '1'. The write protection is released when writing a '0' or automatically re-enabled when the file handle is closed. The current implementation will only allow one handle to be opened at a time across all hfi1 devices. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/hns: Remove asynchronic QP destroyLeon Romanovsky2019-04-243-406/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Verbs destroy callbacks are synchronous operations and can't be delayed. The expectation is that after driver returned from destroy function, the memory can be freed and user won't be able to access it again. Ditch workqueue implementation used in HNS driver. Fixes: d838c481e025 ("IB/hns: Fix the bug when destroy qp") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: oulijun <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/cma: Consider scope_id while binding to ipv6 ll addressParav Pandit2019-04-241-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When two netdev have same link local addresses (such as vlan and non vlan), two rdma cm listen id should be able to bind to following different addresses. listener-1: addr=lla, scope_id=A, port=X listener-2: addr=lla, scope_id=B, port=X However while comparing the addresses only addr and port are considered, due to which 2nd listener fails to listen. In below example of two listeners, 2nd listener is failing with address in use error. $ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1 -p 4545& $ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1.200 -p 4545 rdma_bind_addr: Address already in use To overcome this, consider the scope_ids as well which forms the accurate IPv6 link local address. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Allow vlan link local address based RoCE GIDsParav Pandit2019-04-241-23/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPv6 link local address for a VLAN netdevice has nothing to do with its resemblance with the default GID, because VLAN link local GID is in different layer 2 domain. Now that RoCE MAD packet processing and route resolution consider the right GID index, there is no need for an unnecessary check which prevents the addition of vlan based IPv6 link local GIDs. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Don't create IB representors when in multiport RoCE modeMark Bloch2019-04-221-1/+2
| | | | | | | | | | | | | | | | | | | | Switchdev mode and mutiport RoCE mode aren't compatible at this point. Don't create IB reps when a user switches to switchdev mode and the driver operates in that mode. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Initialize roce port info before multiport master initMark Bloch2019-04-221-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When working in mutliport RoCE mode it is possible to attach a slave before the master. In that case the slave is waiting for a master to be attached. When the master is attached it goes over the list of waiting slaves, finds a slave that is compatible and tries to bind it to itself. The call stack is: mlx5_ib_init_multiport_master() -> mlx5_ib_bind_slave_port() In the bind function we will create a netdev notifier, but this is done before we initialize the RoCE structure (this is done at a later stage by the master in the ROCE stage). Once events are delivered to that notifier we will use mlx5_ib_get_native_port_mdev() to get the actual port and as the native port is zero we will access an invalid index in the port structure. Move the RoCE structure initialization to an earlier stage. Fixes: 32f69e4be269 ("{net, IB}/mlx5: Manage port association for multiport RoCE") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Allow DEVX and raw creation flow on repsMark Bloch2019-04-223-8/+5
| | | | | | | | | | | | | | | | | | | | Remove the limitations that were in place and provide support for DEVX and raw flow creation on reps. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Add query e-switch vport context to devx white listMaor Gottlieb2019-04-221-0/+2
| | | | | | | | | | | | | | | | | | | | Add MLX5_OP_QUERY_ESW_VPORT_CONTEXT to devx white list. It will be allowed only if HCA_CAP.eswitch_manager==1. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Allow inserting a steering rule to the FDBMark Bloch2019-04-221-12/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow this only via mlx5 raw create flow API, legacy verbs are not supported. To accommodate that, we add a new attribute to matcher creation to indicate the type of flow table to be used. MLX5_IB_ATTR_FLOW_MATCHER_FT_TYPE With this new attribute MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS is no longer needed, we keep it for compatibility but at most only a single attribute can be passed of the two. When inserting a flow rule to the FDB we require that a DEVX FT is provided as a destination, no other configuration is allowed. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Create flow table with max size supportedMark Bloch2019-04-221-6/+4
| | | | | | | | | | | | | | | | | | | | Instead of failing the request, just use the supported number of flow entries. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Access the prio bypass inside the FDB flow table namespaceMark Bloch2019-04-222-11/+21
| | | | | | | | | | | | | | | | | | | | Now that we have a specific prio inside the FDB namespace allow retrieving it from the RDMA side. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Add a netlink command to change net namespace of rdma deviceParav Pandit2019-04-223-6/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide an option to change the net namespace of a rdma device through a netlink command. When multiple rdma devices exists in a system, and when containers are used, this will limit rdma device visibility to a specified net namespace. An example command to change net namespace of mlx5_1 device to the previously created net namespace 'foo' is: $ ip netns add foo $ rdma dev set mlx5_1 netns foo Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Introduce a helper function to change net namespace of rdma deviceParav Pandit2019-04-221-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a helper function that changes rdma device's net namespace which performs mini disable/enable sequence to have device visible only in assigned net namespace. Device unregistration, device rename and device change net namespace may be invoked concurrently. (a) device unregistration needs to wait if a device change (rename or net namespace change) operation is in progress. (b) device net namespace change should not proceed if the unregistration has started. (c) while one cpu is changing device net namespace, other cpu should not be able to rename or change net namespace. To address above concurrency, (a) Use unreg_mutex to synchronize between ib_unregister_device() and net namespace change operation (b) In cases where unregister_device() has started unregistration before change_netns got chance to acquire unreg_mutex, validate the refcount - if it dropped to zero, abort the net namespace change operation. Finally use the helper function to change net namespace of ib device to move the device back to init_net when such net is deleted. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Avoid freeing netdevs in disable_device()Parav Pandit2019-04-221-3/+4
| | | | | | | | | | | | | | | | So we can use the disable_device() helper while changing the net namespace of the rdma device in a subsequent patch, move free_netdevs() out of it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | infiniband/qib: Fix typo in commentChengguang Xu2019-04-221-1/+1
| | | | | | | | | | | | | | Fix typo 'faspath' -> 'pastpath'. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/cxgb4: Fix spelling mistake "immedate" -> "immediate"Colin Ian King2019-04-181-1/+1
| | | | | | | | | | | | | | There is a spelling mistake in a module parameter description. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/cxgb4: Fix null pointer dereference on alloc_skb failureColin Ian King2019-04-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently if alloc_skb fails to allocate the skb a null skb is passed to t4_set_arp_err_handler and this ends up dereferencing the null skb. Avoid the NULL pointer dereference by checking for a NULL skb and returning early. Addresses-Coverity: ("Dereference null return") Fixes: b38a0ad8ec11 ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Check for error return in flow_rule rather than errColin Ian King2019-04-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently when the call to create_flow_rule_vport_sq fails, the error check is being performed on err rather than on the return pointer flow_rule. The return flow_rule maybe NULL (which is not considered an error) or an error code, so check for the error on flow_rule. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: d5ed8ac34cef ("RDMA/mlx5: Move default representors SQ steering to rule to modify QP") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/ocrdma: Remove use of idr use pci bdf insteadDevesh Sharma2019-04-121-11/+2
| | | | | | | | | | | | | | | | | | | | Removing the use of IDR variable just to name the function ids. Using the PCI_FUNC(pdev->devfn) instead to create the device name, associated resources and to print driver into at various places. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Remove VF representor profileMark Bloch2019-04-103-86/+16
| | | | | | | | | | | | | | | | | | Now that we have a single IB device with multiple ports we can remove the VF representor profile. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Move to single device multiport ports in switchdev modeMark Bloch2019-04-103-9/+49
| | | | | | | | | | | | | | | | | | | | | | Move from IB device (representor) per virtual function to single IB device with port per virtual function (port 1 represents the uplink). As number of ports is a static property of an IB device, declare the IB device with as many port as the possible according to the PCI bus. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Move SMI caps logicMark Bloch2019-04-101-5/+5
| | | | | | | | | | | | | | | | | | | | We store the SMI information in the core device's struct, make sure we set that information only once (and not per port), while here make the for loop based on the actual size of the array. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Refactor netdev affinity codeMark Bloch2019-04-102-10/+39
| | | | | | | | | | | | | | | | | | | | The design of representors is such that once an IB representor is created, the netdev of representor already exists, we can use that fact to simplify the netdev affinity code. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Move default representors SQ steering to rule to modify QPMark Bloch2019-04-103-30/+48
| | | | | | | | | | | | | | | | | | | | | | Currently the steering for SQs created on representors is done on creation, once we move to representors as ports of an IB device we need the port argument which is given only at the modify QP stage, adjust the code appropriately. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Move rep into port structMark Bloch2019-04-107-20/+26
| | | | | | | | | | | | | | | | | | | | In preparation of moving into a model of single IB device multiple ports move rep to be part of the port structure. We mark a representor device by setting is_rep, no functional change with this patch. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Use correct size for device resourcesMark Bloch2019-04-101-3/+1
| | | | | | | | | | | | | | | | | | | | | | On allocation we use the array size and on destruction num_ports, use the array size of destruction as well, in this context the array corresponds to the native/actual ports on the NIC so no need to adjust this logic for representors. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Move ports allocation to outside of INIT stageMark Bloch2019-04-102-14/+22
| | | | | | | | | | | | | | | | | | In downstream patches we will need access to the ports before doing any stages, in order to set net device per representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Free IB device on removeMark Bloch2019-04-102-6/+3
| | | | | | | | | | | | | | | | | | Simplify the code and move the deallocation of the IB device into the remove function. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/mlx5: Move netdev info into the port structMark Bloch2019-04-103-25/+25
| | | | | | | | | | | | | | | | | | Netdev info is stored in a separate array and holds data relevant on a per port basis, move it to be part of the port struct. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | Merge branch 'mlx5-next' into rdma.git for-nextJason Gunthorpe2019-04-104-9/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies on the next series * branch 'mlx5-next': net/mlx5: E-Switch, add a new prio to be used by the RDMA side net/mlx5: E-Switch, don't use hardcoded values for FDB prios net/mlx5: Fix false compilation warning net/mlx5: Expose MPEIN (Management PCIE INfo) register layout net/mlx5: Add rate limit print macros net/mlx5: Add explicit bar address field net/mlx5: Replace dev_err/warn/info by mlx5_core_err/warn/info net/mlx5: Use dev->priv.name instead of dev_name net/mlx5: Make mlx5_core messages independent from mdev->pdev net/mlx5: Break load_one into three stages net/mlx5: Function setup/teardown procedures net/mlx5: Move health and page alloc init to mdev_init net/mlx5: Split mdev init and pci init net/mlx5: Remove redundant init functions parameter net/mlx5: Remove spinlock support from mlx5_write64 net/mlx5: Remove unused MLX5_*_DOORBELL_LOCK macros Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | net/mlx5: Add explicit bar address fieldHuy Nguyen2019-04-023-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add bar_addr field to store bar-0 address to avoid calling pci_resource_start with hard-coded bar-0 as parameter. Also note that different mlx5 device types will have bar_addr on different bars. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
OpenPOWER on IntegriCloud