summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw
Commit message (Collapse)AuthorAgeFilesLines
* IB/hfi1: Spelling s/statisfied/satisfied/Geert Uytterhoeven2019-06-181-1/+1
| | | | | Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA: Report available cdevs through RDMA_NLDEV_CMD_GET_CHARDEVJason Gunthorpe2019-06-183-0/+3
| | | | | | | | | | | | | | Update the struct ib_client for all modules exporting cdevs related to the ibdevice to also implement RDMA_NLDEV_CMD_GET_CHARDEV. All cdevs are now autoloadable and discoverable by userspace over netlink instead of relying on sysfs. uverbs also exposes the DRIVER_ID for drivers that are able to support driver id binding in rdma-core. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* rdma: Remove nesJason Gunthorpe2019-06-1316-19674/+0
| | | | | | | | | | | | | | This driver was first merged over 10 years ago and has not seen major activity by the authors in the last 7 years. However, in that time it has been patched 150 times to adapt it to changing kernel APIs. Further, the hardware has several issues, like not supporting 64 bit DMA, that make it rather uninteresting for use with modern systems and RDMA. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA: Convert CQ allocations to be under core responsibilityLeon Romanovsky2019-06-1136-341/+223
| | | | | | | | | | Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA: Clean destroy CQ in drivers do not return errorsLeon Romanovsky2019-06-1130-151/+74
| | | | | | | | | | | | | Like all other destroy commands, .destroy_cq() call is not supposed to fail. In all flows, the attempt to return earlier caused to memory leaks. This patch converts .destroy_cq() to do not return any errors. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/nes: Avoid memory allocation during CQ destroyLeon Romanovsky2019-06-111-17/+11
| | | | | | | | | | The memory allocation call can fail and cause to early return from nes_desotroy_cq() function. This situation will cause to memory leak of struct nes_cq. Rewrite function to avoid memory allocation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA: Move owner into struct ib_device_opsJason Gunthorpe2019-06-1016-17/+16
| | | | | | | | | This more closely follows how other subsytems work, with owner being a member of the structure containing the function pointers. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA: Move uverbs_abi_ver into struct ib_device_opsJason Gunthorpe2019-06-1015-21/+26
| | | | | | | | | No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA: Move driver_id into struct ib_device_opsJason Gunthorpe2019-06-1016-16/+34
| | | | | | | | | No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/hns: Bugfix for filling the sge of srqLijun Ou2019-06-071-3/+3
| | | | | | | | | | | | | | When user post recv a srq with multiple sges, the hardware will get the last correct sge and count the sge numbers according to the specific identifier with lkey. For example, when the driver fills the sges with every wr less than the max sge that the user configured when creating srq, the hardware will stop getting the sge according to the specific lkey in the sge. However, it will always end with the first sge in the current post srq recv interface implementation. Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/hns: fix inverted logic of readl read and shiftColin Ian King2019-06-071-1/+1
| | | | | | | | | | | | A previous change incorrectly changed the inverted logic and logically negated the readl rather than the shifted readl result. Fix this by adding in missing parentheses around the expression that needs to be logically negated. Addresses-Coverity: ("Logically dead code") Fixes: 669cefb654cb ("RDMA/hns: Remove jiffies operation in disable interrupt context") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/hns: Bugfix for posting multiple srq work requestLijun Ou2019-05-313-29/+22
| | | | | | | | | | | | | | When the user submits more than 32 work request to a srq queue at a time, it needs to find the corresponding number of entries in the bitmap in the idx queue. However, the original lookup function named ffs only processes 32 bits of the array element, When the number of srq wqe issued exceeds 32, the ffs will only process the lower 32 bits of the elements, it will not be able to get the correct wqe index for srq wqe. Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/hfi1: Use struct_size() helperGustavo A. R. Silva2019-05-301-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes, in particular in the context in which this code is being used. So, replace the following form: sizeof(struct opa_port_status_rsp) + num_vls * sizeof(struct _vls_pctrs) with: struct_size(rsp, vls, num_vls) and so on... Also, notice that variable size is unnecessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/qib: Use struct_size() helperGustavo A. R. Silva2019-05-301-2/+3
| | | | | | | | | | | | | | | | | | | | | | Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes, in particular in the context in which this code is being used. So, replace the following form: sizeof(*pkt) + sizeof(pkt->addr[0])*n with: struct_size(pkt, addr, n) Also, notice that variable size is unnecessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/efa: Remove unused includesGal Pressman2019-05-292-3/+0
| | | | | | | | | Remove leftover includes that are no longer used from the driver. Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/efa: Use rdma block iterator in chunk list creationGal Pressman2019-05-291-14/+11
| | | | | | | | | | | | When creating the chunks list the rdma_for_each_block() iterator is used in order to iterate over the payload in EFA_CHUNK_PAYLOAD_SIZE (device defined) strides. Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/efa: Remove unneeded admin commands abort flowGal Pressman2019-05-292-74/+1
| | | | | | | | | | | | The admin commands abort flow is buggy (use-after-free) and not really necessary as it is guaranteed that after ib_unregister_device() is called there are no user verbs threads running in parallel, delete it. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/efa: Use kvzalloc instead of kzalloc with fallbackGal Pressman2019-05-291-25/+21
| | | | | | | | | | | | | | | | Use kvzalloc which attempts to allocate a physically continuous buffer and fallbacks to virtually continuous on failure instead of open coding it in the driver. The is_vmalloc_addr function is used to determine whether the buffer is physically continuous or not (which determines direct vs indirect MR registration mode). Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/hfi1: Remove extra brackets from an ifDennis Dalessandro2019-05-291-2/+1
| | | | | | | | A recent patch to hfi1 left behind a checkpatch error. Fixes: fb24ea52f78e ("drivers: Remove explicit invocations of mmiowb()") Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA: Convert put_page() to put_user_page*()John Hubbard2019-05-275-23/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For infiniband code that retains pages via get_user_pages*(), release those pages via the new put_user_page(), or put_user_pages*(), instead of put_page() This is a tiny part of the second step of fixing the problem described in [1]. The steps are: 1) Provide put_user_page*() routines, intended to be used for releasing pages that were pinned via get_user_pages*(). 2) Convert all of the call sites for get_user_pages*(), to invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time. 3) After (2) is complete, use get_user_pages*() and put_user_page*() to implement tracking of these pages. This tracking will be separate from the existing struct page refcounting. 4) Use the tracking and identification of these pages, to implement special handling (especially in writeback paths) when the pages are backed by a filesystem. Again, [1] provides details as to why that is desirable. [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/hfi1: Remove set but not used variables 'offset' and 'fspsn'YueHaibing2019-05-271-4/+1
| | | | | | | | | | | | | | | | | | Fixes gcc '-Wunused-but-set-variable' warning: drivers/infiniband/hw/hfi1/tid_rdma.c: In function tid_rdma_rcv_error: drivers/infiniband/hw/hfi1/tid_rdma.c:2029:7: warning: variable offset set but not used [-Wunused-but-set-variable] drivers/infiniband/hw/hfi1/tid_rdma.c: In function hfi1_rc_rcv_tid_rdma_ack: drivers/infiniband/hw/hfi1/tid_rdma.c:4555:35: warning: variable fspsn set but not used [-Wunused-but-set-variable] 'offset' is never used since introduction in commit d0d564a1caac ("IB/hfi1: Add functions to receive TID RDMA READ request") 'fspsn' is never used since introduciotn in commit 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/hns: Replace magic numbers with #definesLijun Ou2019-05-278-69/+115
| | | | | | | | This patch makes the code more readable by removing magic numbers. Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/hns: Remove jiffies operation in disable interrupt contextLang Cheng2019-05-272-19/+21
| | | | | | | | | | In some functions, the jiffies operation is unnecessary, and we can control delay using mdelay and udelay functions only. Especially, in hns_roce_v1_clear_hem, the function calls spin_lock_irqsave, the context disables interrupt, so we can not use jiffies and msleep functions. Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/hns: Move spin_lock_irqsave to the correct placeLang Cheng2019-05-272-10/+5
| | | | | | | | | | | | | | | When hip08 set gid, it will call spin_unlock_bh when send cmq. if main.ko call spin_lock_irqsave firstly, and the kernel is before commit f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs are disabled/enabled"), it will cause WARN_ON_ONCE because of calling spin_unlock_bh in disable context. In fact, the spin_lock_irqsave in main.ko is only used for hip06, and should be placed in hns_roce_hw_v1.c. hns_roce_hw_v2.c uses its own spin_unlock_bh and does not need main.ko manage spin_lock. Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/hns: Update CQE specificationsLijun Ou2019-05-271-1/+1
| | | | | | | | According to hip08 UM, the maximum number of CQEs supported by each CQ is 4M. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/hns: Remove unnecessary print message in aeqYixian Liu2019-05-271-1/+0
| | | | | | | | There is no need to print when communication is established, especially while lots of qp used by application. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* iw_cxgb4: Fix qpid leakNirranjan Kirubaharan2019-05-272-33/+19
| | | | | | | | | | | Add await in destroy_qp() so that all references to qp are dereferenced and qp is freed in destroy_qp() itself. This ensures freeing of all QPs before invocation of dealloc_ucontext(), which prevents loss of in use qpids stored in the ucontext. Signed-off-by: Nirranjan Kirubaharan <nirranjan@chelsio.com> Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/cxgb4: Don't expose DMA addressesLeon Romanovsky2019-05-271-3/+3
| | | | | | | | Change unconditional print of DMA address to be printed with special printk format type specifier. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/cxgb4: Use sizeof() notationLeon Romanovsky2019-05-277-53/+53
| | | | | | | | Convert various sizeof call sites to be written in standard format sizeof(). Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/cxgb3: Delete and properly mark unimplemented resize CQ functionLeon Romanovsky2019-05-273-83/+0
| | | | | | | | | | | Resize CQ implementation was guarded by undeclared "notyet" define while cxgb3 was added to the kernel. Twelve years later, this call is still unimplemented, so safely delete it and fix improper return error code when .resize_cq() is not implemented. Fixes: b038ced7b370 ("RDMA/cxgb3: Add driver for Chelsio T3 RNIC") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/cxgb3: Don't expose DMA addressesLeon Romanovsky2019-05-272-9/+10
| | | | | | | | | DMA addresses like all other kernel addresses should be printed with special %p* formatter. It is needed to allow control of exposure of such information through a dedicated knob. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/cxgb3: Use sizeof() notation instead of plain sizeofLeon Romanovsky2019-05-273-16/+14
| | | | | | | | | | | | sizeof(a), sizeof a and sizeof (a) are all valid notations, but first is more readable format recommended by checkpatch.pl. Let's canonize it in cxgb3 drivers, so latter patches won't emit checkpatch warnings. As part of this change, a redundant memset() was removed. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/efa: Remove check that prevents destroy of resources in error flowsLeon Romanovsky2019-05-211-24/+0
| | | | | | | | | | | | | | | | Drivers cannot check the udata for validity when doing destroy as there will be no way to report this error back to the uverbs. Since udata is new for destroy no driver should start to use it - instead drivers should opt for the ioctl interface and define it in a way where it cannot fail due to incorrect data. Remove the checks on udata construction so EFA is consistent with everything else. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/nes: Remove second wait queue initialization callLeon Romanovsky2019-05-211-1/+0
| | | | | | | | The same wait queue is initialized a couple of lines above. Fixes: 3c2d774cad5b ("RDMA/nes: Add a driver for NetEffect RNICs") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/i40iw: Remove useless NULL checksLeon Romanovsky2019-05-211-8/+0
| | | | | | | There is no need to check existence of structures to be destroyed. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/nes: Remove useless NULL checksLeon Romanovsky2019-05-211-6/+0
| | | | | | | | The destroy functions are always called with relevant structs, there is no need to check their existence. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx4: Delete unused func argYuval Shaia2019-05-211-5/+3
| | | | | | | | The function argument virt_addr is not in use - delete it. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/umem: Move page_shift from ib_umem to ib_odp_umemJason Gunthorpe2019-05-218-42/+38
| | | | | | | | | | | This value has always been set to PAGE_SHIFT in the core code, the only thing that does differently was the ODP path. Move the value into the ODP struct and still use it for ODP, but change all the non-ODP things to just use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
* RDMA/qedr: Fix incorrect device rate.Sagiv Ozeri2019-05-211-16/+9
| | | | | | | | | | | | | Use the correct enum value introduced in commit 12113a35ada6 ("IB/core: Add HDR speed enum") Prior to this change a 50Gbps port would show 40Gbps. This patch also cleaned up the redundant redefiniton of ib speeds for qedr. Fixes: 12113a35ada6 ("IB/core: Add HDR speed enum") Signed-off-by: Sagiv Ozeri <sagiv.ozeri@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2019-05-143-9/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more rdma updates from Jason Gunthorpe: "This is being sent to get a fix for the gcc 9.1 build warnings, and I've also pulled in some bug fix patches that were posted in the last two weeks. - Avoid the gcc 9.1 warning about overflowing a union member - Fix the wrong callback type for a single response netlink to doit - Bug fixes from more usage of the mlx5 devx interface" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net/mlx5: Set completion EQs as shared resources IB/mlx5: Verify DEVX general object type correctly RDMA/core: Change system parameters callback from dumpit to doit RDMA: Directly cast the sockaddr union to sockaddr
| * IB/mlx5: Verify DEVX general object type correctlyYishai Hadas2019-05-141-3/+10
| | | | | | | | | | | | | | | | | | | | As the obj_id in the firmware is not globally unique in general_object, the object type must be considered upon checking for a valid object id. Fixes: 2351776e87a1 ("IB/mlx5: Verify DEVX object type") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA: Directly cast the sockaddr union to sockaddrJason Gunthorpe2019-05-132-6/+4
| | | | | | | | | | | | | | | | | | | | | | gcc 9 now does allocation size tracking and thinks that passing the member of a union and then accessing beyond that member's bounds is an overflow. Instead of using the union member, use the entire union with a cast to get to the sockaddr. gcc will now know that the memory extends the full size of the union. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/mthca: use the new FOLL_LONGTERM flag to get_user_pages_fast()Ira Weiny2019-05-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS DAX pages being mapped. Link: http://lkml.kernel.org/r/20190328084422.29911-8-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-8-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | IB/qib: use the new FOLL_LONGTERM flag to get_user_pages_fast()Ira Weiny2019-05-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS DAX pages being mapped. Link: http://lkml.kernel.org/r/20190328084422.29911-7-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-7-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | IB/hfi1: use the new FOLL_LONGTERM flag to get_user_pages_fast()Ira Weiny2019-05-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS DAX pages being mapped. [ira.weiny@intel.com: v3] Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-6-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny2019-05-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERMIra Weiny2019-05-142-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pach series "Add FOLL_LONGTERM to GUP fast and use it". HFI1, qib, and mthca, use get_user_pages_fast() due to its performance advantages. These pages can be held for a significant time. But get_user_pages_fast() does not protect against mapping FS DAX pages. Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which retains the performance while also adding the FS DAX checks. XDP has also shown interest in using this functionality.[1] In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and remove the specialized get_user_pages_longterm call. [1] https://lkml.org/lkml/2019/3/19/939 "longterm" is a relative thing and at this point is probably a misnomer. This is really flagging a pin which is going to be given to hardware and can't move. I've thought of a couple of alternative names but I think we have to settle on if we are going to use FL_LAYOUT or something else to solve the "longterm" problem. Then I think we can change the flag to a better name. Secondly, it depends on how often you are registering memory. I have spoken with some RDMA users who consider MR in the performance path... For the overall application performance. I don't have the numbers as the tests for HFI1 were done a long time ago. But there was a significant advantage. Some of which is probably due to the fact that you don't have to hold mmap_sem. Finally, architecturally I think it would be good for everyone to use *_fast. There are patches submitted to the RDMA list which would allow the use of *_fast (they reworking the use of mmap_sem) and as soon as they are accepted I'll submit a patch to convert the RDMA core as well. Also to this point others are looking to use *_fast. As an aside, Jasons pointed out in my previous submission that *_fast and *_unlocked look very much the same. I agree and I think further cleanup will be coming. But I'm focused on getting the final solution for DAX at the moment. This patch (of 7): This patch starts a series which aims to support FOLL_LONGTERM in get_user_pages_fast(). Some callers who would like to do a longterm (user controlled pin) of pages with the fast variant of GUP for performance purposes. Rather than have a separate get_user_pages_longterm() call, introduce FOLL_LONGTERM and change the longterm callers to use it. This patch does not change any functionality. In the short term "longterm" or user controlled pins are unsafe for Filesystems and FS DAX in particular has been blocked. However, callers of get_user_pages_fast() were not "protected". FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it requires vmas to determine if DAX is in use. NOTE: In merging with the CMA changes we opt to change the get_user_pages() call in check_and_migrate_cma_pages() to a call of __get_user_pages_locked() on the newly migrated pages. This makes the code read better in that we are calling __get_user_pages_locked() on the pages before and after a potential migration. As a side affect some of the interfaces are cleaned up but this is not the primary purpose of the series. In review[1] it was asked: <quote> > This I don't get - if you do lock down long term mappings performance > of the actual get_user_pages call shouldn't matter to start with. > > What do I miss? A couple of points. First "longterm" is a relative thing and at this point is probably a misnomer. This is really flagging a pin which is going to be given to hardware and can't move. I've thought of a couple of alternative names but I think we have to settle on if we are going to use FL_LAYOUT or something else to solve the "longterm" problem. Then I think we can change the flag to a better name. Second, It depends on how often you are registering memory. I have spoken with some RDMA users who consider MR in the performance path... For the overall application performance. I don't have the numbers as the tests for HFI1 were done a long time ago. But there was a significant advantage. Some of which is probably due to the fact that you don't have to hold mmap_sem. Finally, architecturally I think it would be good for everyone to use *_fast. There are patches submitted to the RDMA list which would allow the use of *_fast (they reworking the use of mmap_sem) and as soon as they are accepted I'll submit a patch to convert the RDMA core as well. Also to this point others are looking to use *_fast. As an asside, Jasons pointed out in my previous submission that *_fast and *_unlocked look very much the same. I agree and I think further cleanup will be coming. But I'm focused on getting the final solution for DAX at the moment. </quote> [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965 [ira.weiny@intel.com: v3] Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2019-05-09148-3283/+9099
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle than normal. One new driver was accepted, which is unusual, and at least one more driver remains in review on the list. Summary: - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma - Many patches from MatthewW converting radix tree and IDR users to use xarray - Introduction of tracepoints to the MAD layer - Build large SGLs at the start for DMA mapping and get the driver to split them - Generally clean SGL handling code throughout the subsystem - Support for restricting RDMA devices to net namespaces for containers - Progress to remove object allocation boilerplate code from drivers - Change in how the mlx5 driver shows representor ports linked to VFs - mlx5 uapi feature to access the on chip SW ICM memory - Add a new driver for 'EFA'. This is HW that supports user space packet processing through QPs in Amazon's cloud" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits) RDMA/ipoib: Allow user space differentiate between valid dev_port IB/core, ipoib: Do not overreact to SM LID change event RDMA/device: Don't fire uevent before device is fully initialized lib/scatterlist: Remove leftover from sg_page_iter comment RDMA/efa: Add driver to Kconfig/Makefile RDMA/efa: Add the efa module RDMA/efa: Add EFA verbs implementation RDMA/efa: Add common command handlers RDMA/efa: Implement functions that submit and complete admin commands RDMA/efa: Add the ABI definitions RDMA/efa: Add the com service API definitions RDMA/efa: Add the efa_com.h file RDMA/efa: Add the efa.h header file RDMA/efa: Add EFA device definitions RDMA: Add EFA related definitions RDMA/umem: Remove hugetlb flag RDMA/bnxt_re: Use core helpers to get aligned DMA address RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks RDMA/umem: Add API to find best driver supported page size in an MR ...
| * RDMA/efa: Add driver to Kconfig/MakefileGal Pressman2019-05-073-0/+25
| | | | | | | | | | | | | | | | Add EFA Makefile and Kconfig. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/efa: Add the efa moduleGal Pressman2019-05-071-0/+533
| | | | | | | | | | | | | | | | | | | | Add the main EFA module file which takes care of device probe/initialization/registration/etc. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
OpenPOWER on IntegriCloud