summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core
Commit message (Collapse)AuthorAgeFilesLines
...
| * | RDMA/restrack: Release task struct which was hold by CM_ID objectLeon Romanovsky2018-10-052-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tracking CM_ID resource is performed in two stages: creation of cm_id and connecting it to the cma_dev. It is needed because rdma-cm protocol exports two separate user-visible calls rdma_create_id and rdma_accept. At the time of CM_ID creation, the real owner of that object is unknown yet and we need to grab task_struct. This task_struct is released or reassigned in attach phase later on. but call to rdma_destroy_id left this task_struct unreleased. Such separation is unique to CM_ID and other restrack objects initialize in one shot. It means that it is safe to use "res->valid" check to catch unfinished CM_ID flow and release task_struct for that object. Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information") Reported-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/restrack: Consolidate task name updates in one placeLeon Romanovsky2018-10-054-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify task update and kernel name set in one place. Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/restrack: Un-inline set task implementationLeon Romanovsky2018-10-051-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prepare rdma_restrack_set_task() call to accommodate more code by moving its implementation from *.h to *.c. Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Check error status of rdma_find_ndev_for_src_ip_rcuParav Pandit2018-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rdma_find_ndev_for_src_ip_rcu() returns either valid netdev pointer or ERR_PTR(). Instead of checking for NULL, check for error. Fixes: caf1e3ae9fa6 ("RDMA/core Introduce and use rdma_find_ndev_for_src_ip_rcu") Reported-by: syzbot+20c32fa6ff84a2d28c36@syzkaller.appspotmail.com Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/netlink: Simplify netlink listener existence checkLeon Romanovsky2018-10-033-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | All users of rdma_nl_chk_listeners() are interested to get boolean answer if netlink socket has listeners, so update all places to boolean function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA: Remove unused parameter from ib_modify_qp_is_ok()Kamal Heib2018-10-031-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | The ll parameter is not used in ib_modify_qp_is_ok(), so remove it. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/uverbs: Fix RCU annotation for radix slot deferenceJason Gunthorpe2018-10-031-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | The uapi radix tree is a write-once data structure protected by kref. Once we get to the ioctl() fop it is not possible for anything else to be writing to it, so the access should use rcu_dereference_protected. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/cma: Introduce and use cma_ib_acquire_dev()Parav Pandit2018-09-301-24/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When RDMA CM connect request arrives for IB transport, it already contains device, port, netdevice (optional). Instead of traversing all the cma devices, use the cma device already found by the cma_find_listener() for which a listener id is provided. iWarp devices doesn't need to derive RoCE GIDs, therefore drop RoCE specific checks from cma_acquire_dev() and rename it to cma_iw_acquire_dev(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/cma: Introduce and use cma_acquire_dev_by_src_ip()Parav Pandit2018-09-301-18/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Light weight version of cma_acquire_dev() just for binding with rdma device based on source IP(v4/v6) address. This simplifies cma_acquire_dev() to avoid listen_id specific checks and also for subsequent simplification for IB vs iWarp. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/cma: Allow accepting requests for multi port rdma deviceParav Pandit2018-09-301-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When IP failover is used between multiple ports of a given rdma device, allow accepting CM requests from either of the ports. This is applicable for IPv4 and IPv6 non link local addressing scheme. IPv6 link local addresses are bound. IP failover requests for listen cm_ids bound to specific netdev interfaces cannot be supported. (Similar to traditional sockets). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Acquire and release mmap_sem on page rangeParav Pandit2018-09-271-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently mmap_sem is read locked while pinning the memory. In a multi-threaded application of a process, holding mmap_sem lock creates contention with other threads who might be either registering memory, creating QPs or simply doing mmap() as such operations also require to hold the mmap_sem write lock. All such operation cannot make forward progress until one memory pin operation is completed. It becomes more worse if the memory is unpinned and/or memory registration is large (in GB range). Therefore, instead of holding mmap_sem for too long (for whole region pinning), acquire and release the lock for every few pages. For example on x86 with 4K page size, acquire and release mmap_sem for every 2Mbytes memory chunk. This allows other competing threads to make progress who might wish to hold mmap_sem for shorter duration. When memory registration latency is measured using [1] for memory sizes ranging from 4K to 48GB, <= 1% or 0.5% degradation is noticed. In many runs no difference is seen other than run-to-run variance. In other targeted tests of users with large memory, desired improvements are seen due to reduced contention of mmap_sem. [1] https://github.com/paravmellanox/rtool $ rdma_resource_lat -c 1 -s 48G -a -u L -i 500 -A It registers pinned memory from 4K to 48GB size with 500 iterations for each memory size. $ rdma_resource_lat -c 1 -s 12G -a -u L -i 500 -t 4 4 competing threads pin memory, each of 12GB size with 500 iterations. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | IB/sa: simplify return code logic for ib_nl_send_msg()Alex Estrin2018-09-261-11/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rdma_nl_multicast() returns either negative error code or zero if succeeded. Remove unnecessary ret code checks and reassignments. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Use dev_name instead of ibdev->nameJason Gunthorpe2018-09-2610-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These return the same thing but dev_name is a more conventional use of the kernel API. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
| * | RDMA/core: Use dev_err/dbg/etc instead of pr_* + ibdev->nameJason Gunthorpe2018-09-266-50/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Any messages related to a device should be printed with the dev_* formatters. This provides greater consistency for the user. The core does not set pr_fmt so this has no significant change. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
| * | RDMA: Fully setup the device name in ib_register_deviceJason Gunthorpe2018-09-262-19/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code has two copies of the device name, ibdev->dev and dev_name(&ibdev->dev), and they are setup at different times, which is very confusing. Set them both up at the same time and make dev_name() the lead name, which is the proper use of the driver core APIs. To make it very clear that the name is not valid until registration pass it in to the ib_register_device() call rather than messing with ibdev->name directly. Also the reorganization now checks that dev_name is unique even if it does not contain a %. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
| * | RDMA/umem: Fix potential addition overflowDoug Ledford2018-09-251-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given a large enough memory allocation, it is possible to wrap the pinned_vm counter. Check for addition overflow to prevent such eventualities. Fixes: 40ddacf2dda9 ("RDMA/umem: Don't hold mmap_sem for too long") Reported-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Doug Ledford <dledford@redhat.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/umem: Minor optimizationsDoug Ledford2018-09-251-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Noticed while reviewing commit d4b4dd1b9706 ("RDMA/umem: Do not use current->tgid to track the mm_struct") patch. Why would we take a lock, adjust a protected variable, drop the lock, and *then* check the input into our protected variable adjustment? Then we have to take the lock again on our error unwind. Let's just check the input early and skip taking the locks needlessly if the input isn't valid. It was also noticed that we set mm = current->mm, we then never modify mm, but we still go back and reference current->mm a number of times needlessly. Be consistent in using the stored reference in mm. Signed-off-by: Doug Ledford <dledford@redhat.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/uverbs: Get rid of ucontext->tgidJason Gunthorpe2018-09-212-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | Nothing uses this now, just delete it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/umem: Avoid synchronize_srcu in the ODP MR destruction pathJason Gunthorpe2018-09-211-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | synchronize_rcu is slow enough that it should be avoided on the syscall path when user space is destroying MRs. After all the rework we can now trivially do this by having call_srcu kfree the per_mm. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/umem: Handle a half-complete start/end sequenceJason Gunthorpe2018-09-211-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mmu_notifier_unregister() can race between a invalidate_start/end and cause the invalidate_end to be skipped. This causes an imbalance in the locking, which lockdep complains about. This is not actually a bug, as we immediately kfree the memory holding the lock, but it simple enough to fix. Mark when the notifier is being destroyed and abort the start callback. This can be done under the lock we already obtained, and can re-purpose the invalidate_range test we already have. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/umem: Get rid of per_mm->notifier_countJason Gunthorpe2018-09-211-95/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is intrinsically racy and the scheme is simply unnecessary. New MR registration can wait for any on going invalidation to fully complete. CPU0 CPU1 if (atomic_read()) if (atomic_dec_and_test() && !list_empty()) { /* not taken */ } list_add() Putting the new UMEM into some kind of purgatory until another invalidate rolls through.. Instead hold the read side of the umem_rwsem across the pair'd start/end and get rid of the racy 'deferred add' approach. Since all umem's in the rbt are always ready to go, also get rid of the mn_counters_active stuff. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/umem: Use umem->owning_mm inside ODPJason Gunthorpe2018-09-212-147/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since ODP had a single struct mmu_notifier located in the ucontext it could only handle a single MM at a time, and this prevented it from using the new owning_mm system. With the prior rework it is now simple to let ODP track multiple MMs per ucontext, finish the job so that the per_mm is allocated on a mm by mm basis, and freed when the last umem is dropped from the ucontext. As a side effect the new saner locking removes the lockdep splat about nesting the umem_rwsem between mmu_notifier_unregister and ib_umem_odp_release. It also makes ODP work with multiple processes, across, fork, etc. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/umem: Move all the ODP related stuff out of ucontext and into per_mmJason Gunthorpe2018-09-212-63/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first step to make ODP use the owning_mm that is now part of struct ib_umem. Each ODP umem is linked to a single per_mm structure, which in turn, is linked to a single mm, via the embedded mmu_notifier. This first patch introduces the structure and reworks eveything to use it. This also needs to introduce tgid into the ib_ucontext_per_mm, as get_user_pages_remote() requires the originating task for statistics tracking. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/umem: Get rid of struct ib_umem.odp_dataJason Gunthorpe2018-09-212-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This no longer has any use, we can use container_of to get to the umem_odp, and a simple flag to indicate if this is an odp MR. Remove the few remaining references to it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/umem: Make ib_umem_odp into a sub structure of ib_umemJason Gunthorpe2018-09-212-63/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These two structures are linked together, use the container_of pattern instead of a double allocation to make the code simpler and easier to follow. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/umem: Use ib_umem_odp in all function signatures connected to ODPJason Gunthorpe2018-09-212-67/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of these functions already require the ODP version of the umem struct, make this very clear by having the signature require it. This paves the way to using the container_of() pattern to link umem_odp and umem together. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/umem: Do not use current->tgid to track the mm_structJason Gunthorpe2018-09-201-41/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is just wrong, the process that calls into the reg_mr is the process associated with the umem, and that does not have to be the same process that created the context. When this code was first written mmgrab() didn't exist, however these days we can just directly hold the mm_struct pointer in the umem and have no ambiguity when it comes to releasing the umem as to which mm it was associated with. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/ucontext: Get rid of the old disassociate flowJason Gunthorpe2018-09-201-41/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The disassociate_ucontext function in every driver is now empty, so we don't need this ugly and wrong code that was messing with tgids. rdma_user_mmap_io does this same work in a better way. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/ucontext: Add a core API for mmaping driver IO memoryJason Gunthorpe2018-09-204-1/+230
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support disassociation and PCI hot unplug, we have to track all the VMAs that refer to the device IO memory. When disassociation occurs the VMAs have to be revised to point to the zero page, not the IO memory, to allow the physical HW to be unplugged. The three drivers supporting this implemented three different versions of this algorithm, all leaving something to be desired. This new common implementation has a few differences from the driver versions: - Track all VMAs, including splitting/truncating/etc. Tie the lifetime of the private data allocation to the lifetime of the vma. This avoids any tricks with setting vm_ops which Linus didn't like. (see link) - Support multiple mms, and support properly tracking mmaps triggered by processes other than the one first opening the uverbs fd. This makes fork behavior of disassociation enabled drivers the same as fork support in normal drivers. - Don't use crazy get_task stuff. - Simplify the approach for to racing between vm_ops close and disassociation, fixing the related bugs most of the driver implementations had. Since we are in core code the tracking list can be placed in struct ib_uverbs_ufile, which has a lifetime strictly longer than any VMAs created by mmap on the uverbs FD. Link: https://www.spinics.net/lists/stable/msg248747.html Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.com Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/uverbs: Fix error unwind in ib_uverbs_add_oneJason Gunthorpe2018-09-191-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error path has several mistakes - cdev_del should not be called if cdev_device_add fails - We must call put_device on all the goto exit paths as that is what frees the uapi, SRCU and the struct itself. While we are here consolidate all the uvdev_dev init that cannot fail at the top. Fixes: c5c4d92e70f3 ("RDMA/uverbs: Use cdev_device_add() instead of cdev_add()") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
| * | RDMA/core: Properly return the error code of rdma_set_src_addr_rcuYueHaibing2018-09-191-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rdma_set_src_addr_rcu should check copy_src_l2_addr fails, rather than always return 0. Also copy_src_l2_addr should return 'ret' as its return value when rdma_translate_ip fails. Fixes: c31d4b2ddf07 ("RDMA/core: Protect against changing dst->dev during destination resolve") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/uverbs: Remove is_closed from ib_uverbs_fileJason Gunthorpe2018-09-192-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | This does nothing but indicate if the uverbs_file is in the device's list, use list_del_init instead. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
| * | RDMA/core: Consider net ns of gid attribute for RoCEParav Pandit2018-09-124-16/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When resolving destination address or route, when net namespace is unavailable, refer to the net namespace of the netdevice of the SGID attribute. This is typically the case for requests arriving from the network for RoCE ports. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Introduce rdma_read_gid_attr_ndev_rcu() to check GID attributeParav Pandit2018-09-122-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce an API rdma_read_gid_attr_ndev_rcu() to return GID attribute netdevice which is in UP state for accessing netdevice's fields such as net namespace and ifindex. This is useful for users who intent to access netdevice fields under rcu lock. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Simplify roce_resolve_route_from_path()Parav Pandit2018-09-123-53/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently RoCE route resolve functionality is split between two functions. (a) roce_resolve_route_from_path() and its helper function rdma_resolve_ip_route(). Due to this multiple sockaddr src structures are created in both functions with rdma_dev_addr is an interface between the two for checks. Since there is only one user of rdma_resolve_ip_route() as RoCE, combine the functionality of both functions to roce_resolve_route_from_path() and further reduce the scope of rdma_dev_addr to core/addr.c This also allow to extend addr_resolve() in subsequent patch to consider netdev properties of GID in safer way under rcu lock. Additionally src and dst addresses were always provided, so skip the src addr NULL pointer check as they are present on the stack now. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Protect against changing dst->dev during destination resolveParav Pandit2018-09-121-15/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During resolving address process, during route lookup and while performing src address translation in case of loopback mode, hold the rcu lock so that if netdevice is moving to different net namespace, or being unregistered, it can be synchronized with net/core/dev.c, ie change_net_namespace() ->dev_close_many() ->rt6_uncached_list_flush_dev() who would change dst->dev to loopback device of the given net namespace. Therefore, hold the rcu lock and sync with synchronize_net() of change_net_namespace() to ensure that netdevice cannot get freed while dst->dev is being used. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Refer to network type instead of device typeParav Pandit2018-09-121-19/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set and refer to rdma_dev_addr network type instead of dst->ndev to reduce dependency on accessing dst netdevice. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Use common code flow for IPv4/6 for addr resolveParav Pandit2018-09-121-17/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use common code flow for resolving neighbour and for finding source addresses. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Rename rdma_copy_addr to rdma_copy_src_l2_addrParav Pandit2018-09-123-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that rdma_copy_addr() only copies the source addresses and all callers are interested in copying only source addresses, simplify it to drop the destination address argument. Given that it only copies source layer2 addresses, rename it to rdma_copy_src_l2_addr for better code readability. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Introduce and use rdma_set_src_addr() between IPv4 and IPv6Parav Pandit2018-09-121-42/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rdma_translate_ip() is done while resolving address for the loopback addresses. The current flow is convoluted with resolve neighbor being optional. This patch simplifies the code in following ways. (a) Use common code between IPv4 and IPv6 for address translation, loopback checks and acquiring netdevice. (b) During neigh resolve in addr_resolve_neigh(), only copy destination address. (c) Always resolve the source address before the destination address, because it doesn't depend on resolving neigh being requested or not. This helps to reduce 3 calls of rdma_copy_addr and rdma_translate_ip to one and makes it easier to follow the code flow. Now that ib_nl_fetch_ha() doesn't depend on dst, drop dst argument from ib_nl_fetch_ha(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Let protocol specific function typecast sockaddr structureParav Pandit2018-09-121-16/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code typecasts destination address using extra variable but uses source address as is. Even though the compiler optimizes such code well, just let each protocol specific function typecast for src and dest both and have symmetric code. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Avoid unnecessary sa_family overwriteParav Pandit2018-09-121-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | addr4_resolve() and addr6_resolve() are called by checking the value of sa_family. Both above functions overwrite the value after typecasting, this is not necessary. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core Introduce and use rdma_find_ndev_for_src_ip_rcuParav Pandit2018-09-121-26/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes two issues: 1. When address family is other than IPv4 or v6, rdma_translate_ip() returns success which is incorrect. 2. When address familty is AF_INET6, and if the source address is not found, it returns success, which is also incorrect. Therefore, introduce and use rdma_find_ndev_for_src_ip_rcu() helper function which returns correct success or error status and is also useful for future code refactor in addr_resolve(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/mlx5: Add flow actions support to raw create flowMark Bloch2018-09-111-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support attaching flow actions to a flow rule via raw create flow. For now only NIC RX path is supported. This change requires to export flow resources management functions so we can maintain proper bookkeeping of flow actions. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/uverbs: Move flow resources initializationMark Bloch2018-09-112-23/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use ib_set_flow() when initializing flow related resources. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | IB/uverbs: Add IDRs array attribute type to ioctl() interfaceGuy Levi2018-09-112-0/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Methods sometimes need to get a flexible set of IDRs and not a strict set as can be achieved today by the conventional IDR attribute. Add a new IDRS_ARRAY attribute to the generic uverbs ioctl layer. IDRS_ARRAY points to array of idrs of the same object type and same access rights, only write and read are supported. Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>`` Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Assign device ifindex before publishing the deviceParav Pandit2018-09-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even though device->ifindex is assigned before adding the device in the list which is read by netlink flow, it is better to assign rdma device index before publishing the device in the system to users and clients. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Follow correct unregister order between sysfs and cgroupParav Pandit2018-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During register_device() init sequence is, (a) register with rdma cgroup followed by (b) register with sysfs Therefore, unregister_device() sequence should follow the reverse order. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/umem: Restore lockdep check while downgrading lockLeon Romanovsky2018-09-061-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep engine handles correctly downgrade of locks and it simply incorrect to disable lockdep checks prior to calling mmu_notifier. Remove lockdep_off and ensure locks correctness. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | RDMA/core: Define client_data_lock as rwlock instead of spinlockParav Pandit2018-09-061-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even though device registration/unregistration and client registration/unregistration is not a performance path, define the client_data_lock as rwlock for code clarity. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
OpenPOWER on IntegriCloud