summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core
Commit message (Collapse)AuthorAgeFilesLines
...
* | RDMA/devices: Use xarray to store the client_dataJason Gunthorpe2019-02-081-178/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have a small ID for each client we can use xarray instead of linearly searching linked lists for client data. This will give much faster and scalable client data lookup, and will lets us revise the locking scheme. Since xarray can store 'going_down' using a mark just entirely eliminate the struct ib_client_data and directly store the client_data value in the xarray. However this does require a special iterator as we must still iterate over any NULL client_data values. Also eliminate the client_data_lock in favour of internal xarray locking. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/devices: Use xarray to store the clientsJason Gunthorpe2019-02-081-5/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | This gives each client a unique ID and will let us move client_data to use xarray, and revise the locking scheme. clients have to be add/removed in strict FIFO/LIFO order as they interdepend. To support this the client_ids are assigned to increase in FIFO order. The existing linked list is kept to support reverse iteration until xarray can get a reverse iteration API. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
* | RDMA/device: Use an ida instead of a free page in alloc_nameJason Gunthorpe2019-02-081-11/+17
| | | | | | | | | | | | | | | | ida is the proper data structure to hold list of clustered small integers and then allocate an unused integer. Get rid of the convoluted and limited open-coded bitmap. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/device: Get rid of reg_stateJason Gunthorpe2019-02-081-6/+2
| | | | | | | | | | | | | | | | This really has no purpose anymore, refcount can be used to tell if the device is still registered. Keeping it around just invites mis-use. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
* | RDMA/device: Call ib_cache_release_one() only from ib_device_release()Jason Gunthorpe2019-02-082-29/+15
| | | | | | | | | | | | | | | | | | | | | | Instead of complicated logic about when this memory is freed, always free it during device release(). All the cache pointers start out as NULL, so it is safe to call this before the cache is initialized. This makes for a simpler error unwind flow, and a simpler understanding of the lifetime of the memory allocations inside the struct ib_device. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/device: Ensure that security memory is always freedJason Gunthorpe2019-02-083-12/+6
| | | | | | | | | | | | | | | | Since this only frees memory it should be done during the release callback. Otherwise there are possible error flows where it might not get called if registration aborts. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/device: Check that the rename is nop under the lockJason Gunthorpe2019-02-081-4/+6
| | | | | | | | | | | | | | | | | | | | Since another rename could be running in parallel it is safer to check that the name is not changing inside the lock, where we already know the device name will not change. Fixes: d21943dd19b5 ("RDMA/core: Implement IB device rename function") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
* | RDMA: Handle PD allocations by IB/coreLeon Romanovsky2019-02-084-17/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Share driver structure size with coreLeon Romanovsky2019-02-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Add new macros to be used in drivers while registering ops structure and IB/core while calling allocation routines, so drivers won't need to perform kzalloc/kfree in their paths. The change in allocation stage allows us to initialize common fields prior to calling to drivers (e.g. restrack). Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Don't register each MAD agent for LSM notifierDaniel Jurgens2019-02-083-22/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating many MAD agents in a short period of time, receive packet processing can be delayed long enough to cause timeouts while new agents are being added to the atomic notifier chain with IRQs disabled. Notifier chain registration and unregstration is an O(n) operation. With large numbers of MAD agents being created and destroyed simultaneously the CPUs spend too much time with interrupts disabled. Instead of each MAD agent registering for it's own LSM notification, maintain a list of agents internally and register once, this registration already existed for handling the PKeys. This list is write mostly, so a normal spin lock is used vs a read/write lock. All MAD agents must be checked, so a single list is used instead of breaking them down per device. Notifier calls are done under rcu_read_lock, so there isn't a risk of similar packet timeouts while checking the MAD agents security settings when notified. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Fix potential memory leak while creating MAD agentsDaniel Jurgens2019-02-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | If the MAD agents isn't allowed to manage the subnet, or fails to register for the LSM notifier, the security context is leaked. Free the context in these cases. Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Unregister notifier before freeing MAD securityDaniel Jurgens2019-02-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | If the notifier runs after the security context is freed an access of freed memory can occur. Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/iwcm: add tos_set bool to iw_cm structSteve Wise2019-02-081-0/+2
| | | | | | | | | | | | | | This allows drivers to know the tos was actively set by the application. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/cma: listening device cm_ids should inherit tosSteve Wise2019-02-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a user binds to INADDR_ANY and sets the service id, then the device-specific cm_ids should also use this tos. This allows an app to do: rdma_bind_addr(INADDR_ANY) set_service_type() rdma_listen() And connections setup via this listening endpoint will use the correct tos. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/cma: Define option to set ack timeout and pack tos_setDanit Goldberg2019-02-083-1/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | Define new option in 'rdma_set_option' to override calculated QP timeout when requested to provide QP attributes to modify a QP. At the same time, pack tos_set to be bitfield. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | drivers/IB,core: reduce scope of mmap_semDavidlohr Bueso2019-02-071-39/+2
| | | | | | | | | | | | | | | | | | | | | | ib_umem_get() uses gup_longterm() and relies on the lock to stabilze the vma_list, so we cannot really get rid of mmap_sem altogether, but now that the counter is atomic, we can get of some complexity that mmap_sem brings with only pinned_vm. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | mm: make mm->pinned_vm an atomic64 counterDavidlohr Bueso2019-02-071-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Taking a sleeping lock to _only_ increment a variable is quite the overkill, and pretty much all users do this. Furthermore, some drivers (ie: infiniband and scif) that need pinned semantics can go to quite some trouble to actually delay via workqueue (un)accounting for pinned pages when not possible to acquire it. By making the counter atomic we no longer need to hold the mmap_sem and can simply some code around it for pinned_vm users. The counter is 64-bit such that we need not worry about overflows such as rdma user input controlled from userspace. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/iwpm: move kdoc comments to functionsSteve Wise2019-02-052-33/+123
| | | | | | | | | | | | | | | | Move the iwpm kdoc comments from the prototype declarations to above the function bodies. There are no functional changes in this patch. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/cma: Remove CM_ID statistics provided by rdma-cm moduleLeon Romanovsky2019-02-052-86/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Netlink statistics exported by rdma-cm never had any working user space component published to the mailing list or to any open source project. Canvassing various proprietary users, and the original requester, we find that there are no real users of this interface. This patch simply removes all occurrences of RDMA CM netlink in favour of modern nldev implementation, which provides the same information and accompanied by widely used user space component. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/IWPM: Support no port mapping requirementsSteve Wise2019-02-044-6/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A soft iwarp driver that uses the host TCP stack via a kernel mode socket does not need port mapping. In fact, if the port map daemon, iwpmd, is running, then iwpmd must not try and create/bind a socket to the actual port for a soft iwarp connection, since the driver already has that socket bound. Yet if the soft iwarp driver wants to interoperate with hard iwarp devices that -are- using port mapping, then the soft iwarp driver's mappings still need to be maintained and advertised by the iwpm protocol. This patch enhances the rdma driver<->iwcm interface to allow an iwarp driver to specify that it does not want port mapping. The iwpm kernel<->iwpmd interface is also enhanced to pass up this information on map requests. Care is taken to interoperate with the current iwpmd version (ABI version 3) and only use the new NL attributes if iwpmd supports ABI version 4. The ABI version define has also been created in rdma_netlink.h so both kernel and user code can share it. The iwcm and iwpmd negotiate the ABI version to use with a new HELLO netlink message. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/IWPM: refactor the IWPM message attribute namesSteve Wise2019-02-041-17/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to add new IWPM_NL attributes, the enums for the IWPM commands attributes are refactored such that a new attribute can be added without breaking ABI version 3. Instead of sharing nl attribute enums for both request and response messages, we create separate enums for each IWPM message request and reply. This allows us to extend any given IWPM message by adding new attributes for just that message. These new enums are created, though, in a way to avoid breaking ABI version 3. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | Merge tag 'v5.0-rc5' into rdma.git for-nextJason Gunthorpe2019-02-0410-33/+102
|\| | | | | | | | | | | | | | | Linux 5.0-rc5 Needed to merge the include/uapi changes so we have an up to date single-tree for these files. Patches already posted are also expected to need this for dependencies.
| * IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociateYishai Hadas2019-01-291-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vma->vm_mm can become impossible to get before rdma_umap_close() is called, in this case we must not try to get an mm that is already undergoing process exit. In this case there is no need to wait for anything as the VMA will be destroyed by another thread soon and is already effectively 'unreachable' by userspace. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 2050 Comm: bash Tainted: G W OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__rb_erase_color+0xb9/0x280 Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89 58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d 10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246 RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101 RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828 RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838 R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000 FS: 00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0 Call Trace: unlink_file_vma+0x3b/0x50 free_pgtables+0xa1/0x110 exit_mmap+0xca/0x1a0 ? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib] mmput+0x54/0x140 uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs] uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: <stable@vger.kernel.org> # 4.19 Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/uverbs: Fix ioctl query port to consider device disassociationYishai Hadas2019-01-251-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Methods cannot peak into the ufile, the only way to get a ucontext and hence a device is via the ib_uverbs_get_ucontext() call or inspecing a locked uobject. Otherwise during/after disassociation the pointers may be null or free'd. BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 PGD 800000005ece6067 P4D 800000005ece6067 PUD 5ece7067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 10631 Comm: ibv_ud_pingpong Tainted: GW OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:ib_uverbs_handler_UVERBS_METHOD_QUERY_PORT+0x53/0x191 [ib_uverbs] Code: 80 00 00 00 31 c0 48 8b 47 40 48 8d 5c 24 38 48 8d 6c 24 08 48 89 df 48 8b 40 08 4c 8b a0 18 03 00 00 31 c0 f3 48 ab 48 89 ef <49> 83 7c 24 78 00 b1 06 f3 48 ab 0f 84 89 00 00 00 45 31 c9 31 d2 RSP: 0018:ffffb54802ccfb10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffb54802ccfb48 RCX:0000000000000000 RDX: fffffffffffffffa RSI: ffffb54802ccfcf8 RDI:ffffb54802ccfb18 RBP: ffffb54802ccfb18 R08: ffffb54802ccfd18 R09:0000000000000000 R10: 0000000000000000 R11: 00000000000000d0 R12:0000000000000000 R13: ffffb54802ccfcb0 R14: ffffb54802ccfc48 R15:ffff9f736e0059a0 FS: 00007f55a6bd7740(0000) GS:ffff9f737ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000078 CR3: 0000000064214000 CR4:00000000000006f0 Call Trace: ib_uverbs_cmd_verbs.isra.5+0x94d/0xa60 [ib_uverbs] ? copy_port_attr_to_resp+0x120/0x120 [ib_uverbs] ? arch_tlb_finish_mmu+0x16/0xc0 ? tlb_finish_mmu+0x1f/0x30 ? unmap_region+0xd9/0x120 ib_uverbs_ioctl+0xbc/0x120 [ib_uverbs] do_vfs_ioctl+0xa9/0x620 ? __do_munmap+0x29f/0x3a0 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f55a62cb567 Fixes: 641d1207d2ed ("IB/core: Move query port to ioctl") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/uverbs: Fix OOPs upon device disassociationYishai Hadas2019-01-251-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The async_file might be freed before the disassociation has been ended, causing qp shutdown to use after free on it. Since uverbs_destroy_ufile_hw is not a fence, it returns if a disassociation is ongoing in another thread. It has to be written this way to avoid deadlock. However this means that the ufile FD close cannot destroy anything that may still be used by an active kref, such as the the async_file. To fix that move the kref_put() to be in ib_uverbs_release_file(). BUG: unable to handle kernel paging request at ffffffffba682787 PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061 Oops: 0003 [#1] SMP PTI CPU: 1 PID: 32410 Comm: bash Tainted: G OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0 Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85 d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83 RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006 RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001 RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787 RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294 R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00 FS: 00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0 Call Trace: _raw_spin_lock_irq+0x27/0x30 ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs] uverbs_free_qp+0x7e/0x90 [ib_uverbs] destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs] __uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs] uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fac551aac60 Cc: <stable@vger.kernel.org> # 4.2 Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/umem: Add missing initialization of owning_mmArtemy Kovalyov2019-01-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When allocating a umem leaf for implicit ODP MR during page fault the field owning_mm was not set. Initialize and take a reference on this field to avoid kernel panic when trying to access this field. BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 PGD 800000022dfed067 P4D 800000022dfed067 PUD 22dfcf067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 634 Comm: kworker/u33:0 Not tainted 4.20.0-rc6+ #89 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib] RIP: 0010:ib_umem_odp_map_dma_pages+0xf3/0x710 [ib_core] Code: 45 c0 48 21 f3 48 89 75 b0 31 f6 4a 8d 04 33 48 89 45 a8 49 8b 44 24 60 48 8b 78 10 e8 66 16 a8 c5 49 8b 54 24 08 48 89 45 98 <8b> 42 58 85 c0 0f 84 8e 05 00 00 8d 48 01 48 8d 72 58 f0 0f b1 4a RSP: 0000:ffffb610813a7c20 EFLAGS: 00010202 RAX: ffff95ace6e8ac80 RBX: 0000000000000000 RCX: 000000000000000c RDX: 0000000000000000 RSI: 0000000000000850 RDI: ffff95aceaadae80 RBP: ffffb610813a7ce0 R08: 0000000000000000 R09: 0000000000080c77 R10: ffff95acfffdbd00 R11: 0000000000000000 R12: ffff95aceaa20a00 R13: 0000000000001000 R14: 0000000000001000 R15: 000000000000000c FS: 0000000000000000(0000) GS:ffff95acf7800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 000000022c834001 CR4: 00000000001606f0 Call Trace: pagefault_single_data_segment+0x1df/0xc60 [mlx5_ib] mlx5_ib_eqe_pf_action+0x7bc/0xa70 [mlx5_ib] ? __switch_to+0xe1/0x470 process_one_work+0x174/0x390 worker_thread+0x4f/0x3e0 kthread+0x102/0x140 ? drain_workqueue+0x130/0x130 ? kthread_stop+0x110/0x110 ret_from_fork+0x1f/0x30 Fixes: f27a0d50a4bc ("RDMA/umem: Use umem->owning_mm inside ODP") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/device: Expose ib_device_try_get(()Jason Gunthorpe2019-01-212-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out future patches need this capability quite widely now, not just for netlink, so provide two global functions to manage the registration lock refcount. This also moves the point the lock becomes 1 to within ib_register_device() so that the semantics of the public API are very sane and clear. Calling ib_device_try_get() will fail on devices that are only allocated but not yet registered. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
| * RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUTJason Gunthorpe2019-01-144-13/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the ioctl interface for the write commands was introduced it did not mark the core response with UVERBS_ATTR_F_VALID_OUTPUT. This causes rdma-core in userspace to not mark the buffers as written for valgrind. Along the same lines it turns out we have always missed marking the driver data. Fixing both of these makes valgrind work properly with rdma-core and ioctl. Fixes: 4785860e04bc ("RDMA/uverbs: Implement an ioctl that can call write and write_ex handlers") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
| * RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id typeSteve Wise2019-01-081-1/+4
| | | | | | | | | | | | | | | | | | | | A recent regression causes a null ptr crash when dumping cm_id resources. The cma is incorrectly adding all cm_id restrack resources as kernel mode. Fixes: af8d70375d56 ("RDMA/restrack: Resource-tracker should not use uobject pointers") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/nldev: Don't expose unsafe global rkey to regular userLeon Romanovsky2019-01-071-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Unsafe global rkey is considered dangerous because it exposes memory registered for all memory in the system. Only users with a QP on the same PD can use the rkey, and generally those QPs will already know the value. However, out of caution, do not expose the value to unprivleged users on the local system. Require CAP_NET_ADMIN instead. Cc: <stable@vger.kernel.org> # 4.16 Fixes: 29cf1351d450 ("RDMA/nldev: provide detailed PD information") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/uverbs: Fix post send success return value in case of errorGal Pressman2019-01-071-1/+3
| | | | | | | | | | | | | | | | If get QP object fails 'ret' must be assigned with a proper error code. Fixes: 9a0738575f26 ("RDMA/uverbs: Use uverbs_response() for remaining response copying") Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Remove ib_sg_dma_address() and ib_sg_dma_len()Bart Van Assche2019-02-041-7/+5
| | | | | | | | | | | | | | | | | | Keeping single line wrapper functions is not useful. Hence remove the ib_sg_dma_address() and ib_sg_dma_len() functions. This patch does not change any functionality. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/uverbs: Expose XRC ODP device capabilitiesMoni Shoua2019-02-041-0/+1
| | | | | | | | | | | | | | | | | | Expose XRC ODP capabilities as part of the extended device capabilities. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Use the ops infrastructure to keep all callbacks in one placeLeon Romanovsky2019-01-303-17/+19
| | | | | | | | | | | | | | | | | | As preparation to hide rdma_restrack_root, refactor the code to use the ops structure instead of a special callback which is hidden in rdma_restrack_root. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/restrack: Refactor user/kernel restrack additionsLeon Romanovsky2019-01-301-11/+9
| | | | | | | | | | | | | | | | Since we already know if we are user/kernel before calling restrack_add, move type dependent code into the callers to make the flow more readable. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/core: Simplify restrack interfaceLeon Romanovsky2019-01-303-9/+25
| | | | | | | | | | | | | | | | | | | | In the current implementation, we have one restrack root per-device and all users are simply providing it directly. Let's simplify the interface and have callers provide the ib_device and internally access the restrack_root. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/nldev: Prepare CAP_NET_ADMIN checks for .doit callbacksLeon Romanovsky2019-01-301-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | The .doit callbacks don't have a netlink_callback to check capabilities so in order to use the same fill_res_func for both .dump and .doit, we need to do the capability check outside of those functions. For .doit callbacks, it is possible to check CAP_NET_ADMIN directly on the received sk_buff. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/nldev: Factor out the PID namespace checkLeon Romanovsky2019-01-301-10/+12
| | | | | | | | | | | | | | | | The PID namespace is going to be used in the .doit callback, so generalize its implementation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/nldev: Dynamically generate restrack dumpit callbacksLeon Romanovsky2019-01-301-28/+11
| | | | | | | | | | | | | | | | | | | | | | | | There is no need to manually write same callbacks, automatically generate them using C-macro language. This macro is going to be extended to generate doit callbacks too, so use general name for this macro. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA: Add indication for in kernel API support to IB deviceGal Pressman2019-01-302-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers that do not provide kernel verbs support should not be used by ib kernel clients at all. In case a device does not implement all mandatory verbs for kverbs usage mark it as a non kverbs provider and prevent its usage for all clients except for uverbs. The device is marked as a non kverbs provider using the 'kverbs_provider' flag which should only be set by the core code. The clients can choose whether kverbs are requested for its usage using the 'no_kverbs_req' flag which is currently set for uverbs only. This patch allows drivers to remove mandatory verbs stubs and simply set the callbacks to NULL. The IB device will be registered as a non-kverbs provider. Note that verbs that are required for the device registration process must be implemented. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA: Provide safe ib_alloc_device() functionLeon Romanovsky2019-01-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | All callers to ib_alloc_device() provide a larger size than struct ib_device and rely on the fact that struct ib_device is embedded in their driver specific structure as the first member. Provide a safer variant of ib_alloc_device() that checks and enforces this approach to make sure the drivers are using it right. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | Merge branch 'devx-async' into k.o/for-nextJason Gunthorpe2019-01-292-5/+11
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yishai Hadas says: Enable DEVX asynchronous query commands This series enables querying a DEVX object in an asynchronous mode. The userspace application won't block when calling the firmware and it will be able to get the response back once that it will be ready. To enable the above functionality: - DEVX asynchronous command completion FD object was introduced. - The applicable file operations were implemented to enable using it by the user application. - Query asynchronous method was added to the DEVX object, it will call the firmware asynchronously and manages the response on the given input FD. - Hot unplug support was added for the FD to work properly upon unbind/disassociate. - mlx5 core fence for asynchronous commands was implemented and used to prevent racing upon unbind/disassociate. This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'devx-async': IB/mlx5: Implement DEVX hot unplug for async command FD IB/mlx5: Implement the file ops of DEVX async command FD IB/mlx5: Introduce async DEVX obj query API IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FDYishai Hadas2019-01-292-5/+11
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation. This object is from type class FD and will be used to read DEVX async commands completion. The core layer should allow the driver to set object from type FD in a safe mode, this option was added with a matching comment in place. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Destroy QP if XRC QP failsYuval Avnery2019-01-241-17/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The open-coded variant missed destroy of SELinux created QP, reuse already existing ib_detroy_qp() call and use this opportunity to clean ib_create_qp() from double prints and unclear exit paths. Reported-by: Parav Pandit <parav@mellanox.com> Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/mlx5: Ranges in implicit ODP MR inherit its write accessMoni Shoua2019-01-241-2/+3
| | | | | | | | | | | | | | | | | | | | | | A sub-range in ODP implicit MR should take its write permission from the MR and not be set always to allow. Fixes: d07d1d70ce1a ("IB/umem: Update on demand page (ODP) support") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Declare local functions 'static'Bart Van Assche2019-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | This patch avoids that sparse complains about missing function declarations. Fixes: f27a0d50a4bc ("RDMA/umem: Use umem->owning_mm inside ODP") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/umad: Do not check status of nonseekable_open()Parav Pandit2019-01-241-23/+13
| | | | | | | | | | | | | | | | | | | | | | | | As the comment block of nonseekable_open() describes, nonseekable_open() can never fail. Several places in kernel depend on this behavior. Therefore, simplify the umad module to depend on this basic kernel functionality. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/umad: Avoid additional device reference during open()/close()Parav Pandit2019-01-231-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ib_umad_init_port_dev() holds the reference of a ib_umad_device instance. ib_umad_device contains standard core device and cdev. cdev holds the reference of its parent core device. file ops holds the reference to cdev using core kernel. Therefore, there is no need to hold additional reference while opening umad related char devices. While at it, add comments to bring clarity on releasing references to ib_umd_device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: Simplify rdma cgroup registrationParav Pandit2019-01-183-16/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | RDMA cgroup registration routine always returns success, so simplify function to be void and run clang formatter over whole CONFIG_CGROUP_RDMA art of core_priv.h. This reduces unwinding error path for regular registration and future net namespace change functionality for rdma device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA/device: Use __ib_device_get_by_name() in ib_device_rename()Jason Gunthorpe2019-01-181-6/+3
| | | | | | | | | | | | | | No reason to open code this loop. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
OpenPOWER on IntegriCloud