| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that we have a small ID for each client we can use xarray instead of
linearly searching linked lists for client data. This will give much
faster and scalable client data lookup, and will lets us revise the
locking scheme.
Since xarray can store 'going_down' using a mark just entirely eliminate
the struct ib_client_data and directly store the client_data value in the
xarray. However this does require a special iterator as we must still
iterate over any NULL client_data values.
Also eliminate the client_data_lock in favour of internal xarray locking.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This gives each client a unique ID and will let us move client_data to use
xarray, and revise the locking scheme.
clients have to be add/removed in strict FIFO/LIFO order as they
interdepend. To support this the client_ids are assigned to increase in
FIFO order. The existing linked list is kept to support reverse iteration
until xarray can get a reverse iteration API.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
ida is the proper data structure to hold list of clustered small integers
and then allocate an unused integer. Get rid of the convoluted and limited
open-coded bitmap.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This really has no purpose anymore, refcount can be used to tell if the
device is still registered. Keeping it around just invites mis-use.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of complicated logic about when this memory is freed, always free
it during device release(). All the cache pointers start out as NULL, so
it is safe to call this before the cache is initialized.
This makes for a simpler error unwind flow, and a simpler understanding of
the lifetime of the memory allocations inside the struct ib_device.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Since this only frees memory it should be done during the release
callback. Otherwise there are possible error flows where it might not get
called if registration aborts.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since another rename could be running in parallel it is safer to check
that the name is not changing inside the lock, where we already know the
device name will not change.
Fixes: d21943dd19b5 ("RDMA/core: Implement IB device rename function")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The PD allocations in IB/core allows us to simplify drivers and their
error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand
in had with relevant update in .dealloc_pd().
We will use this opportunity and convert .dealloc_pd() to don't fail, as
it was suggested a long time ago, failures are not happening as we have
never seen a WARN_ON print.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add new macros to be used in drivers while registering ops structure and
IB/core while calling allocation routines, so drivers won't need to
perform kzalloc/kfree in their paths.
The change in allocation stage allows us to initialize common fields prior
to calling to drivers (e.g. restrack).
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When creating many MAD agents in a short period of time, receive packet
processing can be delayed long enough to cause timeouts while new agents
are being added to the atomic notifier chain with IRQs disabled. Notifier
chain registration and unregstration is an O(n) operation. With large
numbers of MAD agents being created and destroyed simultaneously the CPUs
spend too much time with interrupts disabled.
Instead of each MAD agent registering for it's own LSM notification,
maintain a list of agents internally and register once, this registration
already existed for handling the PKeys. This list is write mostly, so a
normal spin lock is used vs a read/write lock. All MAD agents must be
checked, so a single list is used instead of breaking them down per
device.
Notifier calls are done under rcu_read_lock, so there isn't a risk of
similar packet timeouts while checking the MAD agents security settings
when notified.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the MAD agents isn't allowed to manage the subnet, or fails to register
for the LSM notifier, the security context is leaked. Free the context in
these cases.
Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reported-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the notifier runs after the security context is freed an access of
freed memory can occur.
Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| | |
This allows drivers to know the tos was actively set by the application.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If a user binds to INADDR_ANY and sets the service id, then the
device-specific cm_ids should also use this tos. This allows an app to
do:
rdma_bind_addr(INADDR_ANY)
set_service_type()
rdma_listen()
And connections setup via this listening endpoint will use the correct
tos.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Define new option in 'rdma_set_option' to override calculated QP timeout
when requested to provide QP attributes to modify a QP.
At the same time, pack tos_set to be bitfield.
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
ib_umem_get() uses gup_longterm() and relies on the lock to stabilze the
vma_list, so we cannot really get rid of mmap_sem altogether, but now that
the counter is atomic, we can get of some complexity that mmap_sem brings
with only pinned_vm.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Taking a sleeping lock to _only_ increment a variable is quite the
overkill, and pretty much all users do this. Furthermore, some drivers
(ie: infiniband and scif) that need pinned semantics can go to quite
some trouble to actually delay via workqueue (un)accounting for pinned
pages when not possible to acquire it.
By making the counter atomic we no longer need to hold the mmap_sem and
can simply some code around it for pinned_vm users. The counter is 64-bit
such that we need not worry about overflows such as rdma user input
controlled from userspace.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move the iwpm kdoc comments from the prototype declarations to above
the function bodies. There are no functional changes in this patch.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Netlink statistics exported by rdma-cm never had any working user space
component published to the mailing list or to any open source
project. Canvassing various proprietary users, and the original requester,
we find that there are no real users of this interface.
This patch simply removes all occurrences of RDMA CM netlink in favour of
modern nldev implementation, which provides the same information and
accompanied by widely used user space component.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A soft iwarp driver that uses the host TCP stack via a kernel mode socket
does not need port mapping. In fact, if the port map daemon, iwpmd, is
running, then iwpmd must not try and create/bind a socket to the actual
port for a soft iwarp connection, since the driver already has that socket
bound.
Yet if the soft iwarp driver wants to interoperate with hard iwarp devices
that -are- using port mapping, then the soft iwarp driver's mappings still
need to be maintained and advertised by the iwpm protocol.
This patch enhances the rdma driver<->iwcm interface to allow an iwarp
driver to specify that it does not want port mapping. The iwpm
kernel<->iwpmd interface is also enhanced to pass up this information on
map requests.
Care is taken to interoperate with the current iwpmd version (ABI version
3) and only use the new NL attributes if iwpmd supports ABI version 4.
The ABI version define has also been created in rdma_netlink.h so both
kernel and user code can share it. The iwcm and iwpmd negotiate the ABI
version to use with a new HELLO netlink message.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to add new IWPM_NL attributes, the enums for the IWPM commands
attributes are refactored such that a new attribute can be added without
breaking ABI version 3. Instead of sharing nl attribute enums for both
request and response messages, we create separate enums for each IWPM
message request and reply. This allows us to extend any given IWPM
message by adding new attributes for just that message. These new enums
are created, though, in a way to avoid breaking ABI version 3.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|\|
| |
| |
| |
| |
| |
| |
| | |
Linux 5.0-rc5
Needed to merge the include/uapi changes so we have an up to date
single-tree for these files. Patches already posted are also expected to
need this for dependencies.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The vma->vm_mm can become impossible to get before rdma_umap_close() is
called, in this case we must not try to get an mm that is already
undergoing process exit. In this case there is no need to wait for
anything as the VMA will be destroyed by another thread soon and is
already effectively 'unreachable' by userspace.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 2050 Comm: bash Tainted: G W OE 4.20.0-rc6+ #3
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__rb_erase_color+0xb9/0x280
Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89
58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d
10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f
RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246
RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101
RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828
RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838
R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000
FS: 00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0
Call Trace:
unlink_file_vma+0x3b/0x50
free_pgtables+0xa1/0x110
exit_mmap+0xca/0x1a0
? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib]
mmput+0x54/0x140
uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs]
uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs]
ib_uverbs_remove_one+0xea/0x240 [ib_uverbs]
ib_unregister_device+0xfb/0x200 [ib_core]
mlx5_ib_remove+0x51/0xe0 [mlx5_ib]
mlx5_remove_device+0xc1/0xd0 [mlx5_core]
mlx5_unregister_device+0x3d/0xb0 [mlx5_core]
remove_one+0x2a/0x90 [mlx5_core]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x16d/0x240
unbind_store+0xb2/0x100
kernfs_fop_write+0x102/0x180
__vfs_write+0x36/0x1a0
? __alloc_fd+0xa9/0x170
? set_close_on_exec+0x49/0x70
vfs_write+0xad/0x1a0
ksys_write+0x52/0xc0
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Cc: <stable@vger.kernel.org> # 4.19
Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Methods cannot peak into the ufile, the only way to get a ucontext and
hence a device is via the ib_uverbs_get_ucontext() call or inspecing a
locked uobject.
Otherwise during/after disassociation the pointers may be null or free'd.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
PGD 800000005ece6067 P4D 800000005ece6067 PUD 5ece7067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 10631 Comm: ibv_ud_pingpong Tainted: GW OE 4.20.0-rc6+ #3
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:ib_uverbs_handler_UVERBS_METHOD_QUERY_PORT+0x53/0x191 [ib_uverbs]
Code: 80 00 00 00 31 c0 48 8b 47 40 48 8d 5c 24 38 48 8d 6c 24
08 48 89 df 48 8b 40 08 4c 8b a0 18 03 00 00 31 c0 f3 48 ab 48 89
ef <49> 83 7c 24 78 00 b1 06 f3 48 ab 0f 84 89 00 00 00 45 31 c9 31 d2
RSP: 0018:ffffb54802ccfb10 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffb54802ccfb48 RCX:0000000000000000
RDX: fffffffffffffffa RSI: ffffb54802ccfcf8 RDI:ffffb54802ccfb18
RBP: ffffb54802ccfb18 R08: ffffb54802ccfd18 R09:0000000000000000
R10: 0000000000000000 R11: 00000000000000d0 R12:0000000000000000
R13: ffffb54802ccfcb0 R14: ffffb54802ccfc48 R15:ffff9f736e0059a0
FS: 00007f55a6bd7740(0000) GS:ffff9f737ba00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000078 CR3: 0000000064214000 CR4:00000000000006f0
Call Trace:
ib_uverbs_cmd_verbs.isra.5+0x94d/0xa60 [ib_uverbs]
? copy_port_attr_to_resp+0x120/0x120 [ib_uverbs]
? arch_tlb_finish_mmu+0x16/0xc0
? tlb_finish_mmu+0x1f/0x30
? unmap_region+0xd9/0x120
ib_uverbs_ioctl+0xbc/0x120 [ib_uverbs]
do_vfs_ioctl+0xa9/0x620
? __do_munmap+0x29f/0x3a0
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f55a62cb567
Fixes: 641d1207d2ed ("IB/core: Move query port to ioctl")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The async_file might be freed before the disassociation has been ended,
causing qp shutdown to use after free on it.
Since uverbs_destroy_ufile_hw is not a fence, it returns if a
disassociation is ongoing in another thread. It has to be written this way
to avoid deadlock. However this means that the ufile FD close cannot
destroy anything that may still be used by an active kref, such as the the
async_file.
To fix that move the kref_put() to be in ib_uverbs_release_file().
BUG: unable to handle kernel paging request at ffffffffba682787
PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061
Oops: 0003 [#1] SMP PTI
CPU: 1 PID: 32410 Comm: bash Tainted: G OE 4.20.0-rc6+ #3
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0
Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d
ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85
d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83
RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006
RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001
RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787
RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000
R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294
R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00
FS: 00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0
Call Trace:
_raw_spin_lock_irq+0x27/0x30
ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs]
uverbs_free_qp+0x7e/0x90 [ib_uverbs]
destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs]
uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs]
__uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs]
uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs]
ib_uverbs_remove_one+0xea/0x240 [ib_uverbs]
ib_unregister_device+0xfb/0x200 [ib_core]
mlx5_ib_remove+0x51/0xe0 [mlx5_ib]
mlx5_remove_device+0xc1/0xd0 [mlx5_core]
mlx5_unregister_device+0x3d/0xb0 [mlx5_core]
remove_one+0x2a/0x90 [mlx5_core]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x16d/0x240
unbind_store+0xb2/0x100
kernfs_fop_write+0x102/0x180
__vfs_write+0x36/0x1a0
? __alloc_fd+0xa9/0x170
? set_close_on_exec+0x49/0x70
vfs_write+0xad/0x1a0
ksys_write+0x52/0xc0
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fac551aac60
Cc: <stable@vger.kernel.org> # 4.2
Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When allocating a umem leaf for implicit ODP MR during page fault the
field owning_mm was not set.
Initialize and take a reference on this field to avoid kernel panic when
trying to access this field.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
PGD 800000022dfed067 P4D 800000022dfed067 PUD 22dfcf067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 634 Comm: kworker/u33:0 Not tainted 4.20.0-rc6+ #89
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
RIP: 0010:ib_umem_odp_map_dma_pages+0xf3/0x710 [ib_core]
Code: 45 c0 48 21 f3 48 89 75 b0 31 f6 4a 8d 04 33 48 89 45 a8 49 8b 44 24 60 48 8b 78 10 e8 66 16 a8 c5 49 8b 54 24 08 48 89 45 98 <8b> 42 58 85 c0 0f 84 8e 05 00 00 8d 48 01 48 8d 72 58 f0 0f b1 4a
RSP: 0000:ffffb610813a7c20 EFLAGS: 00010202
RAX: ffff95ace6e8ac80 RBX: 0000000000000000 RCX: 000000000000000c
RDX: 0000000000000000 RSI: 0000000000000850 RDI: ffff95aceaadae80
RBP: ffffb610813a7ce0 R08: 0000000000000000 R09: 0000000000080c77
R10: ffff95acfffdbd00 R11: 0000000000000000 R12: ffff95aceaa20a00
R13: 0000000000001000 R14: 0000000000001000 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff95acf7800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000058 CR3: 000000022c834001 CR4: 00000000001606f0
Call Trace:
pagefault_single_data_segment+0x1df/0xc60 [mlx5_ib]
mlx5_ib_eqe_pf_action+0x7bc/0xa70 [mlx5_ib]
? __switch_to+0xe1/0x470
process_one_work+0x174/0x390
worker_thread+0x4f/0x3e0
kthread+0x102/0x140
? drain_workqueue+0x130/0x130
? kthread_stop+0x110/0x110
ret_from_fork+0x1f/0x30
Fixes: f27a0d50a4bc ("RDMA/umem: Use umem->owning_mm inside ODP")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It turns out future patches need this capability quite widely now, not
just for netlink, so provide two global functions to manage the
registration lock refcount.
This also moves the point the lock becomes 1 to within
ib_register_device() so that the semantics of the public API are very sane
and clear. Calling ib_device_try_get() will fail on devices that are only
allocated but not yet registered.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When the ioctl interface for the write commands was introduced it did
not mark the core response with UVERBS_ATTR_F_VALID_OUTPUT. This causes
rdma-core in userspace to not mark the buffers as written for valgrind.
Along the same lines it turns out we have always missed marking the driver
data. Fixing both of these makes valgrind work properly with rdma-core and
ioctl.
Fixes: 4785860e04bc ("RDMA/uverbs: Implement an ioctl that can call write and write_ex handlers")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A recent regression causes a null ptr crash when dumping cm_id resources.
The cma is incorrectly adding all cm_id restrack resources as kernel mode.
Fixes: af8d70375d56 ("RDMA/restrack: Resource-tracker should not use uobject pointers")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Unsafe global rkey is considered dangerous because it exposes memory
registered for all memory in the system. Only users with a QP on the same
PD can use the rkey, and generally those QPs will already know the
value. However, out of caution, do not expose the value to unprivleged
users on the local system. Require CAP_NET_ADMIN instead.
Cc: <stable@vger.kernel.org> # 4.16
Fixes: 29cf1351d450 ("RDMA/nldev: provide detailed PD information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
If get QP object fails 'ret' must be assigned with a proper error code.
Fixes: 9a0738575f26 ("RDMA/uverbs: Use uverbs_response() for remaining response copying")
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Keeping single line wrapper functions is not useful. Hence remove the
ib_sg_dma_address() and ib_sg_dma_len() functions. This patch does not
change any functionality.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Expose XRC ODP capabilities as part of the extended device capabilities.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As preparation to hide rdma_restrack_root, refactor the code to use the
ops structure instead of a special callback which is hidden in
rdma_restrack_root.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Since we already know if we are user/kernel before calling restrack_add,
move type dependent code into the callers to make the flow more readable.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In the current implementation, we have one restrack root per-device and
all users are simply providing it directly. Let's simplify the interface
and have callers provide the ib_device and internally access the
restrack_root.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The .doit callbacks don't have a netlink_callback to check capabilities so
in order to use the same fill_res_func for both .dump and .doit, we need
to do the capability check outside of those functions.
For .doit callbacks, it is possible to check CAP_NET_ADMIN directly on the
received sk_buff.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The PID namespace is going to be used in the .doit callback, so generalize
its implementation.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is no need to manually write same callbacks, automatically generate
them using C-macro language.
This macro is going to be extended to generate doit callbacks too, so use
general name for this macro.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Drivers that do not provide kernel verbs support should not be used by ib
kernel clients at all.
In case a device does not implement all mandatory verbs for kverbs usage
mark it as a non kverbs provider and prevent its usage for all clients
except for uverbs.
The device is marked as a non kverbs provider using the 'kverbs_provider'
flag which should only be set by the core code. The clients can choose
whether kverbs are requested for its usage using the 'no_kverbs_req' flag
which is currently set for uverbs only.
This patch allows drivers to remove mandatory verbs stubs and simply set
the callbacks to NULL. The IB device will be registered as a non-kverbs
provider. Note that verbs that are required for the device registration
process must be implemented.
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
All callers to ib_alloc_device() provide a larger size than struct
ib_device and rely on the fact that struct ib_device is embedded in their
driver specific structure as the first member.
Provide a safer variant of ib_alloc_device() that checks and enforces this
approach to make sure the drivers are using it right.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Yishai Hadas says:
Enable DEVX asynchronous query commands
This series enables querying a DEVX object in an asynchronous mode.
The userspace application won't block when calling the firmware and it will be
able to get the response back once that it will be ready.
To enable the above functionality:
- DEVX asynchronous command completion FD object was introduced.
- The applicable file operations were implemented to enable using it by
the user application.
- Query asynchronous method was added to the DEVX object, it will call the
firmware asynchronously and manages the response on the given input FD.
- Hot unplug support was added for the FD to work properly upon
unbind/disassociate.
- mlx5 core fence for asynchronous commands was implemented and used to
prevent racing upon unbind/disassociate.
This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* branch 'devx-async':
IB/mlx5: Implement DEVX hot unplug for async command FD
IB/mlx5: Implement the file ops of DEVX async command FD
IB/mlx5: Introduce async DEVX obj query API
IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation.
This object is from type class FD and will be used to read DEVX async
commands completion.
The core layer should allow the driver to set object from type FD in a
safe mode, this option was added with a matching comment in place.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The open-coded variant missed destroy of SELinux created QP, reuse already
existing ib_detroy_qp() call and use this opportunity to clean
ib_create_qp() from double prints and unclear exit paths.
Reported-by: Parav Pandit <parav@mellanox.com>
Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A sub-range in ODP implicit MR should take its write permission from the
MR and not be set always to allow.
Fixes: d07d1d70ce1a ("IB/umem: Update on demand page (ODP) support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch avoids that sparse complains about missing function
declarations.
Fixes: f27a0d50a4bc ("RDMA/umem: Use umem->owning_mm inside ODP")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As the comment block of nonseekable_open() describes, nonseekable_open()
can never fail. Several places in kernel depend on this behavior.
Therefore, simplify the umad module to depend on this basic kernel
functionality.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
ib_umad_init_port_dev() holds the reference of a ib_umad_device instance.
ib_umad_device contains standard core device and cdev. cdev holds the
reference of its parent core device. file ops holds the reference to cdev
using core kernel.
Therefore, there is no need to hold additional reference while opening
umad related char devices.
While at it, add comments to bring clarity on releasing references to
ib_umd_device.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
RDMA cgroup registration routine always returns success, so simplify
function to be void and run clang formatter over whole CONFIG_CGROUP_RDMA
art of core_priv.h.
This reduces unwinding error path for regular registration and future net
namespace change functionality for rdma device.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
| |
| |
| |
| |
| |
| |
| | |
No reason to open code this loop.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
|