summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mlx5/mlx5_ib.h
Commit message (Collapse)AuthorAgeFilesLines
...
* RDMA: Constify the argument of the work request conversion functionsBart Van Assche2018-07-301-1/+1
| | | | | | | | | | | | When posting a send work request, the work request that is posted is not modified by any of the RDMA drivers. Make this explicit by constifying most ib_send_wr pointers in RDMA transport drivers. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Enable driver uapi commands for flow steeringYishai Hadas2018-07-241-6/+6
| | | | | | | | | Expose the mlx5 flow steering parsing trees, exposing the functionality to user space. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Support adding flow steering rule by raw descriptionYishai Hadas2018-07-241-0/+2
| | | | | | | | | | | | | | | Add support to set a public flow steering rule when its destination is a TIR by using raw specification data. The logic follows the verbs API but instead of using ib_spec(s) the raw, device specific, description is used. This allows supporting specialty matchers without having to define new matches in the verbs struct based language. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Introduce driver create and destroy flow methodsYishai Hadas2018-07-241-0/+15
| | | | | | | | | | | | | | | | | | Introduce driver create and destroy flow methods on the uverbs flow object. This allows the driver to get its specific device attributes to match the underlay specification while still using the generic ib_flow object for cleanup and code sharing. The IB object's attributes are set via the ib_set_flow() helper function. The specific implementation for the given specification is added in downstream patches. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Introduce flow steering matcher uapi objectYishai Hadas2018-07-241-0/+11
| | | | | | | | | | | | | | | Introduce flow steering matcher object and its create and destroy methods. This matcher object holds some mlx5 specific driver properties that matches the underlay device specification when an mlx5 flow steering group is created. It will be used in downstream patches to be part of mlx5 specific create flow method. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/mlx5: Check that supplied blue flame index doesn't overflowLeon Romanovsky2018-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User's supplied index is checked again total number of system pages, but this number already includes num_static_sys_pages, so addition of that value to supplied index causes to below error while trying to access sys_pages[]. BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400 Read of size 4 at addr ffff880065561904 by task syz-executor446/314 CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ #256 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xef/0x17e print_address_description+0x83/0x3b0 kasan_report+0x18d/0x4d0 bfregn_to_uar_index+0x34f/0x400 create_user_qp+0x272/0x227d create_qp_common+0x32eb/0x43e0 mlx5_ib_create_qp+0x379/0x1ca0 create_qp.isra.5+0xc94/0x22d0 ib_uverbs_create_qp+0x21b/0x2a0 ib_uverbs_write+0xc2c/0x1010 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x433679 Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679 RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003 RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8 R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000 R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006 Allocated by task 314: kasan_kmalloc+0xa0/0xd0 __kmalloc+0x1a9/0x510 mlx5_ib_alloc_ucontext+0x966/0x2620 ib_uverbs_get_context+0x23f/0xa60 ib_uverbs_write+0xc2c/0x1010 __vfs_write+0x10d/0x720 vfs_write+0x1b0/0x550 ksys_write+0xc6/0x1a0 do_syscall_64+0xa7/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1: __kasan_slab_free+0x12e/0x180 kfree+0x159/0x630 kvfree+0x37/0x50 single_release+0x8e/0xf0 __fput+0x2d8/0x900 task_work_run+0x102/0x1f0 exit_to_usermode_loop+0x159/0x1c0 do_syscall_64+0x408/0x590 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff880065561100 which belongs to the cache kmalloc-4096 of size 4096 The buggy address is located 2052 bytes inside of 4096-byte region [ffff880065561100, ffff880065562100) The buggy address belongs to the page: page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0 flags: 0x4000000000008100(slab|head) raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480 raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Cc: <stable@vger.kernel.org> # 4.15 Fixes: 1ee47ab3e8d8 ("IB/mlx5: Enable QP creation with a given blue flame index") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/mlx5: Melt consecutive calls to alloc_bfreg() in one callLeon Romanovsky2018-07-131-6/+0
| | | | | | | | There is no need for three consecutive calls to alloc_bfreg(). It can be implemented with one function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Add support for drain SQ & RQYishai Hadas2018-06-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch follows the logic from ib_core but considers the internal device state upon executing the involved commands. Specifically, Upon internal error state modify QP to an error state can be assumed to be success as each in-progress WR going to be flushed in error in any case as expected by that modify command. In addition, As the drain should never fail the driver makes sure that post_send/recv will succeed even if the device is already in an internal error state. As such once the driver will supply the simulated/SW CQEs the CQE for the drain WR will be handled as well. In case of an internal error state the CQE for the drain WR may be completed as part of the main task that handled the error state or by the task that issued the drain WR. As the above depends on scheduling the code takes the relevant locks and actions to make sure that the completion handler for that WR will always be called after that the post_send/recv were issued but not in parallel to the other task that handles the error flow. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* Merge branch 'icrc-counter' into rdma.git for-nextJason Gunthorpe2018-06-221-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For dependencies, branch based on 'mellanox/mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git Pull RoCE ICRC counters from Leon Romanovsky: ==================== This series exposes RoCE ICRC counter through existing RDMA hw_counters sysfs interface. The first patch has all HW definitions in mlx5_ifc.h file and second patch is the actual counter implementation. ==================== * branch 'icrc-counter': IB/mlx5: Support RoCE ICRC encapsulated error counter net/mlx5: Add RoCE RX ICRC encapsulated counter
| * IB/mlx5: Support RoCE ICRC encapsulated error counterTalat Batheesh2018-06-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to query the counter that counts the RoCE packets with corrupted ICRC (Invariant Cyclic Redundancy Code). This counter will be under /sys/class/infiniband/<mlx5-dev>/ports/<port>/hw_counters/ rx_icrc_encapsulated - The number of RoCE packets with ICRC error. Signed-off-by: Talat Batheesh <talatb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/mlx5: Expose DEVX treeYishai Hadas2018-06-191-0/+3
| | | | | | | | | | | | | | | | Expose DEVX tree to be used by upper layers. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/mlx5: Add support for DEVX query UARYishai Hadas2018-06-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return a device UAR index for a given user index via the DEVX interface. Security note: The hardware protection mechanism works like this: Each device object that is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in the device specification manual) upon its creation. Then upon doorbell, hardware fetches the object context for which the doorbell was rang, and validates that the UAR through which the DB was rang matches the UAR ID of the object. If no match the doorbell is silently ignored by the hardware. Of course, the user cannot ring a doorbell on a UAR that was not mapped to it. Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command mailboxes (except tagging them with UID), we expose to the user its UAR ID, so it can embed it in these objects in the expected specification format. So the only thing the user can do is hurt itself by creating a QP/SQ/CQ with a UAR ID other than his, and then in this case other users may ring a doorbell on its objects. The consequence of that will be that another user can schedule a QP/SQ of the buggy user for execution (just insert it to the hardware schedule queue or arm its CQ for event generation), no further harm is expected. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/mlx5: Introduce DEVXYishai Hadas2018-06-191-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce DEVX to enable direct device commands in downstream patches from this series. In that mode of work the firmware manages the isolation between processes' resources and as such a DEVX user id is created and assigned to the given user context upon allocation request. A capability check is done to make sure that this feature is really supported by the firmware prior to creating the DEVX user id. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | RDMA: Convert drivers to use sgid_attr instead of sgid_indexParav Pandit2018-06-181-4/+2
|/ | | | | | | | | | | | | The core code now ensures that all driver callbacks that receive an rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present. Drivers can use this pointer instead of calling a query function with sgid_index. This simplifies the drivers and also avoids races where a gid_index lookup may return different data if it is changed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
* IB/mlx5: Add flow counters read supportRaed Salem2018-06-021-1/+12
| | | | | | | | | Implements the flow counters read wrapper. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Add flow counters binding supportRaed Salem2018-06-021-0/+15
| | | | | | | | | | | | | | Associates a counters with a flow when IB_FLOW_SPEC_ACTION_COUNT is part of the flow specifications. The counters user space placements of location and description (index, description) pairs are passed as private data of the counters flow specification. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Add counters create and destroy supportRaed Salem2018-06-021-0/+10
| | | | | | | | | | | | | This patch implements the device counters create and destroy APIs and introducing some internal management structures. Downstream patches in this series will add the functionality to support flow counters binding and reading. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Device memory mr registration supportAriel Levkovich2018-04-051-0/+9
| | | | | | | | | | Adding mlx5_ib driver implementation for reg_dm_mr callback which allows registering device memory (DM) as an MR for local and remote access. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Device memory support in mlx5_ibAriel Levkovich2018-04-051-1/+34
| | | | | | | | | | | | | | | | | | This patch adds the mlx5_ib driver implementation for the device memory allocation API. It implements the ib_device callbacks for allocation and deallocation operations as well as a new mmap command support which allows mapping an allocated device memory to a VMA. The change also adds reporting of device memory maximum size and alignment parameters reported in device capabilities. The allocation/deallocation operations are using new firmware commands to allocate MEMIC memory on the device. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Add IPsec support for egress and ingressAviad Yehezkel2018-04-041-0/+2
| | | | | | | | | | | | This commit introduces support for the esp_aes_gcm flow specification for the Innova device. To that end we add support for egress steering and some validations that an IPsec rule is indeed valid. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Add implementation for create and destroy action_xfrmAviad Yehezkel2018-04-041-0/+16
| | | | | | | | | | | | | | | Adding implementation in mlx5 driver to create and destroy action_xfrm object. This merely call the accel layer. A user may pass MLX5_IB_XFRM_FLAGS_REQUIRE_METADATA flag which states that [s]he expects a metadata header to be added to the payload. This header represents information regarding the transformation's state. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Initialize the parsing tree root without the help of uverbsMatan Barak2018-04-041-0/+1
| | | | | | | | | | | | | | | | In order to have a custom parsing tree, a provider driver needs to assign its parsing tree to ib_device specs_tree field. Otherwise, the uverbs client assigns a common default parsing tree for it. In downstream patches, the mlx5_ib driver gains a custom parsing tree, which contains both the common objects and a new flags field for the UVERBS_FLOW_ACTION_ESP_CREATE command. This patch makes mlx5_ib assign its own tree to specs_root, which later on will be extended. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Packet packing enhancement for RAW QPBodong Wang2018-03-191-1/+1
| | | | | | | | | | | | | | | | | | Enable RAW QP to be able to configure burst control by modify_qp. By using burst control with rate limiting, user can achieve best performance and accuracy. The burst control information is passed by user through udata. This patch also reports burst control capability for mlx5 related hardwares, burst control is only marked as supported when both packet_pacing_burst_bound and packet_pacing_typical_size are supported. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* Merge branch 'k.o/wip/dl-for-rc' into k.o/wip/dl-for-nextDoug Ledford2018-03-141-3/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Fix cleanup order on unloadMark Bloch2018-03-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On load we create private CQ/QP/PD in order to be used by UMR, we create those resources after we register ourself as an IB device, and we destroy them after we unregister as an IB device. This was changed by commit 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") which moved the destruction before we unregistration. This allowed to trigger an invalid memory access when unloading mlx5_ib while there are open resources: BUG: unable to handle kernel paging request at 00000001002c012c ... Call Trace: mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib] __slab_free+0x9a/0x2d0 delay_time_func+0x10/0x10 [mlx5_ib] unreg_umr.isra.15+0x4b/0x50 [mlx5_ib] mlx5_mr_cache_free+0x46/0x150 [mlx5_ib] clean_mr+0xc9/0x190 [mlx5_ib] dereg_mr+0xba/0xf0 [mlx5_ib] ib_dereg_mr+0x13/0x20 [ib_core] remove_commit_idr_uobject+0x16/0x70 [ib_uverbs] uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs] ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs] ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs] ib_unregister_device+0xd4/0x190 [ib_core] __mlx5_ib_remove+0x2e/0x40 [mlx5_ib] mlx5_remove_device+0xf5/0x120 [mlx5_core] mlx5_unregister_interface+0x37/0x90 [mlx5_core] mlx5_ib_cleanup+0xc/0x225 [mlx5_ib] SyS_delete_module+0x153/0x230 do_syscall_64+0x62/0x110 entry_SYSCALL_64_after_hwframe+0x21/0x86 ... We restore the original behavior by breaking the UMR stage into two parts, pre and post IB registration stages, this way we can restore the original functionality and maintain clean separation of logic between stages. Fixes: 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | IB/mlx5: Maintain a single emergency pageIlya Lesokhin2018-03-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The mlx5 driver needs to be able to issue invalidation to ODP MRs even if it cannot allocate memory. To this end it preallocates emergency pages to use when the situation arises. This flow should be extremely rare enough, that we don't need to worry about contention and therefore a single emergency page is good enough. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | IB/mlx5: Add proper representors supportMark Bloch2018-02-231-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds full support for IB representor: 1) Representors profile, We add two new profiles: nic_rep_profile - This profile will be used to create an IB device that represents the PF/UPLINK. rep_profile - This profile will be used to create an IB device that represents VFs. Each VF will be its own representor. 2) Proper load/unload callbacks, Those are called by the E-Switch when moving to/from switchdev mode. 3) Different flow DB handling for when we in switchdev mode. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | IB/mlx5: E-Switch, Add rule to forward traffic to vportMark Bloch2018-02-231-0/+1
| | | | | | | | | | | | | | | | | | | | In order to forward traffic from representor's SQ to the right virtual function, every time an SQ is created also add the corresponding flow rule to the FDB. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | IB/mlx5: When in switchdev mode, expose only raw packet capabilitiesMark Bloch2018-02-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Currently in switchdev mode we allow only for raw packet QPs. Expose the right capabilities and set the gid table length to 0, also make sure we don't try to enable RoCE, so split the function to enable RoCE so representors can enable only the notifier needed for net device events. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | IB/mlx5: Allocate flow DB only on PF IB deviceMark Bloch2018-02-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | A flow DB is a shared resource between PF and representors, need to allocate it only when creating the PF IB device. Once we add IB representors, they will use the flow db which was created by the PF. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | IB/mlx5: Add basic regiser/unregister representors codeMark Bloch2018-02-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | Create the basic infrastructure of registering and unregistering IB representors. The load/unload callbacks are left empty and proper implementation will be introduced in following patches. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | IB/mlx5: Implement fragmented completion queue (CQ)Yonatan Cohen2018-02-151-3/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation of create CQ requires contiguous memory, such requirement is problematic once the memory is fragmented or the system is low in memory, it causes for failures in dma_zalloc_coherent(). This patch implements new scheme of fragmented CQ to overcome this issue by introducing new type: 'struct mlx5_frag_buf_ctrl' to allocate fragmented buffers, rather than contiguous ones. Base the Completion Queues (CQs) on this new fragmented buffer. It fixes following crashes: kworker/29:0: page allocation failure: order:6, mode:0x80d0 CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0 Workqueue: ib_cm cm_work_handler [ib_cm] Call Trace: [<>] dump_stack+0x19/0x1b [<>] warn_alloc_failed+0x110/0x180 [<>] __alloc_pages_slowpath+0x6b7/0x725 [<>] __alloc_pages_nodemask+0x405/0x420 [<>] dma_generic_alloc_coherent+0x8f/0x140 [<>] x86_swiotlb_alloc_coherent+0x21/0x50 [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core] [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core] [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core] [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core] [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib] [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib] Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* RDMA: Move enum ib_cq_creation_flags to uapi headersJason Gunthorpe2018-01-291-2/+2
| | | | | | | | The flags field the enum is used with comes directly from the uapi so it belongs in the uapi headers for clarity and so userspace can use it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Mmap the HCA's clock info to user-spaceFeras Daoud2018-01-181-10/+0
| | | | | | | | | | | | | | | | | This patch maps the new page to user space applications to allow converting a user space completion timestamp to system wall time at the lowest possible latency cost. By using a versioning scheme we allow compatibility between current and future userspace libraries. The change moves mlx5_ib_mmap_cmd enum from mlx5_ib.h to the abi header file mlx5-abi.h. Reviewed-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/mlx5: Update counter implementation for dual port RoCEDaniel Jurgens2018-01-081-0/+1
| | | | | | | | | | | | | Update the counter interface for multiple ports. Some counter sets always comes from the primary device. Port specific counters should be accessed per mlx5_core_dev not always through the IB master mdev. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Change debugfs to have per port contentsParav Pandit2018-01-081-3/+4
| | | | | | | | | | When there are multiple ports for single IB(RoCE) device, support debugfs entries to be available for each port. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* {net, IB}/mlx5: Manage port association for multiport RoCEDaniel Jurgens2018-01-081-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When mlx5_ib_add is called determine if the mlx5 core device being added is capable of dual port RoCE operation. If it is, determine whether it is a master device or a slave device using the num_vhca_ports and affiliate_nic_vport_criteria capabilities. If the device is a slave, attempt to find a master device to affiliate it with. Devices that can be affiliated will share a system image guid. If none are found place it on a list of unaffiliated ports. If a master is found bind the port to it by configuring the port affiliation in the NIC vport context. Similarly when mlx5_ib_remove is called determine the port type. If it's a slave port, unaffiliate it from the master device, otherwise just remove it from the unaffiliated port list. The IB device is registered as a multiport device, even if a 2nd port is not available for affiliation. When the 2nd port is affiliated later the GID cache must be refreshed in order to get the default GIDs for the 2nd port in the cache. Export roce_rescan_device to provide a mechanism to refresh the cache after a new port is bound. In a multiport configuration all IB object (QP, MR, PD, etc) related commands should flow through the master mlx5_core_dev, other commands must be sent to the slave port mlx5_core_mdev, an interface is provide to get the correct mdev for non IB object commands. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Make netdev notifications multiport capableDaniel Jurgens2018-01-081-1/+3
| | | | | | | | | | | When multiple RoCE ports are supported registration for events on multiple netdevs is required. Refactor the event registration and handling to support multiple ports. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Add support for DC target QPMoni Shoua2018-01-081-0/+2
| | | | | | | | | | | | | | | | | | | A DC Target (DCT) QP is represented in the hardware as a unique object. This object is created by CREATE_DCT command and destroyed by DESTROY_DCT command. However, in the driver we describe it as a QP. The hardware command that creates a DCT needs parameters that the verb create_qp() does not provide. Those remaining parameters are provided with the call to the verb modify_qp(). Therefore we delay the actual creation of a DCT in the hardware until the stage of modify_qp() to RTR. A support for query_qp() was added as well. It uses QUERY_DCT command to retrieve the applicable fields. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Handle type IB_QPT_DRIVER when creating a QPMoni Shoua2018-01-081-0/+10
| | | | | | | | | | | | | | | | | | | The QP type IB_QPT_DRIVER doesn't describe the transport or the service that the QP provides but those are known only to the hardware driver. The actual type of the QP is stored in the hardware driver context (i.e. mlx5_qp) under the field qp_sub_type. Take the real QP type and any extra data that is required to create the QP from the driver channel and modify the QP initial attributes before continuing with create_qp(). Downstream patches from this series will add support for both DCI and DCT driver QPs. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Move locks initialization to the corresponding stageMark Bloch2018-01-031-2/+0
| | | | | | | | | | Unconditional locks/list and ODP srcu initialization should be done in the INIT stage. Remove those from the CAPS stage and move them to the proper stage. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Move loopback initialization to the corresponding stageMark Bloch2018-01-031-1/+0
| | | | | | | | | The loopback stage only initializes a lock, move it to be in the CAPS initialization phase and get rid loopback step completely. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Create profile infrastructure to add and remove stagesMark Bloch2018-01-031-0/+31
| | | | | | | | | | | | | | | Today we have single function which is used when we add an IB interface, break this function into multiple functions. Create stages and a generic mechanism to execute each stage. This is in preparation for RDMA/IB representors which might not need all stages or will do things differently in some of the stages. This patch doesn't change any functionality. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Enable QP creation with a given blue flame indexYishai Hadas2017-12-281-0/+1
| | | | | | | | | | This patch enables QP creation with a given BF index, this allows the user space driver to share same BF between few QPs or alternatively have a dedicated BF per QP. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Expose dynamic mmap allocationYishai Hadas2017-12-281-0/+8
| | | | | | | | | | | | | | | | | This patch exposes the option to dynamic allocates a UAR, this functionality will be used in downstream patch in this series as part of QP creation. Specifically, the user space driver asks for a UAR allocation in a given page index, upon success this UAR and its bfregs can be used as part of QP creation by the user space driver. To enable allocating more than 256 UARs the page index is encoded in an extra one byte just after the command byte. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Extend UAR stuff to support dynamic allocationYishai Hadas2017-12-281-3/+3
| | | | | | | | | | | | | | | | | | | | | This patch extends the alloc context flow to be prepared for working with dynamic UAR allocations. Currently upon alloc context there is some fix size of UARs that are allocated (named 'static allocation') and there is no option to user application to ask for more or control which UAR will be used by which QP. In this patch the driver prepares its data structures to manage both the static and the dynamic allocations and let the user driver knows about the max value of dynamic blue-flame registers that are allowed. Downstream patches from this series will enable the dynamic allocation and the association as part of QP creation. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Serialize access to the VMA listMajd Dibbiny2017-12-271-0/+4
| | | | | | | | | | | | | | | | User-space applications can do mmap and munmap directly at any time. Since the VMA list is not protected with a mutex, concurrent accesses to the VMA list from the mmap and munmap can cause data corruption. Add a mutex around the list. Cc: <stable@vger.kernel.org> # v4.7 Fixes: 7c2344c3bbf9 ("IB/mlx5: Implements disassociate_ucontext API") Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/mlx5: Add PCI write end padding supportNoa Osherovich2017-11-101-1/+2
| | | | | | | | | | | | | | | | | Add the PCI write end padding flag to device_cap_flags enum and set it during mlx5_ib_query_device so it will be reported to user-space. During WQ/QP creation, set that capability for WQ/QP if user requested it and HW supports it. PCI write end padding modification is not supported for now. There's no such flag for a QP but for a WQ, create and modify use the same flag. Return an error if PCI write end padding flag is set during modify_wq. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/mlx5: Add tunneling offloads supportMaor Gottlieb2017-10-251-0/+3
| | | | | | | | | | | | | | | | | | | | The device can support receive Stateless Offloads for the inner packet's fields only when the packet is processed by TIR which is enabled to support tunneling. Otherwise, the device treats the packet as an ordinary non-tunneling packet and receive offloads can be done only for the outer packet's field. In order to enable receive Stateless Offloading support for incoming tunneling traffic the TIR should be created with tunneled_offload_en. Tunneling offloads is supported only be raw ethernet QP. This patch includes: * New QP creation flag for tunneling offloads. * Reports device capabilities. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/mlx5: Support padded 128B CQE featureGuy Levi2017-10-251-0/+5
| | | | | | | | | | | | | | In some benchmarks and some CPU architectures, writing the CQE on a full cache line size improves performance by saving memory access operations (read-modify-write) relative to partial cache line change. This patch lets the user to configure the device to pad the CQE up to 128B in case its content is less than 128B. Currently the driver supports only padding for a CQE size of 128B. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
OpenPOWER on IntegriCloud