<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-op-linux/drivers/infiniband, branch master</title>
<subtitle>Talos™ II Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/talos-op-linux/atom?h=master</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-op-linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/'/>
<updated>2020-02-14T19:21:52+00:00</updated>
<entry>
<title>IB/mlx5: Use div64_u64 for num_var_hw_entries calculation</title>
<updated>2020-02-14T19:21:52+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@mellanox.com</email>
</author>
<published>2020-02-06T14:27:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=685eff513183d6d64a5f413531e683d23b8b198b'/>
<id>urn:sha1:685eff513183d6d64a5f413531e683d23b8b198b</id>
<content type='text'>
On i386:

ERROR: "__udivdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!
ERROR: "__divdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined!

Fixes: f164be8c0366 ("IB/mlx5: Extend caps stage to handle VAR capabilities")
Reported-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Acked-by: Randy Dunlap &lt;rdunlap@infradead.org&gt; # build-tested
Reported-by: Alexander Lobakin &lt;alobakin@dlink.ru&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Fix protection fault in get_pkey_idx_qp_list</title>
<updated>2020-02-13T16:31:56+00:00</updated>
<author>
<name>Leon Romanovsky</name>
<email>leonro@mellanox.com</email>
</author>
<published>2020-02-12T08:06:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=1dd017882e01d2fcd9c5dbbf1eb376211111c393'/>
<id>urn:sha1:1dd017882e01d2fcd9c5dbbf1eb376211111c393</id>
<content type='text'>
We don't need to set pkey as valid in case that user set only one of pkey
index or port number, otherwise it will be resulted in NULL pointer
dereference while accessing to uninitialized pkey list.  The following
crash from Syzkaller revealed it.

  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
  Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
  8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 &lt;0f&gt; b6 04
  02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
  RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
  RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
  RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
  RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
  R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
  R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
  FS:  00007f20777de700(0000) GS:ffff88811b100000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   port_pkey_list_insert+0xd7/0x7c0
   ib_security_modify_qp+0x6fa/0xfc0
   _ib_modify_qp+0x8c4/0xbf0
   modify_qp+0x10da/0x16d0
   ib_uverbs_modify_qp+0x9a/0x100
   ib_uverbs_write+0xaa5/0xdf0
   __vfs_write+0x7c/0x100
   vfs_write+0x168/0x4a0
   ksys_write+0xc8/0x200
   do_syscall_64+0x9c/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal
Signed-off-by: Maor Gottlieb &lt;maorg@mellanox.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
Message-Id: &lt;20200212080651.GB679970@unreal&gt;
</content>
</entry>
<entry>
<title>RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq</title>
<updated>2020-02-13T14:51:09+00:00</updated>
<author>
<name>Zhu Yanjun</name>
<email>yanjunz@mellanox.com</email>
</author>
<published>2020-02-12T07:26:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=8ac0e6641c7ca14833a2a8c6f13d8e0a435e535c'/>
<id>urn:sha1:8ac0e6641c7ca14833a2a8c6f13d8e0a435e535c</id>
<content type='text'>
When run stress tests with RXE, the following Call Traces often occur

  watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
  ...
  Call Trace:
  &lt;IRQ&gt;
  create_object+0x3f/0x3b0
  kmem_cache_alloc_node_trace+0x129/0x2d0
  __kmalloc_reserve.isra.52+0x2e/0x80
  __alloc_skb+0x83/0x270
  rxe_init_packet+0x99/0x150 [rdma_rxe]
  rxe_requester+0x34e/0x11a0 [rdma_rxe]
  rxe_do_task+0x85/0xf0 [rdma_rxe]
  tasklet_action_common.isra.21+0xeb/0x100
  __do_softirq+0xd0/0x298
  irq_exit+0xc5/0xd0
  smp_apic_timer_interrupt+0x68/0x120
  apic_timer_interrupt+0xf/0x20
  &lt;/IRQ&gt;
  ...

The root cause is that tasklet is actually a softirq. In a tasklet
handler, another softirq handler is triggered. Usually these softirq
handlers run on the same cpu core. So this will cause "soft lockup Bug".

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org
Signed-off-by: Zhu Yanjun &lt;yanjunz@mellanox.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>RDMA/mlx5: Prevent overflow in mmap offset calculations</title>
<updated>2020-02-13T14:39:23+00:00</updated>
<author>
<name>Leon Romanovsky</name>
<email>leonro@mellanox.com</email>
</author>
<published>2020-02-12T07:26:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=9b6d3bbc1335404b331f4f11dc896066bdf1c752'/>
<id>urn:sha1:9b6d3bbc1335404b331f4f11dc896066bdf1c752</id>
<content type='text'>
The cmd and index variables declared as u16 and the result is supposed to
be stored in u64. The C arithmetic rules doesn't promote "(index &gt;&gt; 8) &lt;&lt;
16" to be u64 and leaves the end result to be u16.

Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Link: https://lore.kernel.org/r/20200212072635.682689-10-leon@kernel.org
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>IB/umad: Fix kernel crash while unloading ib_umad</title>
<updated>2020-02-13T14:00:50+00:00</updated>
<author>
<name>Yonatan Cohen</name>
<email>yonatanc@mellanox.com</email>
</author>
<published>2020-02-12T07:26:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=9ea04d0df6e6541c6736b43bff45f1e54875a1db'/>
<id>urn:sha1:9ea04d0df6e6541c6736b43bff45f1e54875a1db</id>
<content type='text'>
When disassociating a device from umad we must ensure that the sysfs
access is prevented before blocking the fops, otherwise assumptions in
syfs don't hold:

	    CPU0            	        CPU1
	 ib_umad_kill_port()        ibdev_show()
	    port-&gt;ib_dev = NULL
                                      dev_name(port-&gt;ib_dev)

The prior patch made an error in moving the device_destroy(), it should
have been split into device_del() (above) and put_device() (below). At
this point we already have the split, so move the device_del() back to its
original place.

  kernel stack
  PF: error_code(0x0000) - not-present page
  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
  RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
  RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000
  RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870
  RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000
  R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40
  R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58
  FS:  00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0
  Call Trace:
   dev_attr_show+0x15/0x50
   sysfs_kf_seq_show+0xb8/0x1a0
   seq_read+0x12d/0x350
   vfs_read+0x89/0x140
   ksys_read+0x55/0xd0
   do_syscall_64+0x55/0x1b0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9:

Fixes: cf7ad3030271 ("IB/umad: Avoid destroying device while it is accessed")
Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org
Signed-off-by: Yonatan Cohen &lt;yonatanc@mellanox.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>RDMA/mlx5: Fix async events cleanup flows</title>
<updated>2020-02-13T13:48:17+00:00</updated>
<author>
<name>Yishai Hadas</name>
<email>yishaih@mellanox.com</email>
</author>
<published>2020-02-12T07:26:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=a8af8694a5e8ddaaef4bd7b6426c12b7759c846c'/>
<id>urn:sha1:a8af8694a5e8ddaaef4bd7b6426c12b7759c846c</id>
<content type='text'>
As in the prior patch, the devx code is not fully cleaning up its
event_lists before finishing driver_destroy allowing a later read to
trigger user after free conditions.

Re-arrange things so that the event_list is always empty after destroy and
ensure it remains empty until the file is closed.

Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects")
Link: https://lore.kernel.org/r/20200212072635.682689-7-leon@kernel.org
Signed-off-by: Yishai Hadas &lt;yishaih@mellanox.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>RDMA/core: Add missing list deletion on freeing event queue</title>
<updated>2020-02-13T13:44:49+00:00</updated>
<author>
<name>Michael Guralnik</name>
<email>michaelgur@mellanox.com</email>
</author>
<published>2020-02-12T07:26:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=a0767da7774d91a668f9c223cec3e76172cd833b'/>
<id>urn:sha1:a0767da7774d91a668f9c223cec3e76172cd833b</id>
<content type='text'>
When the uobject file scheme was revised to allow device disassociation
from the file it became possible for read() to still happen the driver
destroys the uobject.

The old clode code was not tolerant to concurrent read, and when it was
moved to the driver destroy it creates a bug.

Ensure the event_list is empty after driver destroy by adding the missing
list_del(). Otherwise read() can trigger a use after free and double
kfree.

Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects")
Link: https://lore.kernel.org/r/20200212072635.682689-6-leon@kernel.org
Signed-off-by: Michael Guralnik &lt;michaelgur@mellanox.com&gt;
Reviewed-by: Yishai Hadas &lt;yishaih@mellanox.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>RDMA/siw: Remove unwanted WARN_ON in siw_cm_llp_data_ready()</title>
<updated>2020-02-11T18:33:58+00:00</updated>
<author>
<name>Krishnamraju Eraparaju</name>
<email>krishna2@chelsio.com</email>
</author>
<published>2020-02-07T14:14:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=663218a3e715fd9339d143a3e10088316b180f4f'/>
<id>urn:sha1:663218a3e715fd9339d143a3e10088316b180f4f</id>
<content type='text'>
Warnings like below can fill up the dmesg while disconnecting RDMA
connections.
Hence, remove the unwanted WARN_ON.

  WARNING: CPU: 6 PID: 0 at drivers/infiniband/sw/siw/siw_cm.c:1229 siw_cm_llp_data_ready+0xc1/0xd0 [siw]
  RIP: 0010:siw_cm_llp_data_ready+0xc1/0xd0 [siw]
  Call Trace:
   &lt;IRQ&gt;
   tcp_data_queue+0x226/0xb40
   tcp_rcv_established+0x220/0x620
   tcp_v4_do_rcv+0x12a/0x1e0
   tcp_v4_rcv+0xb05/0xc00
   ip_local_deliver_finish+0x69/0x210
   ip_local_deliver+0x6b/0xe0
   ip_rcv+0x273/0x362
   __netif_receive_skb_core+0xb35/0xc30
   netif_receive_skb_internal+0x3d/0xb0
   napi_gro_frags+0x13b/0x200
   t4_ethrx_handler+0x433/0x7d0 [cxgb4]
   process_responses+0x318/0x580 [cxgb4]
   napi_rx_handler+0x14/0x100 [cxgb4]
   net_rx_action+0x149/0x3b0
   __do_softirq+0xe3/0x30a
   irq_exit+0x100/0x110
   do_IRQ+0x7f/0xe0
   common_interrupt+0xf/0xf
   &lt;/IRQ&gt;

Link: https://lore.kernel.org/r/20200207141429.27927-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju &lt;krishna2@chelsio.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>RDMA/iw_cxgb4: initiate CLOSE when entering TERM</title>
<updated>2020-02-11T18:28:00+00:00</updated>
<author>
<name>Krishnamraju Eraparaju</name>
<email>krishna2@chelsio.com</email>
</author>
<published>2020-02-04T09:12:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=d219face9059f38ad187bde133451a2a308fdb7c'/>
<id>urn:sha1:d219face9059f38ad187bde133451a2a308fdb7c</id>
<content type='text'>
As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE
when entering into TERM state.

In c4iw_modify_qp(), disconnect operation should only be performed when
the modify_qp call is invoked from ib_core. And all other internal
modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should
call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks
like below can occur:

 Call Trace:
  schedule+0x2f/0xa0
  schedule_preempt_disabled+0xa/0x10
  __mutex_lock.isra.5+0x2d0/0x4a0
  c4iw_ep_disconnect+0x39/0x430    =&gt; tries to reacquire ep lock again
  c4iw_modify_qp+0x468/0x10d0
  rx_data+0x218/0x570              =&gt; acquires ep lock
  process_work+0x5f/0x70
  process_one_work+0x1a7/0x3b0
  worker_thread+0x30/0x390
  kthread+0x112/0x130
  ret_from_fork+0x35/0x40

Fixes: d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state")
Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju &lt;krishna2@chelsio.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supported</title>
<updated>2020-02-11T18:23:01+00:00</updated>
<author>
<name>Mark Zhang</name>
<email>markz@mellanox.com</email>
</author>
<published>2020-01-26T17:17:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-op-linux/commit/?id=10189e8e6fe8dcde13435f9354800429c4474fb1'/>
<id>urn:sha1:10189e8e6fe8dcde13435f9354800429c4474fb1</id>
<content type='text'>
When binding a QP with a counter and the QP state is not RESET, return
failure if the rts2rts_qp_counters_set_id is not supported by the
device.

This is to prevent cases like manual bind for Connect-IB devices from
returning success when the feature is not supported.

Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter")
Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org
Signed-off-by: Mark Zhang &lt;markz@mellanox.com&gt;
Reviewed-by: Maor Gottlieb &lt;maorg@mellanox.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
</feed>
