<feed xmlns='http://www.w3.org/2005/Atom'>
<title>blackbird-op-linux/drivers/nvme, branch master</title>
<subtitle>Blackbird™ Linux sources for OpenPOWER</subtitle>
<id>https://git.raptorcs.com/git/blackbird-op-linux/atom?h=master</id>
<link rel='self' href='https://git.raptorcs.com/git/blackbird-op-linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/'/>
<updated>2020-02-16T20:35:52+00:00</updated>
<entry>
<title>Merge tag 'block-5.6-2020-02-16' of git://git.kernel.dk/linux-block</title>
<updated>2020-02-16T20:35:52+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-02-16T20:35:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=e29c6a13ddf56217563f03fbc6ba9bb72dcbf2e4'/>
<id>urn:sha1:e29c6a13ddf56217563f03fbc6ba9bb72dcbf2e4</id>
<content type='text'>
Pull block fixes from Jens Axboe:
 "Not a lot here, which is great, basically just three small bcache
  fixes from Coly, and four NVMe fixes via Keith"

* tag 'block-5.6-2020-02-16' of git://git.kernel.dk/linux-block:
  nvme: fix the parameter order for nvme_get_log in nvme_get_fw_slot_info
  nvme/pci: move cqe check after device shutdown
  nvme: prevent warning triggered by nvme_stop_keep_alive
  nvme/tcp: fix bug on double requeue when send fails
  bcache: remove macro nr_to_fifo_front()
  bcache: Revert "bcache: shrink btree node cache after bch_btree_check()"
  bcache: ignore pending signals when creating gc and allocator thread
</content>
</entry>
<entry>
<title>nvme: fix the parameter order for nvme_get_log in nvme_get_fw_slot_info</title>
<updated>2020-02-14T17:12:04+00:00</updated>
<author>
<name>Yi Zhang</name>
<email>yi.zhang@redhat.com</email>
</author>
<published>2020-02-14T10:48:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=f25372ffc3f6c2684b57fb718219137e6ee2b64c'/>
<id>urn:sha1:f25372ffc3f6c2684b57fb718219137e6ee2b64c</id>
<content type='text'>
nvme fw-activate operation will get bellow warning log,
fix it by update the parameter order

[  113.231513] nvme nvme0: Get FW SLOT INFO log error

Fixes: 0e98719b0e4b ("nvme: simplify the API for getting log pages")
Reported-by: Sujith Pandel &lt;sujith_pandel@dell.com&gt;
Reviewed-by: David Milburn &lt;dmilburn@redhat.com&gt;
Signed-off-by: Yi Zhang &lt;yi.zhang@redhat.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>nvme/pci: move cqe check after device shutdown</title>
<updated>2020-02-14T17:12:04+00:00</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2020-02-12T16:41:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=fa46c6fb5d61b1f17b06d7c6ef75478b576304c7'/>
<id>urn:sha1:fa46c6fb5d61b1f17b06d7c6ef75478b576304c7</id>
<content type='text'>
Many users have reported nvme triggered irq_startup() warnings during
shutdown. The driver uses the nvme queue's irq to synchronize scanning
for completions, and enabling an interrupt affined to only offline CPUs
triggers the alarming warning.

Move the final CQE check to after disabling the device and all
registered interrupts have been torn down so that we do not have any
IRQ to synchronize.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206509
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>nvme: prevent warning triggered by nvme_stop_keep_alive</title>
<updated>2020-02-14T17:12:04+00:00</updated>
<author>
<name>Nigel Kirkland</name>
<email>nigel.kirkland@broadcom.com</email>
</author>
<published>2020-02-11T00:01:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=97b2512ad000a409b4073dd1a71e4157d76675cb'/>
<id>urn:sha1:97b2512ad000a409b4073dd1a71e4157d76675cb</id>
<content type='text'>
Delayed keep alive work is queued on system workqueue and may be cancelled
via nvme_stop_keep_alive from nvme_reset_wq, nvme_fc_wq or nvme_wq.

Check_flush_dependency detects mismatched attributes between the work-queue
context used to cancel the keep alive work and system-wq. Specifically
system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used
to cancel keep alive work have WQ_MEM_RECLAIM flag.

Example warning:

  workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc]
	is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core]

To avoid the flags mismatch, delayed keep alive work is queued on nvme_wq.

However this creates a secondary concern where work and a request to cancel
that work may be in the same work queue - namely err_work in the rdma and
tcp transports, which will want to flush/cancel the keep alive work which
will now be on nvme_wq.

After reviewing the transports, it looks like err_work can be moved to
nvme_reset_wq. In fact that aligns them better with transition into
RESETTING and performing related reset work in nvme_reset_wq.

Change nvme-rdma and nvme-tcp to perform err_work in nvme_reset_wq.

Signed-off-by: Nigel Kirkland &lt;nigel.kirkland@broadcom.com&gt;
Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>nvme/tcp: fix bug on double requeue when send fails</title>
<updated>2020-02-14T17:12:04+00:00</updated>
<author>
<name>Anton Eidelman</name>
<email>anton@lightbitslabs.com</email>
</author>
<published>2020-02-10T18:37:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=2d570a7c0251c594489a2c16b82b14ae30345c03'/>
<id>urn:sha1:2d570a7c0251c594489a2c16b82b14ae30345c03</id>
<content type='text'>
When nvme_tcp_io_work() fails to send to socket due to
connection close/reset, error_recovery work is triggered
from nvme_tcp_state_change() socket callback.
This cancels all the active requests in the tagset,
which requeues them.

The failed request, however, was ended and thus requeued
individually as well unless send returned -EPIPE.
Another return code to be treated the same way is -ECONNRESET.

Double requeue caused BUG_ON(blk_queued_rq(rq))
in blk_mq_requeue_request() from either the individual requeue
of the failed request or the bulk requeue from
blk_mq_tagset_busy_iter(, nvme_cancel_request, );

Signed-off-by: Anton Eidelman &lt;anton@lightbitslabs.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'block-5.6-2020-02-05' of git://git.kernel.dk/linux-block</title>
<updated>2020-02-06T06:15:23+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-02-06T06:15:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=ed535f2c9e00eafdeb57d6310b7c8c5a009a9262'/>
<id>urn:sha1:ed535f2c9e00eafdeb57d6310b7c8c5a009a9262</id>
<content type='text'>
Pull more block updates from Jens Axboe:
 "Some later arrivals, but all fixes at this point:

   - bcache fix series (Coly)

   - Series of BFQ fixes (Paolo)

   - NVMe pull request from Keith with a few minor NVMe fixes

   - Various little tweaks"

* tag 'block-5.6-2020-02-05' of git://git.kernel.dk/linux-block: (23 commits)
  nvmet: update AEN list and array at one place
  nvmet: Fix controller use after free
  nvmet: Fix error print message at nvmet_install_queue function
  brd: check and limit max_part par
  nvme-pci: remove nvmeq-&gt;tags
  nvmet: fix dsm failure when payload does not match sgl descriptor
  nvmet: Pass lockdep expression to RCU lists
  block, bfq: clarify the goal of bfq_split_bfqq()
  block, bfq: get a ref to a group when adding it to a service tree
  block, bfq: remove ifdefs from around gets/puts of bfq groups
  block, bfq: extend incomplete name of field on_st
  block, bfq: get extra ref to prevent a queue from being freed during a group move
  block, bfq: do not insert oom queue into position tree
  block, bfq: do not plug I/O for bfq_queues with no proc refs
  bcache: check return value of prio_read()
  bcache: fix incorrect data type usage in btree_flush_write()
  bcache: add readahead cache policy options via sysfs interface
  bcache: explicity type cast in bset_bkey_last()
  bcache: fix memory corruption in bch_cache_accounting_clear()
  xen/blkfront: limit allocated memory size to actual use case
  ...
</content>
</entry>
<entry>
<title>nvmet: update AEN list and array at one place</title>
<updated>2020-02-04T16:56:10+00:00</updated>
<author>
<name>Daniel Wagner</name>
<email>dwagner@suse.de</email>
</author>
<published>2020-01-30T18:29:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=0f5be6a4ff7b3f8bf3db15f904e3e76797a43d9a'/>
<id>urn:sha1:0f5be6a4ff7b3f8bf3db15f904e3e76797a43d9a</id>
<content type='text'>
All async events are enqueued via nvmet_add_async_event() which
updates the ctrl-&gt;async_event_cmds[] array and additionally an struct
nvmet_async_event is added to the ctrl-&gt;async_events list.

Under normal operations the nvmet_async_event_work() updates again
the ctrl-&gt;async_event_cmds and removes the corresponding struct
nvmet_async_event from the list again. Though nvmet_sq_destroy() could
be called which calls nvmet_async_events_free() which only updates the
ctrl-&gt;async_event_cmds[] array.

Add new functions nvmet_async_events_process() and
nvmet_async_events_free() to process async events, update an array and
the list.

When we destroy submission queue after clearing the aen present on
the ctrl-&gt;async list we also loop over ctrl-&gt;async_event_cmds[] for
any requests posted by the host for which we don't have the AEN in
the ctrl-&gt;async_events list by calling nvmet_async_event_process()
and nvmet_async_events_free().

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Daniel Wagner &lt;dwagner@suse.de&gt;
[chaitanya.kulkarni@wdc.com
 * Loop over and clear out outstanding requests
 * Update changelog
]
Signed-off-by: Chaitanya Kulkarni &lt;chaitanya.kulkarni@wdc.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>nvmet: Fix controller use after free</title>
<updated>2020-02-04T16:13:09+00:00</updated>
<author>
<name>Israel Rukshin</name>
<email>israelr@mellanox.com</email>
</author>
<published>2020-02-04T12:38:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=1a3f540d63152b8db0a12de508bfa03776217d83'/>
<id>urn:sha1:1a3f540d63152b8db0a12de508bfa03776217d83</id>
<content type='text'>
After nvmet_install_queue() sets sq-&gt;ctrl calling to nvmet_sq_destroy()
reduces the controller refcount. In case nvmet_install_queue() fails,
calling to nvmet_ctrl_put() is done twice (at nvmet_sq_destroy and
nvmet_execute_io_connect/nvmet_execute_admin_connect) instead of once for
the queue which leads to use after free of the controller. Fix this by set
NULL at sq-&gt;ctrl in case of a failure at nvmet_install_queue().

The bug leads to the following Call Trace:

[65857.994862] refcount_t: underflow; use-after-free.
[65858.108304] Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
[65858.115557] RIP: 0010:refcount_warn_saturate+0xe5/0xf0
[65858.208141] Call Trace:
[65858.211203]  nvmet_sq_destroy+0xe1/0xf0 [nvmet]
[65858.216383]  nvmet_rdma_release_queue_work+0x37/0xf0 [nvmet_rdma]
[65858.223117]  process_one_work+0x167/0x370
[65858.227776]  worker_thread+0x49/0x3e0
[65858.232089]  kthread+0xf5/0x130
[65858.235895]  ? max_active_store+0x80/0x80
[65858.240504]  ? kthread_bind+0x10/0x10
[65858.244832]  ret_from_fork+0x1f/0x30
[65858.249074] ---[ end trace f82d59250b54beb7 ]---

Fixes: bb1cc74790eb ("nvmet: implement valid sqhd values in completions")
Fixes: 1672ddb8d691 ("nvmet: Add install_queue callout")
Signed-off-by: Israel Rukshin &lt;israelr@mellanox.com&gt;
Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>nvmet: Fix error print message at nvmet_install_queue function</title>
<updated>2020-02-04T16:13:06+00:00</updated>
<author>
<name>Israel Rukshin</name>
<email>israelr@mellanox.com</email>
</author>
<published>2020-02-04T12:38:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=0b87a2b795d66be7b54779848ef0f3901c5e46fc'/>
<id>urn:sha1:0b87a2b795d66be7b54779848ef0f3901c5e46fc</id>
<content type='text'>
Place the arguments in the correct order.

Fixes: 1672ddb8d691 ("nvmet: Add install_queue callout")
Signed-off-by: Israel Rukshin &lt;israelr@mellanox.com&gt;
Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>nvme-pci: remove nvmeq-&gt;tags</title>
<updated>2020-02-03T18:00:25+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-01-30T18:40:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/blackbird-op-linux/commit/?id=cfa27356f835dc7755192e7b941d4f4851acbcc7'/>
<id>urn:sha1:cfa27356f835dc7755192e7b941d4f4851acbcc7</id>
<content type='text'>
There is no real need to have a pointer to the tagset in
struct nvme_queue, as we only need it in a single place, and that place
can derive the used tagset from the device and qid trivially.  This
fixes a problem with stale pointer exposure when tagsets are reset,
and also shrinks the nvme_queue structure.  It also matches what most
other transports have done since day 1.

Reported-by: Edmund Nadolski &lt;edmund.nadolski@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
</feed>
