talos-op-linux - Talos™ II Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| *	nvme: factor out a few helpers from req_completion	Christoph Hellwig	2015-12-22	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We'll need them in other places later. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: fix admin queue depth	Christoph Hellwig	2015-12-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The number in tag_set->queue depth includes the reserved tags. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Remove device management handles on remove	Keith Busch	2015-12-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't want to allow new references to open on a device that is removed. This ties the lifetime of these handles to the physical device's presence rather than to the open reference count. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Use unbounded work queue for all work	Keith Busch	2015-12-22	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removes all usage of the global work queue so work can't be scheduled on two different work queues, and removes nvme's work queue singlethreadedness so controllers can be driven in parallel. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: keep the dead controller removal on the system workqueue to avoid deadlocks] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Implement namespace list scanning	Keith Busch	2015-12-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NVMe 1.1 specification provides an identify mode to return a list of active namespaces. This is more efficient to discover which namespace identifiers are active on a controller, providing potentially significant improvement in scan time for controllers with sparesly populated namespaces. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: add quirk for the broken Qemu Identify implementation. To be relaxed later] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: switch abort_limit to an atomic_t	Christoph Hellwig	2015-12-22	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no lock to sychronize access to the abort_limit field of struct nvme_ctrl, so switch it to an atomic_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: remove dead controllers from a work item	Christoph Hellwig	2015-12-22	1	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Compared to the kthread this gives us multiple call prevention for free. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: merge probe_work and reset_work	Christoph Hellwig	2015-12-22	1	-35/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we're using two work queues we're always going to run into races where one item is tearing down what the other one is initializing. So insted merge the two work queues, and let the old probe_work also tear the controller down first if it was alive. Together with the better detection of the probe path using a flag this gives us a properly serialized reset/probe path that also doesn't accidentally trigger when two commands time out and the second one tries to reset the controller while the first reset is still in progress. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: do not restart the request timeout if we're resetting the controller	Keith Busch	2015-12-22	1	-9/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise we're never going to complete a command when it is restarted just after we completed all other outstanding commands in nvme_clear_queue. The controller must be disabled prior to completing a presumed lost command, do this by directly shutting down the controller before queueing the reset work, and return EH_HANDLED from the timeout handler after we shut the controller down. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: split and rebase] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: simplify resets	Christoph Hellwig	2015-12-22	1	-26/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't delete the controller from dev_list before queuing a reset, instead just check for it being reset in the polling kthread. This allows to remove the dev_list_lock in various places, and in addition we can simply rely on checking the queue_work return value to see if we could reset a controller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: add NVME_SC_CANCELLED	Christoph Hellwig	2015-12-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To properly document how we are using a negative Linux error value to communicate request cancellations inside the driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: merge nvme_abort_req and nvme_timeout	Christoph Hellwig	2015-12-22	1	-29/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to be able to return bettern error values frmo nvme_timeout, which is significantly easier if the two functions are merged. Also clean up and reduce the printk spew so that we only get one message per abort. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: don't take the I/O queue q_lock in nvme_timeout	Christoph Hellwig	2015-12-22	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is nothing it protects, but it makes lockdep unhappy in many different ways. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: protect against simultaneous shutdown invocations	Keith Busch	2015-12-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: only add a controller to dev_list after it's been fully initialized	Christoph Hellwig	2015-12-22	1	-21/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this we can easily get bad derferences on nvmeq->d_db when the nvme kthread tries to poll the CQs for controllers that are in half initialized state. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: only ignore hardware errors in nvme_create_io_queues	Christoph Hellwig	2015-12-22	1	-15/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Half initialized queues due to kernel error returns or timeout are still a good reason to give up on initializing a controller. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: refactor set_queue_count	Christoph Hellwig	2015-12-01	1	-21/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split out a helper that just issues the Set Features and interprets the result which can go to common code, and document why we are ignoring non-timeout error returns in the PCIe driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: move chardev and sysfs interface to common code	Christoph Hellwig	2015-12-01	1	-180/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For this we need to add a proper controller init routine and a list of all controllers that is in addition to the list of PCIe controllers, which stays in pci.c. Note that we remove the sysfs device when the last reference to a controller is dropped now - the old code would have kept it around longer, which doesn't make much sense. This requires a new ->reset_ctrl operation to implement controleller resets, and a new ->write_reg32 operation that is required to implement subsystem resets. We also now store caches copied of the NVMe compliance version and the flag if a controller is attached to a subsystem or not in the generic controller structure now. Signed-off-by: Christoph Hellwig <hch@lst.de> [Fixes for pr merge] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: move namespace scanning to common code	Christoph Hellwig	2015-12-01	1	-190/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The namespace scanning code has been mostly generic already, we just need to store a pointer to the tagset in the nvme_ctrl structure, and add a method to check if a controller is I/O incapable. The latter will hopefully be replaced by a proper controller state machine soon. Signed-off-by: Christoph Hellwig <hch@lst.de> [Fixed pr conflicts] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: move the call to nvme_init_identify earlier	Christoph Hellwig	2015-12-01	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to record the identify and CAP values even if no I/O queue is available. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: add a common helper to read Identify Controller data	Christoph Hellwig	2015-12-01	1	-38/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And add the 64-bit register read operation for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: move nvme_{enable,disable,shutdown}_ctrl to common code	Christoph Hellwig	2015-12-01	1	-109/+24
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: move remaining CC setup into nvme_enable_ctrl	Christoph Hellwig	2015-12-01	1	-23/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the calculation of all the bits written into the CC register into nvme_enable_ctrl, so that they can be moved into the core NVMe driver in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: add explicit quirk handling	Christoph Hellwig	2015-12-01	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an enum for all workarounds not in the spec and identify the affected controllers at probe time. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: move block_device_operations and ns/ctrl freeing to common code	Christoph Hellwig	2015-12-01	1	-400/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the block_device_operations over to common code mostly as-is. The only change is that the ns and ctrl refcounting got some small refcounting to have wrappers around the kref_put operations. A new free_ctrl operation is added to allow the PCI driver to free it's ressources on the final drop. Signed-off-by: Christoph Hellwig <hch@lst.de> [Moved the integrity and pr changes due to merge conflict] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: use the block layer for userspace passthrough metadata	Keith Busch	2015-12-01	1	-34/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the integrity API to pass through metadata from userspace. For PI enabled devices this means that we now validate the reftag, which seems like an unintentional ommission in the old code. Thanks to Keith Busch for testing and fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> [Skip metadata setup on admin commands] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: split __nvme_submit_sync_cmd	Christoph Hellwig	2015-12-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a separate nvme_submit_user_cmd for commands that directly DMA to or from userspace. We'll add metadata support to that soon and the common version would become too messy. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: move nvme_setup_flush and nvme_setup_rw to common code	Christoph Hellwig	2015-12-01	1	-49/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And mark them inline so that we don't slow down the I/O submission path by having to turn it into a forced out of line call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: move nvme_error_status to common code	Christoph Hellwig	2015-12-01	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And mark it inline so that we don't slow down the completion path by having to turn it into a forced out of line call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: factor out a nvme_unmap_data helper	Christoph Hellwig	2015-12-01	1	-18/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the counter part to nvme_map_data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: refactor nvme_queue_rq	Christoph Hellwig	2015-12-01	1	-122/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This "backports" the structure I've used for the fabrics driver. It mostly started out as a cleanup so that I could actually understand the code, but I think it also qualifies as a micro-optimization due to the reduced time we hold q_lock and disable interrupts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: simplify nvme_setup_prps calling convention	Christoph Hellwig	2015-12-01	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pass back a true/false value instead of the length which needs a compare with the bytes in the request and drop the pointless gfp_t argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: split a new struct nvme_ctrl out of struct nvme_dev	Christoph Hellwig	2015-12-01	1	-64/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new struct nvme_ctrl will be used by the common NVMe code that sits on top of struct request_queue and the new nvme_ctrl_ops abstraction. It only contains the bare minimum required, which consists of values sampled during controller probe, the admin queue pointer and a second struct device pointer at the moment, but more will follow later. Only values that are not used in the I/O fast path should be moved to struct nvme_ctrl so that drivers can optimize their cache line usage easily. That's also the reason why we have two device pointers as the struct device is used for DMA mapping purposes. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: use offset instead of a struct for registers	Christoph Hellwig	2015-12-01	1	-28/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes life easier for future non-PCI drivers where access to the registers might be more complicated. Note that Linux drivers are pretty evenly split between the two versions, and in fact the NVMe driver already uses offsets for the doorbells. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> [Fixed CMBSZ offset] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: split command submission helpers out of pci.c	Christoph Hellwig	2015-12-01	1	-154/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create a new core.c and start by adding the command submission helpers to it, which are already abstracted away from the actual hardware queues by the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: move struct nvme_iod to pci.c	Christoph Hellwig	2015-12-01	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This structure is specific to the PCIe driver internals and should be moved to pci.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* \|	Merge branch 'for-4.5/core' of git://git.kernel.dk/linux-block	Linus Torvalds	2016-01-19	1	-5/+6
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull core block updates from Jens Axboe: "We don't have a lot of core changes this time around, it's mostly in drivers, which will come in a subsequent pull. The cores changes include: - blk-mq - Prep patch from Christoph, changing blk_mq_alloc_request() to take flags instead of just using gfp_t for sleep/nosleep. - Doc patch from me, clarifying the difference between legacy and blk-mq for timer usage. - Fixes from Raghavendra for memory-less numa nodes, and a reuse of CPU masks. - Cleanup from Geliang Tang, using offset_in_page() instead of open coding it. - From Ilya, rename request_queue slab to it reflects what it holds, and a fix for proper use of bdgrab/put. - A real fix for the split across stripe boundaries from Keith. We yanked a broken version of this from 4.4-rc final, this one works. - From Mike Krinkin, emit a trace message when we split. - From Wei Tang, two small cleanups, not explicitly clearing memory that is already cleared" * 'for-4.5/core' of git://git.kernel.dk/linux-block: block: use bd{grab,put}() instead of open-coding block: split bios to max possible length block: add call to split trace point blk-mq: Avoid memoryless numa node encoded in hctx numa_node blk-mq: Reuse hardware context cpumask for tags blk-mq: add a flags parameter to blk_mq_alloc_request Revert "blk-flush: Queue through IO scheduler when flush not required" block: clarify blk_add_timer() use case for blk-mq bio: use offset_in_page macro block: do not initialise statics to 0 or NULL block: do not initialise globals to 0 or NULL block: rename request_queue slab cache
\| *	blk-mq: add a flags parameter to blk_mq_alloc_request	Christoph Hellwig	2015-12-01	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already have the reserved flag, and a nowait flag awkwardly encoded as a gfp_t. Add a real flags argument to make the scheme more extensible and allow for a nicer calling convention. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* \|	NVMe: IO ending fixes on surprise removal	Keith Busch	2015-12-22	1	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a lost request discovered during IO + hot removal. The driver's pci removal deletes gendisks prior to shutting down the controller to allow dirty data to sync. Dirty data can not be synced on a surprise removal, though, and would potentially block indefinitely. The driver previously had marked the queue as dying in this scenario to prevent new requests from attempting, however it will still block for requests that already entered the queue. This patch fixes this by quiescing IO first, then aborting the requeued requests before deleting disks. Reported-by: Sujith Pandel <sujith_pandel@dell.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Sujith Pandel <sujith_pandel@dell.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* \|	nvme: temporary fix for Apple controller reset	Stephan Günther	2015-12-01	1	-0/+12
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent patches added basic support for the Apple NVMe controller but still cause resets and data corruption on that particular controller when a specific pattern of read/flush commands occurs. Limiting the queue depth to 2 works around that issue. This patch enforces that limit only for the Apple controller and is considered a temporary fix until we find the root source of that problem. Signed-off-by: Stephan Günther <guenther@tum.de> Signed-off-by: Maurice Leclaire <leclaire@in.tum.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
*	nvme: add missing unmaps in nvme_queue_rq	Christoph Hellwig	2015-11-24	1	-3/+12
\| \| \| \| \| \| \| \| \| \|	When we fail various metadata related operations in nvme_queue_rq we need to unmap the data SGL. Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
*	NVMe: default to 4k device page size	Nishanth Aravamudan	2015-11-24	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PRP entries will match the device's page size, and that the DMA aligment matches the kernel's page aligment. On Power, the the IOMMU page size, as mentioned above, can be 4K, while the device can have a page size of 8K, while the kernel has a page size of 64K. This eventually trips the BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple of 4K but not 8K (e.g., 0xF000). In this particular case of page sizes, we clearly want to use the IOMMU's page size in the driver. And generally, the NVMe driver in this function should be using the IOMMU's page size for the default device page size, rather than the kernel's page size. There is not currently an API to obtain the IOMMU's page size across all architectures and in the interest of a stop-gap fix to this functional issue, default the NVMe device page size to 4K, with the intent of adding such an API and implementation across all architectures in the next merge window. With the functionally equivalent v3 of this patch, our hardware test exerciser survives when using 32-bit DMA; without the patch, the kernel will BUG within a few minutes. Signed-off-by: Nishanth Aravamudan <nacc at linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
*	NVMe: reap completion entries when deleting queue	Keith Busch	2015-11-20	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure that there are no unprocesssed entries on a completion queue before deleting it, and check for validity of the CQ door bell before writing completions to it. This fixes problems with doing a sysfs reset of the device while it's handling IO. Tested-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
*	NVMe: Fix possible arithmetic overflow for max segments	Keith Busch	2015-11-19	1	-1/+1
\| \| \| \| \| \|	Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
*	NVMe: add support for Apple NVMe controller	Stephan Günther	2015-11-11	1	-0/+1
\| \| \| \| \| \| \| \| \|	Add PCI ID of Apple's NVMe controller. Signed-off-by: Stephan Guenther <guenther@tum.de> Signed-off-by: Maurice Leclaire <leclaire@in.tum.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
*	NVMe: use split lo_hi_{read,write}q	Stephan Günther	2015-11-11	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some controllers may require ordered split transfers even on 64bit machines, e.g. Apple's NVMe controller as found in the MacBook8,1 and MacBookAir7,1 (256/512GB models). This patch enforces ordered split transfers on 64bit platforms, which works around that issue for all controllers. As pointed out by Christoph [1] there should be no performance impact due to that modification. [1] http://lists.infradead.org/pipermail/linux-nvme/2015-November/002965.html Signed-off-by: Stephan Guenther <guenther@tum.de> Signed-off-by: Maurice Leclaire <leclaire@in.tum.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Updated by me to explicitly use lo_hi_read/writeq instead of playing define tricks. Signed-off-by: Jens Axboe <axboe@fb.com>
*	NVMe: Increase the max transfer size when mdts is 0	Sathyavathi M	2015-11-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This patch address the issue when IO with 128KB from FIO is split into two parts, 124KB and 4KB, due to max transfer size(127KB). This degrades the device performance. Signed-off-by: Sathyavathi M <sathya.m@samsung.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
*	Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block	Linus Torvalds	2015-11-10	1	-4/+28
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull block IO poll support from Jens Axboe: "Various groups have been doing experimentation around IO polling for (really) fast devices. The code has been reviewed and has been sitting on the side for a few releases, but this is now good enough for coordinated benchmarking and further experimentation. Currently O_DIRECT sync read/write are supported. A framework is in the works that allows scalable stats tracking so we can auto-tune this. And we'll add libaio support as well soon. Fow now, it's an opt-in feature for test purposes" * 'for-4.4/io-poll' of git://git.kernel.dk/linux-block: direct-io: be sure to assign dio->bio_bdev for both paths directio: add block polling support NVMe: add blk polling support block: add block polling support blk-mq: return tag/queue combo in the make_request_fn handlers block: change ->make_request_fn() and users to return a queue cookie
\| *	NVMe: add blk polling support	Jens Axboe	2015-11-07	1	-4/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add nvme_poll(), which will check a specific completion queue for command completions. Wire that up to the new block layer poll mechanism. Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com>
* \|	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	2015-11-07	1	-2/+4
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...