11 files changed, 326 insertions, 42 deletions
diff --git a/Documentation/driver-api/basics.rst b/Documentation/driver-api/basics.rst
index 826e85d50a16..e970fadf4d1a 100644
--- a/Documentation/driver-api/basics.rst
+++ b/Documentation/driver-api/basics.rst
@@ -121,6 +121,9 @@ Kernel utility functions
 .. kernel-doc:: kernel/rcu/update.c
    :export:
 
+.. kernel-doc:: include/linux/overflow.h
+   :internal:
+
 Device Resource Management
 --------------------------
 
diff --git a/Documentation/driver-api/firewire.rst b/Documentation/driver-api/firewire.rst
new file mode 100644
index 000000000000..94a2d7f01d99
--- /dev/null
+++ b/Documentation/driver-api/firewire.rst
@@ -0,0 +1,48 @@
+===========================================
+Firewire (IEEE 1394) driver Interface Guide
+===========================================
+
+Introduction and Overview
+=========================
+
+The Linux FireWire subsystem adds some interfaces into the Linux system to
+ use/maintain+any resource on IEEE 1394 bus.
+
+The main purpose of these interfaces is to access address space on each node
+on IEEE 1394 bus by ISO/IEC 13213 (IEEE 1212) procedure, and to control
+isochronous resources on the bus by IEEE 1394 procedure.
+
+Two types of interfaces are added, according to consumers of the interface. A
+set of userspace interfaces is available via `firewire character devices`. A set
+of kernel interfaces is available via exported symbols in `firewire-core` module.
+
+Firewire char device data structures
+====================================
+
+.. include:: /ABI/stable/firewire-cdev
+    :literal:
+
+.. kernel-doc:: include/uapi/linux/firewire-cdev.h
+    :internal:
+
+Firewire device probing and sysfs interfaces
+============================================
+
+.. include:: /ABI/stable/sysfs-bus-firewire
+    :literal:
+
+.. kernel-doc:: drivers/firewire/core-device.c
+    :export:
+
+Firewire core transaction interfaces
+====================================
+
+.. kernel-doc:: drivers/firewire/core-transaction.c
+    :export:
+
+Firewire Isochronous I/O interfaces
+===================================
+
+.. kernel-doc:: drivers/firewire/core-iso.c
+   :export:
+
diff --git a/Documentation/driver-api/gpio/board.rst b/Documentation/driver-api/gpio/board.rst
index 2c112553df84..a0f294e2e250 100644
--- a/Documentation/driver-api/gpio/board.rst
+++ b/Documentation/driver-api/gpio/board.rst
@@ -193,3 +193,27 @@ And the table can be added to the board code as follows::
 
 The line will be hogged as soon as the gpiochip is created or - in case the
 chip was created earlier - when the hog table is registered.
+
+Arrays of pins
+--------------
+In addition to requesting pins belonging to a function one by one, a device may
+also request an array of pins assigned to the function.  The way those pins are
+mapped to the device determines if the array qualifies for fast bitmap
+processing.  If yes, a bitmap is passed over get/set array functions directly
+between a caller and a respective .get/set_multiple() callback of a GPIO chip.
+
+In order to qualify for fast bitmap processing, the array must meet the
+following requirements:
+- pin hardware number of array member 0 must also be 0,
+- pin hardware numbers of consecutive array members which belong to the same
+  chip as member 0 does must also match their array indexes.
+
+Otherwise fast bitmap processing path is not used in order to avoid consecutive
+pins which belong to the same chip but are not in hardware order being processed
+separately.
+
+If the array applies for fast bitmap processing path, pins which belong to
+different chips than member 0 does, as well as those with indexes different from
+their hardware pin numbers, are excluded from the fast path, both input and
+output.  Moreover, open drain and open source pins are excluded from fast bitmap
+output processing.
diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst
index aa03f389d41d..5e4d8aa68913 100644
--- a/Documentation/driver-api/gpio/consumer.rst
+++ b/Documentation/driver-api/gpio/consumer.rst
@@ -109,9 +109,11 @@ For a function using multiple GPIOs all of those can be obtained with one call::
 					   enum gpiod_flags flags)
 
 This function returns a struct gpio_descs which contains an array of
-descriptors::
+descriptors.  It also contains a pointer to a gpiolib private structure which,
+if passed back to get/set array functions, may speed up I/O proocessing::
 
 	struct gpio_descs {
+		struct gpio_array *info;
 		unsigned int ndescs;
 		struct gpio_desc *desc[];
 	}
@@ -323,29 +325,37 @@ The following functions get or set the values of an array of GPIOs::
 
 	int gpiod_get_array_value(unsigned int array_size,
 				  struct gpio_desc **desc_array,
-				  int *value_array);
+				  struct gpio_array *array_info,
+				  unsigned long *value_bitmap);
 	int gpiod_get_raw_array_value(unsigned int array_size,
 				      struct gpio_desc **desc_array,
-				      int *value_array);
+				      struct gpio_array *array_info,
+				      unsigned long *value_bitmap);
 	int gpiod_get_array_value_cansleep(unsigned int array_size,
 					   struct gpio_desc **desc_array,
-					   int *value_array);
+					   struct gpio_array *array_info,
+					   unsigned long *value_bitmap);
 	int gpiod_get_raw_array_value_cansleep(unsigned int array_size,
 					   struct gpio_desc **desc_array,
-					   int *value_array);
-
-	void gpiod_set_array_value(unsigned int array_size,
-				   struct gpio_desc **desc_array,
-				   int *value_array)
-	void gpiod_set_raw_array_value(unsigned int array_size,
-				       struct gpio_desc **desc_array,
-				       int *value_array)
-	void gpiod_set_array_value_cansleep(unsigned int array_size,
-					    struct gpio_desc **desc_array,
-					    int *value_array)
-	void gpiod_set_raw_array_value_cansleep(unsigned int array_size,
-						struct gpio_desc **desc_array,
-						int *value_array)
+					   struct gpio_array *array_info,
+					   unsigned long *value_bitmap);
+
+	int gpiod_set_array_value(unsigned int array_size,
+				  struct gpio_desc **desc_array,
+				  struct gpio_array *array_info,
+				  unsigned long *value_bitmap)
+	int gpiod_set_raw_array_value(unsigned int array_size,
+				      struct gpio_desc **desc_array,
+				      struct gpio_array *array_info,
+				      unsigned long *value_bitmap)
+	int gpiod_set_array_value_cansleep(unsigned int array_size,
+					   struct gpio_desc **desc_array,
+					   struct gpio_array *array_info,
+					   unsigned long *value_bitmap)
+	int gpiod_set_raw_array_value_cansleep(unsigned int array_size,
+					       struct gpio_desc **desc_array,
+					       struct gpio_array *array_info,
+					       unsigned long *value_bitmap)
 
 The array can be an arbitrary set of GPIOs. The functions will try to access
 GPIOs belonging to the same bank or chip simultaneously if supported by the
@@ -356,8 +366,9 @@ accessed sequentially.
 The functions take three arguments:
 	* array_size	- the number of array elements
 	* desc_array	- an array of GPIO descriptors
-	* value_array	- an array to store the GPIOs' values (get) or
-			  an array of values to assign to the GPIOs (set)
+	* array_info	- optional information obtained from gpiod_array_get()
+	* value_bitmap	- a bitmap to store the GPIOs' values (get) or
+			  a bitmap of values to assign to the GPIOs (set)
 
 The descriptor array can be obtained using the gpiod_get_array() function
 or one of its variants. If the group of descriptors returned by that function
@@ -366,16 +377,25 @@ the struct gpio_descs returned by gpiod_get_array()::
 
 	struct gpio_descs *my_gpio_descs = gpiod_get_array(...);
 	gpiod_set_array_value(my_gpio_descs->ndescs, my_gpio_descs->desc,
-			      my_gpio_values);
+			      my_gpio_descs->info, my_gpio_value_bitmap);
 
 It is also possible to access a completely arbitrary array of descriptors. The
 descriptors may be obtained using any combination of gpiod_get() and
 gpiod_get_array(). Afterwards the array of descriptors has to be setup
-manually before it can be passed to one of the above functions.
+manually before it can be passed to one of the above functions.  In that case,
+array_info should be set to NULL.
 
 Note that for optimal performance GPIOs belonging to the same chip should be
 contiguous within the array of descriptors.
 
+Still better performance may be achieved if array indexes of the descriptors
+match hardware pin numbers of a single chip.  If an array passed to a get/set
+array function matches the one obtained from gpiod_get_array() and array_info
+associated with the array is also passed, the function may take a fast bitmap
+processing path, passing the value_bitmap argument directly to the respective
+.get/set_multiple() callback of the chip.  That allows for utilization of GPIO
+banks as data I/O ports without much loss of performance.
+
 The return value of gpiod_get_array_value() and its variants is 0 on success
 or negative on error. Note the difference to gpiod_get_value(), which returns
 0 or 1 on success to convey the GPIO value. With the array functions, the GPIO
diff --git a/Documentation/driver-api/gpio/driver.rst b/Documentation/driver-api/gpio/driver.rst
index cbe0242842d1..a6c14ff0c54f 100644
--- a/Documentation/driver-api/gpio/driver.rst
+++ b/Documentation/driver-api/gpio/driver.rst
@@ -374,7 +374,28 @@ When implementing an irqchip inside a GPIO driver, these two functions should
 typically be called in the .startup() and .shutdown() callbacks from the
 irqchip.
 
-When using the gpiolib irqchip helpers, these callback are automatically
+When using the gpiolib irqchip helpers, these callbacks are automatically
+assigned.
+
+
+Disabling and enabling IRQs
+---------------------------
+When a GPIO is used as an IRQ signal, then gpiolib also needs to know if
+the IRQ is enabled or disabled. In order to inform gpiolib about this,
+a driver should call::
+
+	void gpiochip_disable_irq(struct gpio_chip *chip, unsigned int offset)
+
+This allows drivers to drive the GPIO as an output while the IRQ is
+disabled. When the IRQ is enabled again, a driver should call::
+
+	void gpiochip_enable_irq(struct gpio_chip *chip, unsigned int offset)
+
+When implementing an irqchip inside a GPIO driver, these two functions should
+typically be called in the .irq_disable() and .irq_enable() callbacks from the
+irqchip.
+
+When using the gpiolib irqchip helpers, these callbacks are automatically
 assigned.
 
 Real-Time compliance for GPIO IRQ chips
diff --git a/Documentation/driver-api/gpio/index.rst b/Documentation/driver-api/gpio/index.rst
index 6a374ded1287..c5b8467f9104 100644
--- a/Documentation/driver-api/gpio/index.rst
+++ b/Documentation/driver-api/gpio/index.rst
@@ -38,7 +38,7 @@ Device tree support
 Device-managed API
 ==================
 
-.. kernel-doc:: drivers/gpio/devres.c
+.. kernel-doc:: drivers/gpio/gpiolib-devres.c
    :export:
 
 sysfs helpers
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 6d9f2f9fe20e..909f991b4c0d 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -29,7 +29,8 @@ available subsections can be seen below.
    iio/index
    input
    usb/index
-   pci
+   firewire
+   pci/index
    spi
    i2c
    hsi
diff --git a/Documentation/driver-api/mtdnand.rst b/Documentation/driver-api/mtdnand.rst
index c55a6034c397..55447659b81f 100644
--- a/Documentation/driver-api/mtdnand.rst
+++ b/Documentation/driver-api/mtdnand.rst
@@ -180,10 +180,10 @@ by a chip select decoder.
     {
         struct nand_chip *this = mtd_to_nand(mtd);
         switch(cmd){
-            case NAND_CTL_SETCLE: this->IO_ADDR_W |= CLE_ADRR_BIT;  break;
-            case NAND_CTL_CLRCLE: this->IO_ADDR_W &= ~CLE_ADRR_BIT; break;
-            case NAND_CTL_SETALE: this->IO_ADDR_W |= ALE_ADRR_BIT;  break;
-            case NAND_CTL_CLRALE: this->IO_ADDR_W &= ~ALE_ADRR_BIT; break;
+            case NAND_CTL_SETCLE: this->legacy.IO_ADDR_W |= CLE_ADRR_BIT;  break;
+            case NAND_CTL_CLRCLE: this->legacy.IO_ADDR_W &= ~CLE_ADRR_BIT; break;
+            case NAND_CTL_SETALE: this->legacy.IO_ADDR_W |= ALE_ADRR_BIT;  break;
+            case NAND_CTL_CLRALE: this->legacy.IO_ADDR_W &= ~ALE_ADRR_BIT; break;
         }
     }
 
@@ -197,7 +197,7 @@ to read back the state of the pin. The function has no arguments and
 should return 0, if the device is busy (R/B pin is low) and 1, if the
 device is ready (R/B pin is high). If the hardware interface does not
 give access to the ready busy pin, then the function must not be defined
-and the function pointer this->dev_ready is set to NULL.
+and the function pointer this->legacy.dev_ready is set to NULL.
 
 Init function
 -------------
@@ -235,18 +235,18 @@ necessary information about the device.
         }
 
         /* Set address of NAND IO lines */
-        this->IO_ADDR_R = baseaddr;
-        this->IO_ADDR_W = baseaddr;
+        this->legacy.IO_ADDR_R = baseaddr;
+        this->legacy.IO_ADDR_W = baseaddr;
         /* Reference hardware control function */
         this->hwcontrol = board_hwcontrol;
         /* Set command delay time, see datasheet for correct value */
-        this->chip_delay = CHIP_DEPENDEND_COMMAND_DELAY;
+        this->legacy.chip_delay = CHIP_DEPENDEND_COMMAND_DELAY;
         /* Assign the device ready function, if available */
-        this->dev_ready = board_dev_ready;
+        this->legacy.dev_ready = board_dev_ready;
         this->eccmode = NAND_ECC_SOFT;
 
         /* Scan to find existence of the device */
-        if (nand_scan (board_mtd, 1)) {
+        if (nand_scan (this, 1)) {
             err = -ENXIO;
             goto out_ior;
         }
@@ -277,7 +277,7 @@ unregisters the partitions in the MTD layer.
     static void __exit board_cleanup (void)
     {
         /* Release resources, unregister device */
-        nand_release (board_mtd);
+        nand_release (mtd_to_nand(board_mtd));
 
         /* unmap physical address */
         iounmap(baseaddr);
@@ -336,17 +336,17 @@ connected to an address decoder.
         struct nand_chip *this = mtd_to_nand(mtd);
 
         /* Deselect all chips */
-        this->IO_ADDR_R &= ~BOARD_NAND_ADDR_MASK;
-        this->IO_ADDR_W &= ~BOARD_NAND_ADDR_MASK;
+        this->legacy.IO_ADDR_R &= ~BOARD_NAND_ADDR_MASK;
+        this->legacy.IO_ADDR_W &= ~BOARD_NAND_ADDR_MASK;
         switch (chip) {
         case 0:
-            this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIP0;
-            this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIP0;
+            this->legacy.IO_ADDR_R |= BOARD_NAND_ADDR_CHIP0;
+            this->legacy.IO_ADDR_W |= BOARD_NAND_ADDR_CHIP0;
             break;
         ....
         case n:
-            this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIPn;
-            this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIPn;
+            this->legacy.IO_ADDR_R |= BOARD_NAND_ADDR_CHIPn;
+            this->legacy.IO_ADDR_W |= BOARD_NAND_ADDR_CHIPn;
             break;
         }
     }
diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst
new file mode 100644
index 000000000000..c6cf1fef61ce
--- /dev/null
+++ b/Documentation/driver-api/pci/index.rst
@@ -0,0 +1,22 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================
+The Linux PCI driver implementer's API guide
+============================================
+
+.. class:: toc-title
+
+	   Table of contents
+
+.. toctree::
+   :maxdepth: 2
+
+   pci
+   p2pdma
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst
new file mode 100644
index 000000000000..4c577fa7bef9
--- /dev/null
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -0,0 +1,145 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+PCI Peer-to-Peer DMA Support
+============================
+
+The PCI bus has pretty decent support for performing DMA transfers
+between two devices on the bus. This type of transaction is henceforth
+called Peer-to-Peer (or P2P). However, there are a number of issues that
+make P2P transactions tricky to do in a perfectly safe way.
+
+One of the biggest issues is that PCI doesn't require forwarding
+transactions between hierarchy domains, and in PCIe, each Root Port
+defines a separate hierarchy domain. To make things worse, there is no
+simple way to determine if a given Root Complex supports this or not.
+(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
+only supports doing P2P when the endpoints involved are all behind the
+same PCI bridge, as such devices are all in the same PCI hierarchy
+domain, and the spec guarantees that all transactions within the
+hierarchy will be routable, but it does not require routing
+between hierarchies.
+
+The second issue is that to make use of existing interfaces in Linux,
+memory that is used for P2P transactions needs to be backed by struct
+pages. However, PCI BARs are not typically cache coherent so there are
+a few corner case gotchas with these pages so developers need to
+be careful about what they do with them.
+
+
+Driver Writer's Guide
+=====================
+
+In a given P2P implementation there may be three or more different
+types of kernel drivers in play:
+
+* Provider - A driver which provides or publishes P2P resources like
+  memory or doorbell registers to other drivers.
+* Client - A driver which makes use of a resource by setting up a
+  DMA transaction to or from it.
+* Orchestrator - A driver which orchestrates the flow of data between
+  clients and providers.
+
+In many cases there could be overlap between these three types (i.e.,
+it may be typical for a driver to be both a provider and a client).
+
+For example, in the NVMe Target Copy Offload implementation:
+
+* The NVMe PCI driver is both a client, provider and orchestrator
+  in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
+  resource (provider), it accepts P2P memory pages as buffers in requests
+  to be used directly (client) and it can also make use of the CMB as
+  submission queue entries (orchastrator).
+* The RDMA driver is a client in this arrangement so that an RNIC
+  can DMA directly to the memory exposed by the NVMe device.
+* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
+  to the P2P memory (CMB) and then to the NVMe device (and vice versa).
+
+This is currently the only arrangement supported by the kernel but
+one could imagine slight tweaks to this that would allow for the same
+functionality. For example, if a specific RNIC added a BAR with some
+memory behind it, its driver could add support as a P2P provider and
+then the NVMe Target could use the RNIC's memory instead of the CMB
+in cases where the NVMe cards in use do not have CMB support.
+
+
+Provider Drivers
+----------------
+
+A provider simply needs to register a BAR (or a portion of a BAR)
+as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
+This will register struct pages for all the specified memory.
+
+After that it may optionally publish all of its resources as
+P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
+any orchestrator drivers to find and use the memory. When marked in
+this way, the resource must be regular memory with no side effects.
+
+For the time being this is fairly rudimentary in that all resources
+are typically going to be P2P memory. Future work will likely expand
+this to include other types of resources like doorbells.
+
+
+Client Drivers
+--------------
+
+A client driver typically only has to conditionally change its DMA map
+routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
+of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
+way does not need to be unmapped.
+
+The client may also, optionally, make use of
+:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
+functions and when to use the regular mapping functions. In some
+situations, it may be more appropriate to use a flag to indicate a
+given request is P2P memory and map appropriately. It is important to
+ensure that struct pages that back P2P memory stay out of code that
+does not have support for them as other code may treat the pages as
+regular memory which may not be appropriate.
+
+
+Orchestrator Drivers
+--------------------
+
+The first task an orchestrator driver must do is compile a list of
+all client devices that will be involved in a given transaction. For
+example, the NVMe Target driver creates a list including the namespace
+block device and the RNIC in use. If the orchestrator has access to
+a specific P2P provider to use it may check compatibility using
+:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider
+that's compatible with all clients using  :c:func:`pci_p2pmem_find()`.
+If more than one provider is supported, the one nearest to all the clients will
+be chosen first. If more than one provider is an equal distance away, the
+one returned will be chosen at random (it is not an arbitrary but
+truely random). This function returns the PCI device to use for the provider
+with a reference taken and therefore when it's no longer needed it should be
+returned with pci_dev_put().
+
+Once a provider is selected, the orchestrator can then use
+:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
+allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
+and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
+allocating scatter-gather lists with P2P memory.
+
+Struct Page Caveats
+-------------------
+
+Driver writers should be very careful about not passing these special
+struct pages to code that isn't prepared for it. At this time, the kernel
+interfaces do not have any checks for ensuring this. This obviously
+precludes passing these pages to userspace.
+
+P2P memory is also technically IO memory but should never have any side
+effects behind it. Thus, the order of loads and stores should not be important
+and ioreadX(), iowriteX() and friends should not be necessary.
+However, as the memory is not cache coherent, if access ever needs to
+be protected by a spinlock then :c:func:`mmiowb()` must be used before
+unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
+Documentation/memory-barriers.txt)
+
+
+P2P DMA Support Library
+=======================
+
+.. kernel-doc:: drivers/pci/p2pdma.c
+   :export:
diff --git a/Documentation/driver-api/pci.rst b/Documentation/driver-api/pci/pci.rst
index ca85e5e78b2c..ca85e5e78b2c 100644
--- a/Documentation/driver-api/pci.rst
+++ b/Documentation/driver-api/pci/pci.rst