diff options
Diffstat (limited to 'Documentation')
281 files changed, 9279 insertions, 2704 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 65bbd2622396..2214f123a976 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -7,8 +7,8 @@ Please try and keep the descriptions small enough to fit on one line. Following translations are available on the WWW: - - Japanese, maintained by the JF Project (JF@linux.or.jp), at - http://www.linux.or.jp/JF/ + - Japanese, maintained by the JF Project (jf@listserv.linux.or.jp), at + http://linuxjf.sourceforge.jp/ 00-INDEX - this file. @@ -104,6 +104,8 @@ cpuidle/ - info on CPU_IDLE, CPU idle state management subsystem. cputopology.txt - documentation on how CPU topology info is exported via sysfs. +crc32.txt + - brief tutorial on CRC computation cris/ - directory with info about Linux on CRIS architecture. crypto/ diff --git a/Documentation/ABI/obsolete/sysfs-class-rfkill b/Documentation/ABI/obsolete/sysfs-class-rfkill index 4201d5b05515..ff60ad9eca4c 100644 --- a/Documentation/ABI/obsolete/sysfs-class-rfkill +++ b/Documentation/ABI/obsolete/sysfs-class-rfkill @@ -7,7 +7,7 @@ Date: 09-Jul-2007 KernelVersion v2.6.22 Contact: linux-wireless@vger.kernel.org Description: Current state of the transmitter. - This file is deprecated and sheduled to be removed in 2014, + This file is deprecated and scheduled to be removed in 2014, because its not possible to express the 'soft and hard block' state of the rfkill driver. Values: A numeric value. diff --git a/Documentation/ABI/removed/devfs b/Documentation/ABI/removed/devfs index 8ffd28bf6598..0020c49933c4 100644 --- a/Documentation/ABI/removed/devfs +++ b/Documentation/ABI/removed/devfs @@ -1,6 +1,6 @@ What: devfs Date: July 2005 (scheduled), finally removed in kernel v2.6.18 -Contact: Greg Kroah-Hartman <gregkh@suse.de> +Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Description: devfs has been unmaintained for a number of years, has unfixable races, contains a naming policy within the kernel that is diff --git a/Documentation/ABI/stable/sysfs-driver-usb-usbtmc b/Documentation/ABI/stable/sysfs-driver-usb-usbtmc index 9a75fb22187d..e960cd027e1e 100644 --- a/Documentation/ABI/stable/sysfs-driver-usb-usbtmc +++ b/Documentation/ABI/stable/sysfs-driver-usb-usbtmc @@ -1,7 +1,7 @@ -What: /sys/bus/usb/drivers/usbtmc/devices/*/interface_capabilities -What: /sys/bus/usb/drivers/usbtmc/devices/*/device_capabilities +What: /sys/bus/usb/drivers/usbtmc/*/interface_capabilities +What: /sys/bus/usb/drivers/usbtmc/*/device_capabilities Date: August 2008 -Contact: Greg Kroah-Hartman <gregkh@suse.de> +Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Description: These files show the various USB TMC capabilities as described by the device itself. The full description of the bitfields @@ -12,10 +12,10 @@ Description: The files are read only. -What: /sys/bus/usb/drivers/usbtmc/devices/*/usb488_interface_capabilities -What: /sys/bus/usb/drivers/usbtmc/devices/*/usb488_device_capabilities +What: /sys/bus/usb/drivers/usbtmc/*/usb488_interface_capabilities +What: /sys/bus/usb/drivers/usbtmc/*/usb488_device_capabilities Date: August 2008 -Contact: Greg Kroah-Hartman <gregkh@suse.de> +Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Description: These files show the various USB TMC capabilities as described by the device itself. The full description of the bitfields @@ -27,9 +27,9 @@ Description: The files are read only. -What: /sys/bus/usb/drivers/usbtmc/devices/*/TermChar +What: /sys/bus/usb/drivers/usbtmc/*/TermChar Date: August 2008 -Contact: Greg Kroah-Hartman <gregkh@suse.de> +Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Description: This file is the TermChar value to be sent to the USB TMC device as described by the document, "Universal Serial Bus Test @@ -40,9 +40,9 @@ Description: sent to the device or not. -What: /sys/bus/usb/drivers/usbtmc/devices/*/TermCharEnabled +What: /sys/bus/usb/drivers/usbtmc/*/TermCharEnabled Date: August 2008 -Contact: Greg Kroah-Hartman <gregkh@suse.de> +Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Description: This file determines if the TermChar is to be sent to the device on every transaction or not. For more details about @@ -51,11 +51,11 @@ Description: published by the USB-IF. -What: /sys/bus/usb/drivers/usbtmc/devices/*/auto_abort +What: /sys/bus/usb/drivers/usbtmc/*/auto_abort Date: August 2008 -Contact: Greg Kroah-Hartman <gregkh@suse.de> +Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Description: - This file determines if the the transaction of the USB TMC + This file determines if the transaction of the USB TMC device is to be automatically aborted if there is any error. For more details about this, please see the document, "Universal Serial Bus Test and Measurement Class Specification diff --git a/Documentation/ABI/stable/sysfs-module b/Documentation/ABI/stable/sysfs-module index 75be43118335..a0dd21c6db59 100644 --- a/Documentation/ABI/stable/sysfs-module +++ b/Documentation/ABI/stable/sysfs-module @@ -6,7 +6,7 @@ Description: The name of the module that is in the kernel. This module name will show up either if the module is built directly into the kernel, or if it is loaded as a - dyanmic module. + dynamic module. /sys/module/MODULENAME/parameters This directory contains individual files that are each diff --git a/Documentation/ABI/testing/debugfs-olpc b/Documentation/ABI/testing/debugfs-olpc new file mode 100644 index 000000000000..bd76cc6d55f9 --- /dev/null +++ b/Documentation/ABI/testing/debugfs-olpc @@ -0,0 +1,16 @@ +What: /sys/kernel/debug/olpc-ec/cmd +Date: Dec 2011 +KernelVersion: 3.4 +Contact: devel@lists.laptop.org +Description: + +A generic interface for executing OLPC Embedded Controller commands and +reading their responses. + +To execute a command, write data with the format: CC:N A A A A +CC is the (hex) command, N is the count of expected reply bytes, and A A A A +are optional (hex) arguments. + +To read the response (if any), read from the generic node after executing +a command. Hex reply bytes will be returned, *whether or not* they came from +the immediately previous command. diff --git a/Documentation/ABI/testing/sysfs-block-dm b/Documentation/ABI/testing/sysfs-block-dm new file mode 100644 index 000000000000..87ca5691e29b --- /dev/null +++ b/Documentation/ABI/testing/sysfs-block-dm @@ -0,0 +1,25 @@ +What: /sys/block/dm-<num>/dm/name +Date: January 2009 +KernelVersion: 2.6.29 +Contact: dm-devel@redhat.com +Description: Device-mapper device name. + Read-only string containing mapped device name. +Users: util-linux, device-mapper udev rules + +What: /sys/block/dm-<num>/dm/uuid +Date: January 2009 +KernelVersion: 2.6.29 +Contact: dm-devel@redhat.com +Description: Device-mapper device UUID. + Read-only string containing DM-UUID or empty string + if DM-UUID is not set. +Users: util-linux, device-mapper udev rules + +What: /sys/block/dm-<num>/dm/suspended +Date: June 2009 +KernelVersion: 2.6.31 +Contact: dm-devel@redhat.com +Description: Device-mapper device suspend state. + Contains the value 1 while the device is suspended. + Otherwise it contains 0. Read-only attribute. +Users: util-linux, device-mapper udev rules diff --git a/Documentation/ABI/testing/sysfs-block-rssd b/Documentation/ABI/testing/sysfs-block-rssd new file mode 100644 index 000000000000..d535757799fe --- /dev/null +++ b/Documentation/ABI/testing/sysfs-block-rssd @@ -0,0 +1,18 @@ +What: /sys/block/rssd*/registers +Date: March 2012 +KernelVersion: 3.3 +Contact: Asai Thambi S P <asamymuthupa@micron.com> +Description: This is a read-only file. Dumps below driver information and + hardware registers. + - S ACTive + - Command Issue + - Allocated + - Completed + - PORT IRQ STAT + - HOST IRQ STAT + +What: /sys/block/rssd*/status +Date: April 2012 +KernelVersion: 3.4 +Contact: Asai Thambi S P <asamymuthupa@micron.com> +Description: This is a read-only file. Indicates the status of the device. diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format new file mode 100644 index 000000000000..079afc71363d --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format @@ -0,0 +1,14 @@ +Where: /sys/bus/event_source/devices/<dev>/format +Date: January 2012 +Kernel Version: 3.3 +Contact: Jiri Olsa <jolsa@redhat.com> +Description: + Attribute group to describe the magic bits that go into + perf_event_attr::config[012] for a particular pmu. + Each attribute of this group defines the 'hardware' bitmask + we want to export, so that userspace can deal with sane + name/value pairs. + + Example: 'config1:1,6-10,44' + Defines contents of attribute that occupies bits 1,6-10,44 of + perf_event_attr::config1. diff --git a/Documentation/ABI/testing/sysfs-bus-hsi b/Documentation/ABI/testing/sysfs-bus-hsi new file mode 100644 index 000000000000..1b1b282a99e1 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-hsi @@ -0,0 +1,19 @@ +What: /sys/bus/hsi +Date: April 2012 +KernelVersion: 3.4 +Contact: Carlos Chinea <carlos.chinea@nokia.com> +Description: + High Speed Synchronous Serial Interface (HSI) is a + serial interface mainly used for connecting application + engines (APE) with cellular modem engines (CMT) in cellular + handsets. + The bus will be populated with devices (hsi_clients) representing + the protocols available in the system. Bus drivers implement + those protocols. + +What: /sys/bus/hsi/devices/.../modalias +Date: April 2012 +KernelVersion: 3.4 +Contact: Carlos Chinea <carlos.chinea@nokia.com> +Description: Stores the same MODALIAS value emitted by uevent + Format: hsi:<hsi_client device name> diff --git a/Documentation/ABI/testing/sysfs-bus-rpmsg b/Documentation/ABI/testing/sysfs-bus-rpmsg new file mode 100644 index 000000000000..189e419a5a2d --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-rpmsg @@ -0,0 +1,75 @@ +What: /sys/bus/rpmsg/devices/.../name +Date: June 2011 +KernelVersion: 3.3 +Contact: Ohad Ben-Cohen <ohad@wizery.com> +Description: + Every rpmsg device is a communication channel with a remote + processor. Channels are identified with a (textual) name, + which is maximum 32 bytes long (defined as RPMSG_NAME_SIZE in + rpmsg.h). + + This sysfs entry contains the name of this channel. + +What: /sys/bus/rpmsg/devices/.../src +Date: June 2011 +KernelVersion: 3.3 +Contact: Ohad Ben-Cohen <ohad@wizery.com> +Description: + Every rpmsg device is a communication channel with a remote + processor. Channels have a local ("source") rpmsg address, + and remote ("destination") rpmsg address. When an entity + starts listening on one end of a channel, it assigns it with + a unique rpmsg address (a 32 bits integer). This way when + inbound messages arrive to this address, the rpmsg core + dispatches them to the listening entity (a kernel driver). + + This sysfs entry contains the src (local) rpmsg address + of this channel. If it contains 0xffffffff, then an address + wasn't assigned (can happen if no driver exists for this + channel). + +What: /sys/bus/rpmsg/devices/.../dst +Date: June 2011 +KernelVersion: 3.3 +Contact: Ohad Ben-Cohen <ohad@wizery.com> +Description: + Every rpmsg device is a communication channel with a remote + processor. Channels have a local ("source") rpmsg address, + and remote ("destination") rpmsg address. When an entity + starts listening on one end of a channel, it assigns it with + a unique rpmsg address (a 32 bits integer). This way when + inbound messages arrive to this address, the rpmsg core + dispatches them to the listening entity. + + This sysfs entry contains the dst (remote) rpmsg address + of this channel. If it contains 0xffffffff, then an address + wasn't assigned (can happen if the kernel driver that + is attached to this channel is exposing a service to the + remote processor. This make it a local rpmsg server, + and it is listening for inbound messages that may be sent + from any remote rpmsg client; it is not bound to a single + remote entity). + +What: /sys/bus/rpmsg/devices/.../announce +Date: June 2011 +KernelVersion: 3.3 +Contact: Ohad Ben-Cohen <ohad@wizery.com> +Description: + Every rpmsg device is a communication channel with a remote + processor. Channels are identified by a textual name (see + /sys/bus/rpmsg/devices/.../name above) and have a local + ("source") rpmsg address, and remote ("destination") rpmsg + address. + + A channel is first created when an entity, whether local + or remote, starts listening on it for messages (and is thus + called an rpmsg server). + + When that happens, a "name service" announcement is sent + to the other processor, in order to let it know about the + creation of the channel (this way remote clients know they + can start sending messages). + + This sysfs entry tells us whether the channel is a local + server channel that is announced (values are either + true or false). diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb index b4f548792e32..7c22a532fdfb 100644 --- a/Documentation/ABI/testing/sysfs-bus-usb +++ b/Documentation/ABI/testing/sysfs-bus-usb @@ -182,3 +182,14 @@ Description: USB2 hardware LPM is enabled for the device. Developer can write y/Y/1 or n/N/0 to the file to enable/disable the feature. + +What: /sys/bus/usb/devices/.../removable +Date: February 2012 +Contact: Matthew Garrett <mjg@redhat.com> +Description: + Some information about whether a given USB device is + physically fixed to the platform can be inferred from a + combination of hub decriptor bits and platform-specific data + such as ACPI. This file will read either "removable" or + "fixed" if the information is available, and "unknown" + otherwise.
\ No newline at end of file diff --git a/Documentation/ABI/testing/sysfs-cfq-target-latency b/Documentation/ABI/testing/sysfs-cfq-target-latency new file mode 100644 index 000000000000..df0f7828c5e3 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-cfq-target-latency @@ -0,0 +1,8 @@ +What: /sys/block/<device>/iosched/target_latency +Date: March 2012 +contact: Tao Ma <boyu.mt@taobao.com> +Description: + The /sys/block/<device>/iosched/target_latency only exists + when the user sets cfq to /sys/block/<device>/scheduler. + It contains an estimated latency time for the cfq. cfq will + use it to calculate the time slice used for every task. diff --git a/Documentation/ABI/testing/sysfs-class b/Documentation/ABI/testing/sysfs-class index 4b0cb891e46e..676530fcf747 100644 --- a/Documentation/ABI/testing/sysfs-class +++ b/Documentation/ABI/testing/sysfs-class @@ -1,6 +1,6 @@ What: /sys/class/ Date: Febuary 2006 -Contact: Greg Kroah-Hartman <gregkh@suse.de> +Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Description: The /sys/class directory will consist of a group of subdirectories describing individual classes of devices diff --git a/Documentation/ABI/testing/sysfs-class-net-mesh b/Documentation/ABI/testing/sysfs-class-net-mesh index b02001488eef..b218e0f8bdb3 100644 --- a/Documentation/ABI/testing/sysfs-class-net-mesh +++ b/Documentation/ABI/testing/sysfs-class-net-mesh @@ -65,6 +65,13 @@ Description: Defines the penalty which will be applied to an originator message's tq-field on every hop. +What: /sys/class/net/<mesh_iface>/mesh/routing_algo +Date: Dec 2011 +Contact: Marek Lindner <lindner_marek@yahoo.de> +Description: + Defines the routing procotol this mesh instance + uses to find the optimal paths through the mesh. + What: /sys/class/net/<mesh_iface>/mesh/vis_mode Date: May 2010 Contact: Marek Lindner <lindner_marek@yahoo.de> diff --git a/Documentation/ABI/testing/sysfs-devices b/Documentation/ABI/testing/sysfs-devices index 6a25671ee5f6..5fcc94358b8d 100644 --- a/Documentation/ABI/testing/sysfs-devices +++ b/Documentation/ABI/testing/sysfs-devices @@ -1,6 +1,6 @@ What: /sys/devices Date: February 2006 -Contact: Greg Kroah-Hartman <gregkh@suse.de> +Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Description: The /sys/devices tree contains a snapshot of the internal state of the kernel device tree. Devices will diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power index 8ffbc25376a0..840f7d64d483 100644 --- a/Documentation/ABI/testing/sysfs-devices-power +++ b/Documentation/ABI/testing/sysfs-devices-power @@ -165,3 +165,21 @@ Description: Not all drivers support this attribute. If it isn't supported, attempts to read or write it will yield I/O errors. + +What: /sys/devices/.../power/pm_qos_latency_us +Date: March 2012 +Contact: Rafael J. Wysocki <rjw@sisk.pl> +Description: + The /sys/devices/.../power/pm_qos_resume_latency_us attribute + contains the PM QoS resume latency limit for the given device, + which is the maximum allowed time it can take to resume the + device, after it has been suspended at run time, from a resume + request to the moment the device will be ready to process I/O, + in microseconds. If it is equal to 0, however, this means that + the PM QoS resume latency may be arbitrary. + + Not all drivers support this attribute. If it isn't supported, + it is not present. + + This attribute has no effect on system-wide suspend/resume and + hibernation. diff --git a/Documentation/ABI/testing/sysfs-devices-soc b/Documentation/ABI/testing/sysfs-devices-soc new file mode 100644 index 000000000000..6d9cc253f2b2 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-soc @@ -0,0 +1,58 @@ +What: /sys/devices/socX +Date: January 2012 +contact: Lee Jones <lee.jones@linaro.org> +Description: + The /sys/devices/ directory contains a sub-directory for each + System-on-Chip (SoC) device on a running platform. Information + regarding each SoC can be obtained by reading sysfs files. This + functionality is only available if implemented by the platform. + + The directory created for each SoC will also house information + about devices which are commonly contained in /sys/devices/platform. + It has been agreed that if an SoC device exists, its supported + devices would be better suited to appear as children of that SoC. + +What: /sys/devices/socX/machine +Date: January 2012 +contact: Lee Jones <lee.jones@linaro.org> +Description: + Read-only attribute common to all SoCs. Contains the SoC machine + name (e.g. Ux500). + +What: /sys/devices/socX/family +Date: January 2012 +contact: Lee Jones <lee.jones@linaro.org> +Description: + Read-only attribute common to all SoCs. Contains SoC family name + (e.g. DB8500). + +What: /sys/devices/socX/soc_id +Date: January 2012 +contact: Lee Jones <lee.jones@linaro.org> +Description: + Read-only attribute supported by most SoCs. In the case of + ST-Ericsson's chips this contains the SoC serial number. + +What: /sys/devices/socX/revision +Date: January 2012 +contact: Lee Jones <lee.jones@linaro.org> +Description: + Read-only attribute supported by most SoCs. Contains the SoC's + manufacturing revision number. + +What: /sys/devices/socX/process +Date: January 2012 +contact: Lee Jones <lee.jones@linaro.org> +Description: + Read-only attribute supported ST-Ericsson's silicon. Contains the + the process by which the silicon chip was manufactured. + +What: /sys/bus/soc +Date: January 2012 +contact: Lee Jones <lee.jones@linaro.org> +Description: + The /sys/bus/soc/ directory contains the usual sub-folders + expected under most buses. /sys/bus/soc/devices is of particular + interest, as it contains a symlink for each SoC device found on + the system. Each symlink points back into the aforementioned + /sys/devices/socX devices. diff --git a/Documentation/ABI/testing/sysfs-driver-samsung-laptop b/Documentation/ABI/testing/sysfs-driver-samsung-laptop index 0a810231aad4..678819a3f8bf 100644 --- a/Documentation/ABI/testing/sysfs-driver-samsung-laptop +++ b/Documentation/ABI/testing/sysfs-driver-samsung-laptop @@ -1,7 +1,7 @@ What: /sys/devices/platform/samsung/performance_level Date: January 1, 2010 KernelVersion: 2.6.33 -Contact: Greg Kroah-Hartman <gregkh@suse.de> +Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Description: Some Samsung laptops have different "performance levels" that are can be modified by a function key, and by this sysfs file. These values don't always make a whole lot @@ -17,3 +17,21 @@ Description: Some Samsung laptops have different "performance levels" Specifically, not all support the "overclock" option, and it's still unknown if this value even changes anything, other than making the user feel a bit better. + +What: /sys/devices/platform/samsung/battery_life_extender +Date: December 1, 2011 +KernelVersion: 3.3 +Contact: Corentin Chary <corentin.chary@gmail.com> +Description: Max battery charge level can be modified, battery cycle + life can be extended by reducing the max battery charge + level. + 0 means normal battery mode (100% charge) + 1 means battery life extender mode (80% charge) + +What: /sys/devices/platform/samsung/usb_charge +Date: December 1, 2011 +KernelVersion: 3.3 +Contact: Corentin Chary <corentin.chary@gmail.com> +Description: Use your USB ports to charge devices, even + when your laptop is powered off. + 1 means enabled, 0 means disabled. diff --git a/Documentation/ABI/testing/sysfs-firmware-acpi b/Documentation/ABI/testing/sysfs-firmware-acpi index 4f9ba3c2fca7..dd930c8db41f 100644 --- a/Documentation/ABI/testing/sysfs-firmware-acpi +++ b/Documentation/ABI/testing/sysfs-firmware-acpi @@ -1,3 +1,23 @@ +What: /sys/firmware/acpi/bgrt/ +Date: January 2012 +Contact: Matthew Garrett <mjg@redhat.com> +Description: + The BGRT is an ACPI 5.0 feature that allows the OS + to obtain a copy of the firmware boot splash and + some associated metadata. This is intended to be used + by boot splash applications in order to interact with + the firmware boot splash in order to avoid jarring + transitions. + + image: The image bitmap. Currently a 32-bit BMP. + status: 1 if the image is valid, 0 if firmware invalidated it. + type: 0 indicates image is in BMP format. + version: The version of the BGRT. Currently 1. + xoffset: The number of pixels between the left of the screen + and the left edge of the image. + yoffset: The number of pixels between the top of the screen + and the top edge of the image. + What: /sys/firmware/acpi/interrupts/ Date: February 2008 Contact: Len Brown <lenb@kernel.org> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-cleancache b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache deleted file mode 100644 index 662ae646ea12..000000000000 --- a/Documentation/ABI/testing/sysfs-kernel-mm-cleancache +++ /dev/null @@ -1,11 +0,0 @@ -What: /sys/kernel/mm/cleancache/ -Date: April 2011 -Contact: Dan Magenheimer <dan.magenheimer@oracle.com> -Description: - /sys/kernel/mm/cleancache/ contains a number of files which - record a count of various cleancache operations - (sum across all filesystems): - succ_gets - failed_gets - puts - flushes diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index 2b90d328b3ba..c58b236bbe04 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle @@ -793,6 +793,35 @@ own custom mode, or may have some other magic method for making indentation work correctly. + Chapter 19: Inline assembly + +In architecture-specific code, you may need to use inline assembly to interface +with CPU or platform functionality. Don't hesitate to do so when necessary. +However, don't use inline assembly gratuitously when C can do the job. You can +and should poke hardware from C when possible. + +Consider writing simple helper functions that wrap common bits of inline +assembly, rather than repeatedly writing them with slight variations. Remember +that inline assembly can use C parameters. + +Large, non-trivial assembly functions should go in .S files, with corresponding +C prototypes defined in C header files. The C prototypes for assembly +functions should use "asmlinkage". + +You may need to mark your asm statement as volatile, to prevent GCC from +removing it if GCC doesn't notice any side effects. You don't always need to +do so, though, and doing so unnecessarily can limit optimization. + +When writing a single inline assembly statement containing multiple +instructions, put each instruction on a separate line in a separate quoted +string, and end each string except the last with \n\t to properly indent the +next instruction in the assembly output: + + asm ("magic %reg1, #42\n\t" + "more_magic %reg2, %reg3" + : /* outputs */ : /* inputs */ : /* clobbers */); + + Appendix I: References diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt index b768cc0e402b..5c72eed89563 100644 --- a/Documentation/DMA-attributes.txt +++ b/Documentation/DMA-attributes.txt @@ -31,3 +31,21 @@ may be weakly ordered, that is that reads and writes may pass each other. Since it is optional for platforms to implement DMA_ATTR_WEAK_ORDERING, those that do not will simply ignore the attribute and exhibit default behavior. + +DMA_ATTR_WRITE_COMBINE +---------------------- + +DMA_ATTR_WRITE_COMBINE specifies that writes to the mapping may be +buffered to improve performance. + +Since it is optional for platforms to implement DMA_ATTR_WRITE_COMBINE, +those that do not will simply ignore the attribute and exhibit default +behavior. + +DMA_ATTR_NON_CONSISTENT +----------------------- + +DMA_ATTR_NON_CONSISTENT lets the platform to choose to return either +consistent or non-consistent memory as it sees fit. By using this API, +you are guaranteeing to the platform that you have all the correct and +necessary sync points for this memory in the driver. diff --git a/Documentation/DocBook/80211.tmpl b/Documentation/DocBook/80211.tmpl index 2014155c899d..c5ac6929c41c 100644 --- a/Documentation/DocBook/80211.tmpl +++ b/Documentation/DocBook/80211.tmpl @@ -129,7 +129,6 @@ !Finclude/net/cfg80211.h cfg80211_pmksa !Finclude/net/cfg80211.h cfg80211_send_rx_auth !Finclude/net/cfg80211.h cfg80211_send_auth_timeout -!Finclude/net/cfg80211.h __cfg80211_auth_canceled !Finclude/net/cfg80211.h cfg80211_send_rx_assoc !Finclude/net/cfg80211.h cfg80211_send_assoc_timeout !Finclude/net/cfg80211.h cfg80211_send_deauth diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index 9c27e5125dd2..7514dbf0a679 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -446,4 +446,21 @@ X!Idrivers/video/console/fonts.c !Edrivers/i2c/i2c-core.c </chapter> + <chapter id="hsi"> + <title>High Speed Synchronous Serial Interface (HSI)</title> + + <para> + High Speed Synchronous Serial Interface (HSI) is a + serial interface mainly used for connecting application + engines (APE) with cellular modem engines (CMT) in cellular + handsets. + + HSI provides multiplexing for up to 16 logical channels, + low-latency and full duplex communication. + </para> + +!Iinclude/linux/hsi/hsi.h +!Edrivers/hsi/hsi.c + </chapter> + </book> diff --git a/Documentation/DocBook/filesystems.tmpl b/Documentation/DocBook/filesystems.tmpl index f51f28531b8d..3fca32c41927 100644 --- a/Documentation/DocBook/filesystems.tmpl +++ b/Documentation/DocBook/filesystems.tmpl @@ -387,7 +387,7 @@ an example. <title>See also</title> <para> <citation> - <ulink url="ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/journal-design.ps.gz"> + <ulink url="http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz"> Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen Tweedie </ulink> </citation> diff --git a/Documentation/DocBook/kgdb.tmpl b/Documentation/DocBook/kgdb.tmpl index d71b57fcf116..4ee4ba3509fc 100644 --- a/Documentation/DocBook/kgdb.tmpl +++ b/Documentation/DocBook/kgdb.tmpl @@ -362,6 +362,23 @@ </para> </para> </sect1> + <sect1 id="kgdbreboot"> + <title>Run time parameter: kgdbreboot</title> + <para> The kgdbreboot feature allows you to change how the debugger + deals with the reboot notification. You have 3 choices for the + behavior. The default behavior is always set to 0.</para> + <orderedlist> + <listitem><para>echo -1 > /sys/module/debug_core/parameters/kgdbreboot</para> + <para>Ignore the reboot notification entirely.</para> + </listitem> + <listitem><para>echo 0 > /sys/module/debug_core/parameters/kgdbreboot</para> + <para>Send the detach message to any attached debugger client.</para> + </listitem> + <listitem><para>echo 1 > /sys/module/debug_core/parameters/kgdbreboot</para> + <para>Enter the debugger on reboot notify.</para> + </listitem> + </orderedlist> + </sect1> </chapter> <chapter id="usingKDB"> <title>Using kdb</title> diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl index cdd1bb9aac0d..31df1aa00710 100644 --- a/Documentation/DocBook/libata.tmpl +++ b/Documentation/DocBook/libata.tmpl @@ -22,8 +22,8 @@ <para> The contents of this file are subject to the Open Software License version 1.1 that can be found at - <ulink url="http://www.opensource.org/licenses/osl-1.1.txt">http://www.opensource.org/licenses/osl-1.1.txt</ulink> and is included herein - by reference. + <ulink url="http://fedoraproject.org/wiki/Licensing:OSL1.1">http://fedoraproject.org/wiki/Licensing:OSL1.1</ulink> + and is included herein by reference. </para> <para> @@ -945,7 +945,7 @@ and other resources, etc. <listitem> <para> - !BSY && ERR after CDB tranfer starts but before the + !BSY && ERR after CDB transfer starts but before the last byte of CDB is transferred. ATA/ATAPI standard states that "The device shall not terminate the PACKET command with an error before the last byte of the command packet has @@ -1050,7 +1050,7 @@ and other resources, etc. to complete a command. Combined with the fact that MWDMA and PIO transfer errors aren't allowed to use ICRC bit up to ATA/ATAPI-7, it seems to imply that ABRT bit alone could - indicate tranfer errors. + indicate transfer errors. </para> <para> However, ATA/ATAPI-8 draft revision 1f removes the part diff --git a/Documentation/DocBook/media/v4l/compat.xml b/Documentation/DocBook/media/v4l/compat.xml index dd958b5a34e6..bce97c50391b 100644 --- a/Documentation/DocBook/media/v4l/compat.xml +++ b/Documentation/DocBook/media/v4l/compat.xml @@ -444,7 +444,7 @@ linkend="pixfmt-rgb"><constant>V4L2_PIX_FMT_BGR24</constant></link></para></entr <entry><para><link linkend="pixfmt-rgb"><constant>V4L2_PIX_FMT_BGR32</constant></link><footnote> <para>Presumably all V4L RGB formats are -little-endian, although some drivers might interpret them according to machine endianess. V4L2 defines little-endian, big-endian and red/blue +little-endian, although some drivers might interpret them according to machine endianness. V4L2 defines little-endian, big-endian and red/blue swapped variants. For details see <xref linkend="pixfmt-rgb" />.</para> </footnote></para></entry> </row> @@ -823,7 +823,7 @@ standard); 35468950 Hz PAL and SECAM (625-line standards)</entry> <row> <entry>sample_format</entry> <entry>V4L2_PIX_FMT_GREY. The last four bytes (a -machine endianess integer) contain a frame counter.</entry> +machine endianness integer) contain a frame counter.</entry> </row> <row> <entry>start[]</entry> diff --git a/Documentation/EDID/1024x768.S b/Documentation/EDID/1024x768.S new file mode 100644 index 000000000000..4b486fe31b32 --- /dev/null +++ b/Documentation/EDID/1024x768.S @@ -0,0 +1,44 @@ +/* + 1024x768.S: EDID data set for standard 1024x768 60 Hz monitor + + Copyright (C) 2011 Carsten Emde <C.Emde@osadl.org> + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. +*/ + +/* EDID */ +#define VERSION 1 +#define REVISION 3 + +/* Display */ +#define CLOCK 65000 /* kHz */ +#define XPIX 1024 +#define YPIX 768 +#define XY_RATIO XY_RATIO_4_3 +#define XBLANK 320 +#define YBLANK 38 +#define XOFFSET 8 +#define XPULSE 144 +#define YOFFSET (63+3) +#define YPULSE (63+6) +#define DPI 72 +#define VFREQ 60 /* Hz */ +#define TIMING_NAME "Linux XGA" +#define ESTABLISHED_TIMINGS_BITS 0x08 /* Bit 3 -> 1024x768 @60 Hz */ +#define HSYNC_POL 0 +#define VSYNC_POL 0 +#define CRC 0x55 + +#include "edid.S" diff --git a/Documentation/EDID/1280x1024.S b/Documentation/EDID/1280x1024.S new file mode 100644 index 000000000000..a2799fe33a4d --- /dev/null +++ b/Documentation/EDID/1280x1024.S @@ -0,0 +1,44 @@ +/* + 1280x1024.S: EDID data set for standard 1280x1024 60 Hz monitor + + Copyright (C) 2011 Carsten Emde <C.Emde@osadl.org> + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. +*/ + +/* EDID */ +#define VERSION 1 +#define REVISION 3 + +/* Display */ +#define CLOCK 108000 /* kHz */ +#define XPIX 1280 +#define YPIX 1024 +#define XY_RATIO XY_RATIO_5_4 +#define XBLANK 408 +#define YBLANK 42 +#define XOFFSET 48 +#define XPULSE 112 +#define YOFFSET (63+1) +#define YPULSE (63+3) +#define DPI 72 +#define VFREQ 60 /* Hz */ +#define TIMING_NAME "Linux SXGA" +#define ESTABLISHED_TIMINGS_BITS 0x00 /* none */ +#define HSYNC_POL 1 +#define VSYNC_POL 1 +#define CRC 0xa0 + +#include "edid.S" diff --git a/Documentation/EDID/1680x1050.S b/Documentation/EDID/1680x1050.S new file mode 100644 index 000000000000..96f67cafcf2e --- /dev/null +++ b/Documentation/EDID/1680x1050.S @@ -0,0 +1,44 @@ +/* + 1680x1050.S: EDID data set for standard 1680x1050 60 Hz monitor + + Copyright (C) 2012 Carsten Emde <C.Emde@osadl.org> + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. +*/ + +/* EDID */ +#define VERSION 1 +#define REVISION 3 + +/* Display */ +#define CLOCK 146250 /* kHz */ +#define XPIX 1680 +#define YPIX 1050 +#define XY_RATIO XY_RATIO_16_10 +#define XBLANK 560 +#define YBLANK 39 +#define XOFFSET 104 +#define XPULSE 176 +#define YOFFSET (63+3) +#define YPULSE (63+6) +#define DPI 96 +#define VFREQ 60 /* Hz */ +#define TIMING_NAME "Linux WSXGA" +#define ESTABLISHED_TIMINGS_BITS 0x00 /* none */ +#define HSYNC_POL 1 +#define VSYNC_POL 1 +#define CRC 0x26 + +#include "edid.S" diff --git a/Documentation/EDID/1920x1080.S b/Documentation/EDID/1920x1080.S new file mode 100644 index 000000000000..36ed5d571d0a --- /dev/null +++ b/Documentation/EDID/1920x1080.S @@ -0,0 +1,44 @@ +/* + 1920x1080.S: EDID data set for standard 1920x1080 60 Hz monitor + + Copyright (C) 2012 Carsten Emde <C.Emde@osadl.org> + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. +*/ + +/* EDID */ +#define VERSION 1 +#define REVISION 3 + +/* Display */ +#define CLOCK 148500 /* kHz */ +#define XPIX 1920 +#define YPIX 1080 +#define XY_RATIO XY_RATIO_16_9 +#define XBLANK 280 +#define YBLANK 45 +#define XOFFSET 88 +#define XPULSE 44 +#define YOFFSET (63+4) +#define YPULSE (63+5) +#define DPI 96 +#define VFREQ 60 /* Hz */ +#define TIMING_NAME "Linux FHD" +#define ESTABLISHED_TIMINGS_BITS 0x00 /* none */ +#define HSYNC_POL 1 +#define VSYNC_POL 1 +#define CRC 0x05 + +#include "edid.S" diff --git a/Documentation/EDID/HOWTO.txt b/Documentation/EDID/HOWTO.txt new file mode 100644 index 000000000000..75a9f2a0c43d --- /dev/null +++ b/Documentation/EDID/HOWTO.txt @@ -0,0 +1,39 @@ +In the good old days when graphics parameters were configured explicitly +in a file called xorg.conf, even broken hardware could be managed. + +Today, with the advent of Kernel Mode Setting, a graphics board is +either correctly working because all components follow the standards - +or the computer is unusable, because the screen remains dark after +booting or it displays the wrong area. Cases when this happens are: +- The graphics board does not recognize the monitor. +- The graphics board is unable to detect any EDID data. +- The graphics board incorrectly forwards EDID data to the driver. +- The monitor sends no or bogus EDID data. +- A KVM sends its own EDID data instead of querying the connected monitor. +Adding the kernel parameter "nomodeset" helps in most cases, but causes +restrictions later on. + +As a remedy for such situations, the kernel configuration item +CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an +individually prepared or corrected EDID data set in the /lib/firmware +directory from where it is loaded via the firmware interface. The code +(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for +commonly used screen resolutions (1024x768, 1280x1024, 1680x1050, +1920x1080) as binary blobs, but the kernel source tree does not contain +code to create these data. In order to elucidate the origin of the +built-in binary EDID blobs and to facilitate the creation of individual +data for a specific misbehaving monitor, commented sources and a +Makefile environment are given here. + +To create binary EDID and C source code files from the existing data +material, simply type "make". + +If you want to create your own EDID file, copy the file 1024x768.S and +replace the settings with your own data. The CRC value in the last line + #define CRC 0x55 +is a bit tricky. After a first version of the binary data set is +created, it must be be checked with the "edid-decode" utility which will +most probably complain about a wrong CRC. Fortunately, the utility also +displays the correct CRC which must then be inserted into the source +file. After the make procedure is repeated, the EDID data set is ready +to be used. diff --git a/Documentation/EDID/Makefile b/Documentation/EDID/Makefile new file mode 100644 index 000000000000..17763ca3f12b --- /dev/null +++ b/Documentation/EDID/Makefile @@ -0,0 +1,26 @@ + +SOURCES := $(wildcard [0-9]*x[0-9]*.S) + +BIN := $(patsubst %.S, %.bin, $(SOURCES)) + +IHEX := $(patsubst %.S, %.bin.ihex, $(SOURCES)) + +CODE := $(patsubst %.S, %.c, $(SOURCES)) + +all: $(BIN) $(IHEX) $(CODE) + +clean: + @rm -f *.o *.bin.ihex *.bin *.c + +%.o: %.S + @cc -c $^ + +%.bin: %.o + @objcopy -Obinary $^ $@ + +%.bin.ihex: %.o + @objcopy -Oihex $^ $@ + @dos2unix $@ 2>/dev/null + +%.c: %.bin + @echo "{" >$@; hexdump -f hex $^ >>$@; echo "};" >>$@ diff --git a/Documentation/EDID/edid.S b/Documentation/EDID/edid.S new file mode 100644 index 000000000000..ea97ae275fca --- /dev/null +++ b/Documentation/EDID/edid.S @@ -0,0 +1,261 @@ +/* + edid.S: EDID data template + + Copyright (C) 2012 Carsten Emde <C.Emde@osadl.org> + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. +*/ + + +/* Manufacturer */ +#define MFG_LNX1 'L' +#define MFG_LNX2 'N' +#define MFG_LNX3 'X' +#define SERIAL 0 +#define YEAR 2012 +#define WEEK 5 + +/* EDID 1.3 standard definitions */ +#define XY_RATIO_16_10 0b00 +#define XY_RATIO_4_3 0b01 +#define XY_RATIO_5_4 0b10 +#define XY_RATIO_16_9 0b11 + +#define mfgname2id(v1,v2,v3) \ + ((((v1-'@')&0x1f)<<10)+(((v2-'@')&0x1f)<<5)+((v3-'@')&0x1f)) +#define swap16(v1) ((v1>>8)+((v1&0xff)<<8)) +#define msbs2(v1,v2) ((((v1>>8)&0x0f)<<4)+((v2>>8)&0x0f)) +#define msbs4(v1,v2,v3,v4) \ + (((v1&0x03)>>2)+((v2&0x03)>>4)+((v3&0x03)>>6)+((v4&0x03)>>8)) +#define pixdpi2mm(pix,dpi) ((pix*25)/dpi) +#define xsize pixdpi2mm(XPIX,DPI) +#define ysize pixdpi2mm(YPIX,DPI) + + .data + +/* Fixed header pattern */ +header: .byte 0x00,0xff,0xff,0xff,0xff,0xff,0xff,0x00 + +mfg_id: .word swap16(mfgname2id(MFG_LNX1, MFG_LNX2, MFG_LNX3)) + +prod_code: .word 0 + +/* Serial number. 32 bits, little endian. */ +serial_number: .long SERIAL + +/* Week of manufacture */ +week: .byte WEEK + +/* Year of manufacture, less 1990. (1990-2245) + If week=255, it is the model year instead */ +year: .byte YEAR-1990 + +version: .byte VERSION /* EDID version, usually 1 (for 1.3) */ +revision: .byte REVISION /* EDID revision, usually 3 (for 1.3) */ + +/* If Bit 7=1 Digital input. If set, the following bit definitions apply: + Bits 6-1 Reserved, must be 0 + Bit 0 Signal is compatible with VESA DFP 1.x TMDS CRGB, + 1 pixel per clock, up to 8 bits per color, MSB aligned, + If Bit 7=0 Analog input. If clear, the following bit definitions apply: + Bits 6-5 Video white and sync levels, relative to blank + 00=+0.7/-0.3 V; 01=+0.714/-0.286 V; + 10=+1.0/-0.4 V; 11=+0.7/0 V + Bit 4 Blank-to-black setup (pedestal) expected + Bit 3 Separate sync supported + Bit 2 Composite sync (on HSync) supported + Bit 1 Sync on green supported + Bit 0 VSync pulse must be serrated when somposite or + sync-on-green is used. */ +video_parms: .byte 0x6d + +/* Maximum horizontal image size, in centimetres + (max 292 cm/115 in at 16:9 aspect ratio) */ +max_hor_size: .byte xsize/10 + +/* Maximum vertical image size, in centimetres. + If either byte is 0, undefined (e.g. projector) */ +max_vert_size: .byte ysize/10 + +/* Display gamma, minus 1, times 100 (range 1.00-3.5 */ +gamma: .byte 120 + +/* Bit 7 DPMS standby supported + Bit 6 DPMS suspend supported + Bit 5 DPMS active-off supported + Bits 4-3 Display type: 00=monochrome; 01=RGB colour; + 10=non-RGB multicolour; 11=undefined + Bit 2 Standard sRGB colour space. Bytes 25-34 must contain + sRGB standard values. + Bit 1 Preferred timing mode specified in descriptor block 1. + Bit 0 GTF supported with default parameter values. */ +dsp_features: .byte 0xea + +/* Chromaticity coordinates. */ +/* Red and green least-significant bits + Bits 7-6 Red x value least-significant 2 bits + Bits 5-4 Red y value least-significant 2 bits + Bits 3-2 Green x value lst-significant 2 bits + Bits 1-0 Green y value least-significant 2 bits */ +red_green_lsb: .byte 0x5e + +/* Blue and white least-significant 2 bits */ +blue_white_lsb: .byte 0xc0 + +/* Red x value most significant 8 bits. + 0-255 encodes 0-0.996 (255/256); 0-0.999 (1023/1024) with lsbits */ +red_x_msb: .byte 0xa4 + +/* Red y value most significant 8 bits */ +red_y_msb: .byte 0x59 + +/* Green x and y value most significant 8 bits */ +green_x_y_msb: .byte 0x4a,0x98 + +/* Blue x and y value most significant 8 bits */ +blue_x_y_msb: .byte 0x25,0x20 + +/* Default white point x and y value most significant 8 bits */ +white_x_y_msb: .byte 0x50,0x54 + +/* Established timings */ +/* Bit 7 720x400 @ 70 Hz + Bit 6 720x400 @ 88 Hz + Bit 5 640x480 @ 60 Hz + Bit 4 640x480 @ 67 Hz + Bit 3 640x480 @ 72 Hz + Bit 2 640x480 @ 75 Hz + Bit 1 800x600 @ 56 Hz + Bit 0 800x600 @ 60 Hz */ +estbl_timing1: .byte 0x00 + +/* Bit 7 800x600 @ 72 Hz + Bit 6 800x600 @ 75 Hz + Bit 5 832x624 @ 75 Hz + Bit 4 1024x768 @ 87 Hz, interlaced (1024x768) + Bit 3 1024x768 @ 60 Hz + Bit 2 1024x768 @ 72 Hz + Bit 1 1024x768 @ 75 Hz + Bit 0 1280x1024 @ 75 Hz */ +estbl_timing2: .byte ESTABLISHED_TIMINGS_BITS + +/* Bit 7 1152x870 @ 75 Hz (Apple Macintosh II) + Bits 6-0 Other manufacturer-specific display mod */ +estbl_timing3: .byte 0x00 + +/* Standard timing */ +/* X resolution, less 31, divided by 8 (256-2288 pixels) */ +std_xres: .byte (XPIX/8)-31 +/* Y resolution, X:Y pixel ratio + Bits 7-6 X:Y pixel ratio: 00=16:10; 01=4:3; 10=5:4; 11=16:9. + Bits 5-0 Vertical frequency, less 60 (60-123 Hz) */ +std_vres: .byte (XY_RATIO<<6)+VFREQ-60 + .fill 7,2,0x0101 /* Unused */ + +descriptor1: +/* Pixel clock in 10 kHz units. (0.-655.35 MHz, little-endian) */ +clock: .word CLOCK/10 + +/* Horizontal active pixels 8 lsbits (0-4095) */ +x_act_lsb: .byte XPIX&0xff +/* Horizontal blanking pixels 8 lsbits (0-4095) + End of active to start of next active. */ +x_blk_lsb: .byte XBLANK&0xff +/* Bits 7-4 Horizontal active pixels 4 msbits + Bits 3-0 Horizontal blanking pixels 4 msbits */ +x_msbs: .byte msbs2(XPIX,XBLANK) + +/* Vertical active lines 8 lsbits (0-4095) */ +y_act_lsb: .byte YPIX&0xff +/* Vertical blanking lines 8 lsbits (0-4095) */ +y_blk_lsb: .byte YBLANK&0xff +/* Bits 7-4 Vertical active lines 4 msbits + Bits 3-0 Vertical blanking lines 4 msbits */ +y_msbs: .byte msbs2(YPIX,YBLANK) + +/* Horizontal sync offset pixels 8 lsbits (0-1023) From blanking start */ +x_snc_off_lsb: .byte XOFFSET&0xff +/* Horizontal sync pulse width pixels 8 lsbits (0-1023) */ +x_snc_pls_lsb: .byte XPULSE&0xff +/* Bits 7-4 Vertical sync offset lines 4 lsbits -63) + Bits 3-0 Vertical sync pulse width lines 4 lsbits -63) */ +y_snc_lsb: .byte ((YOFFSET-63)<<4)+(YPULSE-63) +/* Bits 7-6 Horizontal sync offset pixels 2 msbits + Bits 5-4 Horizontal sync pulse width pixels 2 msbits + Bits 3-2 Vertical sync offset lines 2 msbits + Bits 1-0 Vertical sync pulse width lines 2 msbits */ +xy_snc_msbs: .byte msbs4(XOFFSET,XPULSE,YOFFSET,YPULSE) + +/* Horizontal display size, mm, 8 lsbits (0-4095 mm, 161 in) */ +x_dsp_size: .byte xsize&0xff + +/* Vertical display size, mm, 8 lsbits (0-4095 mm, 161 in) */ +y_dsp_size: .byte ysize&0xff + +/* Bits 7-4 Horizontal display size, mm, 4 msbits + Bits 3-0 Vertical display size, mm, 4 msbits */ +dsp_size_mbsb: .byte msbs2(xsize,ysize) + +/* Horizontal border pixels (each side; total is twice this) */ +x_border: .byte 0 +/* Vertical border lines (each side; total is twice this) */ +y_border: .byte 0 + +/* Bit 7 Interlaced + Bits 6-5 Stereo mode: 00=No stereo; other values depend on bit 0: + Bit 0=0: 01=Field sequential, sync=1 during right; 10=similar, + sync=1 during left; 11=4-way interleaved stereo + Bit 0=1 2-way interleaved stereo: 01=Right image on even lines; + 10=Left image on even lines; 11=side-by-side + Bits 4-3 Sync type: 00=Analog composite; 01=Bipolar analog composite; + 10=Digital composite (on HSync); 11=Digital separate + Bit 2 If digital separate: Vertical sync polarity (1=positive) + Other types: VSync serrated (HSync during VSync) + Bit 1 If analog sync: Sync on all 3 RGB lines (else green only) + Digital: HSync polarity (1=positive) + Bit 0 2-way line-interleaved stereo, if bits 4-3 are not 00. */ +features: .byte 0x18+(VSYNC_POL<<2)+(HSYNC_POL<<1) + +descriptor2: .byte 0,0 /* Not a detailed timing descriptor */ + .byte 0 /* Must be zero */ + .byte 0xff /* Descriptor is monitor serial number (text) */ + .byte 0 /* Must be zero */ +start1: .ascii "Linux #0" +end1: .byte 0x0a /* End marker */ + .fill 12-(end1-start1), 1, 0x20 /* Padded spaces */ +descriptor3: .byte 0,0 /* Not a detailed timing descriptor */ + .byte 0 /* Must be zero */ + .byte 0xfd /* Descriptor is monitor range limits */ + .byte 0 /* Must be zero */ +start2: .byte VFREQ-1 /* Minimum vertical field rate (1-255 Hz) */ + .byte VFREQ+1 /* Maximum vertical field rate (1-255 Hz) */ + .byte (CLOCK/(XPIX+XBLANK))-1 /* Minimum horizontal line rate + (1-255 kHz) */ + .byte (CLOCK/(XPIX+XBLANK))+1 /* Maximum horizontal line rate + (1-255 kHz) */ + .byte (CLOCK/10000)+1 /* Maximum pixel clock rate, rounded up + to 10 MHz multiple (10-2550 MHz) */ + .byte 0 /* No extended timing information type */ +end2: .byte 0x0a /* End marker */ + .fill 12-(end2-start2), 1, 0x20 /* Padded spaces */ +descriptor4: .byte 0,0 /* Not a detailed timing descriptor */ + .byte 0 /* Must be zero */ + .byte 0xfc /* Descriptor is text */ + .byte 0 /* Must be zero */ +start3: .ascii TIMING_NAME +end3: .byte 0x0a /* End marker */ + .fill 12-(end3-start3), 1, 0x20 /* Padded spaces */ +extensions: .byte 0 /* Number of extensions to follow */ +checksum: .byte CRC /* Sum of all bytes must be 0 */ diff --git a/Documentation/EDID/hex b/Documentation/EDID/hex new file mode 100644 index 000000000000..8873ebb618af --- /dev/null +++ b/Documentation/EDID/hex @@ -0,0 +1 @@ +"\t" 8/1 "0x%02x, " "\n" diff --git a/Documentation/IRQ-domain.txt b/Documentation/IRQ-domain.txt new file mode 100644 index 000000000000..27dcaabfb4db --- /dev/null +++ b/Documentation/IRQ-domain.txt @@ -0,0 +1,117 @@ +irq_domain interrupt number mapping library + +The current design of the Linux kernel uses a single large number +space where each separate IRQ source is assigned a different number. +This is simple when there is only one interrupt controller, but in +systems with multiple interrupt controllers the kernel must ensure +that each one gets assigned non-overlapping allocations of Linux +IRQ numbers. + +The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of +irq numbers, but they don't provide any support for reverse mapping of +the controller-local IRQ (hwirq) number into the Linux IRQ number +space. + +The irq_domain library adds mapping between hwirq and IRQ numbers on +top of the irq_alloc_desc*() API. An irq_domain to manage mapping is +preferred over interrupt controller drivers open coding their own +reverse mapping scheme. + +irq_domain also implements translation from Device Tree interrupt +specifiers to hwirq numbers, and can be easily extended to support +other IRQ topology data sources. + +=== irq_domain usage === +An interrupt controller driver creates and registers an irq_domain by +calling one of the irq_domain_add_*() functions (each mapping method +has a different allocator function, more on that later). The function +will return a pointer to the irq_domain on success. The caller must +provide the allocator function with an irq_domain_ops structure with +the .map callback populated as a minimum. + +In most cases, the irq_domain will begin empty without any mappings +between hwirq and IRQ numbers. Mappings are added to the irq_domain +by calling irq_create_mapping() which accepts the irq_domain and a +hwirq number as arguments. If a mapping for the hwirq doesn't already +exist then it will allocate a new Linux irq_desc, associate it with +the hwirq, and call the .map() callback so the driver can perform any +required hardware setup. + +When an interrupt is received, irq_find_mapping() function should +be used to find the Linux IRQ number from the hwirq number. + +If the driver has the Linux IRQ number or the irq_data pointer, and +needs to know the associated hwirq number (such as in the irq_chip +callbacks) then it can be directly obtained from irq_data->hwirq. + +=== Types of irq_domain mappings === +There are several mechanisms available for reverse mapping from hwirq +to Linux irq, and each mechanism uses a different allocation function. +Which reverse map type should be used depends on the use case. Each +of the reverse map types are described below: + +==== Linear ==== +irq_domain_add_linear() + +The linear reverse map maintains a fixed size table indexed by the +hwirq number. When a hwirq is mapped, an irq_desc is allocated for +the hwirq, and the IRQ number is stored in the table. + +The Linear map is a good choice when the maximum number of hwirqs is +fixed and a relatively small number (~ < 256). The advantages of this +map are fixed time lookup for IRQ numbers, and irq_descs are only +allocated for in-use IRQs. The disadvantage is that the table must be +as large as the largest possible hwirq number. + +The majority of drivers should use the linear map. + +==== Tree ==== +irq_domain_add_tree() + +The irq_domain maintains a radix tree map from hwirq numbers to Linux +IRQs. When an hwirq is mapped, an irq_desc is allocated and the +hwirq is used as the lookup key for the radix tree. + +The tree map is a good choice if the hwirq number can be very large +since it doesn't need to allocate a table as large as the largest +hwirq number. The disadvantage is that hwirq to IRQ number lookup is +dependent on how many entries are in the table. + +Very few drivers should need this mapping. At the moment, powerpc +iseries is the only user. + +==== No Map ===- +irq_domain_add_nomap() + +The No Map mapping is to be used when the hwirq number is +programmable in the hardware. In this case it is best to program the +Linux IRQ number into the hardware itself so that no mapping is +required. Calling irq_create_direct_mapping() will allocate a Linux +IRQ number and call the .map() callback so that driver can program the +Linux IRQ number into the hardware. + +Most drivers cannot use this mapping. + +==== Legacy ==== +irq_domain_add_legacy() +irq_domain_add_legacy_isa() + +The Legacy mapping is a special case for drivers that already have a +range of irq_descs allocated for the hwirqs. It is used when the +driver cannot be immediately converted to use the linear mapping. For +example, many embedded system board support files use a set of #defines +for IRQ numbers that are passed to struct device registrations. In that +case the Linux IRQ numbers cannot be dynamically assigned and the legacy +mapping should be used. + +The legacy map assumes a contiguous range of IRQ numbers has already +been allocated for the controller and that the IRQ number can be +calculated by adding a fixed offset to the hwirq number, and +visa-versa. The disadvantage is that it requires the interrupt +controller to manage IRQ allocations and it requires an irq_desc to be +allocated for every hwirq, even if it is unused. + +The legacy map should only be used if fixed IRQ mappings must be +supported. For example, ISA controllers would use the legacy map for +mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ +numbers. diff --git a/Documentation/Makefile b/Documentation/Makefile index 9b4bc5c76f33..30b656ece7aa 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -1,3 +1,3 @@ obj-m := DocBook/ accounting/ auxdisplay/ connector/ \ filesystems/ filesystems/configfs/ ia64/ laptops/ networking/ \ - pcmcia/ spi/ timers/ vm/ watchdog/src/ + pcmcia/ spi/ timers/ watchdog/src/ diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt index c43460dade0f..7c1dfb19fc40 100644 --- a/Documentation/RCU/RTFP.txt +++ b/Documentation/RCU/RTFP.txt @@ -1,9 +1,10 @@ -Read the F-ing Papers! +Read the Fscking Papers! This document describes RCU-related publications, and is followed by the corresponding bibtex entries. A number of the publications may -be found at http://www.rdrop.com/users/paulmck/RCU/. +be found at http://www.rdrop.com/users/paulmck/RCU/. For others, browsers +and search engines will usually find what you are looking for. The first thing resembling RCU was published in 1980, when Kung and Lehman [Kung80] recommended use of a garbage collector to defer destruction @@ -160,7 +161,26 @@ which Mathieu Desnoyers is now maintaining [MathieuDesnoyers2009URCU] [MathieuDesnoyersPhD]. TINY_RCU [PaulEMcKenney2009BloatWatchRCU] made its appearance, as did expedited RCU [PaulEMcKenney2009expeditedRCU]. The problem of resizeable RCU-protected hash tables may now be on a path -to a solution [JoshTriplett2009RPHash]. +to a solution [JoshTriplett2009RPHash]. A few academic researchers are now +using RCU to solve their parallel problems [HariKannan2009DynamicAnalysisRCU]. + +2010 produced a simpler preemptible-RCU implementation +based on TREE_RCU [PaulEMcKenney2010SimpleOptRCU], lockdep-RCU +[PaulEMcKenney2010LockdepRCU], another resizeable RCU-protected hash +table [HerbertXu2010RCUResizeHash] (this one consuming more memory, +but allowing arbitrary changes in hash function, as required for DoS +avoidance in the networking code), realization of the 2009 RCU-protected +hash table with atomic node move [JoshTriplett2010RPHash], an update on +the RCU API [PaulEMcKenney2010RCUAPI]. + +2011 marked the inclusion of Nick Piggin's fully lockless dentry search +[LinusTorvalds2011Linux2:6:38:rc1:NPigginVFS], an RCU-protected red-black +tree using software transactional memory to protect concurrent updates +(strange, but true!) [PhilHoward2011RCUTMRBTree], yet another variant of +RCU-protected resizeable hash tables [Triplett:2011:RPHash], the 3.0 RCU +trainwreck [PaulEMcKenney2011RCU3.0trainwreck], and Neil Brown's "Meet the +Lockers" LWN article [NeilBrown2011MeetTheLockers]. + Bibtex Entries @@ -173,6 +193,14 @@ Bibtex Entries ,volume="5" ,number="3" ,pages="354-382" +,note="Available: +\url{http://portal.acm.org/citation.cfm?id=320619&dl=GUIDE,} +[Viewed December 3, 2007]" +,annotation={ + Use garbage collector to clean up data after everyone is done with it. + . + Oldest use of something vaguely resembling RCU that I have found. +} } @techreport{Manber82 @@ -184,6 +212,31 @@ Bibtex Entries ,number="82-01-01" ,month="January" ,pages="28" +,annotation={ + . + Superseded by Manber84. + . + Describes concurrent AVL tree implementation. Uses a + garbage-collection mechanism to handle concurrent use and deletion + of nodes in the tree, but lacks the summary-of-execution-history + concept of read-copy locking. + . + Keeps full list of processes that were active when a given + node was to be deleted, and waits until all such processes have + -terminated- before allowing this node to be reused. This is + not described in great detail -- one could imagine using process + IDs for this if the ID space was large enough that overlapping + never occurred. + . + This restriction makes this algorithm unsuitable for use in + systems comprised of long-lived processes. It also produces + completely unacceptable overhead in systems with large numbers + of processes. Finally, it is specific to AVL trees. + . + Cites Kung80, so not an independent invention, but the first + RCU-like usage that does not rely on an automatic garbage + collector. +} } @article{Manber84 @@ -195,6 +248,74 @@ Bibtex Entries ,volume="9" ,number="3" ,pages="439-455" +,annotation={ + Describes concurrent AVL tree implementation. Uses a + garbage-collection mechanism to handle concurrent use and deletion + of nodes in the tree, but lacks the summary-of-execution-history + concept of read-copy locking. + . + Keeps full list of processes that were active when a given + node was to be deleted, and waits until all such processes have + -terminated- before allowing this node to be reused. This is + not described in great detail -- one could imagine using process + IDs for this if the ID space was large enough that overlapping + never occurred. + . + This restriction makes this algorithm unsuitable for use in + systems comprised of long-lived processes. It also produces + completely unacceptable overhead in systems with large numbers + of processes. Finally, it is specific to AVL trees. +} +} + +@Conference{RichardRashid87a +,Author="Richard Rashid and Avadis Tevanian and Michael Young and +David Golub and Robert Baron and David Black and William Bolosky and +Jonathan Chew" +,Title="Machine-Independent Virtual Memory Management for Paged +Uniprocessor and Multiprocessor Architectures" +,Booktitle="{2\textsuperscript{nd} Symposium on Architectural Support +for Programming Languages and Operating Systems}" +,Publisher="Association for Computing Machinery" +,Month="October" +,Year="1987" +,pages="31-39" +,Address="Palo Alto, CA" +,note="Available: +\url{http://www.cse.ucsc.edu/~randal/221/rashid-machvm.pdf} +[Viewed February 17, 2005]" +,annotation={ + Describes lazy TLB flush, where one waits for each CPU to pass + through a scheduling-clock interrupt before reusing a given range + of virtual address. Does not describe how one determines that + all CPUs have in fact taken such an interrupt, though there are + no shortage of straightforward methods for accomplishing this. + . + Note that it does not make sense to just wait a fixed amount of + time, since a given CPU might have interrupts disabled for an + extended amount of time. +} +} + +@article{BarbaraLiskov1988ArgusCACM +,author = {Barbara Liskov} +,title = {Distributed programming in {Argus}} +,journal = {Commun. ACM} +,volume = {31} +,number = {3} +,year = {1988} +,issn = {0001-0782} +,pages = {300--312} +,doi = {http://doi.acm.org/10.1145/42392.42399} +,publisher = {ACM} +,address = {New York, NY, USA} +,annotation= { + At the top of page 307: "Conflicts with deposits and withdrawals + are necessary if the reported total is to be up to date. They + could be avoided by having total return a sum that is slightly + out of date." Relies on semantics -- approximate numerical + values sometimes OK. +} } @techreport{Hennessy89 @@ -216,6 +337,13 @@ Bibtex Entries ,year="1990" ,number="CS-TR-2222.1" ,month="June" +,annotation={ + Concurrent access to skip lists. Has both weak and strong search. + Uses concept of ``garbage queue'', but has no real way of cleaning + the garbage efficiently. + . + Appears to be an independent invention of an RCU-like mechanism. +} } @Book{Adams91 @@ -223,20 +351,15 @@ Bibtex Entries ,title="Concurrent Programming, Principles, and Practices" ,Publisher="Benjamin Cummins" ,Year="1991" +,annotation={ + Has a few paragraphs describing ``chaotic relaxation'', a + numerical analysis technique that allows multiprocessors to + avoid synchronization overhead by using possibly-stale data. + . + Seems like this is descended from yet another independent + invention of RCU-like function -- but this is restricted + in that reclamation is not necessary. } - -@phdthesis{HMassalinPhD -,author="H. Massalin" -,title="Synthesis: An Efficient Implementation of Fundamental Operating -System Services" -,school="Columbia University" -,address="New York, NY" -,year="1992" -,annotation=" - Mondo optimizing compiler. - Wait-free stuff. - Good advice: defer work to avoid synchronization. -" } @unpublished{Jacobson93 @@ -244,7 +367,13 @@ System Services" ,title="Avoid Read-Side Locking Via Delayed Free" ,year="1993" ,month="September" -,note="Verbal discussion" +,note="private communication" +,annotation={ + Use fixed time delay to approximate grace period. Very simple, + but subject to random memory corruption under heavy load. + . + Independent invention of RCU-like mechanism. +} } @Conference{AjuJohn95 @@ -256,6 +385,17 @@ System Services" ,Year="1995" ,pages="11-23" ,Address="New Orleans, LA" +,note="Available: +\url{https://www.usenix.org/publications/library/proceedings/neworl/full_papers/john.a} +[Viewed October 1, 2010]" +,annotation={ + Age vnodes out of the cache, and have a fixed time set by a kernel + parameter. Not clear that all races were in fact correctly handled. + Used a 20-minute time by default, which would most definitely not + be suitable during DoS attacks or virus scans. + . + Apparently independent invention of RCU-like mechanism. +} } @conference{Pu95a, @@ -301,31 +441,47 @@ Utilizing Execution History and Thread Monitoring" ,institution="US Patent and Trademark Office" ,address="Washington, DC" ,year="1995" -,number="US Patent 5,442,758 (contributed under GPL)" +,number="US Patent 5,442,758" ,month="August" +,annotation={ + Describes the parallel RCU infrastructure. Includes NUMA aspect + (structure of bitmap can reflect bus structure of computer system). + . + Another independent invention of an RCU-like mechanism, but the + "real" RCU this time! +} } @techreport{Slingwine97 ,author="John D. Slingwine and Paul E. McKenney" -,title="Method for maintaining data coherency using thread -activity summaries in a multicomputer system" +,title="Method for Maintaining Data Coherency Using Thread Activity +Summaries in a Multicomputer System" ,institution="US Patent and Trademark Office" ,address="Washington, DC" ,year="1997" -,number="US Patent 5,608,893 (contributed under GPL)" +,number="US Patent 5,608,893" ,month="March" +,pages="19" +,annotation={ + Describes use of RCU to synchronize data between a pair of + SMP/NUMA computer systems. +} } @techreport{Slingwine98 ,author="John D. Slingwine and Paul E. McKenney" -,title="Apparatus and method for achieving reduced overhead -mutual exclusion and maintaining coherency in a multiprocessor -system utilizing execution history and thread monitoring" +,title="Apparatus and Method for Achieving Reduced Overhead Mutual +Exclusion and Maintaining Coherency in a Multiprocessor System +Utilizing Execution History and Thread Monitoring" ,institution="US Patent and Trademark Office" ,address="Washington, DC" ,year="1998" -,number="US Patent 5,727,209 (contributed under GPL)" +,number="US Patent 5,727,209" ,month="March" +,annotation={ + Describes doing an atomic update by copying the data item and + then substituting it into the data structure. +} } @Conference{McKenney98 @@ -337,6 +493,15 @@ Problems" ,Year="1998" ,pages="509-518" ,Address="Las Vegas, NV" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU/rclockpdcsproof.pdf} +[Viewed December 3, 2007]" +,annotation={ + Describes and analyzes RCU mechanism in DYNIX/ptx. Describes + application to linked list update and log-buffer flushing. + Defines 'quiescent state'. Includes both measured and analytic + evaluation. +} } @Conference{Gamsa99 @@ -349,18 +514,76 @@ Operating System Design and Implementation}" ,Year="1999" ,pages="87-100" ,Address="New Orleans, LA" +,note="Available: +\url{http://www.usenix.org/events/osdi99/full_papers/gamsa/gamsa.pdf} +[Viewed August 30, 2006]" +,annotation={ + Use of RCU-like facility in K42/Tornado. Another independent + invention of RCU. + See especially pages 7-9 (Section 5). +} +} + +@unpublished{RustyRussell2000a +,Author="Rusty Russell" +,Title="Re: modular net drivers" +,month="June" +,year="2000" +,day="23" +,note="Available: +\url{http://oss.sgi.com/projects/netdev/archive/2000-06/msg00250.html} +[Viewed April 10, 2006]" +,annotation={ + Proto-RCU proposal from Phil Rumpf and Rusty Russell. + Yet another independent invention of RCU. + Outline of algorithm to unload modules... + . + Appeared on net-dev mailing list. +} +} + +@unpublished{RustyRussell2000b +,Author="Rusty Russell" +,Title="Re: modular net drivers" +,month="June" +,year="2000" +,day="24" +,note="Available: +\url{http://oss.sgi.com/projects/netdev/archive/2000-06/msg00254.html} +[Viewed April 10, 2006]" +,annotation={ + Proto-RCU proposal from Phil Rumpf and Rusty Russell. + . + Appeared on net-dev mailing list. +} +} + +@unpublished{McKenney01b +,Author="Paul E. McKenney and Dipankar Sarma" +,Title="Read-Copy Update Mutual Exclusion in {Linux}" +,month="February" +,year="2001" +,note="Available: +\url{http://lse.sourceforge.net/locking/rcu/rcupdate_doc.html} +[Viewed October 18, 2004]" +,annotation={ + Prototypical Linux documentation for RCU. +} } @techreport{Slingwine01 ,author="John D. Slingwine and Paul E. McKenney" -,title="Apparatus and method for achieving reduced overhead -mutual exclusion and maintaining coherency in a multiprocessor -system utilizing execution history and thread monitoring" +,title="Apparatus and Method for Achieving Reduced Overhead Mutual +Exclusion and Maintaining Coherency in a Multiprocessor System +Utilizing Execution History and Thread Monitoring" ,institution="US Patent and Trademark Office" ,address="Washington, DC" ,year="2001" -,number="US Patent 5,219,690 (contributed under GPL)" +,number="US Patent 6,219,690" ,month="April" +,annotation={ + 'Change in mode' aspect of RCU. Can be thought of as a lazy barrier. +} } @Conference{McKenney01a @@ -372,14 +595,61 @@ Orran Krieger and Rusty Russell and Dipankar Sarma and Maneesh Soni" ,Year="2001" ,note="Available: \url{http://www.linuxsymposium.org/2001/abstracts/readcopy.php} -\url{http://www.rdrop.com/users/paulmck/rclock/rclock_OLS.2001.05.01c.pdf} +\url{http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01c.pdf} [Viewed June 23, 2004]" -annotation=" -Described RCU, and presented some patches implementing and using it in -the Linux kernel. +,annotation={ + Described RCU, and presented some patches implementing and using + it in the Linux kernel. +} +} + +@unpublished{McKenney01f +,Author="Paul E. McKenney" +,Title="{RFC:} patch to allow lock-free traversal of lists with insertion" +,month="October" +,year="2001" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=100259266316456&w=2} +[Viewed June 23, 2004]" +,annotation=" + Memory-barrier and Alpha thread. 100 messages, not too bad... +" +} + +@unpublished{Spraul01 +,Author="Manfred Spraul" +,Title="Re: {RFC:} patch to allow lock-free traversal of lists with insertion" +,month="October" +,year="2001" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=100264675012867&w=2} +[Viewed June 23, 2004]" +,annotation=" + Suggested burying memory barriers in Linux's list-manipulation + primitives. " } +@unpublished{LinusTorvalds2001a +,Author="Linus Torvalds" +,Title="{Re:} {[Lse-tech]} {Re:} {RFC:} patch to allow lock-free traversal of lists with insertion" +,month="October" +,year="2001" +,note="Available: +\url{http://lkml.org/lkml/2001/10/13/105} +[Viewed August 21, 2004]" +} + +@unpublished{Blanchard02a +,Author="Anton Blanchard" +,Title="some RCU dcache and ratcache results" +,month="March" +,year="2002" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=101637107412972&w=2} +[Viewed October 18, 2004]" +} + @Conference{Linder02a ,Author="Hanna Linder and Dipankar Sarma and Maneesh Soni" ,Title="Scalability of the Directory Entry Cache" @@ -387,6 +657,10 @@ the Linux kernel. ,Month="June" ,Year="2002" ,pages="289-300" +,annotation=" + Measured scalability of Linux 2.4 kernel's directory-entry cache + (dcache), and measured some scalability enhancements. +" } @Conference{McKenney02a @@ -400,49 +674,76 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell" ,note="Available: \url{http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz} [Viewed June 23, 2004]" +,annotation=" + Presented and compared a number of RCU implementations for the + Linux kernel. +" } -@conference{Michael02a -,author="Maged M. Michael" -,title="Safe Memory Reclamation for Dynamic Lock-Free Objects Using Atomic -Reads and Writes" -,Year="2002" -,Month="August" -,booktitle="{Proceedings of the 21\textsuperscript{st} Annual ACM -Symposium on Principles of Distributed Computing}" -,pages="21-30" +@unpublished{Sarma02a +,Author="Dipankar Sarma" +,Title="specweb99: dcache scalability results" +,month="July" +,year="2002" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=102645767914212&w=2} +[Viewed June 23, 2004]" ,annotation=" - Each thread keeps an array of pointers to items that it is - currently referencing. Sort of an inside-out garbage collection - mechanism, but one that requires the accessing code to explicitly - state its needs. Also requires read-side memory barriers on - most architectures. + Compare fastwalk and RCU for dcache. RCU won. " } -@conference{Michael02b -,author="Maged M. Michael" -,title="High Performance Dynamic Lock-Free Hash Tables and List-Based Sets" -,Year="2002" -,Month="August" -,booktitle="{Proceedings of the 14\textsuperscript{th} Annual ACM -Symposium on Parallel -Algorithms and Architecture}" -,pages="73-82" +@unpublished{Barbieri02 +,Author="Luca Barbieri" +,Title="Re: {[PATCH]} Initial support for struct {vfs\_cred}" +,month="August" +,year="2002" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=103082050621241&w=2} +[Viewed: June 23, 2004]" ,annotation=" - Like the title says... + Suggested RCU for vfs\_shared\_cred. " } -@InProceedings{HerlihyLM02 -,author={Maurice Herlihy and Victor Luchangco and Mark Moir} -,title="The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, -Lock-Free Data Structures" -,booktitle={Proceedings of 16\textsuperscript{th} International -Symposium on Distributed Computing} -,year=2002 +@unpublished{Dickins02a +,author="Hugh Dickins" +,title="Use RCU for System-V IPC" +,year="2002" +,month="October" +,note="private communication" +} + +@unpublished{Sarma02b +,Author="Dipankar Sarma" +,Title="Some dcache\_rcu benchmark numbers" ,month="October" -,pages="339-353" +,year="2002" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=103462075416638&w=2} +[Viewed June 23, 2004]" +,annotation=" + Performance of dcache RCU on kernbench for 16x NUMA-Q and 1x, + 2x, and 4x systems. RCU does no harm, and helps on 16x. +" +} + +@unpublished{LinusTorvalds2003a +,Author="Linus Torvalds" +,Title="Re: {[PATCH]} small fixes in brlock.h" +,month="March" +,year="2003" +,note="Available: +\url{http://lkml.org/lkml/2003/3/9/205} +[Viewed March 13, 2006]" +,annotation=" + Linus suggests replacing brlock with RCU and/or seqlocks: + . + 'It's entirely possible that the current user could be replaced + by RCU and/or seqlocks, and we could get rid of brlocks entirely.' + . + Steve Hemminger responds by replacing them with RCU. +" } @article{Appavoo03a @@ -457,6 +758,20 @@ B. Rosenburg and M. Stumm and J. Xenidis" ,volume="42" ,number="1" ,pages="60-76" +,annotation=" + Use of RCU to enable hot-swapping for autonomic behavior in K42. +" +} + +@unpublished{Seigh03 +,author="Joseph W. {Seigh II}" +,title="Read Copy Update" +,Year="2003" +,Month="March" +,note="email correspondence" +,annotation=" + Described the relationship of the VM/XA passive serialization to RCU. +" } @Conference{Arcangeli03 @@ -470,6 +785,27 @@ Dipankar Sarma" ,year="2003" ,month="June" ,pages="297-310" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU/rcu.FREENIX.2003.06.14.pdf} +[Viewed November 21, 2007]" +,annotation=" + Compared updated RCU implementations for the Linux kernel, and + described System V IPC use of RCU, including order-of-magnitude + performance improvements. +" +} + +@Conference{Soules03a +,Author="Craig A. N. Soules and Jonathan Appavoo and Kevin Hui and +Dilma {Da Silva} and Gregory R. Ganger and Orran Krieger and +Michael Stumm and Robert W. Wisniewski and Marc Auslander and +Michal Ostrowski and Bryan Rosenburg and Jimi Xenidis" +,Title="System Support for Online Reconfiguration" +,Booktitle="Proceedings of the 2003 USENIX Annual Technical Conference" +,Publisher="USENIX Association" +,year="2003" +,month="June" +,pages="141-154" } @article{McKenney03a @@ -481,6 +817,22 @@ Dipankar Sarma" ,volume="1" ,number="114" ,pages="18-26" +,note="Available: +\url{http://www.linuxjournal.com/article/6993} +[Viewed November 14, 2007]" +,annotation=" + Reader-friendly intro to RCU, with the infamous old-man-and-brat + cartoon. +" +} + +@unpublished{Sarma03a +,Author="Dipankar Sarma" +,Title="RCU low latency patches" +,month="December" +,year="2003" +,note="Message ID: 20031222180114.GA2248@in.ibm.com" +,annotation="dipankar/ct.2004.03.27/RCUll.2003.12.22.patch" } @techreport{Friedberg03a @@ -489,9 +841,14 @@ Dipankar Sarma" ,institution="US Patent and Trademark Office" ,address="Washington, DC" ,year="2003" -,number="US Patent 6,662,184 (contributed under GPL)" +,number="US Patent 6,662,184" ,month="December" ,pages="112" +,annotation=" + Applies RCU to a wildcard-search Patricia tree in order to permit + synchronization-free lookup. RCU is used to retain removed nodes + for a grace period before freeing them. +" } @article{McKenney04a @@ -503,6 +860,12 @@ Dipankar Sarma" ,volume="1" ,number="118" ,pages="38-46" +,note="Available: +\url{http://www.linuxjournal.com/node/7124} +[Viewed December 26, 2010]" +,annotation=" + Reader friendly intro to dcache and RCU. +" } @Conference{McKenney04b @@ -514,8 +877,83 @@ Dipankar Sarma" ,Address="Adelaide, Australia" ,note="Available: \url{http://www.linux.org.au/conf/2004/abstracts.html#90} -\url{http://www.rdrop.com/users/paulmck/rclock/lockperf.2004.01.17a.pdf} +\url{http://www.rdrop.com/users/paulmck/RCU/lockperf.2004.01.17a.pdf} [Viewed June 23, 2004]" +,annotation=" + Compares performance of RCU to that of other locking primitives + over a number of CPUs (x86, Opteron, Itanium, and PPC). +" +} + +@unpublished{Sarma04a +,Author="Dipankar Sarma" +,Title="{[PATCH]} {RCU} for low latency (experimental)" +,month="March" +,year="2004" +,note="\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=108003746402892&w=2}" +,annotation="Head of thread: dipankar/2004.03.23/rcu-low-lat.1.patch" +} + +@unpublished{Sarma04b +,Author="Dipankar Sarma" +,Title="Re: {[PATCH]} {RCU} for low latency (experimental)" +,month="March" +,year="2004" +,note="\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=108016474829546&w=2}" +,annotation="dipankar/rcuth.2004.03.24/rcu-throttle.patch" +} + +@unpublished{Spraul04a +,Author="Manfred Spraul" +,Title="[RFC] 0/5 rcu lock update" +,month="May" +,year="2004" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=108546407726602&w=2} +[Viewed June 23, 2004]" +,annotation=" + Hierarchical-bitmap patch for RCU infrastructure. +" +} + +@unpublished{Steiner04a +,Author="Jack Steiner" +,Title="Re: [Lse-tech] [RFC, PATCH] 1/5 rcu lock update: +Add per-cpu batch counter" +,month="May" +,year="2004" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=108551764515332&w=2} +[Viewed June 23, 2004]" +,annotation={ + RCU runs reasonably on a 512-CPU SGI using Manfred Spraul's patches, + which may be found at: + https://lkml.org/lkml/2004/5/20/49 (split vars into cachelines) + https://lkml.org/lkml/2004/5/22/114 (cpu_quiet() patch) + https://lkml.org/lkml/2004/5/25/24 (0/5) + https://lkml.org/lkml/2004/5/25/23 (1/5) + https://lkml.org/lkml/2004/5/25/265 (works for Jack) + https://lkml.org/lkml/2004/5/25/20 (2/5) + https://lkml.org/lkml/2004/5/25/22 (3/5) + https://lkml.org/lkml/2004/5/25/19 (4/5) + https://lkml.org/lkml/2004/5/25/21 (5/5) +} +} + +@Conference{Sarma04c +,Author="Dipankar Sarma and Paul E. McKenney" +,Title="Making {RCU} Safe for Deep Sub-Millisecond Response +Realtime Applications" +,Booktitle="Proceedings of the 2004 USENIX Annual Technical Conference +(FREENIX Track)" +,Publisher="USENIX Association" +,year="2004" +,month="June" +,pages="182-191" +,annotation=" + Describes and compares a number of modifications to the Linux RCU + implementation that make it friendly to realtime applications. +" } @phdthesis{PaulEdwardMcKenneyPhD @@ -529,17 +967,118 @@ Oregon Health and Sciences University" ,note="Available: \url{http://www.rdrop.com/users/paulmck/RCU/RCUdissertation.2004.07.14e1.pdf} [Viewed October 15, 2004]" +,annotation=" + Describes RCU implementations and presents design patterns + corresponding to common uses of RCU in several operating-system + kernels. +" } -@Conference{Sarma04c -,Author="Dipankar Sarma and Paul E. McKenney" -,Title="Making RCU Safe for Deep Sub-Millisecond Response Realtime Applications" -,Booktitle="Proceedings of the 2004 USENIX Annual Technical Conference -(FREENIX Track)" -,Publisher="USENIX Association" +@unpublished{PaulEMcKenney2004rcu:dereference +,Author="Dipankar Sarma" +,Title="{Re: RCU : Abstracted RCU dereferencing [5/5]}" +,month="August" ,year="2004" -,month="June" -,pages="182-191" +,note="Available: +\url{http://lkml.org/lkml/2004/8/6/237} +[Viewed June 8, 2010]" +,annotation=" + Introduce rcu_dereference(). +" +} + +@unpublished{JimHouston04a +,Author="Jim Houston" +,Title="{[RFC\&PATCH] Alternative {RCU} implementation}" +,month="August" +,year="2004" +,note="Available: +\url{http://lkml.org/lkml/2004/8/30/87} +[Viewed February 17, 2005]" +,annotation=" + Uses active code in rcu_read_lock() and rcu_read_unlock() to + make RCU happen, allowing RCU to function on CPUs that do not + receive a scheduling-clock interrupt. +" +} + +@unpublished{TomHart04a +,Author="Thomas E. Hart" +,Title="Master's Thesis: Applying Lock-free Techniques to the {Linux} Kernel" +,month="October" +,year="2004" +,note="Available: +\url{http://www.cs.toronto.edu/~tomhart/masters_thesis.html} +[Viewed October 15, 2004]" +,annotation=" + Proposes comparing RCU to lock-free methods for the Linux kernel. +" +} + +@unpublished{Vaddagiri04a +,Author="Srivatsa Vaddagiri" +,Title="Subject: [RFC] Use RCU for tcp\_ehash lookup" +,month="October" +,year="2004" +,note="Available: +\url{http://marc.theaimsgroup.com/?t=109395731700004&r=1&w=2} +[Viewed October 18, 2004]" +,annotation=" + Srivatsa's RCU patch for tcp_ehash lookup. +" +} + +@unpublished{Thirumalai04a +,Author="Ravikiran Thirumalai" +,Title="Subject: [patchset] Lockfree fd lookup 0 of 5" +,month="October" +,year="2004" +,note="Available: +\url{http://marc.theaimsgroup.com/?t=109144217400003&r=1&w=2} +[Viewed October 18, 2004]" +,annotation=" + Ravikiran's lockfree FD patch. +" +} + +@unpublished{Thirumalai04b +,Author="Ravikiran Thirumalai" +,Title="Subject: Re: [patchset] Lockfree fd lookup 0 of 5" +,month="October" +,year="2004" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=109152521410459&w=2} +[Viewed October 18, 2004]" +,annotation=" + Ravikiran's lockfree FD patch. +" +} + +@unpublished{PaulEMcKenney2004rcu:assign:pointer +,Author="Paul E. McKenney" +,Title="{[PATCH 1/3] RCU: \url{rcu_assign_pointer()} removal of memory barriers}" +,month="October" +,year="2004" +,note="Available: +\url{http://lkml.org/lkml/2004/10/23/241} +[Viewed June 8, 2010]" +,annotation=" + Introduce rcu_assign_pointer(). +" +} + +@unpublished{JamesMorris04a +,Author="James Morris" +,Title="{[PATCH 2/3] SELinux} scalability - convert {AVC} to {RCU}" +,day="15" +,month="November" +,year="2004" +,note="Available: +\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=110054979416004&w=2} +[Viewed December 10, 2004]" +,annotation=" + James Morris posts Kaigai Kohei's patch to LKML. +" } @unpublished{JamesMorris04b @@ -550,6 +1089,85 @@ Oregon Health and Sciences University" ,note="Available: \url{http://www.livejournal.com/users/james_morris/2153.html} [Viewed December 10, 2004]" +,annotation=" + RCU helps SELinux performance. ;-) Made LWN. +" +} + +@unpublished{PaulMcKenney2005RCUSemantics +,Author="Paul E. McKenney and Jonathan Walpole" +,Title="{RCU} Semantics: A First Attempt" +,month="January" +,year="2005" +,day="30" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU/rcu-semantics.2005.01.30a.pdf} +[Viewed December 6, 2009]" +,annotation=" + Early derivation of RCU semantics. +" +} + +@unpublished{PaulMcKenney2005e +,Author="Paul E. McKenney" +,Title="Real-Time Preemption and {RCU}" +,month="March" +,year="2005" +,day="17" +,note="Available: +\url{http://lkml.org/lkml/2005/3/17/199} +[Viewed September 5, 2005]" +,annotation=" + First posting showing how RCU can be safely adapted for + preemptable RCU read side critical sections. +" +} + +@unpublished{EsbenNeilsen2005a +,Author="Esben Neilsen" +,Title="Re: Real-Time Preemption and {RCU}" +,month="March" +,year="2005" +,day="18" +,note="Available: +\url{http://lkml.org/lkml/2005/3/18/122} +[Viewed March 30, 2006]" +,annotation=" + Esben Neilsen suggests read-side suppression of grace-period + processing for crude-but-workable realtime RCU. The downside + is indefinite grace periods...But this is OK for experimentation + and testing. +" +} + +@unpublished{TomHart05a +,Author="Thomas E. Hart and Paul E. McKenney and Angela Demke Brown" +,Title="Efficient Memory Reclamation is Necessary for Fast Lock-Free +Data Structures" +,month="March" +,year="2005" +,note="Available: +\url{ftp://ftp.cs.toronto.edu/csrg-technical-reports/515/} +[Viewed March 4, 2005]" +,annotation=" + Comparison of RCU, QBSR, and EBSR. RCU wins for read-mostly + workloads. ;-) +" +} + +@unpublished{JonCorbet2005DeprecateSyncKernel +,Author="Jonathan Corbet" +,Title="API change: synchronize_kernel() deprecated" +,month="May" +,day="3" +,year="2005" +,note="Available: +\url{http://lwn.net/Articles/134484/} +[Viewed May 3, 2005]" +,annotation=" + Jon Corbet describes deprecation of synchronize_kernel() + in favor of synchronize_rcu() and synchronize_sched(). +" } @unpublished{PaulMcKenney05a @@ -568,7 +1186,7 @@ Oregon Health and Sciences University" @conference{PaulMcKenney05b ,Author="Paul E. McKenney and Dipankar Sarma" -,Title="Towards Hard Realtime Response from the Linux Kernel on SMP Hardware" +,Title="Towards Hard Realtime Response from the {Linux} Kernel on {SMP} Hardware" ,Booktitle="linux.conf.au 2005" ,month="April" ,year="2005" @@ -578,6 +1196,103 @@ Oregon Health and Sciences University" [Viewed May 13, 2005]" ,annotation=" Realtime turns into making RCU yet more realtime friendly. + http://lca2005.linux.org.au/Papers/Paul%20McKenney/Towards%20Hard%20Realtime%20Response%20from%20the%20Linux%20Kernel/LKS.2005.04.22a.pdf +" +} + +@unpublished{PaulEMcKenneyHomePage +,Author="Paul E. McKenney" +,Title="{Paul} {E.} {McKenney}" +,month="May" +,year="2005" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/} +[Viewed May 25, 2005]" +,annotation=" + Paul McKenney's home page. +" +} + +@unpublished{PaulEMcKenneyRCUPage +,Author="Paul E. McKenney" +,Title="Read-Copy Update {(RCU)}" +,month="May" +,year="2005" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU} +[Viewed May 25, 2005]" +,annotation=" + Paul McKenney's RCU page. +" +} + +@unpublished{JosephSeigh2005a +,Author="Joseph Seigh" +,Title="{RCU}+{SMR} (hazard pointers)" +,month="July" +,year="2005" +,note="Personal communication" +,annotation=" + Joe Seigh announcing his atomic-ptr-plus project. + http://sourceforge.net/projects/atomic-ptr-plus/ +" +} + +@unpublished{JosephSeigh2005b +,Author="Joseph Seigh" +,Title="Lock-free synchronization primitives" +,month="July" +,day="6" +,year="2005" +,note="Available: +\url{http://sourceforge.net/projects/atomic-ptr-plus/} +[Viewed August 8, 2005]" +,annotation=" + Joe Seigh's atomic-ptr-plus project. +" +} + +@unpublished{PaulMcKenney2005c +,Author="Paul E.McKenney" +,Title="{[RFC,PATCH] RCU} and {CONFIG\_PREEMPT\_RT} sane patch" +,month="August" +,day="1" +,year="2005" +,note="Available: +\url{http://lkml.org/lkml/2005/8/1/155} +[Viewed March 14, 2006]" +,annotation=" + First operating counter-based realtime RCU patch posted to LKML. +" +} + +@unpublished{PaulMcKenney2005d +,Author="Paul E. McKenney" +,Title="Re: [Fwd: Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01]" +,month="August" +,day="8" +,year="2005" +,note="Available: +\url{http://lkml.org/lkml/2005/8/8/108} +[Viewed March 14, 2006]" +,annotation=" + First operating counter-based realtime RCU patch posted to LKML, + but fixed so that various unusual combinations of configuration + parameters all function properly. +" +} + +@unpublished{PaulMcKenney2005rcutorture +,Author="Paul E. McKenney" +,Title="{[PATCH]} {RCU} torture testing" +,month="October" +,day="1" +,year="2005" +,note="Available: +\url{http://lkml.org/lkml/2005/10/1/70} +[Viewed March 14, 2006]" +,annotation=" + First rcutorture patch. " } @@ -591,22 +1306,39 @@ Distributed Processing Symposium" ,year="2006" ,day="25-29" ,address="Rhodes, Greece" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU/hart_ipdps06.pdf} +[Viewed April 28, 2008]" +,annotation=" + Compares QSBR, HPBR, EBR, and lock-free reference counting. + http://www.cs.toronto.edu/~tomhart/perflab/ipdps06.tgz +" +} + +@unpublished{NickPiggin2006radixtree +,Author="Nick Piggin" +,Title="[patch 3/3] radix-tree: {RCU} lockless readside" +,month="June" +,day="20" +,year="2006" +,note="Available: +\url{http://lkml.org/lkml/2006/6/20/238} +[Viewed March 25, 2008]" ,annotation=" - Compares QSBR (AKA "classic RCU"), HPBR, EBR, and lock-free - reference counting. + RCU-protected radix tree. " } @Conference{PaulEMcKenney2006b ,Author="Paul E. McKenney and Dipankar Sarma and Ingo Molnar and Suparna Bhattacharya" -,Title="Extending RCU for Realtime and Embedded Workloads" +,Title="Extending {RCU} for Realtime and Embedded Workloads" ,Booktitle="{Ottawa Linux Symposium}" ,Month="July" ,Year="2006" ,pages="v2 123-138" ,note="Available: -\url{http://www.linuxsymposium.org/2006/index_2006.php} +\url{http://www.linuxsymposium.org/2006/view_abstract.php?content_key=184} \url{http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf} [Viewed January 1, 2007]" ,annotation=" @@ -614,6 +1346,37 @@ Suparna Bhattacharya" " } +@unpublished{WikipediaRCU +,Author="Paul E. McKenney and Chris Purcell and Algae and Ben Schumin and +Gaius Cornelius and Qwertyus and Neil Conway and Sbw and Blainster and +Canis Rufus and Zoicon5 and Anome and Hal Eisen" +,Title="Read-Copy Update" +,month="July" +,day="8" +,year="2006" +,note="Available: +\url{http://en.wikipedia.org/wiki/Read-copy-update} +[Viewed August 21, 2006]" +,annotation=" + Wikipedia RCU page as of July 8 2006. +" +} + +@Conference{NickPiggin2006LocklessPageCache +,Author="Nick Piggin" +,Title="A Lockless Pagecache in Linux---Introduction, Progress, Performance" +,Booktitle="{Ottawa Linux Symposium}" +,Month="July" +,Year="2006" +,pages="v2 249-254" +,note="Available: +\url{http://www.linuxsymposium.org/2006/view_abstract.php?content_key=184} +[Viewed January 11, 2009]" +,annotation=" + Uses RCU-protected radix tree for a lockless page cache. +" +} + @unpublished{PaulEMcKenney2006c ,Author="Paul E. McKenney" ,Title="Sleepable {RCU}" @@ -637,29 +1400,301 @@ Revised: ,day="18" ,year="2006" ,note="Available: -\url{http://www.nada.kth.se/~snilsson/public/papers/trash/trash.pdf} -[Viewed February 24, 2007]" +\url{http://www.nada.kth.se/~snilsson/publications/TRASH/trash.pdf} +[Viewed March 4, 2011]" ,annotation=" RCU-protected dynamic trie-hash combination. " } -@unpublished{ThomasEHart2007a -,Author="Thomas E. Hart and Paul E. McKenney and Angela Demke Brown and Jonathan Walpole" -,Title="Performance of memory reclamation for lockless synchronization" -,journal="J. Parallel Distrib. Comput." +@unpublished{ChristophHellwig2006RCU2SRCU +,Author="Christoph Hellwig" +,Title="Re: {[-mm PATCH 1/4]} {RCU}: split classic rcu" +,month="September" +,day="28" +,year="2006" +,note="Available: +\url{http://lkml.org/lkml/2006/9/28/160} +[Viewed March 27, 2008]" +} + +@unpublished{PaulEMcKenneyRCUusagePage +,Author="Paul E. McKenney" +,Title="{RCU} {Linux} Usage" +,month="October" +,year="2006" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU/linuxusage.html} +[Viewed January 14, 2007]" +,annotation=" + Paul McKenney's RCU page showing graphs plotting Linux-kernel + usage of RCU. +" +} + +@unpublished{PaulEMcKenneyRCUusageRawDataPage +,Author="Paul E. McKenney" +,Title="Read-Copy Update {(RCU)} Usage in {Linux} Kernel" +,month="October" +,year="2006" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html} +[Viewed January 14, 2007]" +,annotation=" + Paul McKenney's RCU page showing Linux usage of RCU in tabular + form, with links to corresponding cscope databases. +" +} + +@unpublished{GauthamShenoy2006RCUrwlock +,Author="Gautham R. Shenoy" +,Title="[PATCH 4/5] lock\_cpu\_hotplug: Redesign - Lightweight implementation of lock\_cpu\_hotplug" +,month="October" +,year="2006" +,day=26 +,note="Available: +\url{http://lkml.org/lkml/2006/10/26/73} +[Viewed January 26, 2009]" +,annotation=" + RCU-based reader-writer lock that allows readers to proceed with + no memory barriers or atomic instruction in absence of writers. + If writer do show up, readers must of course wait as required by + the semantics of reader-writer locking. This is a recursive + lock. +" +} + +@unpublished{JensAxboe2006SlowSRCU +,Author="Jens Axboe" +,Title="Re: [patch] cpufreq: mark \url{cpufreq_tsc()} as +\url{core_initcall_sync}" +,month="November" +,year="2006" +,day=17 +,note="Available: +\url{http://lkml.org/lkml/2006/11/17/56} +[Viewed May 28, 2007]" +,annotation=" + SRCU's grace periods are too slow for Jens, even after a + factor-of-three speedup. + Sped-up version of SRCU at http://lkml.org/lkml/2006/11/17/359. +" +} + +@unpublished{OlegNesterov2006QRCU +,Author="Oleg Nesterov" +,Title="Re: [patch] cpufreq: mark {\tt cpufreq\_tsc()} as +{\tt core\_initcall\_sync}" +,month="November" +,year="2006" +,day=19 +,note="Available: +\url{http://lkml.org/lkml/2006/11/19/69} +[Viewed May 28, 2007]" +,annotation=" + First cut of QRCU. Expanded/corrected versions followed. + Used to be OlegNesterov2007QRCU, now time-corrected. +" +} + +@unpublished{OlegNesterov2006aQRCU +,Author="Oleg Nesterov" +,Title="Re: [RFC, PATCH 1/2] qrcu: {"quick"} srcu implementation" +,month="November" +,year="2006" +,day=30 +,note="Available: +\url{http://lkml.org/lkml/2006/11/29/330} +[Viewed November 26, 2008]" +,annotation=" + Expanded/corrected version of QRCU. + Used to be OlegNesterov2007aQRCU, now time-corrected. +" +} + +@unpublished{EvgeniyPolyakov2006RCUslowdown +,Author="Evgeniy Polyakov" +,Title="Badness in postponing work" +,month="December" +,year="2006" +,day=05 +,note="Available: +\url{http://www.ioremap.net/node/41} +[Viewed October 28, 2008]" +,annotation=" + Using RCU as a pure delay leads to a 2.5x slowdown in skbs in + the Linux kernel. +" +} + +@inproceedings{ChrisMatthews2006ClusteredObjectsRCU +,author = {Matthews, Chris and Coady, Yvonne and Appavoo, Jonathan} +,title = {Portability events: a programming model for scalable system infrastructures} +,booktitle = {PLOS '06: Proceedings of the 3rd workshop on Programming languages and operating systems} +,year = {2006} +,isbn = {1-59593-577-0} +,pages = {11} +,location = {San Jose, California} +,doi = {http://doi.acm.org/10.1145/1215995.1216006} +,publisher = {ACM} +,address = {New York, NY, USA} +,annotation={ + Uses K42's RCU-like functionality to manage clustered-object + lifetimes. +}} + +@article{DilmaDaSilva2006K42 +,author = {Silva, Dilma Da and Krieger, Orran and Wisniewski, Robert W. and Waterland, Amos and Tam, David and Baumann, Andrew} +,title = {K42: an infrastructure for operating system research} +,journal = {SIGOPS Oper. Syst. Rev.} +,volume = {40} +,number = {2} +,year = {2006} +,issn = {0163-5980} +,pages = {34--42} +,doi = {http://doi.acm.org/10.1145/1131322.1131333} +,publisher = {ACM} +,address = {New York, NY, USA} +,annotation={ + Describes relationship of K42 generations to RCU. +}} + +# CoreyMinyard2007list_splice_rcu +@unpublished{CoreyMinyard2007list:splice:rcu +,Author="Corey Minyard and Paul E. McKenney" +,Title="{[PATCH]} add an {RCU} version of list splicing" +,month="January" +,year="2007" +,day=3 +,note="Available: +\url{http://lkml.org/lkml/2007/1/3/112} +[Viewed May 28, 2007]" +,annotation=" + Patch for list_splice_rcu(). +" +} + +@unpublished{PaulEMcKenney2007rcubarrier +,Author="Paul E. McKenney" +,Title="{RCU} and Unloadable Modules" +,month="January" +,day="14" +,year="2007" +,note="Available: +\url{http://lwn.net/Articles/217484/} +[Viewed November 22, 2007]" +,annotation=" + LWN article introducing the rcu_barrier() primitive. +" +} + +@unpublished{PeterZijlstra2007SyncBarrier +,Author="Peter Zijlstra and Ingo Molnar" +,Title="{[PATCH 3/7]} barrier: a scalable synchonisation barrier" +,month="January" +,year="2007" +,day=28 +,note="Available: +\url{http://lkml.org/lkml/2007/1/28/34} +[Viewed March 27, 2008]" +,annotation=" + RCU-like implementation for frequent updaters and rare readers(!). + Subsumed into QRCU. Maybe... +" +} + +@unpublished{PaulEMcKenney2007BoostRCU +,Author="Paul E. McKenney" +,Title="Priority-Boosting {RCU} Read-Side Critical Sections" +,month="February" +,day="5" +,year="2007" +,note="Available: +\url{http://lwn.net/Articles/220677/} +Revised: +\url{http://www.rdrop.com/users/paulmck/RCU/RCUbooststate.2007.04.16a.pdf} +[Viewed September 7, 2007]" +,annotation=" + LWN article introducing RCU priority boosting. +" +} + +@unpublished{PaulMcKenney2007QRCUpatch +,Author="Paul E. McKenney" +,Title="{[PATCH]} {QRCU} with lockless fastpath" +,month="February" +,year="2007" +,day=24 +,note="Available: +\url{http://lkml.org/lkml/2007/2/25/18} +[Viewed March 27, 2008]" +,annotation=" + Patch for QRCU supplying lock-free fast path. +" +} + +@article{JonathanAppavoo2007K42RCU +,author = {Appavoo, Jonathan and Silva, Dilma Da and Krieger, Orran and Auslander, Marc and Ostrowski, Michal and Rosenburg, Bryan and Waterland, Amos and Wisniewski, Robert W. and Xenidis, Jimi and Stumm, Michael and Soares, Livio} +,title = {Experience distributing objects in an SMMP OS} +,journal = {ACM Trans. Comput. Syst.} +,volume = {25} +,number = {3} +,year = {2007} +,issn = {0734-2071} +,pages = {6/1--6/52} +,doi = {http://doi.acm.org/10.1145/1275517.1275518} +,publisher = {ACM} +,address = {New York, NY, USA} +,annotation={ + Role of RCU in K42. +}} + +@conference{RobertOlsson2007Trash +,Author="Robert Olsson and Stefan Nilsson" +,Title="{TRASH}: A dynamic {LC}-trie and hash data structure" +,booktitle="Workshop on High Performance Switching and Routing (HPSR'07)" +,month="May" +,year="2007" +,note="Available: +\url{http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4281239} +[Viewed October 1, 2010]" +,annotation=" + RCU-protected dynamic trie-hash combination. +" +} + +@conference{PeterZijlstra2007ConcurrentPagecacheRCU +,Author="Peter Zijlstra" +,Title="Concurrent Pagecache" +,Booktitle="Linux Symposium" +,month="June" ,year="2007" -,note="To appear in J. Parallel Distrib. Comput. - \url{doi=10.1016/j.jpdc.2007.04.010}" +,address="Ottawa, Canada" +,note="Available: +\url{http://ols.108.redhat.com/2007/Reprints/zijlstra-Reprint.pdf} +[Viewed April 14, 2008]" +,annotation=" + Page-cache modifications permitting RCU readers and concurrent + updates. +" +} + +@unpublished{PaulEMcKenney2007whatisRCU +,Author="Paul E. McKenney" +,Title="What is {RCU}?" +,year="2007" +,month="07" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU/whatisRCU.html} +[Viewed July 6, 2007]" ,annotation={ - Compares QSBR (AKA "classic RCU"), HPBR, EBR, and lock-free - reference counting. Journal version of ThomasEHart2006a. + Describes RCU in Linux kernel. } } @unpublished{PaulEMcKenney2007QRCUspin ,Author="Paul E. McKenney" -,Title="Using Promela and Spin to verify parallel algorithms" +,Title="Using {Promela} and {Spin} to verify parallel algorithms" ,month="August" ,day="1" ,year="2007" @@ -669,6 +1704,50 @@ Revised: ,annotation=" LWN article describing Promela and spin, and also using Oleg Nesterov's QRCU as an example (with Paul McKenney's fastpath). + Merged patch at: http://lkml.org/lkml/2007/2/25/18 +" +} + +@unpublished{PaulEMcKenney2007WG21DDOatomics +,Author="Paul E. McKenney and Hans-J. Boehm and Lawrence Crowl" +,Title="C++ Data-Dependency Ordering: Atomics and Memory Model" +,month="August" +,day="3" +,year="2007" +,note="Preprint: +\url{http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2664.htm} +[Viewed December 7, 2009]" +,annotation=" + RCU for C++, parts 1 and 2. +" +} + +@unpublished{PaulEMcKenney2007WG21DDOannotation +,Author="Paul E. McKenney and Lawrence Crowl" +,Title="C++ Data-Dependency Ordering: Function Annotation" +,month="September" +,day="18" +,year="2008" +,note="Preprint: +\url{http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2782.htm} +[Viewed December 7, 2009]" +,annotation=" + RCU for C++, part 2, updated many times. +" +} + +@unpublished{PaulEMcKenney2007PreemptibleRCUPatch +,Author="Paul E. McKenney" +,Title="[PATCH RFC 0/9] {RCU}: Preemptible {RCU}" +,month="September" +,day="10" +,year="2007" +,note="Available: +\url{http://lkml.org/lkml/2007/9/10/213} +[Viewed October 25, 2007]" +,annotation=" + Final patch for preemptable RCU to -rt. (Later patches were + to mainline, eventually incorporated.) " } @@ -686,10 +1765,46 @@ Revised: " } +@article{ThomasEHart2007a +,Author="Thomas E. Hart and Paul E. McKenney and Angela Demke Brown and Jonathan Walpole" +,Title="Performance of memory reclamation for lockless synchronization" +,journal="J. Parallel Distrib. Comput." +,volume={67} +,number="12" +,year="2007" +,issn="0743-7315" +,pages="1270--1285" +,doi="http://dx.doi.org/10.1016/j.jpdc.2007.04.010" +,publisher="Academic Press, Inc." +,address="Orlando, FL, USA" +,annotation={ + Compares QSBR, HPBR, EBR, and lock-free reference counting. + Journal version of ThomasEHart2006a. +} +} + +@unpublished{MathieuDesnoyers2007call:rcu:schedNeeded +,Author="Mathieu Desnoyers" +,Title="Re: [patch 1/2] {Linux} Kernel Markers - Support Multiple Probes" +,month="December" +,day="20" +,year="2007" +,note="Available: +\url{http://lkml.org/lkml/2007/12/20/244} +[Viewed March 27, 2008]" +,annotation=" + Request for call_rcu_sched() and rcu_barrier_sched(). +" +} + + ######################################################################## # # "What is RCU?" LWN series. # +# http://lwn.net/Articles/262464/ (What is RCU, Fundamentally?) +# http://lwn.net/Articles/263130/ (What is RCU's Usage?) +# http://lwn.net/Articles/264090/ (What is RCU's API?) @unpublished{PaulEMcKenney2007WhatIsRCUFundamentally ,Author="Paul E. McKenney and Jonathan Walpole" @@ -723,7 +1838,7 @@ Revised: 3. RCU is a Bulk Reference-Counting Mechanism 4. RCU is a Poor Man's Garbage Collector 5. RCU is a Way of Providing Existence Guarantees - 6. RCU is a Way of Waiting for Things to Finish + 6. RCU is a Way of Waiting for Things to Finish " } @@ -747,20 +1862,96 @@ Revised: # ######################################################################## + +@unpublished{SteveRostedt2008dyntickRCUpatch +,Author="Steven Rostedt and Paul E. McKenney" +,Title="{[PATCH]} add support for dynamic ticks and preempt rcu" +,month="January" +,day="29" +,year="2008" +,note="Available: +\url{http://lkml.org/lkml/2008/1/29/208} +[Viewed March 27, 2008]" +,annotation=" + Patch that prevents preemptible RCU from unnecessarily waking + up dynticks-idle CPUs. +" +} + +@unpublished{PaulEMcKenney2008LKMLDependencyOrdering +,Author="Paul E. McKenney" +,Title="Re: [PATCH 02/22 -v7] Add basic support for gcc profiler instrumentation" +,month="February" +,day="1" +,year="2008" +,note="Available: +\url{http://lkml.org/lkml/2008/2/2/255} +[Viewed October 18, 2008]" +,annotation=" + Explanation of compilers violating dependency ordering. +" +} + +@Conference{PaulEMcKenney2008Beijing +,Author="Paul E. McKenney" +,Title="Introducing Technology Into {Linux} Or: +Introducing your technology Into {Linux} will require introducing a +lot of {Linux} into your technology!!!" +,Booktitle="2008 Linux Developer Symposium - China" +,Publisher="OSS China" +,Month="February" +,Year="2008" +,Address="Beijing, China" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU/TechIntroLinux.2008.02.19a.pdf} +[Viewed August 12, 2008]" +} + +@unpublished{PaulEMcKenney2008dynticksRCU +,Author="Paul E. McKenney and Steven Rostedt" +,Title="Integrating and Validating dynticks and Preemptable RCU" +,month="April" +,day="24" +,year="2008" +,note="Available: +\url{http://lwn.net/Articles/279077/} +[Viewed April 24, 2008]" +,annotation=" + Describes use of Promela and Spin to validate (and fix!) the + dynticks/RCU interface. +" +} + @article{DinakarGuniguntala2008IBMSysJ ,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole" ,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}" ,Year="2008" -,Month="April" +,Month="April-June" ,journal="IBM Systems Journal" ,volume="47" ,number="2" -,pages="@@-@@" +,pages="221-236" ,annotation=" RCU, realtime RCU, sleepable RCU, performance. " } +@unpublished{LaiJiangshan2008NewClassicAlgorithm +,Author="Lai Jiangshan" +,Title="[{RFC}][{PATCH}] rcu classic: new algorithm for callbacks-processing" +,month="June" +,day="3" +,year="2008" +,note="Available: +\url{http://lkml.org/lkml/2008/6/2/539} +[Viewed December 10, 2008]" +,annotation=" + Updated RCU classic algorithm. Introduced multi-tailed list + for RCU callbacks and also pulling common code into + __call_rcu(). +" +} + @article{PaulEMcKenney2008RCUOSR ,author="Paul E. McKenney and Jonathan Walpole" ,title="Introducing technology into the {Linux} kernel: a case study" @@ -778,6 +1969,52 @@ Revised: } } +@unpublished{ManfredSpraul2008StateMachineRCU +,Author="Manfred Spraul" +,Title="[{RFC}, {PATCH}] state machine based rcu" +,month="August" +,day="21" +,year="2008" +,note="Available: +\url{http://lkml.org/lkml/2008/8/21/336} +[Viewed December 8, 2008]" +,annotation=" + State-based RCU. One key thing that this patch does is to + separate the dynticks handling of NMIs and IRQs. +" +} + +@unpublished{ManfredSpraul2008dyntickIRQNMI +,Author="Manfred Spraul" +,Title="Re: [{RFC}, {PATCH}] v4 scalable classic {RCU} implementation" +,month="September" +,day="6" +,year="2008" +,note="Available: +\url{http://lkml.org/lkml/2008/9/6/86} +[Viewed December 8, 2008]" +,annotation=" + Manfred notes a fix required to my attempt to separate irq + and NMI processing for hierarchical RCU's dynticks interface. +" +} + +@techreport{PaulEMcKenney2008cyclicRCU +,author="Paul E. McKenney" +,title="Efficient Support of Consistent Cyclic Search With Read-Copy Update" +,institution="US Patent and Trademark Office" +,address="Washington, DC" +,year="2008" +,number="US Patent 7,426,511" +,month="September" +,pages="23" +,annotation=" + Maintains an additional level of indirection to allow + readers to confine themselves to the desired snapshot of the + data structure. Only permits one update at a time. +" +} + @unpublished{PaulEMcKenney2008HierarchicalRCU ,Author="Paul E. McKenney" ,Title="Hierarchical {RCU}" @@ -793,6 +2030,21 @@ Revised: " } +@unpublished{PaulEMcKenney2009BloatwatchRCU +,Author="Paul E. McKenney" +,Title="Re: [PATCH fyi] RCU: the bloatwatch edition" +,month="January" +,day="14" +,year="2009" +,note="Available: +\url{http://lkml.org/lkml/2009/1/14/449} +[Viewed January 15, 2009]" +,annotation=" + Small-footprint implementation of RCU for uniprocessor + embedded applications -- and also for exposition purposes. +" +} + @conference{PaulEMcKenney2009MaliciousURCU ,Author="Paul E. McKenney" ,Title="Using a Malicious User-Level {RCU} to Torture {RCU}-Based Algorithms" @@ -816,15 +2068,17 @@ Revised: ,year="2009" ,note="Available: \url{http://lkml.org/lkml/2009/2/5/572} -\url{git://lttng.org/userspace-rcu.git} +\url{http://lttng.org/urcu} [Viewed February 20, 2009]" ,annotation=" Mathieu Desnoyers's user-space RCU implementation. git://lttng.org/userspace-rcu.git + http://lttng.org/cgi-bin/gitweb.cgi?p=userspace-rcu.git + http://lttng.org/urcu " } -@unpublished{PaulEMcKenney2009BloatWatchRCU +@unpublished{PaulEMcKenney2009LWNBloatWatchRCU ,Author="Paul E. McKenney" ,Title="{RCU}: The {Bloatwatch} Edition" ,month="March" @@ -852,14 +2106,29 @@ Revised: " } -@unpublished{JoshTriplett2009RPHash +@unpublished{PaulEMcKenney2009fastRTRCU +,Author="Paul E. McKenney" +,Title="[{PATCH} {RFC} -tip 0/4] {RCU} cleanups and simplified preemptable {RCU}" +,month="July" +,day="23" +,year="2009" +,note="Available: +\url{http://lkml.org/lkml/2009/7/23/294} +[Viewed August 15, 2009]" +,annotation=" + First posting of simple and fast preemptable RCU. +" +} + +@InProceedings{JoshTriplett2009RPHash ,Author="Josh Triplett" ,Title="Scalable concurrent hash tables via relativistic programming" ,month="September" ,year="2009" -,note="Linux Plumbers Conference presentation" +,booktitle="Linux Plumbers Conference 2009" ,annotation=" RP fun with hash tables. + See also JoshTriplett2010RPHash " } @@ -872,4 +2141,323 @@ Revised: ,note="Available: \url{http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf} [Viewed December 9, 2009]" +,annotation={ + Chapter 6 (page 97) covers user-level RCU. +} +} + +@unpublished{RelativisticProgrammingWiki +,Author="Josh Triplett and Paul E. McKenney and Jonathan Walpole" +,Title="Relativistic Programming" +,month="September" +,year="2009" +,note="Available: +\url{http://wiki.cs.pdx.edu/rp/} +[Viewed December 9, 2009]" +,annotation=" + Main Relativistic Programming Wiki. +" +} + +@conference{PaulEMcKenney2009DeterministicRCU +,Author="Paul E. McKenney" +,Title="Deterministic Synchronization in Multicore Systems: the Role of {RCU}" +,Booktitle="Eleventh Real Time Linux Workshop" +,month="September" +,year="2009" +,address="Dresden, Germany" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/realtime/paper/DetSyncRCU.2009.08.18a.pdf} +[Viewed January 14, 2009]" +} + +@unpublished{PaulEMcKenney2009HuntingHeisenbugs +,Author="Paul E. McKenney" +,Title="Hunting Heisenbugs" +,month="November" +,year="2009" +,day="1" +,note="Available: +\url{http://paulmck.livejournal.com/14639.html} +[Viewed June 4, 2010]" +,annotation=" + Day-one bug in Tree RCU that took forever to track down. +" +} + +@unpublished{MathieuDesnoyers2009defer:rcu +,Author="Mathieu Desnoyers" +,Title="Kernel RCU: shrink the size of the struct rcu\_head" +,month="December" +,year="2009" +,note="Available: +\url{http://lkml.org/lkml/2009/10/18/129} +[Viewed December 29, 2009]" +,annotation=" + Mathieu proposed defer_rcu() with fixed-size per-thread pool + of RCU callbacks. +" +} + +@unpublished{MathieuDesnoyers2009VerifPrePub +,Author="Mathieu Desnoyers and Paul E. McKenney and Michel R. Dagenais" +,Title="Multi-Core Systems Modeling for Formal Verification of Parallel Algorithms" +,month="December" +,year="2009" +,note="Submitted to IEEE TPDS" +,annotation=" + OOMem model for Mathieu's user-level RCU mechanical proof of + correctness. +" +} + +@unpublished{MathieuDesnoyers2009URCUPrePub +,Author="Mathieu Desnoyers and Paul E. McKenney and Alan Stern and Michel R. Dagenais and Jonathan Walpole" +,Title="User-Level Implementations of Read-Copy Update" +,month="December" +,year="2010" +,url=\url{http://www.computer.org/csdl/trans/td/2012/02/ttd2012020375-abs.html} +,annotation=" + RCU overview, desiderata, semi-formal semantics, user-level RCU + usage scenarios, three classes of RCU implementation, wait-free + RCU updates, RCU grace-period batching, update overhead, + http://www.rdrop.com/users/paulmck/RCU/urcu-main-accepted.2011.08.30a.pdf + http://www.rdrop.com/users/paulmck/RCU/urcu-supp-accepted.2011.08.30a.pdf + Superseded by MathieuDesnoyers2012URCU. +" +} + +@inproceedings{HariKannan2009DynamicAnalysisRCU +,author = {Kannan, Hari} +,title = {Ordering decoupled metadata accesses in multiprocessors} +,booktitle = {MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture} +,year = {2009} +,isbn = {978-1-60558-798-1} +,pages = {381--390} +,location = {New York, New York} +,doi = {http://doi.acm.org/10.1145/1669112.1669161} +,publisher = {ACM} +,address = {New York, NY, USA} +,annotation={ + Uses RCU to protect metadata used in dynamic analysis. +}} + +@conference{PaulEMcKenney2010SimpleOptRCU +,Author="Paul E. McKenney" +,Title="Simplicity Through Optimization" +,Booktitle="linux.conf.au 2010" +,month="January" +,year="2010" +,address="Wellington, New Zealand" +,note="Available: +\url{http://www.rdrop.com/users/paulmck/RCU/SimplicityThruOptimization.2010.01.21f.pdf} +[Viewed October 10, 2010]" +,annotation=" + TREE_PREEMPT_RCU optimizations greatly simplified the old + PREEMPT_RCU implementation. +" +} + +@unpublished{PaulEMcKenney2010LockdepRCU +,Author="Paul E. McKenney" +,Title="Lockdep-{RCU}" +,month="February" +,year="2010" +,day="1" +,note="Available: +\url{https://lwn.net/Articles/371986/} +[Viewed June 4, 2010]" +,annotation=" + CONFIG_PROVE_RCU, or at least an early version. +" +} + +@unpublished{AviKivity2010KVM2RCU +,Author="Avi Kivity" +,Title="[{PATCH} 37/40] {KVM}: Bump maximum vcpu count to 64" +,month="February" +,year="2010" +,note="Available: +\url{http://www.mail-archive.com/kvm@vger.kernel.org/msg28640.html} +[Viewed March 20, 2010]" +,annotation=" + Use of RCU permits KVM to increase the size of guest OSes from + 16 CPUs to 64 CPUs. +" +} + +@unpublished{HerbertXu2010RCUResizeHash +,Author="Herbert Xu" +,Title="bridge: Add core IGMP snooping support" +,month="February" +,year="2010" +,note="Available: +\url{http://kerneltrap.com/mailarchive/linux-netdev/2010/2/26/6270589} +[Viewed March 20, 2011]" +,annotation={ + Use a pair of list_head structures to support RCU-protected + resizable hash tables. +}} + +@article{JoshTriplett2010RPHash +,author="Josh Triplett and Paul E. McKenney and Jonathan Walpole" +,title="Scalable Concurrent Hash Tables via Relativistic Programming" +,journal="ACM Operating Systems Review" +,year=2010 +,volume=44 +,number=3 +,month="July" +,annotation={ + RP fun with hash tables. + http://portal.acm.org/citation.cfm?id=1842733.1842750 +}} + +@unpublished{PaulEMcKenney2010RCUAPI +,Author="Paul E. McKenney" +,Title="The {RCU} {API}, 2010 Edition" +,month="December" +,day="8" +,year="2010" +,note="Available: +\url{http://lwn.net/Articles/418853/} +[Viewed December 8, 2010]" +,annotation=" + Includes updated software-engineering features. +" +} + +@mastersthesis{AndrejPodzimek2010masters +,author="Andrej Podzimek" +,title="Read-Copy-Update for OpenSolaris" +,school="Charles University in Prague" +,year="2010" +,note="Available: +\url{https://andrej.podzimek.org/thesis.pdf} +[Viewed January 31, 2011]" +,annotation={ + Reviews RCU implementations and creates a few for OpenSolaris. + Drives quiescent-state detection from RCU read-side primitives, + in a manner roughly similar to that of Jim Houston. +}} + +@unpublished{LinusTorvalds2011Linux2:6:38:rc1:NPigginVFS +,Author="Linus Torvalds" +,Title="Linux 2.6.38-rc1" +,month="January" +,year="2011" +,note="Available: +\url{https://lkml.org/lkml/2011/1/18/322} +[Viewed March 4, 2011]" +,annotation={ + "The RCU-based name lookup is at the other end of the spectrum - the + absolute anti-gimmick. It's some seriously good stuff, and gets rid of + the last main global lock that really tends to hurt some kernel loads. + The dentry lock is no longer a big serializing issue. What's really + nice about it is that it actually improves performance a lot even for + single-threaded loads (on an SMP kernel), because it gets rid of some + of the most expensive parts of path component lookup, which was the + d_lock on every component lookup. So I'm seeing improvements of 30-50% + on some seriously pathname-lookup intensive loads." +}} + +@techreport{JoshTriplett2011RPScalableCorrectOrdering +,author = {Josh Triplett and Philip W. Howard and Paul E. McKenney and Jonathan Walpole} +,title = {Scalable Correct Memory Ordering via Relativistic Programming} +,year = {2011} +,number = {11-03} +,institution = {Portland State University} +,note = {\url{http://www.cs.pdx.edu/pdfs/tr1103.pdf}} +} + +@inproceedings{PhilHoward2011RCUTMRBTree +,author = {Philip W. Howard and Jonathan Walpole} +,title = {A Relativistic Enhancement to Software Transactional Memory} +,booktitle = {Proceedings of the 3rd USENIX conference on Hot topics in parallelism} +,series = {HotPar'11} +,year = {2011} +,location = {Berkeley, CA} +,pages = {1--6} +,numpages = {6} +,url = {http://www.usenix.org/event/hotpar11/tech/final_files/Howard.pdf} +,publisher = {USENIX Association} +,address = {Berkeley, CA, USA} +} + +@techreport{PaulEMcKenney2011cyclicparallelRCU +,author="Paul E. McKenney and Jonathan Walpole" +,title="Efficient Support of Consistent Cyclic Search With Read-Copy Update and Parallel Updates" +,institution="US Patent and Trademark Office" +,address="Washington, DC" +,year="2011" +,number="US Patent 7,953,778" +,month="May" +,pages="34" +,annotation=" + Maintains an array of generation numbers to track in-flight + updates and keeps an additional level of indirection to allow + readers to confine themselves to the desired snapshot of the + data structure. +" +} + +@inproceedings{Triplett:2011:RPHash +,author = {Triplett, Josh and McKenney, Paul E. and Walpole, Jonathan} +,title = {Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming} +,booktitle = {Proceedings of the 2011 USENIX Annual Technical Conference} +,month = {June} +,year = {2011} +,pages = {145--158} +,numpages = {14} +,url={http://www.usenix.org/event/atc11/tech/final_files/atc11_proceedings.pdf} +,publisher = {The USENIX Association} +,address = {Portland, OR USA} +} + +@unpublished{PaulEMcKenney2011RCU3.0trainwreck +,Author="Paul E. McKenney" +,Title="3.0 and {RCU:} what went wrong" +,month="July" +,day="27" +,year="2011" +,note="Available: +\url{http://lwn.net/Articles/453002/} +[Viewed July 27, 2011]" +,annotation=" + Analysis of the RCU trainwreck in Linux kernel 3.0. +" +} + +@unpublished{NeilBrown2011MeetTheLockers +,Author="Neil Brown" +,Title="Meet the Lockers" +,month="August" +,day="3" +,year="2011" +,note="Available: +\url{http://lwn.net/Articles/453685/} +[Viewed September 2, 2011]" +,annotation=" + The Locker family as an analogy for locking, reference counting, + RCU, and seqlock. +" +} + +@article{MathieuDesnoyers2012URCU +,Author="Mathieu Desnoyers and Paul E. McKenney and Alan Stern and Michel R. Dagenais and Jonathan Walpole" +,Title="User-Level Implementations of Read-Copy Update" +,journal="IEEE Transactions on Parallel and Distributed Systems" +,volume={23} +,year="2012" +,issn="1045-9219" +,pages="375-382" +,doi="http://doi.ieeecomputersociety.org/10.1109/TPDS.2011.159" +,publisher="IEEE Computer Society" +,address="Los Alamitos, CA, USA" +,annotation={ + RCU overview, desiderata, semi-formal semantics, user-level RCU + usage scenarios, three classes of RCU implementation, wait-free + RCU updates, RCU grace-period batching, update overhead, + http://www.rdrop.com/users/paulmck/RCU/urcu-main-accepted.2011.08.30a.pdf + http://www.rdrop.com/users/paulmck/RCU/urcu-supp-accepted.2011.08.30a.pdf +} } diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index bff2d8be1e18..5c8d74968090 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -180,6 +180,20 @@ over a rather long period of time, but improvements are always welcome! operations that would not normally be undertaken while a real-time workload is running. + In particular, if you find yourself invoking one of the expedited + primitives repeatedly in a loop, please do everyone a favor: + Restructure your code so that it batches the updates, allowing + a single non-expedited primitive to cover the entire batch. + This will very likely be faster than the loop containing the + expedited primitive, and will be much much easier on the rest + of the system, especially to real-time workloads running on + the rest of the system. + + In addition, it is illegal to call the expedited forms from + a CPU-hotplug notifier, or while holding a lock that is acquired + by a CPU-hotplug notifier. Failing to observe this restriction + will result in deadlock. + 7. If the updater uses call_rcu() or synchronize_rcu(), then the corresponding readers must use rcu_read_lock() and rcu_read_unlock(). If the updater uses call_rcu_bh() or diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 083d88cbc089..523364e4e1f1 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -12,14 +12,38 @@ CONFIG_RCU_CPU_STALL_TIMEOUT This kernel configuration parameter defines the period of time that RCU will wait from the beginning of a grace period until it issues an RCU CPU stall warning. This time period is normally - ten seconds. + sixty seconds. -RCU_SECONDS_TILL_STALL_RECHECK + This configuration parameter may be changed at runtime via the + /sys/module/rcutree/parameters/rcu_cpu_stall_timeout, however + this parameter is checked only at the beginning of a cycle. + So if you are 30 seconds into a 70-second stall, setting this + sysfs parameter to (say) five will shorten the timeout for the + -next- stall, or the following warning for the current stall + (assuming the stall lasts long enough). It will not affect the + timing of the next warning for the current stall. - This macro defines the period of time that RCU will wait after - issuing a stall warning until it issues another stall warning - for the same stall. This time period is normally set to three - times the check interval plus thirty seconds. + Stall-warning messages may be enabled and disabled completely via + /sys/module/rcutree/parameters/rcu_cpu_stall_suppress. + +CONFIG_RCU_CPU_STALL_VERBOSE + + This kernel configuration parameter causes the stall warning to + also dump the stacks of any tasks that are blocking the current + RCU-preempt grace period. + +RCU_CPU_STALL_INFO + + This kernel configuration parameter causes the stall warning to + print out additional per-CPU diagnostic information, including + information on scheduling-clock ticks and RCU's idle-CPU tracking. + +RCU_STALL_DELAY_DELTA + + Although the lockdep facility is extremely useful, it does add + some overhead. Therefore, under CONFIG_PROVE_RCU, the + RCU_STALL_DELAY_DELTA macro allows five extra seconds before + giving an RCU CPU stall warning message. RCU_STALL_RAT_DELAY @@ -64,6 +88,54 @@ INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffi This is rare, but does happen from time to time in real life. +If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, +more information is printed with the stall-warning message, for example: + + INFO: rcu_preempt detected stall on CPU + 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 + (t=65000 jiffies) + +In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is +printed: + + INFO: rcu_preempt detected stall on CPU + 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer=-1 + (t=65000 jiffies) + +The "(64628 ticks this GP)" indicates that this CPU has taken more +than 64,000 scheduling-clock interrupts during the current stalled +grace period. If the CPU was not yet aware of the current grace +period (for example, if it was offline), then this part of the message +indicates how many grace periods behind the CPU is. + +The "idle=" portion of the message prints the dyntick-idle state. +The hex number before the first "/" is the low-order 12 bits of the +dynticks counter, which will have an even-numbered value if the CPU is +in dyntick-idle mode and an odd-numbered value otherwise. The hex +number between the two "/"s is the value of the nesting, which will +be a small positive number if in the idle loop and a very large positive +number (as shown above) otherwise. + +For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the +CPU is not in the process of trying to force itself into dyntick-idle +state, the "." indicates that the CPU has not given up forcing RCU +into dyntick-idle mode (it would be "H" otherwise), and the "timer=-1" +indicates that the CPU has not recented forced RCU into dyntick-idle +mode (it would otherwise indicate the number of microseconds remaining +in this forced state). + + +Multiple Warnings From One Stall + +If a stall lasts long enough, multiple stall-warning messages will be +printed for it. The second and subsequent messages are printed at +longer intervals, so that the time between (say) the first and second +message will be about three times the interval between the beginning +of the stall and the first message. + + +What Causes RCU CPU Stall Warnings? + So your kernel printed an RCU CPU stall warning. The next question is "What caused it?" The following problems can result in RCU CPU stall warnings: @@ -128,4 +200,5 @@ is occurring, which will usually be in the function nearest the top of that portion of the stack which remains the same from trace to trace. If you can reliably trigger the stall, ftrace can be quite helpful. -RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE. +RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE +and with RCU's event tracing. diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index d67068d0d2b9..375d3fb71437 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt @@ -69,6 +69,13 @@ onoff_interval CPU-hotplug operations regardless of what value is specified for onoff_interval. +onoff_holdoff The number of seconds to wait until starting CPU-hotplug + operations. This would normally only be used when + rcutorture was built into the kernel and started + automatically at boot time, in which case it is useful + in order to avoid confusing boot-time code with CPUs + coming and going. + shuffle_interval The number of seconds to keep the test threads affinitied to a particular subset of the CPUs, defaults to 3 seconds. @@ -79,6 +86,24 @@ shutdown_secs The number of seconds to run the test before terminating zero, which disables test termination and system shutdown. This capability is useful for automated testing. +stall_cpu The number of seconds that a CPU should be stalled while + within both an rcu_read_lock() and a preempt_disable(). + This stall happens only once per rcutorture run. + If you need multiple stalls, use modprobe and rmmod to + repeatedly run rcutorture. The default for stall_cpu + is zero, which prevents rcutorture from stalling a CPU. + + Note that attempts to rmmod rcutorture while the stall + is ongoing will hang, so be careful what value you + choose for this module parameter! In addition, too-large + values for stall_cpu might well induce failures and + warnings in other parts of the kernel. You have been + warned! + +stall_cpu_holdoff + The number of seconds to wait after rcutorture starts + before stalling a CPU. Defaults to 10 seconds. + stat_interval The number of seconds between output of torture statistics (via printk()). Regardless of the interval, statistics are printed when the module is unloaded. @@ -271,11 +296,13 @@ The following script may be used to torture RCU: #!/bin/sh modprobe rcutorture - sleep 100 + sleep 3600 rmmod rcutorture dmesg | grep torture: The output can be manually inspected for the error flag of "!!!". One could of course create a more elaborate script that automatically -checked for such errors. The "rmmod" command forces a "SUCCESS" or -"FAILURE" indication to be printk()ed. +checked for such errors. The "rmmod" command forces a "SUCCESS", +"FAILURE", or "RCU_HOTPLUG" indication to be printk()ed. The first +two are self-explanatory, while the last indicates that while there +were no RCU failures, CPU-hotplug problems were detected. diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 49587abfc2f7..f6f15ce39903 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -33,23 +33,23 @@ rcu/rcuboost: The output of "cat rcu/rcudata" looks as follows: rcu_sched: - 0 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=545/1/0 df=50 of=0 ri=0 ql=163 qs=NRW. kt=0/W/0 ktl=ebc3 b=10 ci=153737 co=0 ca=0 - 1 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=967/1/0 df=58 of=0 ri=0 ql=634 qs=NRW. kt=0/W/1 ktl=58c b=10 ci=191037 co=0 ca=0 - 2 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=1081/1/0 df=175 of=0 ri=0 ql=74 qs=N.W. kt=0/W/2 ktl=da94 b=10 ci=75991 co=0 ca=0 - 3 c=20942 g=20943 pq=1 pgp=20942 qp=1 dt=1846/0/0 df=404 of=0 ri=0 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=72261 co=0 ca=0 - 4 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=369/1/0 df=83 of=0 ri=0 ql=48 qs=N.W. kt=0/W/4 ktl=e0e7 b=10 ci=128365 co=0 ca=0 - 5 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=381/1/0 df=64 of=0 ri=0 ql=169 qs=NRW. kt=0/W/5 ktl=fb2f b=10 ci=164360 co=0 ca=0 - 6 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=1037/1/0 df=183 of=0 ri=0 ql=62 qs=N.W. kt=0/W/6 ktl=d2ad b=10 ci=65663 co=0 ca=0 - 7 c=20897 g=20897 pq=1 pgp=20896 qp=0 dt=1572/0/0 df=382 of=0 ri=0 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=75006 co=0 ca=0 + 0 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=545/1/0 df=50 of=0 ql=163 qs=NRW. kt=0/W/0 ktl=ebc3 b=10 ci=153737 co=0 ca=0 + 1 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=967/1/0 df=58 of=0 ql=634 qs=NRW. kt=0/W/1 ktl=58c b=10 ci=191037 co=0 ca=0 + 2 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=1081/1/0 df=175 of=0 ql=74 qs=N.W. kt=0/W/2 ktl=da94 b=10 ci=75991 co=0 ca=0 + 3 c=20942 g=20943 pq=1 pgp=20942 qp=1 dt=1846/0/0 df=404 of=0 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=72261 co=0 ca=0 + 4 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=369/1/0 df=83 of=0 ql=48 qs=N.W. kt=0/W/4 ktl=e0e7 b=10 ci=128365 co=0 ca=0 + 5 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=381/1/0 df=64 of=0 ql=169 qs=NRW. kt=0/W/5 ktl=fb2f b=10 ci=164360 co=0 ca=0 + 6 c=20972 g=20973 pq=1 pgp=20973 qp=0 dt=1037/1/0 df=183 of=0 ql=62 qs=N.W. kt=0/W/6 ktl=d2ad b=10 ci=65663 co=0 ca=0 + 7 c=20897 g=20897 pq=1 pgp=20896 qp=0 dt=1572/0/0 df=382 of=0 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=75006 co=0 ca=0 rcu_bh: - 0 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=545/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/0 ktl=ebc3 b=10 ci=0 co=0 ca=0 - 1 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=967/1/0 df=3 of=0 ri=1 ql=0 qs=.... kt=0/W/1 ktl=58c b=10 ci=151 co=0 ca=0 - 2 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=1081/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/2 ktl=da94 b=10 ci=0 co=0 ca=0 - 3 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=1846/0/0 df=8 of=0 ri=1 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=0 co=0 ca=0 - 4 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=369/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/4 ktl=e0e7 b=10 ci=0 co=0 ca=0 - 5 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=381/1/0 df=4 of=0 ri=1 ql=0 qs=.... kt=0/W/5 ktl=fb2f b=10 ci=0 co=0 ca=0 - 6 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=1037/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/6 ktl=d2ad b=10 ci=0 co=0 ca=0 - 7 c=1474 g=1474 pq=1 pgp=1473 qp=0 dt=1572/0/0 df=8 of=0 ri=1 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=0 co=0 ca=0 + 0 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=545/1/0 df=6 of=0 ql=0 qs=.... kt=0/W/0 ktl=ebc3 b=10 ci=0 co=0 ca=0 + 1 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=967/1/0 df=3 of=0 ql=0 qs=.... kt=0/W/1 ktl=58c b=10 ci=151 co=0 ca=0 + 2 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=1081/1/0 df=6 of=0 ql=0 qs=.... kt=0/W/2 ktl=da94 b=10 ci=0 co=0 ca=0 + 3 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=1846/0/0 df=8 of=0 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=0 co=0 ca=0 + 4 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=369/1/0 df=6 of=0 ql=0 qs=.... kt=0/W/4 ktl=e0e7 b=10 ci=0 co=0 ca=0 + 5 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=381/1/0 df=4 of=0 ql=0 qs=.... kt=0/W/5 ktl=fb2f b=10 ci=0 co=0 ca=0 + 6 c=1480 g=1480 pq=1 pgp=1480 qp=0 dt=1037/1/0 df=6 of=0 ql=0 qs=.... kt=0/W/6 ktl=d2ad b=10 ci=0 co=0 ca=0 + 7 c=1474 g=1474 pq=1 pgp=1473 qp=0 dt=1572/0/0 df=8 of=0 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=0 co=0 ca=0 The first section lists the rcu_data structures for rcu_sched, the second for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an @@ -119,10 +119,6 @@ o "of" is the number of times that some other CPU has forced a CPU is offline when it is really alive and kicking) is a fatal error, so it makes sense to err conservatively. -o "ri" is the number of times that RCU has seen fit to send a - reschedule IPI to this CPU in order to get it to report a - quiescent state. - o "ql" is the number of RCU callbacks currently residing on this CPU. This is the total number of callbacks, regardless of what state they are in (new, waiting for grace period to diff --git a/Documentation/acpi/apei/einj.txt b/Documentation/acpi/apei/einj.txt index e7cc36397217..e20b6daaced4 100644 --- a/Documentation/acpi/apei/einj.txt +++ b/Documentation/acpi/apei/einj.txt @@ -53,6 +53,14 @@ directory apei/einj. The following files are provided. This file is used to set the second error parameter value. Effect of parameter depends on error_type specified. +- notrigger + The EINJ mechanism is a two step process. First inject the error, then + perform some actions to trigger it. Setting "notrigger" to 1 skips the + trigger phase, which *may* allow the user to cause the error in some other + context by a simple access to the cpu, memory location, or device that is + the target of the error injection. Whether this actually works depends + on what operations the BIOS actually includes in the trigger phase. + BIOS versions based in the ACPI 4.0 specification have limited options to control where the errors are injected. Your BIOS may support an extension (enabled with the param_extension=1 module parameter, or diff --git a/Documentation/aoe/aoe.txt b/Documentation/aoe/aoe.txt index b5aada9f20cc..5f5aa16047ff 100644 --- a/Documentation/aoe/aoe.txt +++ b/Documentation/aoe/aoe.txt @@ -35,7 +35,7 @@ CREATING DEVICE NODES sh Documentation/aoe/mkshelf.sh /dev/etherd 0 There is also an autoload script that shows how to edit - /etc/modprobe.conf to ensure that the aoe module is loaded when + /etc/modprobe.d/aoe.conf to ensure that the aoe module is loaded when necessary. USING DEVICE NODES diff --git a/Documentation/aoe/autoload.sh b/Documentation/aoe/autoload.sh index 78dad1334c6f..815dff4691c9 100644 --- a/Documentation/aoe/autoload.sh +++ b/Documentation/aoe/autoload.sh @@ -1,8 +1,8 @@ #!/bin/sh # set aoe to autoload by installing the -# aliases in /etc/modprobe.conf +# aliases in /etc/modprobe.d/ -f=/etc/modprobe.conf +f=/etc/modprobe.d/aoe.conf if test ! -r $f || test ! -w $f; then echo "cannot configure $f for module autoloading" 1>&2 diff --git a/Documentation/arm/kernel_user_helpers.txt b/Documentation/arm/kernel_user_helpers.txt index a17df9f91d16..5673594717cf 100644 --- a/Documentation/arm/kernel_user_helpers.txt +++ b/Documentation/arm/kernel_user_helpers.txt @@ -25,7 +25,7 @@ inline (either in the code emitted directly by the compiler, or part of the implementation of a library call) when optimizing for a recent enough processor that has the necessary native support, but only if resulting binaries are already to be incompatible with earlier ARM processors due to -useage of similar native instructions for other things. In other words +usage of similar native instructions for other things. In other words don't make binaries unable to run on earlier processors just for the sake of not using these kernel helpers if your compiled code is not going to use new instructions for other purpose. diff --git a/Documentation/backlight/lp855x-driver.txt b/Documentation/backlight/lp855x-driver.txt new file mode 100644 index 000000000000..f5e4caafab7d --- /dev/null +++ b/Documentation/backlight/lp855x-driver.txt @@ -0,0 +1,78 @@ +Kernel driver lp855x +==================== + +Backlight driver for LP855x ICs + +Supported chips: + Texas Instruments LP8550, LP8551, LP8552, LP8553 and LP8556 + +Author: Milo(Woogyom) Kim <milo.kim@ti.com> + +Description +----------- + +* Brightness control + +Brightness can be controlled by the pwm input or the i2c command. +The lp855x driver supports both cases. + +* Device attributes + +1) bl_ctl_mode +Backlight control mode. +Value : pwm based or register based + +2) chip_id +The lp855x chip id. +Value : lp8550/lp8551/lp8552/lp8553/lp8556 + +Platform data for lp855x +------------------------ + +For supporting platform specific data, the lp855x platform data can be used. + +* name : Backlight driver name. If it is not defined, default name is set. +* mode : Brightness control mode. PWM or register based. +* device_control : Value of DEVICE CONTROL register. +* initial_brightness : Initial value of backlight brightness. +* pwm_data : Platform specific pwm generation functions. + Only valid when brightness is pwm input mode. + Functions should be implemented by PWM driver. + - pwm_set_intensity() : set duty of PWM + - pwm_get_intensity() : get current duty of PWM +* load_new_rom_data : + 0 : use default configuration data + 1 : update values of eeprom or eprom registers on loading driver +* size_program : Total size of lp855x_rom_data. +* rom_data : List of new eeprom/eprom registers. + +example 1) lp8552 platform data : i2c register mode with new eeprom data + +#define EEPROM_A5_ADDR 0xA5 +#define EEPROM_A5_VAL 0x4f /* EN_VSYNC=0 */ + +static struct lp855x_rom_data lp8552_eeprom_arr[] = { + {EEPROM_A5_ADDR, EEPROM_A5_VAL}, +}; + +static struct lp855x_platform_data lp8552_pdata = { + .name = "lcd-bl", + .mode = REGISTER_BASED, + .device_control = I2C_CONFIG(LP8552), + .initial_brightness = INITIAL_BRT, + .load_new_rom_data = 1, + .size_program = ARRAY_SIZE(lp8552_eeprom_arr), + .rom_data = lp8552_eeprom_arr, +}; + +example 2) lp8556 platform data : pwm input mode with default rom data + +static struct lp855x_platform_data lp8556_pdata = { + .mode = PWM_BASED, + .device_control = PWM_CONFIG(LP8556), + .initial_brightness = INITIAL_BRT, + .pwm_data = { + .pwm_set_intensity = platform_pwm_set_intensity, + .pwm_get_intensity = platform_pwm_get_intensity, + }, +}; diff --git a/Documentation/blockdev/floppy.txt b/Documentation/blockdev/floppy.txt index 6ccab88705cb..470fe4b5e379 100644 --- a/Documentation/blockdev/floppy.txt +++ b/Documentation/blockdev/floppy.txt @@ -49,7 +49,7 @@ you can put: options floppy omnibook messages -in /etc/modprobe.conf. +in a configuration file in /etc/modprobe.d/. The floppy driver related options are: diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index 84f0a15fc210..b4b1fb3a83f0 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -94,11 +94,11 @@ Throttling/Upper Limit policy Hierarchical Cgroups ==================== -- Currently none of the IO control policy supports hierarhical groups. But - cgroup interface does allow creation of hierarhical cgroups and internally +- Currently none of the IO control policy supports hierarchical groups. But + cgroup interface does allow creation of hierarchical cgroups and internally IO policies treat them as flat hierarchy. - So this patch will allow creation of cgroup hierarhcy but at the backend + So this patch will allow creation of cgroup hierarchcy but at the backend everything will be treated as flat. So if somebody created a hierarchy like as follows. @@ -266,7 +266,7 @@ Proportional weight policy files - blkio.idle_time - Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. This is the amount of time spent by the IO scheduler idling for a - given cgroup in anticipation of a better request than the exising ones + given cgroup in anticipation of a better request than the existing ones from other queues/cgroups. This is in nanoseconds. If this is read when the cgroup is in an idling state, the stat will only report the idle_time accumulated till the last idle period and will not include @@ -283,34 +283,34 @@ Throttling/Upper limit policy files ----------------------------------- - blkio.throttle.read_bps_device - Specifies upper limit on READ rate from the device. IO rate is - specified in bytes per second. Rules are per deivce. Following is + specified in bytes per second. Rules are per device. Following is the format. echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device - blkio.throttle.write_bps_device - Specifies upper limit on WRITE rate to the device. IO rate is - specified in bytes per second. Rules are per deivce. Following is + specified in bytes per second. Rules are per device. Following is the format. echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device - blkio.throttle.read_iops_device - Specifies upper limit on READ rate from the device. IO rate is - specified in IO per second. Rules are per deivce. Following is + specified in IO per second. Rules are per device. Following is the format. echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device - blkio.throttle.write_iops_device - Specifies upper limit on WRITE rate to the device. IO rate is - specified in io per second. Rules are per deivce. Following is + specified in io per second. Rules are per device. Following is the format. echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device Note: If both BW and IOPS rules are specified for a device, then IO is - subjectd to both the constraints. + subjected to both the constraints. - blkio.throttle.io_serviced - Number of IOs (bio) completed to/from the disk by the group (as diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index a7c96ae5557c..8e74980ab385 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -558,8 +558,7 @@ Each subsystem may export the following methods. The only mandatory methods are create/destroy. Any others that are null are presumed to be successful no-ops. -struct cgroup_subsys_state *create(struct cgroup_subsys *ss, - struct cgroup *cgrp) +struct cgroup_subsys_state *create(struct cgroup *cgrp) (cgroup_mutex held by caller) Called to create a subsystem state object for a cgroup. The @@ -574,7 +573,7 @@ identified by the passed cgroup object having a NULL parent (since it's the root of the hierarchy) and may be an appropriate place for initialization code. -void destroy(struct cgroup_subsys *ss, struct cgroup *cgrp) +void destroy(struct cgroup *cgrp) (cgroup_mutex held by caller) The cgroup system is about to destroy the passed cgroup; the subsystem @@ -585,7 +584,7 @@ cgroup->parent is still valid. (Note - can also be called for a newly-created cgroup if an error occurs after this subsystem's create() method has been called for the new cgroup). -int pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp); +int pre_destroy(struct cgroup *cgrp); Called before checking the reference count on each subsystem. This may be useful for subsystems which have some extra references even if @@ -593,8 +592,7 @@ there are not tasks in the cgroup. If pre_destroy() returns error code, rmdir() will fail with it. From this behavior, pre_destroy() can be called multiple times against a cgroup. -int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup_taskset *tset) +int can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) (cgroup_mutex held by caller) Called prior to moving one or more tasks into a cgroup; if the @@ -615,8 +613,7 @@ fork. If this method returns 0 (success) then this should remain valid while the caller holds cgroup_mutex and it is ensured that either attach() or cancel_attach() will be called in future. -void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup_taskset *tset) +void cancel_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) (cgroup_mutex held by caller) Called when a task attach operation has failed after can_attach() has succeeded. @@ -625,23 +622,22 @@ function, so that the subsystem can implement a rollback. If not, not necessary. This will be called only about subsystems whose can_attach() operation have succeeded. The parameters are identical to can_attach(). -void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup_taskset *tset) +void attach(struct cgroup *cgrp, struct cgroup_taskset *tset) (cgroup_mutex held by caller) Called after the task has been attached to the cgroup, to allow any post-attachment activity that requires memory allocations or blocking. The parameters are identical to can_attach(). -void fork(struct cgroup_subsy *ss, struct task_struct *task) +void fork(struct task_struct *task) Called when a task is forked into a cgroup. -void exit(struct cgroup_subsys *ss, struct task_struct *task) +void exit(struct task_struct *task) Called during task exit. -int populate(struct cgroup_subsys *ss, struct cgroup *cgrp) +int populate(struct cgroup *cgrp) (cgroup_mutex held by caller) Called after creation of a cgroup to allow a subsystem to populate @@ -651,7 +647,7 @@ include/linux/cgroup.h for details). Note that although this method can return an error code, the error code is currently not always handled well. -void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp) +void post_clone(struct cgroup *cgrp) (cgroup_mutex held by caller) Called during cgroup_create() to do any parameter @@ -659,7 +655,7 @@ initialization which might be required before a task could attach. For example in cpusets, no task may attach before 'cpus' and 'mems' are set up. -void bind(struct cgroup_subsys *ss, struct cgroup *root) +void bind(struct cgroup *root) (cgroup_mutex and ss->hierarchy_mutex held by caller) Called when a cgroup subsystem is rebound to a different hierarchy diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index 5c51ed406d1d..cefd3d8bbd11 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt @@ -217,7 +217,7 @@ and name space for cpusets, with a minimum of additional kernel code. The cpus and mems files in the root (top_cpuset) cpuset are read-only. The cpus file automatically tracks the value of -cpu_online_map using a CPU hotplug notifier, and the mems file +cpu_online_mask using a CPU hotplug notifier, and the mems file automatically tracks the value of node_states[N_HIGH_MEMORY]--i.e., nodes with memory--using the cpuset_track_online_nodes() hook. diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 4c95c0034a4b..9b1067afb224 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -34,8 +34,7 @@ Current Status: linux-2.6.34-mmotm(development version of 2010/April) Features: - accounting anonymous pages, file caches, swap caches usage and limiting them. - - private LRU and reclaim routine. (system's global LRU and private LRU - work independently from each other) + - pages are linked to per-memcg LRU exclusively, and there is no global LRU. - optionally, memory+swap usage can be accounted and limited. - hierarchical accounting - soft limit @@ -154,7 +153,7 @@ updated. page_cgroup has its own LRU on cgroup. 2.2.1 Accounting details All mapped anon pages (RSS) and cache pages (Page Cache) are accounted. -Some pages which are never reclaimable and will not be on the global LRU +Some pages which are never reclaimable and will not be on the LRU are not accounted. We just account pages under usual VM management. RSS pages are accounted at page_fault unless they've already been accounted diff --git a/Documentation/clk.txt b/Documentation/clk.txt new file mode 100644 index 000000000000..1943fae014fd --- /dev/null +++ b/Documentation/clk.txt @@ -0,0 +1,233 @@ + The Common Clk Framework + Mike Turquette <mturquette@ti.com> + +This document endeavours to explain the common clk framework details, +and how to port a platform over to this framework. It is not yet a +detailed explanation of the clock api in include/linux/clk.h, but +perhaps someday it will include that information. + + Part 1 - introduction and interface split + +The common clk framework is an interface to control the clock nodes +available on various devices today. This may come in the form of clock +gating, rate adjustment, muxing or other operations. This framework is +enabled with the CONFIG_COMMON_CLK option. + +The interface itself is divided into two halves, each shielded from the +details of its counterpart. First is the common definition of struct +clk which unifies the framework-level accounting and infrastructure that +has traditionally been duplicated across a variety of platforms. Second +is a common implementation of the clk.h api, defined in +drivers/clk/clk.c. Finally there is struct clk_ops, whose operations +are invoked by the clk api implementation. + +The second half of the interface is comprised of the hardware-specific +callbacks registered with struct clk_ops and the corresponding +hardware-specific structures needed to model a particular clock. For +the remainder of this document any reference to a callback in struct +clk_ops, such as .enable or .set_rate, implies the hardware-specific +implementation of that code. Likewise, references to struct clk_foo +serve as a convenient shorthand for the implementation of the +hardware-specific bits for the hypothetical "foo" hardware. + +Tying the two halves of this interface together is struct clk_hw, which +is defined in struct clk_foo and pointed to within struct clk. This +allows easy for navigation between the two discrete halves of the common +clock interface. + + Part 2 - common data structures and api + +Below is the common struct clk definition from +include/linux/clk-private.h, modified for brevity: + + struct clk { + const char *name; + const struct clk_ops *ops; + struct clk_hw *hw; + char **parent_names; + struct clk **parents; + struct clk *parent; + struct hlist_head children; + struct hlist_node child_node; + ... + }; + +The members above make up the core of the clk tree topology. The clk +api itself defines several driver-facing functions which operate on +struct clk. That api is documented in include/linux/clk.h. + +Platforms and devices utilizing the common struct clk use the struct +clk_ops pointer in struct clk to perform the hardware-specific parts of +the operations defined in clk.h: + + struct clk_ops { + int (*prepare)(struct clk_hw *hw); + void (*unprepare)(struct clk_hw *hw); + int (*enable)(struct clk_hw *hw); + void (*disable)(struct clk_hw *hw); + int (*is_enabled)(struct clk_hw *hw); + unsigned long (*recalc_rate)(struct clk_hw *hw, + unsigned long parent_rate); + long (*round_rate)(struct clk_hw *hw, unsigned long, + unsigned long *); + int (*set_parent)(struct clk_hw *hw, u8 index); + u8 (*get_parent)(struct clk_hw *hw); + int (*set_rate)(struct clk_hw *hw, unsigned long); + void (*init)(struct clk_hw *hw); + }; + + Part 3 - hardware clk implementations + +The strength of the common struct clk comes from its .ops and .hw pointers +which abstract the details of struct clk from the hardware-specific bits, and +vice versa. To illustrate consider the simple gateable clk implementation in +drivers/clk/clk-gate.c: + +struct clk_gate { + struct clk_hw hw; + void __iomem *reg; + u8 bit_idx; + ... +}; + +struct clk_gate contains struct clk_hw hw as well as hardware-specific +knowledge about which register and bit controls this clk's gating. +Nothing about clock topology or accounting, such as enable_count or +notifier_count, is needed here. That is all handled by the common +framework code and struct clk. + +Let's walk through enabling this clk from driver code: + + struct clk *clk; + clk = clk_get(NULL, "my_gateable_clk"); + + clk_prepare(clk); + clk_enable(clk); + +The call graph for clk_enable is very simple: + +clk_enable(clk); + clk->ops->enable(clk->hw); + [resolves to...] + clk_gate_enable(hw); + [resolves struct clk gate with to_clk_gate(hw)] + clk_gate_set_bit(gate); + +And the definition of clk_gate_set_bit: + +static void clk_gate_set_bit(struct clk_gate *gate) +{ + u32 reg; + + reg = __raw_readl(gate->reg); + reg |= BIT(gate->bit_idx); + writel(reg, gate->reg); +} + +Note that to_clk_gate is defined as: + +#define to_clk_gate(_hw) container_of(_hw, struct clk_gate, clk) + +This pattern of abstraction is used for every clock hardware +representation. + + Part 4 - supporting your own clk hardware + +When implementing support for a new type of clock it only necessary to +include the following header: + +#include <linux/clk-provider.h> + +include/linux/clk.h is included within that header and clk-private.h +must never be included from the code which implements the operations for +a clock. More on that below in Part 5. + +To construct a clk hardware structure for your platform you must define +the following: + +struct clk_foo { + struct clk_hw hw; + ... hardware specific data goes here ... +}; + +To take advantage of your data you'll need to support valid operations +for your clk: + +struct clk_ops clk_foo_ops { + .enable = &clk_foo_enable; + .disable = &clk_foo_disable; +}; + +Implement the above functions using container_of: + +#define to_clk_foo(_hw) container_of(_hw, struct clk_foo, hw) + +int clk_foo_enable(struct clk_hw *hw) +{ + struct clk_foo *foo; + + foo = to_clk_foo(hw); + + ... perform magic on foo ... + + return 0; +}; + +Below is a matrix detailing which clk_ops are mandatory based upon the +hardware capbilities of that clock. A cell marked as "y" means +mandatory, a cell marked as "n" implies that either including that +callback is invalid or otherwise uneccesary. Empty cells are either +optional or must be evaluated on a case-by-case basis. + + clock hardware characteristics + ----------------------------------------------------------- + | gate | change rate | single parent | multiplexer | root | + |------|-------------|---------------|-------------|------| +.prepare | | | | | | +.unprepare | | | | | | + | | | | | | +.enable | y | | | | | +.disable | y | | | | | +.is_enabled | y | | | | | + | | | | | | +.recalc_rate | | y | | | | +.round_rate | | y | | | | +.set_rate | | y | | | | + | | | | | | +.set_parent | | | n | y | n | +.get_parent | | | n | y | n | + | | | | | | +.init | | | | | | + ----------------------------------------------------------- + +Finally, register your clock at run-time with a hardware-specific +registration function. This function simply populates struct clk_foo's +data and then passes the common struct clk parameters to the framework +with a call to: + +clk_register(...) + +See the basic clock types in drivers/clk/clk-*.c for examples. + + Part 5 - static initialization of clock data + +For platforms with many clocks (often numbering into the hundreds) it +may be desirable to statically initialize some clock data. This +presents a problem since the definition of struct clk should be hidden +from everyone except for the clock core in drivers/clk/clk.c. + +To get around this problem struct clk's definition is exposed in +include/linux/clk-private.h along with some macros for more easily +initializing instances of the basic clock types. These clocks must +still be initialized with the common clock framework via a call to +__clk_init. + +clk-private.h must NEVER be included by code which implements struct +clk_ops callbacks, nor must it be included by any logic which pokes +around inside of struct clk at run-time. To do so is a layering +violation. + +To better enforce this policy, always follow this simple rule: any +statically initialized clock data MUST be defined in a separate file +from the logic that implements its ops. Basically separate the logic +from the data and all is well. diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index a20bfd415e41..66ef8f35613d 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt @@ -47,7 +47,7 @@ maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using other cpus later online, read FAQ's for more info. additional_cpus=n (*) Use this to limit hotpluggable cpus. This option sets - cpu_possible_map = cpu_present_map + additional_cpus + cpu_possible_mask = cpu_present_mask + additional_cpus cede_offline={"off","on"} Use this option to disable/enable putting offlined processors to an extended H_CEDE state on @@ -64,11 +64,11 @@ should only rely on this to count the # of cpus, but *MUST* not rely on the apicid values in those tables for disabled apics. In the event BIOS doesn't mark such hot-pluggable cpus as disabled entries, one could use this parameter "additional_cpus=x" to represent those cpus in the -cpu_possible_map. +cpu_possible_mask. possible_cpus=n [s390,x86_64] use this to set hotpluggable cpus. This option sets possible_cpus bits in - cpu_possible_map. Thus keeping the numbers of bits set + cpu_possible_mask. Thus keeping the numbers of bits set constant even if the machine gets rebooted. CPU maps and such @@ -76,7 +76,7 @@ CPU maps and such [More on cpumaps and primitive to manipulate, please check include/linux/cpumask.h that has more descriptive text.] -cpu_possible_map: Bitmap of possible CPUs that can ever be available in the +cpu_possible_mask: Bitmap of possible CPUs that can ever be available in the system. This is used to allocate some boot time memory for per_cpu variables that aren't designed to grow/shrink as CPUs are made available or removed. Once set during boot time discovery phase, the map is static, i.e no bits @@ -84,13 +84,13 @@ are added or removed anytime. Trimming it accurately for your system needs upfront can save some boot time memory. See below for how we use heuristics in x86_64 case to keep this under check. -cpu_online_map: Bitmap of all CPUs currently online. Its set in __cpu_up() +cpu_online_mask: Bitmap of all CPUs currently online. Its set in __cpu_up() after a cpu is available for kernel scheduling and ready to receive interrupts from devices. Its cleared when a cpu is brought down using __cpu_disable(), before which all OS services including interrupts are migrated to another target CPU. -cpu_present_map: Bitmap of CPUs currently present in the system. Not all +cpu_present_mask: Bitmap of CPUs currently present in the system. Not all of them may be online. When physical hotplug is processed by the relevant subsystem (e.g ACPI) can change and new bit either be added or removed from the map depending on the event is hot-add/hot-remove. There are currently @@ -99,22 +99,22 @@ at which time hotplug is disabled. You really dont need to manipulate any of the system cpu maps. They should be read-only for most use. When setting up per-cpu resources almost always use -cpu_possible_map/for_each_possible_cpu() to iterate. +cpu_possible_mask/for_each_possible_cpu() to iterate. Never use anything other than cpumask_t to represent bitmap of CPUs. #include <linux/cpumask.h> - for_each_possible_cpu - Iterate over cpu_possible_map - for_each_online_cpu - Iterate over cpu_online_map - for_each_present_cpu - Iterate over cpu_present_map + for_each_possible_cpu - Iterate over cpu_possible_mask + for_each_online_cpu - Iterate over cpu_online_mask + for_each_present_cpu - Iterate over cpu_present_mask for_each_cpu_mask(x,mask) - Iterate over some random collection of cpu mask. #include <linux/cpu.h> get_online_cpus() and put_online_cpus(): The above calls are used to inhibit cpu hotplug operations. While the -cpu_hotplug.refcount is non zero, the cpu_online_map will not change. +cpu_hotplug.refcount is non zero, the cpu_online_mask will not change. If you merely need to avoid cpus going away, you could also use preempt_disable() and preempt_enable() for those sections. Just remember the critical section cannot call any diff --git a/Documentation/cpuidle/sysfs.txt b/Documentation/cpuidle/sysfs.txt index 50d7b1642759..9d28a3406e74 100644 --- a/Documentation/cpuidle/sysfs.txt +++ b/Documentation/cpuidle/sysfs.txt @@ -36,6 +36,7 @@ drwxr-xr-x 2 root root 0 Feb 8 10:42 state3 /sys/devices/system/cpu/cpu0/cpuidle/state0: total 0 -r--r--r-- 1 root root 4096 Feb 8 10:42 desc +-rw-r--r-- 1 root root 4096 Feb 8 10:42 disable -r--r--r-- 1 root root 4096 Feb 8 10:42 latency -r--r--r-- 1 root root 4096 Feb 8 10:42 name -r--r--r-- 1 root root 4096 Feb 8 10:42 power @@ -45,6 +46,7 @@ total 0 /sys/devices/system/cpu/cpu0/cpuidle/state1: total 0 -r--r--r-- 1 root root 4096 Feb 8 10:42 desc +-rw-r--r-- 1 root root 4096 Feb 8 10:42 disable -r--r--r-- 1 root root 4096 Feb 8 10:42 latency -r--r--r-- 1 root root 4096 Feb 8 10:42 name -r--r--r-- 1 root root 4096 Feb 8 10:42 power @@ -54,6 +56,7 @@ total 0 /sys/devices/system/cpu/cpu0/cpuidle/state2: total 0 -r--r--r-- 1 root root 4096 Feb 8 10:42 desc +-rw-r--r-- 1 root root 4096 Feb 8 10:42 disable -r--r--r-- 1 root root 4096 Feb 8 10:42 latency -r--r--r-- 1 root root 4096 Feb 8 10:42 name -r--r--r-- 1 root root 4096 Feb 8 10:42 power @@ -63,6 +66,7 @@ total 0 /sys/devices/system/cpu/cpu0/cpuidle/state3: total 0 -r--r--r-- 1 root root 4096 Feb 8 10:42 desc +-rw-r--r-- 1 root root 4096 Feb 8 10:42 disable -r--r--r-- 1 root root 4096 Feb 8 10:42 latency -r--r--r-- 1 root root 4096 Feb 8 10:42 name -r--r--r-- 1 root root 4096 Feb 8 10:42 power @@ -72,6 +76,7 @@ total 0 * desc : Small description about the idle state (string) +* disable : Option to disable this idle state (bool) * latency : Latency to exit out of this idle state (in microseconds) * name : Name of the idle state (string) * power : Power consumed while in this idle state (in milliwatts) diff --git a/Documentation/crc32.txt b/Documentation/crc32.txt new file mode 100644 index 000000000000..a08a7dd9d625 --- /dev/null +++ b/Documentation/crc32.txt @@ -0,0 +1,182 @@ +A brief CRC tutorial. + +A CRC is a long-division remainder. You add the CRC to the message, +and the whole thing (message+CRC) is a multiple of the given +CRC polynomial. To check the CRC, you can either check that the +CRC matches the recomputed value, *or* you can check that the +remainder computed on the message+CRC is 0. This latter approach +is used by a lot of hardware implementations, and is why so many +protocols put the end-of-frame flag after the CRC. + +It's actually the same long division you learned in school, except that +- We're working in binary, so the digits are only 0 and 1, and +- When dividing polynomials, there are no carries. Rather than add and + subtract, we just xor. Thus, we tend to get a bit sloppy about + the difference between adding and subtracting. + +Like all division, the remainder is always smaller than the divisor. +To produce a 32-bit CRC, the divisor is actually a 33-bit CRC polynomial. +Since it's 33 bits long, bit 32 is always going to be set, so usually the +CRC is written in hex with the most significant bit omitted. (If you're +familiar with the IEEE 754 floating-point format, it's the same idea.) + +Note that a CRC is computed over a string of *bits*, so you have +to decide on the endianness of the bits within each byte. To get +the best error-detecting properties, this should correspond to the +order they're actually sent. For example, standard RS-232 serial is +little-endian; the most significant bit (sometimes used for parity) +is sent last. And when appending a CRC word to a message, you should +do it in the right order, matching the endianness. + +Just like with ordinary division, you proceed one digit (bit) at a time. +Each step of the division you take one more digit (bit) of the dividend +and append it to the current remainder. Then you figure out the +appropriate multiple of the divisor to subtract to being the remainder +back into range. In binary, this is easy - it has to be either 0 or 1, +and to make the XOR cancel, it's just a copy of bit 32 of the remainder. + +When computing a CRC, we don't care about the quotient, so we can +throw the quotient bit away, but subtract the appropriate multiple of +the polynomial from the remainder and we're back to where we started, +ready to process the next bit. + +A big-endian CRC written this way would be coded like: +for (i = 0; i < input_bits; i++) { + multiple = remainder & 0x80000000 ? CRCPOLY : 0; + remainder = (remainder << 1 | next_input_bit()) ^ multiple; +} + +Notice how, to get at bit 32 of the shifted remainder, we look +at bit 31 of the remainder *before* shifting it. + +But also notice how the next_input_bit() bits we're shifting into +the remainder don't actually affect any decision-making until +32 bits later. Thus, the first 32 cycles of this are pretty boring. +Also, to add the CRC to a message, we need a 32-bit-long hole for it at +the end, so we have to add 32 extra cycles shifting in zeros at the +end of every message, + +These details lead to a standard trick: rearrange merging in the +next_input_bit() until the moment it's needed. Then the first 32 cycles +can be precomputed, and merging in the final 32 zero bits to make room +for the CRC can be skipped entirely. This changes the code to: + +for (i = 0; i < input_bits; i++) { + remainder ^= next_input_bit() << 31; + multiple = (remainder & 0x80000000) ? CRCPOLY : 0; + remainder = (remainder << 1) ^ multiple; +} + +With this optimization, the little-endian code is particularly simple: +for (i = 0; i < input_bits; i++) { + remainder ^= next_input_bit(); + multiple = (remainder & 1) ? CRCPOLY : 0; + remainder = (remainder >> 1) ^ multiple; +} + +The most significant coefficient of the remainder polynomial is stored +in the least significant bit of the binary "remainder" variable. +The other details of endianness have been hidden in CRCPOLY (which must +be bit-reversed) and next_input_bit(). + +As long as next_input_bit is returning the bits in a sensible order, we don't +*have* to wait until the last possible moment to merge in additional bits. +We can do it 8 bits at a time rather than 1 bit at a time: +for (i = 0; i < input_bytes; i++) { + remainder ^= next_input_byte() << 24; + for (j = 0; j < 8; j++) { + multiple = (remainder & 0x80000000) ? CRCPOLY : 0; + remainder = (remainder << 1) ^ multiple; + } +} + +Or in little-endian: +for (i = 0; i < input_bytes; i++) { + remainder ^= next_input_byte(); + for (j = 0; j < 8; j++) { + multiple = (remainder & 1) ? CRCPOLY : 0; + remainder = (remainder >> 1) ^ multiple; + } +} + +If the input is a multiple of 32 bits, you can even XOR in a 32-bit +word at a time and increase the inner loop count to 32. + +You can also mix and match the two loop styles, for example doing the +bulk of a message byte-at-a-time and adding bit-at-a-time processing +for any fractional bytes at the end. + +To reduce the number of conditional branches, software commonly uses +the byte-at-a-time table method, popularized by Dilip V. Sarwate, +"Computation of Cyclic Redundancy Checks via Table Look-Up", Comm. ACM +v.31 no.8 (August 1998) p. 1008-1013. + +Here, rather than just shifting one bit of the remainder to decide +in the correct multiple to subtract, we can shift a byte at a time. +This produces a 40-bit (rather than a 33-bit) intermediate remainder, +and the correct multiple of the polynomial to subtract is found using +a 256-entry lookup table indexed by the high 8 bits. + +(The table entries are simply the CRC-32 of the given one-byte messages.) + +When space is more constrained, smaller tables can be used, e.g. two +4-bit shifts followed by a lookup in a 16-entry table. + +It is not practical to process much more than 8 bits at a time using this +technique, because tables larger than 256 entries use too much memory and, +more importantly, too much of the L1 cache. + +To get higher software performance, a "slicing" technique can be used. +See "High Octane CRC Generation with the Intel Slicing-by-8 Algorithm", +ftp://download.intel.com/technology/comms/perfnet/download/slicing-by-8.pdf + +This does not change the number of table lookups, but does increase +the parallelism. With the classic Sarwate algorithm, each table lookup +must be completed before the index of the next can be computed. + +A "slicing by 2" technique would shift the remainder 16 bits at a time, +producing a 48-bit intermediate remainder. Rather than doing a single +lookup in a 65536-entry table, the two high bytes are looked up in +two different 256-entry tables. Each contains the remainder required +to cancel out the corresponding byte. The tables are different because the +polynomials to cancel are different. One has non-zero coefficients from +x^32 to x^39, while the other goes from x^40 to x^47. + +Since modern processors can handle many parallel memory operations, this +takes barely longer than a single table look-up and thus performs almost +twice as fast as the basic Sarwate algorithm. + +This can be extended to "slicing by 4" using 4 256-entry tables. +Each step, 32 bits of data is fetched, XORed with the CRC, and the result +broken into bytes and looked up in the tables. Because the 32-bit shift +leaves the low-order bits of the intermediate remainder zero, the +final CRC is simply the XOR of the 4 table look-ups. + +But this still enforces sequential execution: a second group of table +look-ups cannot begin until the previous groups 4 table look-ups have all +been completed. Thus, the processor's load/store unit is sometimes idle. + +To make maximum use of the processor, "slicing by 8" performs 8 look-ups +in parallel. Each step, the 32-bit CRC is shifted 64 bits and XORed +with 64 bits of input data. What is important to note is that 4 of +those 8 bytes are simply copies of the input data; they do not depend +on the previous CRC at all. Thus, those 4 table look-ups may commence +immediately, without waiting for the previous loop iteration. + +By always having 4 loads in flight, a modern superscalar processor can +be kept busy and make full use of its L1 cache. + +Two more details about CRC implementation in the real world: + +Normally, appending zero bits to a message which is already a multiple +of a polynomial produces a larger multiple of that polynomial. Thus, +a basic CRC will not detect appended zero bits (or bytes). To enable +a CRC to detect this condition, it's common to invert the CRC before +appending it. This makes the remainder of the message+crc come out not +as zero, but some fixed non-zero value. (The CRC of the inversion +pattern, 0xffffffff.) + +The same problem applies to zero bits prepended to the message, and a +similar solution is used. Instead of starting the CRC computation with +a remainder of 0, an initial remainder of all ones is used. As long as +you start the same way on decoding, it doesn't make a difference. diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.txt index 2a8c11331d2d..946c73342cde 100644 --- a/Documentation/device-mapper/dm-raid.txt +++ b/Documentation/device-mapper/dm-raid.txt @@ -28,7 +28,7 @@ The target is named "raid" and it accepts the following parameters: raid6_nc RAID6 N continue - rotating parity N (right-to-left) with data continuation - Refererence: Chapter 4 of + Reference: Chapter 4 of http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf <#raid_params>: The number of parameters that follow. diff --git a/Documentation/device-mapper/persistent-data.txt b/Documentation/device-mapper/persistent-data.txt index 0e5df9b04ad2..a333bcb3a6c2 100644 --- a/Documentation/device-mapper/persistent-data.txt +++ b/Documentation/device-mapper/persistent-data.txt @@ -3,7 +3,7 @@ Introduction The more-sophisticated device-mapper targets require complex metadata that is managed in kernel. In late 2010 we were seeing that various -different targets were rolling their own data strutures, for example: +different targets were rolling their own data structures, for example: - Mikulas Patocka's multisnap implementation - Heinz Mauelshagen's thin provisioning target diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt index 801d9d1cf82b..3370bc4d7b98 100644 --- a/Documentation/device-mapper/thin-provisioning.txt +++ b/Documentation/device-mapper/thin-provisioning.txt @@ -1,7 +1,7 @@ Introduction ============ -This document descibes a collection of device-mapper targets that +This document describes a collection of device-mapper targets that between them implement thin-provisioning and snapshots. The main highlight of this implementation, compared to the previous @@ -75,10 +75,12 @@ less sharing than average you'll need a larger-than-average metadata device. As a guide, we suggest you calculate the number of bytes to use in the metadata device as 48 * $data_dev_size / $data_block_size but round it up -to 2MB if the answer is smaller. The largest size supported is 16GB. +to 2MB if the answer is smaller. If you're creating large numbers of +snapshots which are recording large amounts of change, you may find you +need to increase this. -If you're creating large numbers of snapshots which are recording large -amounts of change, you may need find you need to increase this. +The largest size supported is 16GB: If the device is larger, +a warning will be issued and the excess space will not be used. Reloading a pool table ---------------------- @@ -167,6 +169,38 @@ ii) Using an internal snapshot. dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1" +External snapshots +------------------ + +You can use an external _read only_ device as an origin for a +thinly-provisioned volume. Any read to an unprovisioned area of the +thin device will be passed through to the origin. Writes trigger +the allocation of new blocks as usual. + +One use case for this is VM hosts that want to run guests on +thinly-provisioned volumes but have the base image on another device +(possibly shared between many VMs). + +You must not write to the origin device if you use this technique! +Of course, you may write to the thin device and take internal snapshots +of the thin volume. + +i) Creating a snapshot of an external device + + This is the same as creating a thin device. + You don't mention the origin at this stage. + + dmsetup message /dev/mapper/pool 0 "create_thin 0" + +ii) Using a snapshot of an external device. + + Append an extra parameter to the thin target specifying the origin: + + dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image" + + N.B. All descendants (internal snapshots) of this snapshot require the + same extra origin parameter. + Deactivation ------------ @@ -189,7 +223,13 @@ i) Constructor <low water mark (blocks)> [<number of feature args> [<arg>]*] Optional feature arguments: - - 'skip_block_zeroing': skips the zeroing of newly-provisioned blocks. + + skip_block_zeroing: Skip the zeroing of newly-provisioned blocks. + + ignore_discard: Disable discard support. + + no_discard_passdown: Don't pass discards down to the underlying + data device, but just remove the mapping. Data block size must be between 64KB (128 sectors) and 1GB (2097152 sectors) inclusive. @@ -237,16 +277,6 @@ iii) Messages Deletes a thin device. Irreversible. - trim <dev id> <new size in sectors> - - Delete mappings from the end of a thin device. Irreversible. - You might want to use this if you're reducing the size of - your thinly-provisioned device. In many cases, due to the - sharing of blocks between devices, it is not possible to - determine in advance how much space 'trim' will release. (In - future a userspace tool might be able to perform this - calculation.) - set_transaction_id <current id> <new id> Userland volume managers, such as LVM, need a way to @@ -262,7 +292,7 @@ iii) Messages i) Constructor - thin <pool dev> <dev id> + thin <pool dev> <dev id> [<external origin dev>] pool dev: the thin-pool device, e.g. /dev/mapper/my_pool or 253:0 @@ -271,6 +301,11 @@ i) Constructor the internal device identifier of the device to be activated. + external origin dev: + an optional block device outside the pool to be treated as a + read-only snapshot origin: reads to unprovisioned areas of the + thin target will be mapped to this device. + The pool doesn't store any size against the thin devices. If you load a thin target that is smaller than you've been using previously, then you'll have no access to blocks mapped beyond the end. If you diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt new file mode 100644 index 000000000000..32e48797a14f --- /dev/null +++ b/Documentation/device-mapper/verity.txt @@ -0,0 +1,194 @@ +dm-verity +========== + +Device-Mapper's "verity" target provides transparent integrity checking of +block devices using a cryptographic digest provided by the kernel crypto API. +This target is read-only. + +Construction Parameters +======================= + <version> <dev> <hash_dev> <hash_start> + <data_block_size> <hash_block_size> + <num_data_blocks> <hash_start_block> + <algorithm> <digest> <salt> + +<version> + This is the version number of the on-disk format. + + 0 is the original format used in the Chromium OS. + The salt is appended when hashing, digests are stored continuously and + the rest of the block is padded with zeros. + + 1 is the current format that should be used for new devices. + The salt is prepended when hashing and each digest is + padded with zeros to the power of two. + +<dev> + This is the device containing the data the integrity of which needs to be + checked. It may be specified as a path, like /dev/sdaX, or a device number, + <major>:<minor>. + +<hash_dev> + This is the device that that supplies the hash tree data. It may be + specified similarly to the device path and may be the same device. If the + same device is used, the hash_start should be outside of the dm-verity + configured device size. + +<data_block_size> + The block size on a data device. Each block corresponds to one digest on + the hash device. + +<hash_block_size> + The size of a hash block. + +<num_data_blocks> + The number of data blocks on the data device. Additional blocks are + inaccessible. You can place hashes to the same partition as data, in this + case hashes are placed after <num_data_blocks>. + +<hash_start_block> + This is the offset, in <hash_block_size>-blocks, from the start of hash_dev + to the root block of the hash tree. + +<algorithm> + The cryptographic hash algorithm used for this device. This should + be the name of the algorithm, like "sha1". + +<digest> + The hexadecimal encoding of the cryptographic hash of the root hash block + and the salt. This hash should be trusted as there is no other authenticity + beyond this point. + +<salt> + The hexadecimal encoding of the salt value. + +Theory of operation +=================== + +dm-verity is meant to be setup as part of a verified boot path. This +may be anything ranging from a boot using tboot or trustedgrub to just +booting from a known-good device (like a USB drive or CD). + +When a dm-verity device is configured, it is expected that the caller +has been authenticated in some way (cryptographic signatures, etc). +After instantiation, all hashes will be verified on-demand during +disk access. If they cannot be verified up to the root node of the +tree, the root hash, then the I/O will fail. This should identify +tampering with any data on the device and the hash data. + +Cryptographic hashes are used to assert the integrity of the device on a +per-block basis. This allows for a lightweight hash computation on first read +into the page cache. Block hashes are stored linearly-aligned to the nearest +block the size of a page. + +Hash Tree +--------- + +Each node in the tree is a cryptographic hash. If it is a leaf node, the hash +is of some block data on disk. If it is an intermediary node, then the hash is +of a number of child nodes. + +Each entry in the tree is a collection of neighboring nodes that fit in one +block. The number is determined based on block_size and the size of the +selected cryptographic digest algorithm. The hashes are linearly-ordered in +this entry and any unaligned trailing space is ignored but included when +calculating the parent node. + +The tree looks something like: + +alg = sha256, num_blocks = 32768, block_size = 4096 + + [ root ] + / . . . \ + [entry_0] [entry_1] + / . . . \ . . . \ + [entry_0_0] . . . [entry_0_127] . . . . [entry_1_127] + / ... \ / . . . \ / \ + blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767 + + +On-disk format +============== + +Below is the recommended on-disk format. The verity kernel code does not +read the on-disk header. It only reads the hash blocks which directly +follow the header. It is expected that a user-space tool will verify the +integrity of the verity_header and then call dmsetup with the correct +parameters. Alternatively, the header can be omitted and the dmsetup +parameters can be passed via the kernel command-line in a rooted chain +of trust where the command-line is verified. + +The on-disk format is especially useful in cases where the hash blocks +are on a separate partition. The magic number allows easy identification +of the partition contents. Alternatively, the hash blocks can be stored +in the same partition as the data to be verified. In such a configuration +the filesystem on the partition would be sized a little smaller than +the full-partition, leaving room for the hash blocks. + +struct superblock { + uint8_t signature[8] + "verity\0\0"; + + uint8_t version; + 1 - current format + + uint8_t data_block_bits; + log2(data block size) + + uint8_t hash_block_bits; + log2(hash block size) + + uint8_t pad1[1]; + zero padding + + uint16_t salt_size; + big-endian salt size + + uint8_t pad2[2]; + zero padding + + uint32_t data_blocks_hi; + big-endian high 32 bits of the 64-bit number of data blocks + + uint32_t data_blocks_lo; + big-endian low 32 bits of the 64-bit number of data blocks + + uint8_t algorithm[16]; + cryptographic algorithm + + uint8_t salt[384]; + salt (the salt size is specified above) + + uint8_t pad3[88]; + zero padding to 512-byte boundary +} + +Directly following the header (and with sector number padded to the next hash +block boundary) are the hash blocks which are stored a depth at a time +(starting from the root), sorted in order of increasing index. + +Status +====== +V (for Valid) is returned if every check performed so far was valid. +If any check failed, C (for Corruption) is returned. + +Example +======= + +Setup a device: + dmsetup create vroot --table \ + "0 2097152 "\ + "verity 1 /dev/sda1 /dev/sda2 4096 4096 2097152 1 "\ + "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\ + "1234000000000000000000000000000000000000000000000000000000000000" + +A command line tool veritysetup is available to compute or verify +the hash tree or activate the kernel driver. This is available from +the LVM2 upstream repository and may be supplied as a package called +device-mapper-verity-tools: + git://sources.redhat.com/git/lvm2 + http://sourceware.org/git/?p=lvm2.git + http://sourceware.org/cgi-bin/cvsweb.cgi/LVM2/verity?cvsroot=lvm2 + +veritysetup -a vroot /dev/sda1 /dev/sda2 \ + 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 diff --git a/Documentation/devicetree/bindings/arm/atmel-aic.txt b/Documentation/devicetree/bindings/arm/atmel-aic.txt new file mode 100644 index 000000000000..aabca4f83402 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/atmel-aic.txt @@ -0,0 +1,38 @@ +* Advanced Interrupt Controller (AIC) + +Required properties: +- compatible: Should be "atmel,<chip>-aic" +- interrupt-controller: Identifies the node as an interrupt controller. +- interrupt-parent: For single AIC system, it is an empty property. +- #interrupt-cells: The number of cells to define the interrupts. It sould be 2. + The first cell is the IRQ number (aka "Peripheral IDentifier" on datasheet). + The second cell is used to specify flags: + bits[3:0] trigger type and level flags: + 1 = low-to-high edge triggered. + 2 = high-to-low edge triggered. + 4 = active high level-sensitive. + 8 = active low level-sensitive. + Valid combinations are 1, 2, 3, 4, 8. + Default flag for internal sources should be set to 4 (active high). +- reg: Should contain AIC registers location and length + +Examples: + /* + * AIC + */ + aic: interrupt-controller@fffff000 { + compatible = "atmel,at91rm9200-aic"; + interrupt-controller; + interrupt-parent; + #interrupt-cells = <2>; + reg = <0xfffff000 0x200>; + }; + + /* + * An interrupt generating device that is wired to an AIC. + */ + dma: dma-controller@ffffec00 { + compatible = "atmel,at91sam9g45-dma"; + reg = <0xffffec00 0x200>; + interrupts = <21 4>; + }; diff --git a/Documentation/devicetree/bindings/arm/atmel-at91.txt b/Documentation/devicetree/bindings/arm/atmel-at91.txt new file mode 100644 index 000000000000..ecc81e368715 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/atmel-at91.txt @@ -0,0 +1,92 @@ +Atmel AT91 device tree bindings. +================================ + +PIT Timer required properties: +- compatible: Should be "atmel,at91sam9260-pit" +- reg: Should contain registers location and length +- interrupts: Should contain interrupt for the PIT which is the IRQ line + shared across all System Controller members. + +TC/TCLIB Timer required properties: +- compatible: Should be "atmel,<chip>-pit". + <chip> can be "at91rm9200" or "at91sam9x5" +- reg: Should contain registers location and length +- interrupts: Should contain all interrupts for the TC block + Note that you can specify several interrupt cells if the TC + block has one interrupt per channel. + +Examples: + +One interrupt per TC block: + tcb0: timer@fff7c000 { + compatible = "atmel,at91rm9200-tcb"; + reg = <0xfff7c000 0x100>; + interrupts = <18 4>; + }; + +One interrupt per TC channel in a TC block: + tcb1: timer@fffdc000 { + compatible = "atmel,at91rm9200-tcb"; + reg = <0xfffdc000 0x100>; + interrupts = <26 4 27 4 28 4>; + }; + +RSTC Reset Controller required properties: +- compatible: Should be "atmel,<chip>-rstc". + <chip> can be "at91sam9260" or "at91sam9g45" +- reg: Should contain registers location and length + +Example: + + rstc@fffffd00 { + compatible = "atmel,at91sam9260-rstc"; + reg = <0xfffffd00 0x10>; + }; + +RAMC SDRAM/DDR Controller required properties: +- compatible: Should be "atmel,at91sam9260-sdramc", + "atmel,at91sam9g45-ddramc", +- reg: Should contain registers location and length + For at91sam9263 and at91sam9g45 you must specify 2 entries. + +Examples: + + ramc0: ramc@ffffe800 { + compatible = "atmel,at91sam9g45-ddramc"; + reg = <0xffffe800 0x200>; + }; + + ramc0: ramc@ffffe400 { + compatible = "atmel,at91sam9g45-ddramc"; + reg = <0xffffe400 0x200 + 0xffffe600 0x200>; + }; + +SHDWC Shutdown Controller + +required properties: +- compatible: Should be "atmel,<chip>-shdwc". + <chip> can be "at91sam9260", "at91sam9rl" or "at91sam9x5". +- reg: Should contain registers location and length + +optional properties: +- atmel,wakeup-mode: String, operation mode of the wakeup mode. + Supported values are: "none", "high", "low", "any". +- atmel,wakeup-counter: Counter on Wake-up 0 (between 0x0 and 0xf). + +optional at91sam9260 properties: +- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up. + +optional at91sam9rl properties: +- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up. +- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up. + +optional at91sam9x5 properties: +- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up. + +Example: + + rstc@fffffd00 { + compatible = "atmel,at91sam9260-rstc"; + reg = <0xfffffd00 0x10>; + }; diff --git a/Documentation/devicetree/bindings/arm/atmel-pmc.txt b/Documentation/devicetree/bindings/arm/atmel-pmc.txt new file mode 100644 index 000000000000..389bed5056e8 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/atmel-pmc.txt @@ -0,0 +1,11 @@ +* Power Management Controller (PMC) + +Required properties: +- compatible: Should be "atmel,at91rm9200-pmc" +- reg: Should contain PMC registers location and length + +Examples: + pmc: pmc@fffffc00 { + compatible = "atmel,at91rm9200-pmc"; + reg = <0xfffffc00 0x100>; + }; diff --git a/Documentation/devicetree/bindings/arm/exynos/power_domain.txt b/Documentation/devicetree/bindings/arm/exynos/power_domain.txt new file mode 100644 index 000000000000..6528e215c5fe --- /dev/null +++ b/Documentation/devicetree/bindings/arm/exynos/power_domain.txt @@ -0,0 +1,21 @@ +* Samsung Exynos Power Domains + +Exynos processors include support for multiple power domains which are used +to gate power to one or more peripherals on the processor. + +Required Properties: +- compatiable: should be one of the following. + * samsung,exynos4210-pd - for exynos4210 type power domain. +- reg: physical base address of the controller and length of memory mapped + region. + +Optional Properties: +- samsung,exynos4210-pd-off: Specifies that the power domain is in turned-off + state during boot and remains to be turned-off until explicitly turned-on. + +Example: + + lcd0: power-domain-lcd0 { + compatible = "samsung,exynos4210-pd"; + reg = <0x10023C00 0x10>; + }; diff --git a/Documentation/devicetree/bindings/arm/fsl.txt b/Documentation/devicetree/bindings/arm/fsl.txt index 54bdddadf1cf..bfbc771a65f8 100644 --- a/Documentation/devicetree/bindings/arm/fsl.txt +++ b/Documentation/devicetree/bindings/arm/fsl.txt @@ -28,3 +28,25 @@ Required root node properties: i.MX6 Quad SABRE Lite Board Required root node properties: - compatible = "fsl,imx6q-sabrelite", "fsl,imx6q"; + +Generic i.MX boards +------------------- + +No iomux setup is done for these boards, so this must have been configured +by the bootloader for boards to work with the generic bindings. + +i.MX27 generic board +Required root node properties: + - compatible = "fsl,imx27"; + +i.MX51 generic board +Required root node properties: + - compatible = "fsl,imx51"; + +i.MX53 generic board +Required root node properties: + - compatible = "fsl,imx53"; + +i.MX6q generic board +Required root node properties: + - compatible = "fsl,imx6q"; diff --git a/Documentation/devicetree/bindings/arm/mrvl.txt b/Documentation/devicetree/bindings/arm/mrvl.txt new file mode 100644 index 000000000000..d8de933e9d81 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mrvl.txt @@ -0,0 +1,6 @@ +Marvell Platforms Device Tree Bindings +---------------------------------------------------- + +PXA168 Aspenite Board +Required root node properties: + - compatible = "mrvl,pxa168-aspenite", "mrvl,pxa168"; diff --git a/Documentation/devicetree/bindings/arm/omap/intc.txt b/Documentation/devicetree/bindings/arm/omap/intc.txt new file mode 100644 index 000000000000..f2583e6ec060 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/omap/intc.txt @@ -0,0 +1,27 @@ +* OMAP Interrupt Controller + +OMAP2/3 are using a TI interrupt controller that can support several +configurable number of interrupts. + +Main node required properties: + +- compatible : should be: + "ti,omap2-intc" +- interrupt-controller : Identifies the node as an interrupt controller +- #interrupt-cells : Specifies the number of cells needed to encode an + interrupt source. The type shall be a <u32> and the value shall be 1. + + The cell contains the interrupt number in the range [0-128]. +- ti,intc-size: Number of interrupts handled by the interrupt controller. +- reg: physical base address and size of the intc registers map. + +Example: + + intc: interrupt-controller@1 { + compatible = "ti,omap2-intc"; + interrupt-controller; + #interrupt-cells = <1>; + ti,intc-size = <96>; + reg = <0x48200000 0x1000>; + }; + diff --git a/Documentation/devicetree/bindings/arm/omap/omap.txt b/Documentation/devicetree/bindings/arm/omap/omap.txt index dbdab40ed3a6..e78e8bccac30 100644 --- a/Documentation/devicetree/bindings/arm/omap/omap.txt +++ b/Documentation/devicetree/bindings/arm/omap/omap.txt @@ -5,7 +5,7 @@ IPs present in the SoC. On top of that an omap_device is created to extend the platform_device capabilities and to allow binding with one or several hwmods. The hwmods will contain all the information to build the device: -adresse range, irq lines, dma lines, interconnect, PRCM register, +address range, irq lines, dma lines, interconnect, PRCM register, clock domain, input clocks. For the moment just point to the existing hwmod, the next step will be to move data from hwmod to device-tree representation. @@ -41,3 +41,9 @@ Boards: - OMAP4 PandaBoard : Low cost community board compatible = "ti,omap4-panda", "ti,omap4430" + +- OMAP3 EVM : Software Developement Board for OMAP35x, AM/DM37x + compatible = "ti,omap3-evm", "ti,omap3" + +- AM335X EVM : Software Developement Board for AM335x + compatible = "ti,am335x-evm", "ti,am33xx", "ti,omap3" diff --git a/Documentation/devicetree/bindings/arm/sirf.txt b/Documentation/devicetree/bindings/arm/sirf.txt index 6b07f65b32de..1881e1c6dda5 100644 --- a/Documentation/devicetree/bindings/arm/sirf.txt +++ b/Documentation/devicetree/bindings/arm/sirf.txt @@ -1,3 +1,3 @@ -prima2 "cb" evalutation board +prima2 "cb" evaluation board Required root node properties: - compatible = "sirf,prima2-cb", "sirf,prima2"; diff --git a/Documentation/devicetree/bindings/arm/spear.txt b/Documentation/devicetree/bindings/arm/spear.txt new file mode 100644 index 000000000000..f8e54f092328 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/spear.txt @@ -0,0 +1,8 @@ +ST SPEAr Platforms Device Tree Bindings +--------------------------------------- + +Boards with the ST SPEAr600 SoC shall have the following properties: + +Required root node property: + +compatible = "st,spear600"; diff --git a/Documentation/devicetree/bindings/arm/tegra/emc.txt b/Documentation/devicetree/bindings/arm/tegra/emc.txt new file mode 100644 index 000000000000..09335f8eee00 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/tegra/emc.txt @@ -0,0 +1,100 @@ +Embedded Memory Controller + +Properties: +- name : Should be emc +- #address-cells : Should be 1 +- #size-cells : Should be 0 +- compatible : Should contain "nvidia,tegra20-emc". +- reg : Offset and length of the register set for the device +- nvidia,use-ram-code : If present, the sub-nodes will be addressed + and chosen using the ramcode board selector. If omitted, only one + set of tables can be present and said tables will be used + irrespective of ram-code configuration. + +Child device nodes describe the memory settings for different configurations and clock rates. + +Example: + + emc@7000f400 { + #address-cells = < 1 >; + #size-cells = < 0 >; + compatible = "nvidia,tegra20-emc"; + reg = <0x7000f4000 0x200>; + } + + +Embedded Memory Controller ram-code table + +If the emc node has the nvidia,use-ram-code property present, then the +next level of nodes below the emc table are used to specify which settings +apply for which ram-code settings. + +If the emc node lacks the nvidia,use-ram-code property, this level is omitted +and the tables are stored directly under the emc node (see below). + +Properties: + +- name : Should be emc-tables +- nvidia,ram-code : the binary representation of the ram-code board strappings + for which this node (and children) are valid. + + + +Embedded Memory Controller configuration table + +This is a table containing the EMC register settings for the various +operating speeds of the memory controller. They are always located as +subnodes of the emc controller node. + +There are two ways of specifying which tables to use: + +* The simplest is if there is just one set of tables in the device tree, + and they will always be used (based on which frequency is used). + This is the preferred method, especially when firmware can fill in + this information based on the specific system information and just + pass it on to the kernel. + +* The slightly more complex one is when more than one memory configuration + might exist on the system. The Tegra20 platform handles this during + early boot by selecting one out of possible 4 memory settings based + on a 2-pin "ram code" bootstrap setting on the board. The values of + these strappings can be read through a register in the SoC, and thus + used to select which tables to use. + +Properties: +- name : Should be emc-table +- compatible : Should contain "nvidia,tegra20-emc-table". +- reg : either an opaque enumerator to tell different tables apart, or + the valid frequency for which the table should be used (in kHz). +- clock-frequency : the clock frequency for the EMC at which this + table should be used (in kHz). +- nvidia,emc-registers : a 46 word array of EMC registers to be programmed + for operation at the 'clock-frequency' setting. + The order and contents of the registers are: + RC, RFC, RAS, RP, R2W, W2R, R2P, W2P, RD_RCD, WR_RCD, RRD, REXT, + WDV, QUSE, QRST, QSAFE, RDV, REFRESH, BURST_REFRESH_NUM, PDEX2WR, + PDEX2RD, PCHG2PDEN, ACT2PDEN, AR2PDEN, RW2PDEN, TXSR, TCKE, TFAW, + TRPAB, TCLKSTABLE, TCLKSTOP, TREFBW, QUSE_EXTRA, FBIO_CFG6, ODT_WRITE, + ODT_READ, FBIO_CFG5, CFG_DIG_DLL, DLL_XFORM_DQS, DLL_XFORM_QUSE, + ZCAL_REF_CNT, ZCAL_WAIT_CNT, AUTO_CAL_INTERVAL, CFG_CLKTRIM_0, + CFG_CLKTRIM_1, CFG_CLKTRIM_2 + + emc-table@166000 { + reg = <166000>; + compatible = "nvidia,tegra20-emc-table"; + clock-frequency = < 166000 >; + nvidia,emc-registers = < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 >; + }; + + emc-table@333000 { + reg = <333000>; + compatible = "nvidia,tegra20-emc-table"; + clock-frequency = < 333000 >; + nvidia,emc-registers = < 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 >; + }; diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-pmc.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-pmc.txt new file mode 100644 index 000000000000..b5846e21cc2e --- /dev/null +++ b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-pmc.txt @@ -0,0 +1,19 @@ +NVIDIA Tegra Power Management Controller (PMC) + +Properties: +- name : Should be pmc +- compatible : Should contain "nvidia,tegra<chip>-pmc". +- reg : Offset and length of the register set for the device +- nvidia,invert-interrupt : If present, inverts the PMU interrupt signal. + The PMU is an external Power Management Unit, whose interrupt output + signal is fed into the PMC. This signal is optionally inverted, and then + fed into the ARM GIC. The PMC is not involved in the detection or + handling of this interrupt signal, merely its inversion. + +Example: + +pmc@7000f400 { + compatible = "nvidia,tegra20-pmc"; + reg = <0x7000e400 0x400>; + nvidia,invert-interrupt; +}; diff --git a/Documentation/devicetree/bindings/arm/twd.txt b/Documentation/devicetree/bindings/arm/twd.txt new file mode 100644 index 000000000000..75b8610939fa --- /dev/null +++ b/Documentation/devicetree/bindings/arm/twd.txt @@ -0,0 +1,48 @@ +* ARM Timer Watchdog + +ARM 11MP, Cortex-A5 and Cortex-A9 are often associated with a per-core +Timer-Watchdog (aka TWD), which provides both a per-cpu local timer +and watchdog. + +The TWD is usually attached to a GIC to deliver its two per-processor +interrupts. + +** Timer node required properties: + +- compatible : Should be one of: + "arm,cortex-a9-twd-timer" + "arm,cortex-a5-twd-timer" + "arm,arm11mp-twd-timer" + +- interrupts : One interrupt to each core + +- reg : Specify the base address and the size of the TWD timer + register window. + +Example: + + twd-timer@2c000600 { + compatible = "arm,arm11mp-twd-timer""; + reg = <0x2c000600 0x20>; + interrupts = <1 13 0xf01>; + }; + +** Watchdog node properties: + +- compatible : Should be one of: + "arm,cortex-a9-twd-wdt" + "arm,cortex-a5-twd-wdt" + "arm,arm11mp-twd-wdt" + +- interrupts : One interrupt to each core + +- reg : Specify the base address and the size of the TWD watchdog + register window. + +Example: + + twd-watchdog@2c000620 { + compatible = "arm,arm11mp-twd-wdt"; + reg = <0x2c000620 0x20>; + interrupts = <1 14 0xf01>; + }; diff --git a/Documentation/devicetree/bindings/arm/vexpress.txt b/Documentation/devicetree/bindings/arm/vexpress.txt new file mode 100644 index 000000000000..ec8b50cbb2e8 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/vexpress.txt @@ -0,0 +1,146 @@ +ARM Versatile Express boards family +----------------------------------- + +ARM's Versatile Express platform consists of a motherboard and one +or more daughterboards (tiles). The motherboard provides a set of +peripherals. Processor and RAM "live" on the tiles. + +The motherboard and each core tile should be described by a separate +Device Tree source file, with the tile's description including +the motherboard file using a /include/ directive. As the motherboard +can be initialized in one of two different configurations ("memory +maps"), care must be taken to include the correct one. + +Required properties in the root node: +- compatible value: + compatible = "arm,vexpress,<model>", "arm,vexpress"; + where <model> is the full tile model name (as used in the tile's + Technical Reference Manual), eg.: + - for Coretile Express A5x2 (V2P-CA5s): + compatible = "arm,vexpress,v2p-ca5s", "arm,vexpress"; + - for Coretile Express A9x4 (V2P-CA9): + compatible = "arm,vexpress,v2p-ca9", "arm,vexpress"; + If a tile comes in several variants or can be used in more then one + configuration, the compatible value should be: + compatible = "arm,vexpress,<model>,<variant>", \ + "arm,vexpress,<model>", "arm,vexpress"; + eg: + - Coretile Express A15x2 (V2P-CA15) with Tech Chip 1: + compatible = "arm,vexpress,v2p-ca15,tc1", \ + "arm,vexpress,v2p-ca15", "arm,vexpress"; + - LogicTile Express 13MG (V2F-2XV6) running Cortex-A7 (3 cores) SMM: + compatible = "arm,vexpress,v2f-2xv6,ca7x3", \ + "arm,vexpress,v2f-2xv6", "arm,vexpress"; + +Optional properties in the root node: +- tile model name (use name from the tile's Technical Reference + Manual, eg. "V2P-CA5s") + model = "<model>"; +- tile's HBI number (unique ARM's board model ID, visible on the + PCB's silkscreen) in hexadecimal transcription: + arm,hbi = <0xhbi> + eg: + - for Coretile Express A5x2 (V2P-CA5s) HBI-0191: + arm,hbi = <0x191>; + - Coretile Express A9x4 (V2P-CA9) HBI-0225: + arm,hbi = <0x225>; + +Top-level standard "cpus" node is required. It must contain a node +with device_type = "cpu" property for every available core, eg.: + + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + device_type = "cpu"; + compatible = "arm,cortex-a5"; + reg = <0>; + }; + }; + +The motherboard description file provides a single "motherboard" node +using 2 address cells corresponding to the Static Memory Bus used +between the motherboard and the tile. The first cell defines the Chip +Select (CS) line number, the second cell address offset within the CS. +All interrupt lines between the motherboard and the tile are active +high and are described using single cell. + +Optional properties of the "motherboard" node: +- motherboard's memory map variant: + arm,v2m-memory-map = "<name>"; + where name is one of: + - "rs1" - for RS1 map (i.a. peripherals on CS3); this map is also + referred to as "ARM Cortex-A Series memory map": + arm,v2m-memory-map = "rs1"; + When this property is missing, the motherboard is using the original + memory map (also known as the "Legacy memory map", primarily used + with the original CoreTile Express A9x4) with peripherals on CS7. + +Motherboard .dtsi files provide a set of labelled peripherals that +can be used to obtain required phandle in the tile's "aliases" node: +- UARTs, note that the numbers correspond to the physical connectors + on the motherboard's back panel: + v2m_serial0, v2m_serial1, v2m_serial2 and v2m_serial3 +- I2C controllers: + v2m_i2c_dvi and v2m_i2c_pcie +- SP804 timers: + v2m_timer01 and v2m_timer23 + +Current Linux implementation requires a "arm,v2m_timer" alias +pointing at one of the motherboard's SP804 timers, if it is to be +used as the system timer. This alias should be defined in the +motherboard files. + +The tile description must define "ranges", "interrupt-map-mask" and +"interrupt-map" properties to translate the motherboard's address +and interrupt space into one used by the tile's processor. + +Abbreviated example: + +/dts-v1/; + +/ { + model = "V2P-CA5s"; + arm,hbi = <0x225>; + compatible = "arm,vexpress-v2p-ca5s", "arm,vexpress"; + interrupt-parent = <&gic>; + #address-cells = <1>; + #size-cells = <1>; + + chosen { }; + + aliases { + serial0 = &v2m_serial0; + }; + + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + device_type = "cpu"; + compatible = "arm,cortex-a5"; + reg = <0>; + }; + }; + + gic: interrupt-controller@2c001000 { + compatible = "arm,cortex-a9-gic"; + #interrupt-cells = <3>; + #address-cells = <0>; + interrupt-controller; + reg = <0x2c001000 0x1000>, + <0x2c000100 0x100>; + }; + + motherboard { + /* CS0 is visible at 0x08000000 */ + ranges = <0 0 0x08000000 0x04000000>; + interrupt-map-mask = <0 0 63>; + /* Active high IRQ 0 is connected to GIC's SPI0 */ + interrupt-map = <0 0 0 &gic 0 0 4>; + }; +}; + +/include/ "vexpress-v2m-rs1.dtsi" diff --git a/Documentation/devicetree/bindings/ata/calxeda-sata.txt b/Documentation/devicetree/bindings/ata/ahci-platform.txt index 79caa5651f53..8bb8a76d42e8 100644 --- a/Documentation/devicetree/bindings/ata/calxeda-sata.txt +++ b/Documentation/devicetree/bindings/ata/ahci-platform.txt @@ -1,10 +1,10 @@ -* Calxeda SATA Controller +* AHCI SATA Controller SATA nodes are defined to describe on-chip Serial ATA controllers. Each SATA controller should have its own node. Required properties: -- compatible : compatible list, contains "calxeda,hb-ahci" +- compatible : compatible list, contains "calxeda,hb-ahci" or "snps,spear-ahci" - interrupts : <interrupt mapping for SATA IRQ> - reg : <registers mapping> @@ -14,4 +14,3 @@ Example: reg = <0xffe08000 0x1000>; interrupts = <115>; }; - diff --git a/Documentation/devicetree/bindings/dma/tegra20-apbdma.txt b/Documentation/devicetree/bindings/dma/tegra20-apbdma.txt new file mode 100644 index 000000000000..90fa7da525b8 --- /dev/null +++ b/Documentation/devicetree/bindings/dma/tegra20-apbdma.txt @@ -0,0 +1,30 @@ +* NVIDIA Tegra APB DMA controller + +Required properties: +- compatible: Should be "nvidia,<chip>-apbdma" +- reg: Should contain DMA registers location and length. This shuld include + all of the per-channel registers. +- interrupts: Should contain all of the per-channel DMA interrupts. + +Examples: + +apbdma: dma@6000a000 { + compatible = "nvidia,tegra20-apbdma"; + reg = <0x6000a000 0x1200>; + interrupts = < 0 136 0x04 + 0 137 0x04 + 0 138 0x04 + 0 139 0x04 + 0 140 0x04 + 0 141 0x04 + 0 142 0x04 + 0 143 0x04 + 0 144 0x04 + 0 145 0x04 + 0 146 0x04 + 0 147 0x04 + 0 148 0x04 + 0 149 0x04 + 0 150 0x04 + 0 151 0x04 >; +}; diff --git a/Documentation/devicetree/bindings/gpio/gpio-omap.txt b/Documentation/devicetree/bindings/gpio/gpio-omap.txt new file mode 100644 index 000000000000..bff51a2fee1e --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-omap.txt @@ -0,0 +1,36 @@ +OMAP GPIO controller bindings + +Required properties: +- compatible: + - "ti,omap2-gpio" for OMAP2 controllers + - "ti,omap3-gpio" for OMAP3 controllers + - "ti,omap4-gpio" for OMAP4 controllers +- #gpio-cells : Should be two. + - first cell is the pin number + - second cell is used to specify optional parameters (unused) +- gpio-controller : Marks the device node as a GPIO controller. +- #interrupt-cells : Should be 2. +- interrupt-controller: Mark the device node as an interrupt controller + The first cell is the GPIO number. + The second cell is used to specify flags: + bits[3:0] trigger type and level flags: + 1 = low-to-high edge triggered. + 2 = high-to-low edge triggered. + 4 = active high level-sensitive. + 8 = active low level-sensitive. + +OMAP specific properties: +- ti,hwmods: Name of the hwmod associated to the GPIO: + "gpio<X>", <X> being the 1-based instance number from the HW spec + + +Example: + +gpio4: gpio4 { + compatible = "ti,omap4-gpio"; + ti,hwmods = "gpio4"; + #gpio-cells = <2>; + gpio-controller; + #interrupt-cells = <2>; + interrupt-controller; +}; diff --git a/Documentation/devicetree/bindings/gpio/gpio-twl4030.txt b/Documentation/devicetree/bindings/gpio/gpio-twl4030.txt new file mode 100644 index 000000000000..16695d9cf1e8 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-twl4030.txt @@ -0,0 +1,23 @@ +twl4030 GPIO controller bindings + +Required properties: +- compatible: + - "ti,twl4030-gpio" for twl4030 GPIO controller +- #gpio-cells : Should be two. + - first cell is the pin number + - second cell is used to specify optional parameters (unused) +- gpio-controller : Marks the device node as a GPIO controller. +- #interrupt-cells : Should be 2. +- interrupt-controller: Mark the device node as an interrupt controller + The first cell is the GPIO number. + The second cell is not used. + +Example: + +twl_gpio: gpio { + compatible = "ti,twl4030-gpio"; + #gpio-cells = <2>; + gpio-controller; + #interrupt-cells = <2>; + interrupt-controller; +}; diff --git a/Documentation/devicetree/bindings/gpio/gpio_atmel.txt b/Documentation/devicetree/bindings/gpio/gpio_atmel.txt new file mode 100644 index 000000000000..66efc804806a --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio_atmel.txt @@ -0,0 +1,20 @@ +* Atmel GPIO controller (PIO) + +Required properties: +- compatible: "atmel,<chip>-gpio", where <chip> is at91rm9200 or at91sam9x5. +- reg: Should contain GPIO controller registers location and length +- interrupts: Should be the port interrupt shared by all the pins. +- #gpio-cells: Should be two. The first cell is the pin number and + the second cell is used to specify optional parameters (currently + unused). +- gpio-controller: Marks the device node as a GPIO controller. + +Example: + pioA: gpio@fffff200 { + compatible = "atmel,at91rm9200-gpio"; + reg = <0xfffff200 0x100>; + interrupts = <2 4>; + #gpio-cells = <2>; + gpio-controller; + }; + diff --git a/Documentation/devicetree/bindings/gpio/gpio_i2c.txt b/Documentation/devicetree/bindings/gpio/gpio_i2c.txt new file mode 100644 index 000000000000..4f8ec947c6bd --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio_i2c.txt @@ -0,0 +1,32 @@ +Device-Tree bindings for i2c gpio driver + +Required properties: + - compatible = "i2c-gpio"; + - gpios: sda and scl gpio + + +Optional properties: + - i2c-gpio,sda-open-drain: sda as open drain + - i2c-gpio,scl-open-drain: scl as open drain + - i2c-gpio,scl-output-only: scl as output only + - i2c-gpio,delay-us: delay between GPIO operations (may depend on each platform) + - i2c-gpio,timeout-ms: timeout to get data + +Example nodes: + +i2c@0 { + compatible = "i2c-gpio"; + gpios = <&pioA 23 0 /* sda */ + &pioA 24 0 /* scl */ + >; + i2c-gpio,sda-open-drain; + i2c-gpio,scl-open-drain; + i2c-gpio,delay-us = <2>; /* ~100 kHz */ + #address-cells = <1>; + #size-cells = <0>; + + rv3029c2@56 { + compatible = "rv3029c2"; + reg = <0x56>; + }; +}; diff --git a/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt b/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt index eb4b530d64e1..023c9526e5f8 100644 --- a/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt +++ b/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt @@ -1,8 +1,40 @@ -NVIDIA Tegra 2 GPIO controller +NVIDIA Tegra GPIO controller Required properties: -- compatible : "nvidia,tegra20-gpio" +- compatible : "nvidia,tegra<chip>-gpio" +- reg : Physical base address and length of the controller's registers. +- interrupts : The interrupt outputs from the controller. For Tegra20, + there should be 7 interrupts specified, and for Tegra30, there should + be 8 interrupts specified. - #gpio-cells : Should be two. The first cell is the pin number and the second cell is used to specify optional parameters: - bit 0 specifies polarity (0 for normal, 1 for inverted) - gpio-controller : Marks the device node as a GPIO controller. +- #interrupt-cells : Should be 2. + The first cell is the GPIO number. + The second cell is used to specify flags: + bits[3:0] trigger type and level flags: + 1 = low-to-high edge triggered. + 2 = high-to-low edge triggered. + 4 = active high level-sensitive. + 8 = active low level-sensitive. + Valid combinations are 1, 2, 3, 4, 8. +- interrupt-controller : Marks the device node as an interrupt controller. + +Example: + +gpio: gpio@6000d000 { + compatible = "nvidia,tegra20-gpio"; + reg = < 0x6000d000 0x1000 >; + interrupts = < 0 32 0x04 + 0 33 0x04 + 0 34 0x04 + 0 35 0x04 + 0 55 0x04 + 0 87 0x04 + 0 89 0x04 >; + #gpio-cells = <2>; + gpio-controller; + #interrupt-cells = <2>; + interrupt-controller; +}; diff --git a/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt b/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt new file mode 100644 index 000000000000..1e34cfe5ebea --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt @@ -0,0 +1,23 @@ +* Marvell PXA GPIO controller + +Required properties: +- compatible : Should be "mrvl,pxa-gpio" or "mrvl,mmp-gpio" +- reg : Address and length of the register set for the device +- interrupts : Should be the port interrupt shared by all gpio pins, if +- interrupt-name : Should be the name of irq resource. + one number. +- gpio-controller : Marks the device node as a gpio controller. +- #gpio-cells : Should be one. It is the pin number. + +Example: + + gpio: gpio@d4019000 { + compatible = "mrvl,mmp-gpio", "mrvl,pxa-gpio"; + reg = <0xd4019000 0x1000>; + interrupts = <49>, <17>, <18>; + interrupt-name = "gpio_mux", "gpio0", "gpio1"; + gpio-controller; + #gpio-cells = <1>; + interrupt-controller; + #interrupt-cells = <1>; + }; diff --git a/Documentation/devicetree/bindings/gpio/sodaville.txt b/Documentation/devicetree/bindings/gpio/sodaville.txt new file mode 100644 index 000000000000..563eff22b975 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/sodaville.txt @@ -0,0 +1,48 @@ +GPIO controller on CE4100 / Sodaville SoCs +========================================== + +The bindings for CE4100's GPIO controller match the generic description +which is covered by the gpio.txt file in this folder. + +The only additional property is the intel,muxctl property which holds the +value which is written into the MUXCNTL register. + +There is no compatible property for now because the driver is probed via +PCI id (vendor 0x8086 device 0x2e67). + +The interrupt specifier consists of two cells encoded as follows: + - <1st cell>: The interrupt-number that identifies the interrupt source. + - <2nd cell>: The level-sense information, encoded as follows: + 4 - active high level-sensitive + 8 - active low level-sensitive + +Example of the GPIO device and one user: + + pcigpio: gpio@b,1 { + /* two cells for GPIO and interrupt */ + #gpio-cells = <2>; + #interrupt-cells = <2>; + compatible = "pci8086,2e67.2", + "pci8086,2e67", + "pciclassff0000", + "pciclassff00"; + + reg = <0x15900 0x0 0x0 0x0 0x0>; + /* Interrupt line of the gpio device */ + interrupts = <15 1>; + /* It is an interrupt and GPIO controller itself */ + interrupt-controller; + gpio-controller; + intel,muxctl = <0>; + }; + + testuser@20 { + compatible = "example,testuser"; + /* User the 11th GPIO line as an active high triggered + * level interrupt + */ + interrupts = <11 8>; + interrupt-parent = <&pcigpio>; + /* Use this GPIO also with the gpio functions */ + gpios = <&pcigpio 11 0>; + }; diff --git a/Documentation/devicetree/bindings/i2c/mrvl-i2c.txt b/Documentation/devicetree/bindings/i2c/mrvl-i2c.txt new file mode 100644 index 000000000000..071eb3caae91 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/mrvl-i2c.txt @@ -0,0 +1,37 @@ +* I2C + +Required properties : + + - reg : Offset and length of the register set for the device + - compatible : should be "mrvl,mmp-twsi" where CHIP is the name of a + compatible processor, e.g. pxa168, pxa910, mmp2, mmp3. + For the pxa2xx/pxa3xx, an additional node "mrvl,pxa-i2c" is required + as shown in the example below. + +Recommended properties : + + - interrupts : <a b> where a is the interrupt number and b is a + field that represents an encoding of the sense and level + information for the interrupt. This should be encoded based on + the information in section 2) depending on the type of interrupt + controller you have. + - interrupt-parent : the phandle for the interrupt controller that + services interrupts for this device. + - mrvl,i2c-polling : Disable interrupt of i2c controller. Polling + status register of i2c controller instead. + - mrvl,i2c-fast-mode : Enable fast mode of i2c controller. + +Examples: + twsi1: i2c@d4011000 { + compatible = "mrvl,mmp-twsi", "mrvl,pxa-i2c"; + reg = <0xd4011000 0x1000>; + interrupts = <7>; + mrvl,i2c-fast-mode; + }; + + twsi2: i2c@d4025000 { + compatible = "mrvl,mmp-twsi", "mrvl,pxa-i2c"; + reg = <0xd4025000 0x1000>; + interrupts = <58>; + }; + diff --git a/Documentation/devicetree/bindings/i2c/sirf-i2c.txt b/Documentation/devicetree/bindings/i2c/sirf-i2c.txt new file mode 100644 index 000000000000..7baf9e133fa8 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/sirf-i2c.txt @@ -0,0 +1,19 @@ +I2C for SiRFprimaII platforms + +Required properties : +- compatible : Must be "sirf,prima2-i2c" +- reg: physical base address of the controller and length of memory mapped + region. +- interrupts: interrupt number to the cpu. + +Optional properties: +- clock-frequency : Constains desired I2C/HS-I2C bus clock frequency in Hz. + The absence of the propoerty indicates the default frequency 100 kHz. + +Examples : + +i2c0: i2c@b00e0000 { + compatible = "sirf,prima2-i2c"; + reg = <0xb00e0000 0x10000>; + interrupts = <24>; +}; diff --git a/Documentation/devicetree/bindings/input/matrix-keymap.txt b/Documentation/devicetree/bindings/input/matrix-keymap.txt new file mode 100644 index 000000000000..3cd8b98ccd2d --- /dev/null +++ b/Documentation/devicetree/bindings/input/matrix-keymap.txt @@ -0,0 +1,19 @@ +A simple common binding for matrix-connected key boards. Currently targeted at +defining the keys in the scope of linux key codes since that is a stable and +standardized interface at this time. + +Required properties: +- linux,keymap: an array of packed 1-cell entries containing the equivalent + of row, column and linux key-code. The 32-bit big endian cell is packed + as: + row << 24 | column << 16 | key-code + +Optional properties: +Some users of this binding might choose to specify secondary keymaps for +cases where there is a modifier key such as a Fn key. Proposed names +for said properties are "linux,fn-keymap" or with another descriptive +word for the modifier other from "Fn". + +Example: + linux,keymap = < 0x00030012 + 0x0102003a >; diff --git a/Documentation/devicetree/bindings/input/tegra-kbc.txt b/Documentation/devicetree/bindings/input/tegra-kbc.txt index 5ecfa99089b4..72683be6de35 100644 --- a/Documentation/devicetree/bindings/input/tegra-kbc.txt +++ b/Documentation/devicetree/bindings/input/tegra-kbc.txt @@ -3,16 +3,21 @@ Required properties: - compatible: "nvidia,tegra20-kbc" -Optional properties: -- debounce-delay: delay in milliseconds per row scan for debouncing -- repeat-delay: delay in milliseconds before repeat starts -- ghost-filter: enable ghost filtering for this device -- wakeup-source: configure keyboard as a wakeup source for suspend/resume +Optional properties, in addition to those specified by the shared +matrix-keyboard bindings: + +- linux,fn-keymap: a second keymap, same specification as the + matrix-keyboard-controller spec but to be used when the KEY_FN modifier + key is pressed. +- nvidia,debounce-delay-ms: delay in milliseconds per row scan for debouncing +- nvidia,repeat-delay-ms: delay in milliseconds before repeat starts +- nvidia,ghost-filter: enable ghost filtering for this device +- nvidia,wakeup-source: configure keyboard as a wakeup source for suspend/resume Example: keyboard: keyboard { compatible = "nvidia,tegra20-kbc"; reg = <0x7000e200 0x100>; - ghost-filter; + nvidia,ghost-filter; }; diff --git a/Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt b/Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt new file mode 100644 index 000000000000..dbd4368ab8cc --- /dev/null +++ b/Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt @@ -0,0 +1,33 @@ +* TI Highspeed MMC host controller for OMAP + +The Highspeed MMC Host Controller on TI OMAP family +provides an interface for MMC, SD, and SDIO types of memory cards. + +Required properties: +- compatible: + Should be "ti,omap2-hsmmc", for OMAP2 controllers + Should be "ti,omap3-hsmmc", for OMAP3 controllers + Should be "ti,omap4-hsmmc", for OMAP4 controllers +- ti,hwmods: Must be "mmc<n>", n is controller instance starting 1 +- reg : should contain hsmmc registers location and length + +Optional properties: +ti,dual-volt: boolean, supports dual voltage cards +<supply-name>-supply: phandle to the regulator device tree node +"supply-name" examples are "vmmc", "vmmc_aux" etc +ti,bus-width: Number of data lines, default assumed is 1 if the property is missing. +cd-gpios: GPIOs for card detection +wp-gpios: GPIOs for write protection +ti,non-removable: non-removable slot (like eMMC) +ti,needs-special-reset: Requires a special softreset sequence + +Example: + mmc1: mmc@0x4809c000 { + compatible = "ti,omap4-hsmmc"; + reg = <0x4809c000 0x400>; + ti,hwmods = "mmc1"; + ti,dual-volt; + ti,bus-width = <4>; + vmmc-supply = <&vmmc>; /* phandle to regulator node */ + ti,non-removable; + }; diff --git a/Documentation/devicetree/bindings/mtd/arm-versatile.txt b/Documentation/devicetree/bindings/mtd/arm-versatile.txt index 476845db94d0..beace4b89daa 100644 --- a/Documentation/devicetree/bindings/mtd/arm-versatile.txt +++ b/Documentation/devicetree/bindings/mtd/arm-versatile.txt @@ -4,5 +4,5 @@ Required properties: - compatible : must be "arm,versatile-flash"; - bank-width : width in bytes of flash interface. -Optional properties: -- Subnode partition map from mtd flash binding +The device tree may optionally contain sub-nodes describing partitions of the +address space. See partition.txt for more detail. diff --git a/Documentation/devicetree/bindings/mtd/atmel-dataflash.txt b/Documentation/devicetree/bindings/mtd/atmel-dataflash.txt index ef66ddd01da0..1889a4db5b7c 100644 --- a/Documentation/devicetree/bindings/mtd/atmel-dataflash.txt +++ b/Documentation/devicetree/bindings/mtd/atmel-dataflash.txt @@ -3,6 +3,9 @@ Required properties: - compatible : "atmel,<model>", "atmel,<series>", "atmel,dataflash". +The device tree may optionally contain sub-nodes describing partitions of the +address space. See partition.txt for more detail. + Example: flash@1 { diff --git a/Documentation/devicetree/bindings/mtd/atmel-nand.txt b/Documentation/devicetree/bindings/mtd/atmel-nand.txt new file mode 100644 index 000000000000..a20069502f5a --- /dev/null +++ b/Documentation/devicetree/bindings/mtd/atmel-nand.txt @@ -0,0 +1,41 @@ +Atmel NAND flash + +Required properties: +- compatible : "atmel,at91rm9200-nand". +- reg : should specify localbus address and size used for the chip, + and if availlable the ECC. +- atmel,nand-addr-offset : offset for the address latch. +- atmel,nand-cmd-offset : offset for the command latch. +- #address-cells, #size-cells : Must be present if the device has sub-nodes + representing partitions. + +- gpios : specifies the gpio pins to control the NAND device. detect is an + optional gpio and may be set to 0 if not present. + +Optional properties: +- nand-ecc-mode : String, operation mode of the NAND ecc mode, soft by default. + Supported values are: "none", "soft", "hw", "hw_syndrome", "hw_oob_first", + "soft_bch". +- nand-bus-width : 8 or 16 bus width if not present 8 +- nand-on-flash-bbt: boolean to enable on flash bbt option if not present false + +Examples: +nand0: nand@40000000,0 { + compatible = "atmel,at91rm9200-nand"; + #address-cells = <1>; + #size-cells = <1>; + reg = <0x40000000 0x10000000 + 0xffffe800 0x200 + >; + atmel,nand-addr-offset = <21>; /* ale */ + atmel,nand-cmd-offset = <22>; /* cle */ + nand-on-flash-bbt; + nand-ecc-mode = "soft"; + gpios = <&pioC 13 0 /* rdy */ + &pioC 14 0 /* nce */ + 0 /* cd */ + >; + partition@0 { + ... + }; +}; diff --git a/Documentation/devicetree/bindings/mtd/fsl-upm-nand.txt b/Documentation/devicetree/bindings/mtd/fsl-upm-nand.txt index 00f1f546b32e..fce4894f5a98 100644 --- a/Documentation/devicetree/bindings/mtd/fsl-upm-nand.txt +++ b/Documentation/devicetree/bindings/mtd/fsl-upm-nand.txt @@ -19,6 +19,10 @@ Optional properties: read registers (tR). Required if property "gpios" is not used (R/B# pins not connected). +Each flash chip described may optionally contain additional sub-nodes +describing partitions of the address space. See partition.txt for more +detail. + Examples: upm@1,0 { diff --git a/Documentation/devicetree/bindings/mtd/fsmc-nand.txt b/Documentation/devicetree/bindings/mtd/fsmc-nand.txt new file mode 100644 index 000000000000..e2c663b354d2 --- /dev/null +++ b/Documentation/devicetree/bindings/mtd/fsmc-nand.txt @@ -0,0 +1,33 @@ +* FSMC NAND + +Required properties: +- compatible : "st,spear600-fsmc-nand" +- reg : Address range of the mtd chip +- reg-names: Should contain the reg names "fsmc_regs" and "nand_data" +- st,ale-off : Chip specific offset to ALE +- st,cle-off : Chip specific offset to CLE + +Optional properties: +- bank-width : Width (in bytes) of the device. If not present, the width + defaults to 1 byte +- nand-skip-bbtscan: Indicates the the BBT scanning should be skipped + +Example: + + fsmc: flash@d1800000 { + compatible = "st,spear600-fsmc-nand"; + #address-cells = <1>; + #size-cells = <1>; + reg = <0xd1800000 0x1000 /* FSMC Register */ + 0xd2000000 0x4000>; /* NAND Base */ + reg-names = "fsmc_regs", "nand_data"; + st,ale-off = <0x20000>; + st,cle-off = <0x10000>; + + bank-width = <1>; + nand-skip-bbtscan; + + partition@0 { + ... + }; + }; diff --git a/Documentation/devicetree/bindings/mtd/gpio-control-nand.txt b/Documentation/devicetree/bindings/mtd/gpio-control-nand.txt index 719f4dc58df7..36ef07d3c90f 100644 --- a/Documentation/devicetree/bindings/mtd/gpio-control-nand.txt +++ b/Documentation/devicetree/bindings/mtd/gpio-control-nand.txt @@ -25,6 +25,9 @@ Optional properties: GPIO state and before and after command byte writes, this register will be read to ensure that the GPIO accesses have completed. +The device tree may optionally contain sub-nodes describing partitions of the +address space. See partition.txt for more detail. + Examples: gpio-nand@1,0 { diff --git a/Documentation/devicetree/bindings/mtd/mtd-physmap.txt b/Documentation/devicetree/bindings/mtd/mtd-physmap.txt index 80152cb567d9..a63c2bd7de2b 100644 --- a/Documentation/devicetree/bindings/mtd/mtd-physmap.txt +++ b/Documentation/devicetree/bindings/mtd/mtd-physmap.txt @@ -23,27 +23,8 @@ are defined: - vendor-id : Contains the flash chip's vendor id (1 byte). - device-id : Contains the flash chip's device id (1 byte). -In addition to the information on the mtd bank itself, the -device tree may optionally contain additional information -describing partitions of the address space. This can be -used on platforms which have strong conventions about which -portions of a flash are used for what purposes, but which don't -use an on-flash partition table such as RedBoot. - -Each partition is represented as a sub-node of the mtd device. -Each node's name represents the name of the corresponding -partition of the mtd device. - -Flash partitions - - reg : The partition's offset and size within the mtd bank. - - label : (optional) The label / name for this partition. - If omitted, the label is taken from the node name (excluding - the unit address). - - read-only : (optional) This parameter, if present, is a hint to - Linux that this partition should only be mounted - read-only. This is usually used for flash partitions - containing early-boot firmware images or data which should not - be clobbered. +The device tree may optionally contain sub-nodes describing partitions of the +address space. See partition.txt for more detail. Example: diff --git a/Documentation/devicetree/bindings/mtd/nand.txt b/Documentation/devicetree/bindings/mtd/nand.txt new file mode 100644 index 000000000000..03855c8c492a --- /dev/null +++ b/Documentation/devicetree/bindings/mtd/nand.txt @@ -0,0 +1,7 @@ +* MTD generic binding + +- nand-ecc-mode : String, operation mode of the NAND ecc mode. + Supported values are: "none", "soft", "hw", "hw_syndrome", "hw_oob_first", + "soft_bch". +- nand-bus-width : 8 or 16 bus width if not present 8 +- nand-on-flash-bbt: boolean to enable on flash bbt option if not present false diff --git a/Documentation/devicetree/bindings/mtd/partition.txt b/Documentation/devicetree/bindings/mtd/partition.txt new file mode 100644 index 000000000000..f114ce1657c2 --- /dev/null +++ b/Documentation/devicetree/bindings/mtd/partition.txt @@ -0,0 +1,38 @@ +Representing flash partitions in devicetree + +Partitions can be represented by sub-nodes of an mtd device. This can be used +on platforms which have strong conventions about which portions of a flash are +used for what purposes, but which don't use an on-flash partition table such +as RedBoot. + +#address-cells & #size-cells must both be present in the mtd device and be +equal to 1. + +Required properties: +- reg : The partition's offset and size within the mtd bank. + +Optional properties: +- label : The label / name for this partition. If omitted, the label is taken + from the node name (excluding the unit address). +- read-only : This parameter, if present, is a hint to Linux that this + partition should only be mounted read-only. This is usually used for flash + partitions containing early-boot firmware images or data which should not be + clobbered. + +Examples: + + +flash@0 { + #address-cells = <1>; + #size-cells = <1>; + + partition@0 { + label = "u-boot"; + reg = <0x0000000 0x100000>; + read-only; + }; + + uimage@100000 { + reg = <0x0100000 0x200000>; + }; +]; diff --git a/Documentation/devicetree/bindings/mtd/spear_smi.txt b/Documentation/devicetree/bindings/mtd/spear_smi.txt new file mode 100644 index 000000000000..7248aadd89e4 --- /dev/null +++ b/Documentation/devicetree/bindings/mtd/spear_smi.txt @@ -0,0 +1,31 @@ +* SPEAr SMI + +Required properties: +- compatible : "st,spear600-smi" +- reg : Address range of the mtd chip +- #address-cells, #size-cells : Must be present if the device has sub-nodes + representing partitions. +- interrupt-parent: Should be the phandle for the interrupt controller + that services interrupts for this device +- interrupts: Should contain the STMMAC interrupts +- clock-rate : Functional clock rate of SMI in Hz + +Optional properties: +- st,smi-fast-mode : Flash supports read in fast mode + +Example: + + smi: flash@fc000000 { + compatible = "st,spear600-smi"; + #address-cells = <1>; + #size-cells = <1>; + reg = <0xfc000000 0x1000>; + interrupt-parent = <&vic1>; + interrupts = <12>; + clock-rate = <50000000>; /* 50MHz */ + + flash@f8000000 { + st,smi-fast-mode; + ... + }; + }; diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt new file mode 100644 index 000000000000..1f62623f8c3f --- /dev/null +++ b/Documentation/devicetree/bindings/net/stmmac.txt @@ -0,0 +1,28 @@ +* STMicroelectronics 10/100/1000 Ethernet driver (GMAC) + +Required properties: +- compatible: Should be "st,spear600-gmac" +- reg: Address and length of the register set for the device +- interrupt-parent: Should be the phandle for the interrupt controller + that services interrupts for this device +- interrupts: Should contain the STMMAC interrupts +- interrupt-names: Should contain the interrupt names "macirq" + "eth_wake_irq" if this interrupt is supported in the "interrupts" + property +- phy-mode: String, operation mode of the PHY interface. + Supported values are: "mii", "rmii", "gmii", "rgmii". + +Optional properties: +- mac-address: 6 bytes, mac address + +Examples: + + gmac0: ethernet@e0800000 { + compatible = "st,spear600-gmac"; + reg = <0xe0800000 0x8000>; + interrupt-parent = <&vic1>; + interrupts = <24 23>; + interrupt-names = "macirq", "eth_wake_irq"; + mac-address = [000000000000]; /* Filled in by U-Boot */ + phy-mode = "gmii"; + }; diff --git a/Documentation/devicetree/bindings/power_supply/max17042_battery.txt b/Documentation/devicetree/bindings/power_supply/max17042_battery.txt new file mode 100644 index 000000000000..5bc9b685cf8a --- /dev/null +++ b/Documentation/devicetree/bindings/power_supply/max17042_battery.txt @@ -0,0 +1,18 @@ +max17042_battery +~~~~~~~~~~~~~~~~ + +Required properties : + - compatible : "maxim,max17042" + +Optional properties : + - maxim,rsns-microohm : Resistance of rsns resistor in micro Ohms + (datasheet-recommended value is 10000). + Defining this property enables current-sense functionality. + +Example: + + battery-charger@36 { + compatible = "maxim,max17042"; + reg = <0x36>; + maxim,rsns-microohm = <10000>; + }; diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt b/Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt new file mode 100644 index 000000000000..bc8ded641ab6 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/fsl/mpic-msgr.txt @@ -0,0 +1,63 @@ +* FSL MPIC Message Registers + +This binding specifies what properties must be available in the device tree +representation of the message register blocks found in some FSL MPIC +implementations. + +Required properties: + + - compatible: Specifies the compatibility list for the message register + block. The type shall be <string-list> and the value shall be of the form + "fsl,mpic-v<version>-msgr", where <version> is the version number of + the MPIC containing the message registers. + + - reg: Specifies the base physical address(s) and size(s) of the + message register block's addressable register space. The type shall be + <prop-encoded-array>. + + - interrupts: Specifies a list of interrupt-specifiers which are available + for receiving interrupts. Interrupt-specifier consists of two cells: first + cell is interrupt-number and second cell is level-sense. The type shall be + <prop-encoded-array>. + +Optional properties: + + - mpic-msgr-receive-mask: Specifies what registers in the containing block + are allowed to receive interrupts. The value is a bit mask where a set + bit at bit 'n' indicates that message register 'n' can receive interrupts. + Note that "bit 'n'" is numbered from LSB for PPC hardware. The type shall + be <u32>. If not present, then all of the message registers in the block + are available. + +Aliases: + + An alias should be created for every message register block. They are not + required, though. However, a particular implementation of this binding + may require aliases to be present. Aliases are of the form + 'mpic-msgr-block<n>', where <n> is an integer specifying the block's number. + Numbers shall start at 0. + +Example: + + aliases { + mpic-msgr-block0 = &mpic_msgr_block0; + mpic-msgr-block1 = &mpic_msgr_block1; + }; + + mpic_msgr_block0: mpic-msgr-block@41400 { + compatible = "fsl,mpic-v3.1-msgr"; + reg = <0x41400 0x200>; + // Message registers 0 and 2 in this block can receive interrupts on + // sources 0xb0 and 0xb2, respectively. + interrupts = <0xb0 2 0xb2 2>; + mpic-msgr-receive-mask = <0x5>; + }; + + mpic_msgr_block1: mpic-msgr-block@42400 { + compatible = "fsl,mpic-v3.1-msgr"; + reg = <0x42400 0x200>; + // Message registers 0 and 2 in this block can receive interrupts on + // sources 0xb4 and 0xb6, respectively. + interrupts = <0xb4 2 0xb6 2>; + mpic-msgr-receive-mask = <0x5>; + }; diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt b/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt index 2cf38bd841fd..dc5744636a57 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt @@ -56,7 +56,27 @@ PROPERTIES to the client. The presence of this property also mandates that any initialization related to interrupt sources shall be limited to sources explicitly referenced in the device tree. - + + - big-endian + Usage: optional + Value type: <empty> + If present the MPIC will be assumed to be big-endian. Some + device-trees omit this property on MPIC nodes even when the MPIC is + in fact big-endian, so certain boards override this property. + + - single-cpu-affinity + Usage: optional + Value type: <empty> + If present the MPIC will be assumed to only be able to route + non-IPI interrupts to a single CPU at a time (EG: Freescale MPIC). + + - last-interrupt-source + Usage: optional + Value type: <u32> + Some MPICs do not correctly report the number of hardware sources + in the global feature registers. If specified, this field will + override the value read from MPIC_GREG_FEATURE_LAST_SRC. + INTERRUPT SPECIFIER DEFINITION Interrupt specifiers consists of 4 cells encoded as diff --git a/Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt b/Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt index 5d586e1ccaf5..5693877ab377 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt @@ -6,8 +6,10 @@ Required properties: etc.) and the second is "fsl,mpic-msi" or "fsl,ipic-msi" depending on the parent type. -- reg : should contain the address and the length of the shared message - interrupt register set. +- reg : It may contain one or two regions. The first region should contain + the address and the length of the shared message interrupt register set. + The second region should contain the address of aliased MSIIR register for + platforms that have such an alias. - msi-available-ranges: use <start count> style section to define which msi interrupt can be used in the 256 msi interrupts. This property is diff --git a/Documentation/devicetree/bindings/regulator/anatop-regulator.txt b/Documentation/devicetree/bindings/regulator/anatop-regulator.txt new file mode 100644 index 000000000000..357758cb6e92 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/anatop-regulator.txt @@ -0,0 +1,29 @@ +Anatop Voltage regulators + +Required properties: +- compatible: Must be "fsl,anatop-regulator" +- anatop-reg-offset: Anatop MFD register offset +- anatop-vol-bit-shift: Bit shift for the register +- anatop-vol-bit-width: Number of bits used in the register +- anatop-min-bit-val: Minimum value of this register +- anatop-min-voltage: Minimum voltage of this regulator +- anatop-max-voltage: Maximum voltage of this regulator + +Any property defined as part of the core regulator +binding, defined in regulator.txt, can also be used. + +Example: + + regulator-vddpu { + compatible = "fsl,anatop-regulator"; + regulator-name = "vddpu"; + regulator-min-microvolt = <725000>; + regulator-max-microvolt = <1300000>; + regulator-always-on; + anatop-reg-offset = <0x140>; + anatop-vol-bit-shift = <9>; + anatop-vol-bit-width = <5>; + anatop-min-bit-val = <1>; + anatop-min-voltage = <725000>; + anatop-max-voltage = <1300000>; + }; diff --git a/Documentation/devicetree/bindings/regulator/twl-regulator.txt b/Documentation/devicetree/bindings/regulator/twl-regulator.txt new file mode 100644 index 000000000000..0c3395d55ac1 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/twl-regulator.txt @@ -0,0 +1,68 @@ +TWL family of regulators + +Required properties: +For twl6030 regulators/LDOs +- compatible: + - "ti,twl6030-vaux1" for VAUX1 LDO + - "ti,twl6030-vaux2" for VAUX2 LDO + - "ti,twl6030-vaux3" for VAUX3 LDO + - "ti,twl6030-vmmc" for VMMC LDO + - "ti,twl6030-vpp" for VPP LDO + - "ti,twl6030-vusim" for VUSIM LDO + - "ti,twl6030-vana" for VANA LDO + - "ti,twl6030-vcxio" for VCXIO LDO + - "ti,twl6030-vdac" for VDAC LDO + - "ti,twl6030-vusb" for VUSB LDO + - "ti,twl6030-v1v8" for V1V8 LDO + - "ti,twl6030-v2v1" for V2V1 LDO + - "ti,twl6030-clk32kg" for CLK32KG RESOURCE + - "ti,twl6030-vdd1" for VDD1 SMPS + - "ti,twl6030-vdd2" for VDD2 SMPS + - "ti,twl6030-vdd3" for VDD3 SMPS +For twl6025 regulators/LDOs +- compatible: + - "ti,twl6025-ldo1" for LDO1 LDO + - "ti,twl6025-ldo2" for LDO2 LDO + - "ti,twl6025-ldo3" for LDO3 LDO + - "ti,twl6025-ldo4" for LDO4 LDO + - "ti,twl6025-ldo5" for LDO5 LDO + - "ti,twl6025-ldo6" for LDO6 LDO + - "ti,twl6025-ldo7" for LDO7 LDO + - "ti,twl6025-ldoln" for LDOLN LDO + - "ti,twl6025-ldousb" for LDOUSB LDO + - "ti,twl6025-smps3" for SMPS3 SMPS + - "ti,twl6025-smps4" for SMPS4 SMPS + - "ti,twl6025-vio" for VIO SMPS +For twl4030 regulators/LDOs +- compatible: + - "ti,twl4030-vaux1" for VAUX1 LDO + - "ti,twl4030-vaux2" for VAUX2 LDO + - "ti,twl5030-vaux2" for VAUX2 LDO + - "ti,twl4030-vaux3" for VAUX3 LDO + - "ti,twl4030-vaux4" for VAUX4 LDO + - "ti,twl4030-vmmc1" for VMMC1 LDO + - "ti,twl4030-vmmc2" for VMMC2 LDO + - "ti,twl4030-vpll1" for VPLL1 LDO + - "ti,twl4030-vpll2" for VPLL2 LDO + - "ti,twl4030-vsim" for VSIM LDO + - "ti,twl4030-vdac" for VDAC LDO + - "ti,twl4030-vintana2" for VINTANA2 LDO + - "ti,twl4030-vio" for VIO LDO + - "ti,twl4030-vdd1" for VDD1 SMPS + - "ti,twl4030-vdd2" for VDD2 SMPS + - "ti,twl4030-vintana1" for VINTANA1 LDO + - "ti,twl4030-vintdig" for VINTDIG LDO + - "ti,twl4030-vusb1v5" for VUSB1V5 LDO + - "ti,twl4030-vusb1v8" for VUSB1V8 LDO + - "ti,twl4030-vusb3v1" for VUSB3V1 LDO + +Optional properties: +- Any optional property defined in bindings/regulator/regulator.txt + +Example: + + xyz: regulator@0 { + compatible = "ti,twl6030-vaux1"; + regulator-min-microvolt = <1000000>; + regulator-max-microvolt = <3000000>; + }; diff --git a/Documentation/devicetree/bindings/rtc/sa1100-rtc.txt b/Documentation/devicetree/bindings/rtc/sa1100-rtc.txt new file mode 100644 index 000000000000..0cda19ad4859 --- /dev/null +++ b/Documentation/devicetree/bindings/rtc/sa1100-rtc.txt @@ -0,0 +1,17 @@ +* Marvell Real Time Clock controller + +Required properties: +- compatible: should be "mrvl,sa1100-rtc" +- reg: physical base address of the controller and length of memory mapped + region. +- interrupts: Should be two. The first interrupt number is the rtc alarm + interrupt and the second interrupt number is the rtc hz interrupt. +- interrupt-names: Assign name of irq resource. + +Example: + rtc: rtc@d4010000 { + compatible = "mrvl,mmp-rtc"; + reg = <0xd4010000 0x1000>; + interrupts = <5>, <6>; + interrupt-name = "rtc 1Hz", "rtc alarm"; + }; diff --git a/Documentation/devicetree/bindings/serial/mrvl-serial.txt b/Documentation/devicetree/bindings/serial/mrvl-serial.txt new file mode 100644 index 000000000000..d744340de887 --- /dev/null +++ b/Documentation/devicetree/bindings/serial/mrvl-serial.txt @@ -0,0 +1,4 @@ +PXA UART controller + +Required properties: +- compatible : should be "mrvl,mmp-uart" or "mrvl,pxa-uart". diff --git a/Documentation/devicetree/bindings/sound/alc5632.txt b/Documentation/devicetree/bindings/sound/alc5632.txt new file mode 100644 index 000000000000..8608f747dcfe --- /dev/null +++ b/Documentation/devicetree/bindings/sound/alc5632.txt @@ -0,0 +1,24 @@ +ALC5632 audio CODEC + +This device supports I2C only. + +Required properties: + + - compatible : "realtek,alc5632" + + - reg : the I2C address of the device. + + - gpio-controller : Indicates this device is a GPIO controller. + + - #gpio-cells : Should be two. The first cell is the pin number and the + second cell is used to specify optional parameters (currently unused). + +Example: + +alc5632: alc5632@1e { + compatible = "realtek,alc5632"; + reg = <0x1a>; + + gpio-controller; + #gpio-cells = <2>; +}; diff --git a/Documentation/devicetree/bindings/sound/imx-audmux.txt b/Documentation/devicetree/bindings/sound/imx-audmux.txt new file mode 100644 index 000000000000..215aa9817213 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/imx-audmux.txt @@ -0,0 +1,13 @@ +Freescale Digital Audio Mux (AUDMUX) device + +Required properties: +- compatible : "fsl,imx21-audmux" for AUDMUX version firstly used on i.MX21, + or "fsl,imx31-audmux" for the version firstly used on i.MX31. +- reg : Should contain AUDMUX registers location and length + +Example: + +audmux@021d8000 { + compatible = "fsl,imx6q-audmux", "fsl,imx31-audmux"; + reg = <0x021d8000 0x4000>; +}; diff --git a/Documentation/devicetree/bindings/sound/soc/codecs/fsl-sgtl5000.txt b/Documentation/devicetree/bindings/sound/sgtl5000.txt index 2c3cd413f042..9cc44449508d 100644 --- a/Documentation/devicetree/bindings/sound/soc/codecs/fsl-sgtl5000.txt +++ b/Documentation/devicetree/bindings/sound/sgtl5000.txt @@ -3,6 +3,8 @@ Required properties: - compatible : "fsl,sgtl5000". +- reg : the I2C address of the device + Example: codec: sgtl5000@0a { diff --git a/Documentation/devicetree/bindings/sound/tegra-audio-alc5632.txt b/Documentation/devicetree/bindings/sound/tegra-audio-alc5632.txt new file mode 100644 index 000000000000..b77a97c9101e --- /dev/null +++ b/Documentation/devicetree/bindings/sound/tegra-audio-alc5632.txt @@ -0,0 +1,59 @@ +NVIDIA Tegra audio complex + +Required properties: +- compatible : "nvidia,tegra-audio-alc5632" +- nvidia,model : The user-visible name of this sound complex. +- nvidia,audio-routing : A list of the connections between audio components. + Each entry is a pair of strings, the first being the connection's sink, + the second being the connection's source. Valid names for sources and + sinks are the ALC5632's pins: + + ALC5632 pins: + + * SPK_OUTP + * SPK_OUTN + * HP_OUT_L + * HP_OUT_R + * AUX_OUT_P + * AUX_OUT_N + * LINE_IN_L + * LINE_IN_R + * PHONE_P + * PHONE_N + * MIC1_P + * MIC1_N + * MIC2_P + * MIC2_N + * MICBIAS1 + * DMICDAT + + Board connectors: + + * Headset Stereophone + * Int Spk + * Headset Mic + * Digital Mic + +- nvidia,i2s-controller : The phandle of the Tegra I2S controller +- nvidia,audio-codec : The phandle of the ALC5632 audio codec + +Example: + +sound { + compatible = "nvidia,tegra-audio-alc5632-paz00", + "nvidia,tegra-audio-alc5632"; + + nvidia,model = "Compal PAZ00"; + + nvidia,audio-routing = + "Int Spk", "SPK_OUTP", + "Int Spk", "SPK_OUTN", + "Headset Mic","MICBIAS1", + "MIC1_N", "Headset Mic", + "MIC1_P", "Headset Mic", + "Headset Stereophone", "HP_OUT_R", + "Headset Stereophone", "HP_OUT_L"; + + nvidia,i2s-controller = <&tegra_i2s1>; + nvidia,audio-codec = <&alc5632>; +}; diff --git a/Documentation/devicetree/bindings/spi/omap-spi.txt b/Documentation/devicetree/bindings/spi/omap-spi.txt new file mode 100644 index 000000000000..81df374adbb9 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/omap-spi.txt @@ -0,0 +1,20 @@ +OMAP2+ McSPI device + +Required properties: +- compatible : + - "ti,omap2-spi" for OMAP2 & OMAP3. + - "ti,omap4-spi" for OMAP4+. +- ti,spi-num-cs : Number of chipselect supported by the instance. +- ti,hwmods: Name of the hwmod associated to the McSPI + + +Example: + +mcspi1: mcspi@1 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "ti,omap4-mcspi"; + ti,hwmods = "mcspi1"; + ti,spi-num-cs = <4>; +}; + diff --git a/Documentation/devicetree/bindings/tty/serial/efm32-uart.txt b/Documentation/devicetree/bindings/tty/serial/efm32-uart.txt new file mode 100644 index 000000000000..6588b6950a7f --- /dev/null +++ b/Documentation/devicetree/bindings/tty/serial/efm32-uart.txt @@ -0,0 +1,14 @@ +* Energymicro efm32 UART + +Required properties: +- compatible : Should be "efm32,uart" +- reg : Address and length of the register set +- interrupts : Should contain uart interrupt + +Example: + +uart@0x4000c400 { + compatible = "efm32,uart"; + reg = <0x4000c400 0x400>; + interrupts = <15>; +}; diff --git a/Documentation/devicetree/bindings/usb/atmel-usb.txt b/Documentation/devicetree/bindings/usb/atmel-usb.txt new file mode 100644 index 000000000000..60bd2150a3e6 --- /dev/null +++ b/Documentation/devicetree/bindings/usb/atmel-usb.txt @@ -0,0 +1,49 @@ +Atmel SOC USB controllers + +OHCI + +Required properties: + - compatible: Should be "atmel,at91rm9200-ohci" for USB controllers + used in host mode. + - num-ports: Number of ports. + - atmel,vbus-gpio: If present, specifies a gpio that needs to be + activated for the bus to be powered. + - atmel,oc-gpio: If present, specifies a gpio that needs to be + activated for the overcurrent detection. + +usb0: ohci@00500000 { + compatible = "atmel,at91rm9200-ohci", "usb-ohci"; + reg = <0x00500000 0x100000>; + interrupts = <20 4>; + num-ports = <2>; +}; + +EHCI + +Required properties: + - compatible: Should be "atmel,at91sam9g45-ehci" for USB controllers + used in host mode. + +usb1: ehci@00800000 { + compatible = "atmel,at91sam9g45-ehci", "usb-ehci"; + reg = <0x00800000 0x100000>; + interrupts = <22 4>; +}; + +AT91 USB device controller + +Required properties: + - compatible: Should be "atmel,at91rm9200-udc" + - reg: Address and length of the register set for the device + - interrupts: Should contain macb interrupt + +Optional properties: + - atmel,vbus-gpio: If present, specifies a gpio that needs to be + activated for the bus to be powered. + +usb1: gadget@fffa4000 { + compatible = "atmel,at91rm9200-udc"; + reg = <0xfffa4000 0x4000>; + interrupts = <10 4>; + atmel,vbus-gpio = <&pioC 5 0>; +}; diff --git a/Documentation/devicetree/bindings/usb/tegra-usb.txt b/Documentation/devicetree/bindings/usb/tegra-usb.txt index 035d63d5646d..007005ddbe12 100644 --- a/Documentation/devicetree/bindings/usb/tegra-usb.txt +++ b/Documentation/devicetree/bindings/usb/tegra-usb.txt @@ -11,3 +11,16 @@ Required properties : - phy_type : Should be one of "ulpi" or "utmi". - nvidia,vbus-gpio : If present, specifies a gpio that needs to be activated for the bus to be powered. + +Optional properties: + - dr_mode : dual role mode. Indicates the working mode for + nvidia,tegra20-ehci compatible controllers. Can be "host", "peripheral", + or "otg". Default to "host" if not defined for backward compatibility. + host means this is a host controller + peripheral means it is device controller + otg means it can operate as either ("on the go") + - nvidia,has-legacy-mode : boolean indicates whether this controller can + operate in legacy mode (as APX 2500 / 2600). In legacy mode some + registers are accessed through the APB_MISC base address instead of + the USB controller. Since this is a legacy issue it probably does not + warrant a compatible string of its own. diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index a20008ab319a..82ac057a24a9 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -34,6 +34,7 @@ picochip Picochip Ltd powervr Imagination Technologies qcom Qualcomm, Inc. ramtron Ramtron International +realtek Realtek Semiconductor Corp. samsung Samsung Semiconductor sbs Smart Battery System schindler Schindler diff --git a/Documentation/devicetree/booting-without-of.txt b/Documentation/devicetree/booting-without-of.txt index 7c1329de0596..da0bfeb4253d 100644 --- a/Documentation/devicetree/booting-without-of.txt +++ b/Documentation/devicetree/booting-without-of.txt @@ -169,7 +169,7 @@ it with special cases. b) Entry with a flattened device-tree block. Firmware loads the physical address of the flattened device tree block (dtb) into r2, - r1 is not used, but it is considered good practise to use a valid + r1 is not used, but it is considered good practice to use a valid machine number as described in Documentation/arm/Booting. r0 : 0 diff --git a/Documentation/devicetree/usage-model.txt b/Documentation/devicetree/usage-model.txt new file mode 100644 index 000000000000..c5a80099b71c --- /dev/null +++ b/Documentation/devicetree/usage-model.txt @@ -0,0 +1,412 @@ +Linux and the Device Tree +------------------------- +The Linux usage model for device tree data + +Author: Grant Likely <grant.likely@secretlab.ca> + +This article describes how Linux uses the device tree. An overview of +the device tree data format can be found on the device tree usage page +at devicetree.org[1]. + +[1] http://devicetree.org/Device_Tree_Usage + +The "Open Firmware Device Tree", or simply Device Tree (DT), is a data +structure and language for describing hardware. More specifically, it +is a description of hardware that is readable by an operating system +so that the operating system doesn't need to hard code details of the +machine. + +Structurally, the DT is a tree, or acyclic graph with named nodes, and +nodes may have an arbitrary number of named properties encapsulating +arbitrary data. A mechanism also exists to create arbitrary +links from one node to another outside of the natural tree structure. + +Conceptually, a common set of usage conventions, called 'bindings', +is defined for how data should appear in the tree to describe typical +hardware characteristics including data busses, interrupt lines, GPIO +connections, and peripheral devices. + +As much as possible, hardware is described using existing bindings to +maximize use of existing support code, but since property and node +names are simply text strings, it is easy to extend existing bindings +or create new ones by defining new nodes and properties. Be wary, +however, of creating a new binding without first doing some homework +about what already exists. There are currently two different, +incompatible, bindings for i2c busses that came about because the new +binding was created without first investigating how i2c devices were +already being enumerated in existing systems. + +1. History +---------- +The DT was originally created by Open Firmware as part of the +communication method for passing data from Open Firmware to a client +program (like to an operating system). An operating system used the +Device Tree to discover the topology of the hardware at runtime, and +thereby support a majority of available hardware without hard coded +information (assuming drivers were available for all devices). + +Since Open Firmware is commonly used on PowerPC and SPARC platforms, +the Linux support for those architectures has for a long time used the +Device Tree. + +In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit +and 64-bit support, the decision was made to require DT support on all +powerpc platforms, regardless of whether or not they used Open +Firmware. To do this, a DT representation called the Flattened Device +Tree (FDT) was created which could be passed to the kernel as a binary +blob without requiring a real Open Firmware implementation. U-Boot, +kexec, and other bootloaders were modified to support both passing a +Device Tree Binary (dtb) and to modify a dtb at boot time. DT was +also added to the PowerPC boot wrapper (arch/powerpc/boot/*) so that +a dtb could be wrapped up with the kernel image to support booting +existing non-DT aware firmware. + +Some time later, FDT infrastructure was generalized to be usable by +all architectures. At the time of this writing, 6 mainlined +architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1 +out of mainline (nios) have some level of DT support. + +2. Data Model +------------- +If you haven't already read the Device Tree Usage[1] page, +then go read it now. It's okay, I'll wait.... + +2.1 High Level View +------------------- +The most important thing to understand is that the DT is simply a data +structure that describes the hardware. There is nothing magical about +it, and it doesn't magically make all hardware configuration problems +go away. What it does do is provide a language for decoupling the +hardware configuration from the board and device driver support in the +Linux kernel (or any other operating system for that matter). Using +it allows board and device support to become data driven; to make +setup decisions based on data passed into the kernel instead of on +per-machine hard coded selections. + +Ideally, data driven platform setup should result in less code +duplication and make it easier to support a wide range of hardware +with a single kernel image. + +Linux uses DT data for three major purposes: +1) platform identification, +2) runtime configuration, and +3) device population. + +2.2 Platform Identification +--------------------------- +First and foremost, the kernel will use data in the DT to identify the +specific machine. In a perfect world, the specific platform shouldn't +matter to the kernel because all platform details would be described +perfectly by the device tree in a consistent and reliable manner. +Hardware is not perfect though, and so the kernel must identify the +machine during early boot so that it has the opportunity to run +machine-specific fixups. + +In the majority of cases, the machine identity is irrelevant, and the +kernel will instead select setup code based on the machine's core +CPU or SoC. On ARM for example, setup_arch() in +arch/arm/kernel/setup.c will call setup_machine_fdt() in +arch/arm/kernel/devicetree.c which searches through the machine_desc +table and selects the machine_desc which best matches the device tree +data. It determines the best match by looking at the 'compatible' +property in the root device tree node, and comparing it with the +dt_compat list in struct machine_desc. + +The 'compatible' property contains a sorted list of strings starting +with the exact name of the machine, followed by an optional list of +boards it is compatible with sorted from most compatible to least. For +example, the root compatible properties for the TI BeagleBoard and its +successor, the BeagleBoard xM board might look like: + + compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3"; + compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3"; + +Where "ti,omap3-beagleboard-xm" specifies the exact model, it also +claims that it compatible with the OMAP 3450 SoC, and the omap3 family +of SoCs in general. You'll notice that the list is sorted from most +specific (exact board) to least specific (SoC family). + +Astute readers might point out that the Beagle xM could also claim +compatibility with the original Beagle board. However, one should be +cautioned about doing so at the board level since there is typically a +high level of change from one board to another, even within the same +product line, and it is hard to nail down exactly what is meant when one +board claims to be compatible with another. For the top level, it is +better to err on the side of caution and not claim one board is +compatible with another. The notable exception would be when one +board is a carrier for another, such as a CPU module attached to a +carrier board. + +One more note on compatible values. Any string used in a compatible +property must be documented as to what it indicates. Add +documentation for compatible strings in Documentation/devicetree/bindings. + +Again on ARM, for each machine_desc, the kernel looks to see if +any of the dt_compat list entries appear in the compatible property. +If one does, then that machine_desc is a candidate for driving the +machine. After searching the entire table of machine_descs, +setup_machine_fdt() returns the 'most compatible' machine_desc based +on which entry in the compatible property each machine_desc matches +against. If no matching machine_desc is found, then it returns NULL. + +The reasoning behind this scheme is the observation that in the majority +of cases, a single machine_desc can support a large number of boards +if they all use the same SoC, or same family of SoCs. However, +invariably there will be some exceptions where a specific board will +require special setup code that is not useful in the generic case. +Special cases could be handled by explicitly checking for the +troublesome board(s) in generic setup code, but doing so very quickly +becomes ugly and/or unmaintainable if it is more than just a couple of +cases. + +Instead, the compatible list allows a generic machine_desc to provide +support for a wide common set of boards by specifying "less +compatible" value in the dt_compat list. In the example above, +generic board support can claim compatibility with "ti,omap3" or +"ti,omap3450". If a bug was discovered on the original beagleboard +that required special workaround code during early boot, then a new +machine_desc could be added which implements the workarounds and only +matches on "ti,omap3-beagleboard". + +PowerPC uses a slightly different scheme where it calls the .probe() +hook from each machine_desc, and the first one returning TRUE is used. +However, this approach does not take into account the priority of the +compatible list, and probably should be avoided for new architecture +support. + +2.3 Runtime configuration +------------------------- +In most cases, a DT will be the sole method of communicating data from +firmware to the kernel, so also gets used to pass in runtime and +configuration data like the kernel parameters string and the location +of an initrd image. + +Most of this data is contained in the /chosen node, and when booting +Linux it will look something like this: + + chosen { + bootargs = "console=ttyS0,115200 loglevel=8"; + initrd-start = <0xc8000000>; + initrd-end = <0xc8200000>; + }; + +The bootargs property contains the kernel arguments, and the initrd-* +properties define the address and size of an initrd blob. The +chosen node may also optionally contain an arbitrary number of +additional properties for platform-specific configuration data. + +During early boot, the architecture setup code calls of_scan_flat_dt() +several times with different helper callbacks to parse device tree +data before paging is setup. The of_scan_flat_dt() code scans through +the device tree and uses the helpers to extract information required +during early boot. Typically the early_init_dt_scan_chosen() helper +is used to parse the chosen node including kernel parameters, +early_init_dt_scan_root() to initialize the DT address space model, +and early_init_dt_scan_memory() to determine the size and +location of usable RAM. + +On ARM, the function setup_machine_fdt() is responsible for early +scanning of the device tree after selecting the correct machine_desc +that supports the board. + +2.4 Device population +--------------------- +After the board has been identified, and after the early configuration data +has been parsed, then kernel initialization can proceed in the normal +way. At some point in this process, unflatten_device_tree() is called +to convert the data into a more efficient runtime representation. +This is also when machine-specific setup hooks will get called, like +the machine_desc .init_early(), .init_irq() and .init_machine() hooks +on ARM. The remainder of this section uses examples from the ARM +implementation, but all architectures will do pretty much the same +thing when using a DT. + +As can be guessed by the names, .init_early() is used for any machine- +specific setup that needs to be executed early in the boot process, +and .init_irq() is used to set up interrupt handling. Using a DT +doesn't materially change the behaviour of either of these functions. +If a DT is provided, then both .init_early() and .init_irq() are able +to call any of the DT query functions (of_* in include/linux/of*.h) to +get additional data about the platform. + +The most interesting hook in the DT context is .init_machine() which +is primarily responsible for populating the Linux device model with +data about the platform. Historically this has been implemented on +embedded platforms by defining a set of static clock structures, +platform_devices, and other data in the board support .c file, and +registering it en-masse in .init_machine(). When DT is used, then +instead of hard coding static devices for each platform, the list of +devices can be obtained by parsing the DT, and allocating device +structures dynamically. + +The simplest case is when .init_machine() is only responsible for +registering a block of platform_devices. A platform_device is a concept +used by Linux for memory or I/O mapped devices which cannot be detected +by hardware, and for 'composite' or 'virtual' devices (more on those +later). While there is no 'platform device' terminology for the DT, +platform devices roughly correspond to device nodes at the root of the +tree and children of simple memory mapped bus nodes. + +About now is a good time to lay out an example. Here is part of the +device tree for the NVIDIA Tegra board. + +/{ + compatible = "nvidia,harmony", "nvidia,tegra20"; + #address-cells = <1>; + #size-cells = <1>; + interrupt-parent = <&intc>; + + chosen { }; + aliases { }; + + memory { + device_type = "memory"; + reg = <0x00000000 0x40000000>; + }; + + soc { + compatible = "nvidia,tegra20-soc", "simple-bus"; + #address-cells = <1>; + #size-cells = <1>; + ranges; + + intc: interrupt-controller@50041000 { + compatible = "nvidia,tegra20-gic"; + interrupt-controller; + #interrupt-cells = <1>; + reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >; + }; + + serial@70006300 { + compatible = "nvidia,tegra20-uart"; + reg = <0x70006300 0x100>; + interrupts = <122>; + }; + + i2s1: i2s@70002800 { + compatible = "nvidia,tegra20-i2s"; + reg = <0x70002800 0x100>; + interrupts = <77>; + codec = <&wm8903>; + }; + + i2c@7000c000 { + compatible = "nvidia,tegra20-i2c"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x7000c000 0x100>; + interrupts = <70>; + + wm8903: codec@1a { + compatible = "wlf,wm8903"; + reg = <0x1a>; + interrupts = <347>; + }; + }; + }; + + sound { + compatible = "nvidia,harmony-sound"; + i2s-controller = <&i2s1>; + i2s-codec = <&wm8903>; + }; +}; + +At .machine_init() time, Tegra board support code will need to look at +this DT and decide which nodes to create platform_devices for. +However, looking at the tree, it is not immediately obvious what kind +of device each node represents, or even if a node represents a device +at all. The /chosen, /aliases, and /memory nodes are informational +nodes that don't describe devices (although arguably memory could be +considered a device). The children of the /soc node are memory mapped +devices, but the codec@1a is an i2c device, and the sound node +represents not a device, but rather how other devices are connected +together to create the audio subsystem. I know what each device is +because I'm familiar with the board design, but how does the kernel +know what to do with each node? + +The trick is that the kernel starts at the root of the tree and looks +for nodes that have a 'compatible' property. First, it is generally +assumed that any node with a 'compatible' property represents a device +of some kind, and second, it can be assumed that any node at the root +of the tree is either directly attached to the processor bus, or is a +miscellaneous system device that cannot be described any other way. +For each of these nodes, Linux allocates and registers a +platform_device, which in turn may get bound to a platform_driver. + +Why is using a platform_device for these nodes a safe assumption? +Well, for the way that Linux models devices, just about all bus_types +assume that its devices are children of a bus controller. For +example, each i2c_client is a child of an i2c_master. Each spi_device +is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The +same hierarchy is also found in the DT, where I2C device nodes only +ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB, +etc. The only devices which do not require a specific type of parent +device are platform_devices (and amba_devices, but more on that +later), which will happily live at the base of the Linux /sys/devices +tree. Therefore, if a DT node is at the root of the tree, then it +really probably is best registered as a platform_device. + +Linux board support code calls of_platform_populate(NULL, NULL, NULL) +to kick off discovery of devices at the root of the tree. The +parameters are all NULL because when starting from the root of the +tree, there is no need to provide a starting node (the first NULL), a +parent struct device (the last NULL), and we're not using a match +table (yet). For a board that only needs to register devices, +.init_machine() can be completely empty except for the +of_platform_populate() call. + +In the Tegra example, this accounts for the /soc and /sound nodes, but +what about the children of the SoC node? Shouldn't they be registered +as platform devices too? For Linux DT support, the generic behaviour +is for child devices to be registered by the parent's device driver at +driver .probe() time. So, an i2c bus device driver will register a +i2c_client for each child node, an SPI bus driver will register +its spi_device children, and similarly for other bus_types. +According to that model, a driver could be written that binds to the +SoC node and simply registers platform_devices for each of its +children. The board support code would allocate and register an SoC +device, a (theoretical) SoC device driver could bind to the SoC device, +and register platform_devices for /soc/interrupt-controller, /soc/serial, +/soc/i2s, and /soc/i2c in its .probe() hook. Easy, right? + +Actually, it turns out that registering children of some +platform_devices as more platform_devices is a common pattern, and the +device tree support code reflects that and makes the above example +simpler. The second argument to of_platform_populate() is an +of_device_id table, and any node that matches an entry in that table +will also get its child nodes registered. In the tegra case, the code +can look something like this: + +static void __init harmony_init_machine(void) +{ + /* ... */ + of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); +} + +"simple-bus" is defined in the ePAPR 1.0 specification as a property +meaning a simple memory mapped bus, so the of_platform_populate() code +could be written to just assume simple-bus compatible nodes will +always be traversed. However, we pass it in as an argument so that +board support code can always override the default behaviour. + +[Need to add discussion of adding i2c/spi/etc child devices] + +Appendix A: AMBA devices +------------------------ + +ARM Primecells are a certain kind of device attached to the ARM AMBA +bus which include some support for hardware detection and power +management. In Linux, struct amba_device and the amba_bus_type is +used to represent Primecell devices. However, the fiddly bit is that +not all devices on an AMBA bus are Primecells, and for Linux it is +typical for both amba_device and platform_device instances to be +siblings of the same bus segment. + +When using the DT, this creates problems for of_platform_populate() +because it must decide whether to register each node as either a +platform_device or an amba_device. This unfortunately complicates the +device creation model a little bit, but the solution turns out not to +be too invasive. If a node is compatible with "arm,amba-primecell", then +of_platform_populate() will register it as an amba_device instead of a +platform_device. diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 225f96d88f55..3bbd5c51605a 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -32,8 +32,12 @@ The buffer-user *IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details] For this first version, A buffer shared using the dma_buf sharing API: - *may* be exported to user space using "mmap" *ONLY* by exporter, outside of - this framework. -- may be used *ONLY* by importers that do not need CPU access to the buffer. + this framework. +- with this new iteration of the dma-buf api cpu access from the kernel has been + enable, see below for the details. + +dma-buf operations for device dma only +-------------------------------------- The dma_buf buffer sharing API usage contains the following steps: @@ -219,10 +223,120 @@ NOTES: If the exporter chooses not to allow an attach() operation once a map_dma_buf() API has been called, it simply returns an error. -Miscellaneous notes: +Kernel cpu access to a dma-buf buffer object +-------------------------------------------- + +The motivation to allow cpu access from the kernel to a dma-buf object from the +importers side are: +- fallback operations, e.g. if the devices is connected to a usb bus and the + kernel needs to shuffle the data around first before sending it away. +- full transparency for existing users on the importer side, i.e. userspace + should not notice the difference between a normal object from that subsystem + and an imported one backed by a dma-buf. This is really important for drm + opengl drivers that expect to still use all the existing upload/download + paths. + +Access to a dma_buf from the kernel context involves three steps: + +1. Prepare access, which invalidate any necessary caches and make the object + available for cpu access. +2. Access the object page-by-page with the dma_buf map apis +3. Finish access, which will flush any necessary cpu caches and free reserved + resources. + +1. Prepare access + + Before an importer can access a dma_buf object with the cpu from the kernel + context, it needs to notify the exporter of the access that is about to + happen. + + Interface: + int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, + size_t start, size_t len, + enum dma_data_direction direction) + + This allows the exporter to ensure that the memory is actually available for + cpu access - the exporter might need to allocate or swap-in and pin the + backing storage. The exporter also needs to ensure that cpu access is + coherent for the given range and access direction. The range and access + direction can be used by the exporter to optimize the cache flushing, i.e. + access outside of the range or with a different direction (read instead of + write) might return stale or even bogus data (e.g. when the exporter needs to + copy the data to temporary storage). + + This step might fail, e.g. in oom conditions. + +2. Accessing the buffer + + To support dma_buf objects residing in highmem cpu access is page-based using + an api similar to kmap. Accessing a dma_buf is done in aligned chunks of + PAGE_SIZE size. Before accessing a chunk it needs to be mapped, which returns + a pointer in kernel virtual address space. Afterwards the chunk needs to be + unmapped again. There is no limit on how often a given chunk can be mapped + and unmapped, i.e. the importer does not need to call begin_cpu_access again + before mapping the same chunk again. + + Interfaces: + void *dma_buf_kmap(struct dma_buf *, unsigned long); + void dma_buf_kunmap(struct dma_buf *, unsigned long, void *); + + There are also atomic variants of these interfaces. Like for kmap they + facilitate non-blocking fast-paths. Neither the importer nor the exporter (in + the callback) is allowed to block when using these. + + Interfaces: + void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long); + void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *); + + For importers all the restrictions of using kmap apply, like the limited + supply of kmap_atomic slots. Hence an importer shall only hold onto at most 2 + atomic dma_buf kmaps at the same time (in any given process context). + + dma_buf kmap calls outside of the range specified in begin_cpu_access are + undefined. If the range is not PAGE_SIZE aligned, kmap needs to succeed on + the partial chunks at the beginning and end but may return stale or bogus + data outside of the range (in these partial chunks). + + Note that these calls need to always succeed. The exporter needs to complete + any preparations that might fail in begin_cpu_access. + +3. Finish access + + When the importer is done accessing the range specified in begin_cpu_access, + it needs to announce this to the exporter (to facilitate cache flushing and + unpinning of any pinned resources). The result of of any dma_buf kmap calls + after end_cpu_access is undefined. + + Interface: + void dma_buf_end_cpu_access(struct dma_buf *dma_buf, + size_t start, size_t len, + enum dma_data_direction dir); + + +Miscellaneous notes +------------------- + - Any exporters or users of the dma-buf buffer sharing framework must have a 'select DMA_SHARED_BUFFER' in their respective Kconfigs. +- In order to avoid fd leaks on exec, the FD_CLOEXEC flag must be set + on the file descriptor. This is not just a resource leak, but a + potential security hole. It could give the newly exec'd application + access to buffers, via the leaked fd, to which it should otherwise + not be permitted access. + + The problem with doing this via a separate fcntl() call, versus doing it + atomically when the fd is created, is that this is inherently racy in a + multi-threaded app[3]. The issue is made worse when it is library code + opening/creating the file descriptor, as the application may not even be + aware of the fd's. + + To avoid this problem, userspace must have a way to request O_CLOEXEC + flag be set when the dma-buf fd is created. So any API provided by + the exporting driver to create a dmabuf fd must provide a way to let + userspace control setting of O_CLOEXEC flag passed in to dma_buf_fd(). + References: [1] struct dma_buf_ops in include/linux/dma-buf.h [2] All interfaces mentioned above defined in include/linux/dma-buf.h +[3] https://lwn.net/Articles/236486/ diff --git a/Documentation/dmaengine.txt b/Documentation/dmaengine.txt index bbe6cb3d1856..879b6e31e2da 100644 --- a/Documentation/dmaengine.txt +++ b/Documentation/dmaengine.txt @@ -63,7 +63,7 @@ The slave DMA usage consists of following steps: struct dma_slave_config *config) Please see the dma_slave_config structure definition in dmaengine.h - for a detailed explaination of the struct members. Please note + for a detailed explanation of the struct members. Please note that the 'direction' member will be going away as it duplicates the direction given in the prepare call. diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 0c083c5c2faa..b4a898f43c37 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -158,7 +158,6 @@ logo_*.c logo_*_clut224.c logo_*_mono.c lxdialog -mach mach-types mach-types.h machtypes.h diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt index 41c0c5d1ba14..2a596a4fc23e 100644 --- a/Documentation/driver-model/devres.txt +++ b/Documentation/driver-model/devres.txt @@ -271,3 +271,8 @@ IOMAP pcim_iounmap() pcim_iomap_table() : array of mapped addresses indexed by BAR pcim_iomap_regions() : do request_region() and iomap() on multiple BARs + +REGULATOR + devm_regulator_get() + devm_regulator_put() + devm_regulator_bulk_get() diff --git a/Documentation/dynamic-debug-howto.txt b/Documentation/dynamic-debug-howto.txt index f959909d7154..74e6c7782678 100644 --- a/Documentation/dynamic-debug-howto.txt +++ b/Documentation/dynamic-debug-howto.txt @@ -12,7 +12,7 @@ dynamically enabled per-callsite. Dynamic debug has even more useful features: * Simple query language allows turning on and off debugging statements by - matching any combination of: + matching any combination of 0 or 1 of: - source filename - function name @@ -79,31 +79,24 @@ Command Language Reference ========================== At the lexical level, a command comprises a sequence of words separated -by whitespace characters. Note that newlines are treated as word -separators and do *not* end a command or allow multiple commands to -be done together. So these are all equivalent: +by spaces or tabs. So these are all equivalent: nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' > <debugfs>/dynamic_debug/control nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' > <debugfs>/dynamic_debug/control -nullarbor:~ # echo -c 'file svcsock.c\nline 1603 +p' > - <debugfs>/dynamic_debug/control nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > <debugfs>/dynamic_debug/control -Commands are bounded by a write() system call. If you want to do -multiple commands you need to do a separate "echo" for each, like: +Command submissions are bounded by a write() system call. +Multiple commands can be written together, separated by ';' or '\n'. -nullarbor:~ # echo 'file svcsock.c line 1603 +p' > /proc/dprintk ;\ -> echo 'file svcsock.c line 1563 +p' > /proc/dprintk + ~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \ + > <debugfs>/dynamic_debug/control -or even like: +If your query set is big, you can batch them too: -nullarbor:~ # ( -> echo 'file svcsock.c line 1603 +p' ;\ -> echo 'file svcsock.c line 1563 +p' ;\ -> ) > /proc/dprintk + ~# cat query-batch-file > <debugfs>/dynamic_debug/control At the syntactical level, a command comprises a sequence of match specifications, followed by a flags change specification. @@ -144,11 +137,12 @@ func func svc_tcp_accept file - The given string is compared against either the full - pathname or the basename of the source file of each - callsite. Examples: + The given string is compared against either the full pathname, the + src-root relative pathname, or the basename of the source file of + each callsite. Examples: file svcsock.c + file kernel/freezer.c file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c module diff --git a/Documentation/edac.txt b/Documentation/edac.txt index 249822cde82b..fdcc49fad8e1 100644 --- a/Documentation/edac.txt +++ b/Documentation/edac.txt @@ -334,8 +334,8 @@ Sdram memory scrubbing rate: Reading the file will return the actual scrubbing rate employed. - If configuration fails or memory scrubbing is not implemented, the value - of the attribute file will be -1. + If configuration fails or memory scrubbing is not implemented, accessing + that attribute will fail. diff --git a/Documentation/fb/intel810.txt b/Documentation/fb/intel810.txt index be3e7836abef..a8e9f5bca6f3 100644 --- a/Documentation/fb/intel810.txt +++ b/Documentation/fb/intel810.txt @@ -211,7 +211,7 @@ Using the same setup as described above, load the module like this: modprobe i810fb vram=2 xres=1024 bpp=8 hsync1=30 hsync2=55 vsync1=50 \ vsync2=85 accel=1 mtrr=1 -Or just add the following to /etc/modprobe.conf +Or just add the following to a configuration file in /etc/modprobe.d/ options i810fb vram=2 xres=1024 bpp=16 hsync1=30 hsync2=55 vsync1=50 \ vsync2=85 accel=1 mtrr=1 diff --git a/Documentation/fb/intelfb.txt b/Documentation/fb/intelfb.txt index dd9e944ea628..feac4e4d6968 100644 --- a/Documentation/fb/intelfb.txt +++ b/Documentation/fb/intelfb.txt @@ -120,7 +120,7 @@ Using the same setup as described above, load the module like this: modprobe intelfb mode=800x600-32@75 vram=8 accel=1 hwcursor=1 -Or just add the following to /etc/modprobe.conf +Or just add the following to a configuration file in /etc/modprobe.d/ options intelfb mode=800x600-32@75 vram=8 accel=1 hwcursor=1 diff --git a/Documentation/fb/matroxfb.txt b/Documentation/fb/matroxfb.txt index e5ce8a1a978b..b95f5bb522f2 100644 --- a/Documentation/fb/matroxfb.txt +++ b/Documentation/fb/matroxfb.txt @@ -177,8 +177,8 @@ sgram - tells to driver that you have Gxx0 with SGRAM memory. It has no effect without `init'. sdram - tells to driver that you have Gxx0 with SDRAM memory. It is a default. -inv24 - change timings parameters for 24bpp modes on Millenium and - Millenium II. Specify this if you see strange color shadows around +inv24 - change timings parameters for 24bpp modes on Millennium and + Millennium II. Specify this if you see strange color shadows around characters. noinv24 - use standard timings. It is the default. inverse - invert colors on screen (for LCD displays) @@ -204,9 +204,9 @@ grayscale - enable grayscale summing. It works in PSEUDOCOLOR modes (text, can paint colors. nograyscale - disable grayscale summing. It is default. cross4MB - enables that pixel line can cross 4MB boundary. It is default for - non-Millenium. + non-Millennium. nocross4MB - pixel line must not cross 4MB boundary. It is default for - Millenium I or II, because of these devices have hardware + Millennium I or II, because of these devices have hardware limitations which do not allow this. But this option is incompatible with some (if not all yet released) versions of XF86_FBDev. diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 576257f180eb..e4b57756b9f5 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -6,14 +6,6 @@ be removed from this file. --------------------------- -What: x86 floppy disable_hlt -When: 2012 -Why: ancient workaround of dubious utility clutters the - code used by everybody else. -Who: Len Brown <len.brown@intel.com> - ---------------------------- - What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle When: 2012 Why: This optional sub-feature of APM is of dubious reliability, @@ -513,17 +505,40 @@ Who: Bjorn Helgaas <bhelgaas@google.com> ---------------------------- -What: The CAP9 SoC family will be removed -When: 3.4 -Files: arch/arm/mach-at91/at91cap9.c - arch/arm/mach-at91/at91cap9_devices.c - arch/arm/mach-at91/include/mach/at91cap9.h - arch/arm/mach-at91/include/mach/at91cap9_matrix.h - arch/arm/mach-at91/include/mach/at91cap9_ddrsdr.h - arch/arm/mach-at91/board-cap9adk.c -Why: The code is not actively maintained and platforms are now hard to find. -Who: Nicolas Ferre <nicolas.ferre@atmel.com> - Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> +What: Low Performance USB Block driver ("CONFIG_BLK_DEV_UB") +When: 3.6 +Why: This driver provides support for USB storage devices like "USB + sticks". As of now, it is deactivated in Debian, Fedora and + Ubuntu. All current users can switch over to usb-storage + (CONFIG_USB_STORAGE) which only drawback is the additional SCSI + stack. +Who: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> + +---------------------------- + +What: kmap_atomic(page, km_type) +When: 3.5 +Why: The old kmap_atomic() with two arguments is deprecated, we only + keep it for backward compatibility for few cycles and then drop it. +Who: Cong Wang <amwang@redhat.com> + +---------------------------- + +What: get_robust_list syscall +When: 2013 +Why: There appear to be no production users of the get_robust_list syscall, + and it runs the risk of leaking address locations, allowing the bypass + of ASLR. It was only ever intended for debugging, so it should be + removed. +Who: Kees Cook <keescook@chromium.org> + +---------------------------- + +What: setitimer accepts user NULL pointer (value) +When: 3.6 +Why: setitimer is not returning -EFAULT if user pointer is NULL. This + violates the spec. +Who: Sasikantha Babu <sasikanth.v19@gmail.com> ---------------------------- diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt index 6872c91bce35..7a34f827989c 100644 --- a/Documentation/filesystems/debugfs.txt +++ b/Documentation/filesystems/debugfs.txt @@ -14,7 +14,10 @@ Debugfs is typically mounted with a command like: mount -t debugfs none /sys/kernel/debug -(Or an equivalent /etc/fstab line). +(Or an equivalent /etc/fstab line). +The debugfs root directory is accessible by anyone by default. To +restrict access to the tree the "uid", "gid" and "mode" mount +options can be used. Note that the debugfs API is exported GPL-only to modules. @@ -133,7 +136,7 @@ file. void __iomem *base; }; - struct dentry *debugfs_create_regset32(const char *name, mode_t mode, + struct dentry *debugfs_create_regset32(const char *name, umode_t mode, struct dentry *parent, struct debugfs_regset32 *regset); diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 10ec4639f152..1b7f9acbcbbe 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -144,9 +144,6 @@ journal_async_commit Commit block can be written to disk without waiting mount the device. This will enable 'journal_checksum' internally. -journal=update Update the ext4 file system's journal to the current - format. - journal_dev=devnum When the external journal device's major/minor numbers have changed, this option allows the user to specify the new journal location. The journal device is @@ -308,7 +305,7 @@ min_batch_time=usec This parameter sets the commit time (as fast disks, at the cost of increasing latency. journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the - highest priorty) which should be used for I/O + highest priority) which should be used for I/O operations submitted by kjournald2 during a commit operation. This defaults to 3, which is a slightly higher priority than the default I/O @@ -343,7 +340,7 @@ noinit_itable Do not initialize any uninitialized inode table init_itable=n The lazy itable init code will wait n times the number of milliseconds it took to zero out the previous block group's inode table. This - minimizes the impact on the systme performance + minimizes the impact on the system performance while file system's inode table is being initialized. discard Controls whether ext4 should issue discard/TRIM @@ -356,11 +353,6 @@ nouid32 Disables 32-bit UIDs and GIDs. This is for interoperability with older kernels which only store and expect 16-bit values. -resize Allows to resize filesystem to the end of the last - existing block group, further resize has to be done - with resize2fs either online, or offline. It can be - used only with conjunction with remount. - block_validity This options allows to enables/disables the in-kernel noblock_validity facility for tracking filesystem metadata blocks within internal data structures. This allows multi- diff --git a/Documentation/filesystems/files.txt b/Documentation/filesystems/files.txt index ac2facc50d2a..46dfc6b038c3 100644 --- a/Documentation/filesystems/files.txt +++ b/Documentation/filesystems/files.txt @@ -113,8 +113,8 @@ the fdtable structure - if (fd >= 0) { /* locate_fd() may have expanded fdtable, load the ptr */ fdt = files_fdtable(files); - FD_SET(fd, fdt->open_fds); - FD_CLR(fd, fdt->close_on_exec); + __set_open_fd(fd, fdt); + __clear_close_on_exec(fd, fdt); spin_unlock(&files->file_lock); ..... diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.txt index d81889669293..19a19ebebc34 100644 --- a/Documentation/filesystems/gfs2-uevents.txt +++ b/Documentation/filesystems/gfs2-uevents.txt @@ -62,7 +62,7 @@ be fixed. The REMOVE uevent is generated at the end of an unsuccessful mount or at the end of a umount of the filesystem. All REMOVE uevents will -have been preceded by at least an ADD uevent for the same fileystem, +have been preceded by at least an ADD uevent for the same filesystem, and unlike the other uevents is generated automatically by the kernel's kobject subsystem. diff --git a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt index 120fd3cf7fd9..fe03d10bb79a 100644 --- a/Documentation/filesystems/nfs/idmapper.txt +++ b/Documentation/filesystems/nfs/idmapper.txt @@ -4,13 +4,21 @@ ID Mapper ========= Id mapper is used by NFS to translate user and group ids into names, and to translate user and group names into ids. Part of this translation involves -performing an upcall to userspace to request the information. Id mapper will -user request-key to perform this upcall and cache the result. The program -/usr/sbin/nfs.idmap should be called by request-key, and will perform the -translation and initialize a key with the resulting information. +performing an upcall to userspace to request the information. There are two +ways NFS could obtain this information: placing a call to /sbin/request-key +or by placing a call to the rpc.idmap daemon. + +NFS will attempt to call /sbin/request-key first. If this succeeds, the +result will be cached using the generic request-key cache. This call should +only fail if /etc/request-key.conf is not configured for the id_resolver key +type, see the "Configuring" section below if you wish to use the request-key +method. + +If the call to /sbin/request-key fails (if /etc/request-key.conf is not +configured with the id_resolver key type), then the idmapper will ask the +legacy rpc.idmap daemon for the id mapping. This result will be stored +in a custom NFS idmap cache. - NFS_USE_NEW_IDMAPPER must be selected when configuring the kernel to use this - feature. =========== Configuring diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt index 983e14abe7e9..c7919c6e3bea 100644 --- a/Documentation/filesystems/nfs/pnfs.txt +++ b/Documentation/filesystems/nfs/pnfs.txt @@ -53,3 +53,57 @@ lseg maintains an extra reference corresponding to the NFS_LSEG_VALID bit which holds it in the pnfs_layout_hdr's list. When the final lseg is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED bit is set, preventing any new lsegs from being added. + +layout drivers +-------------- + +PNFS utilizes what is called layout drivers. The STD defines 3 basic +layout types: "files" "objects" and "blocks". For each of these types +there is a layout-driver with a common function-vectors table which +are called by the nfs-client pnfs-core to implement the different layout +types. + +Files-layout-driver code is in: fs/nfs/nfs4filelayout.c && nfs4filelayoutdev.c +Objects-layout-deriver code is in: fs/nfs/objlayout/.. directory +Blocks-layout-deriver code is in: fs/nfs/blocklayout/.. directory + +objects-layout setup +-------------------- + +As part of the full STD implementation the objlayoutdriver.ko needs, at times, +to automatically login to yet undiscovered iscsi/osd devices. For this the +driver makes up-calles to a user-mode script called *osd_login* + +The path_name of the script to use is by default: + /sbin/osd_login. +This name can be overridden by the Kernel module parameter: + objlayoutdriver.osd_login_prog + +If Kernel does not find the osd_login_prog path it will zero it out +and will not attempt farther logins. An admin can then write new value +to the objlayoutdriver.osd_login_prog Kernel parameter to re-enable it. + +The /sbin/osd_login is part of the nfs-utils package, and should usually +be installed on distributions that support this Kernel version. + +The API to the login script is as follows: + Usage: $0 -u <URI> -o <OSDNAME> -s <SYSTEMID> + Options: + -u target uri e.g. iscsi://<ip>:<port> + (allways exists) + (More protocols can be defined in the future. + The client does not interpret this string it is + passed unchanged as recieved from the Server) + -o osdname of the requested target OSD + (Might be empty) + (A string which denotes the OSD name, there is a + limit of 64 chars on this string) + -s systemid of the requested target OSD + (Might be empty) + (This string, if not empty is always an hex + representation of the 20 bytes osd_system_id) + +blocks-layout setup +------------------- + +TODO: Document the setup needs of the blocks layout driver diff --git a/Documentation/filesystems/pohmelfs/network_protocol.txt b/Documentation/filesystems/pohmelfs/network_protocol.txt index 65e03dd44823..c680b4b5353d 100644 --- a/Documentation/filesystems/pohmelfs/network_protocol.txt +++ b/Documentation/filesystems/pohmelfs/network_protocol.txt @@ -20,7 +20,7 @@ Commands can be embedded into transaction command (which in turn has own command so one can extend protocol as needed without breaking backward compatibility as long as old commands are supported. All string lengths include tail 0 byte. -All commands are transferred over the network in big-endian. CPU endianess is used at the end peers. +All commands are transferred over the network in big-endian. CPU endianness is used at the end peers. @cmd - command number, which specifies command to be processed. Following commands are used currently: diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index b4a3d765ff9a..74acd9618819 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -429,3 +429,9 @@ filemap_write_and_wait_range() so that all dirty pages are synced out properly. You must also keep in mind that ->fsync() is not called with i_mutex held anymore, so if you require i_mutex locking you must make sure to take it and release it yourself. + +-- +[mandatory] + d_alloc_root() is gone, along with a lot of bugs caused by code +misusing it. Replacement: d_make_root(inode). The difference is, +d_make_root() drops the reference to inode if dentry allocation fails. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a76a26a1db8a..b7413cb46dcb 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -290,7 +290,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) rsslim current limit in bytes on the rss start_code address above which program text can run end_code address below which program text can run - start_stack address of the start of the stack + start_stack address of the start of the main process stack esp current value of ESP eip current value of EIP pending bitmap of pending signals @@ -325,7 +325,7 @@ address perms offset dev inode pathname a7cb1000-a7cb2000 ---p 00000000 00:00 0 a7cb2000-a7eb2000 rw-p 00000000 00:00 0 a7eb2000-a7eb3000 ---p 00000000 00:00 0 -a7eb3000-a7ed5000 rw-p 00000000 00:00 0 +a7eb3000-a7ed5000 rw-p 00000000 00:00 0 [stack:1001] a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6 a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6 a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6 @@ -357,11 +357,39 @@ is not associated with a file: [heap] = the heap of the program [stack] = the stack of the main process + [stack:1001] = the stack of the thread with tid 1001 [vdso] = the "virtual dynamic shared object", the kernel system call handler or if empty, the mapping is anonymous. +The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint +of the individual tasks of a process. In this file you will see a mapping marked +as [stack] if that task sees it as a stack. This is a key difference from the +content of /proc/PID/maps, where you will see all mappings that are being used +as stack by all of those tasks. Hence, for the example above, the task-level +map, i.e. /proc/PID/task/TID/maps for thread 1001 will look like this: + +08048000-08049000 r-xp 00000000 03:00 8312 /opt/test +08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test +0804a000-0806b000 rw-p 00000000 00:00 0 [heap] +a7cb1000-a7cb2000 ---p 00000000 00:00 0 +a7cb2000-a7eb2000 rw-p 00000000 00:00 0 +a7eb2000-a7eb3000 ---p 00000000 00:00 0 +a7eb3000-a7ed5000 rw-p 00000000 00:00 0 [stack] +a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6 +a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6 +a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6 +a800b000-a800e000 rw-p 00000000 00:00 0 +a800e000-a8022000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0 +a8022000-a8023000 r--p 00013000 03:00 14462 /lib/libpthread.so.0 +a8023000-a8024000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0 +a8024000-a8027000 rw-p 00000000 00:00 0 +a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2 +a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2 +a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2 +aff35000-aff4a000 rw-p 00000000 00:00 0 +ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] The /proc/PID/smaps is an extension based on maps, showing the memory consumption for each of the process's mappings. For each of mappings there diff --git a/Documentation/filesystems/qnx6.txt b/Documentation/filesystems/qnx6.txt new file mode 100644 index 000000000000..050223ea03c7 --- /dev/null +++ b/Documentation/filesystems/qnx6.txt @@ -0,0 +1,174 @@ +The QNX6 Filesystem +=================== + +The qnx6fs is used by newer QNX operating system versions. (e.g. Neutrino) +It got introduced in QNX 6.4.0 and is used default since 6.4.1. + +Option +====== + +mmi_fs Mount filesystem as used for example by Audi MMI 3G system + +Specification +============= + +qnx6fs shares many properties with traditional Unix filesystems. It has the +concepts of blocks, inodes and directories. +On QNX it is possible to create little endian and big endian qnx6 filesystems. +This feature makes it possible to create and use a different endianness fs +for the target (QNX is used on quite a range of embedded systems) plattform +running on a different endianess. +The Linux driver handles endianness transparently. (LE and BE) + +Blocks +------ + +The space in the device or file is split up into blocks. These are a fixed +size of 512, 1024, 2048 or 4096, which is decided when the filesystem is +created. +Blockpointers are 32bit, so the maximum space that can be adressed is +2^32 * 4096 bytes or 16TB + +The superblocks +--------------- + +The superblock contains all global information about the filesystem. +Each qnx6fs got two superblocks, each one having a 64bit serial number. +That serial number is used to identify the "active" superblock. +In write mode with reach new snapshot (after each synchronous write), the +serial of the new master superblock is increased (old superblock serial + 1) + +So basically the snapshot functionality is realized by an atomic final +update of the serial number. Before updating that serial, all modifications +are done by copying all modified blocks during that specific write request +(or period) and building up a new (stable) filesystem structure under the +inactive superblock. + +Each superblock holds a set of root inodes for the different filesystem +parts. (Inode, Bitmap and Longfilenames) +Each of these root nodes holds information like total size of the stored +data and the adressing levels in that specific tree. +If the level value is 0, up to 16 direct blocks can be adressed by each +node. +Level 1 adds an additional indirect adressing level where each indirect +adressing block holds up to blocksize / 4 bytes pointers to data blocks. +Level 2 adds an additional indirect adressig block level (so, already up +to 16 * 256 * 256 = 1048576 blocks that can be adressed by such a tree)a + +Unused block pointers are always set to ~0 - regardless of root node, +indirect adressing blocks or inodes. +Data leaves are always on the lowest level. So no data is stored on upper +tree levels. + +The first Superblock is located at 0x2000. (0x2000 is the bootblock size) +The Audi MMI 3G first superblock directly starts at byte 0. +Second superblock position can either be calculated from the superblock +information (total number of filesystem blocks) or by taking the highest +device address, zeroing the last 3 bytes and then substracting 0x1000 from +that address. + +0x1000 is the size reserved for each superblock - regardless of the +blocksize of the filesystem. + +Inodes +------ + +Each object in the filesystem is represented by an inode. (index node) +The inode structure contains pointers to the filesystem blocks which contain +the data held in the object and all of the metadata about an object except +its longname. (filenames longer than 27 characters) +The metadata about an object includes the permissions, owner, group, flags, +size, number of blocks used, access time, change time and modification time. + +Object mode field is POSIX format. (which makes things easier) + +There are also pointers to the first 16 blocks, if the object data can be +adressed with 16 direct blocks. +For more than 16 blocks an indirect adressing in form of another tree is +used. (scheme is the same as the one used for the superblock root nodes) + +The filesize is stored 64bit. Inode counting starts with 1. (whilst long +filename inodes start with 0) + +Directories +----------- + +A directory is a filesystem object and has an inode just like a file. +It is a specially formatted file containing records which associate each +name with an inode number. +'.' inode number points to the directory inode +'..' inode number points to the parent directory inode +Eeach filename record additionally got a filename length field. + +One special case are long filenames or subdirectory names. +These got set a filename length field of 0xff in the corresponding directory +record plus the longfile inode number also stored in that record. +With that longfilename inode number, the longfilename tree can be walked +starting with the superblock longfilename root node pointers. + +Special files +------------- + +Symbolic links are also filesystem objects with inodes. They got a specific +bit in the inode mode field identifying them as symbolic link. +The directory entry file inode pointer points to the target file inode. + +Hard links got an inode, a directory entry, but a specific mode bit set, +no block pointers and the directory file record pointing to the target file +inode. + +Character and block special devices do not exist in QNX as those files +are handled by the QNX kernel/drivers and created in /dev independant of the +underlaying filesystem. + +Long filenames +-------------- + +Long filenames are stored in a seperate adressing tree. The staring point +is the longfilename root node in the active superblock. +Each data block (tree leaves) holds one long filename. That filename is +limited to 510 bytes. The first two starting bytes are used as length field +for the actual filename. +If that structure shall fit for all allowed blocksizes, it is clear why there +is a limit of 510 bytes for the actual filename stored. + +Bitmap +------ + +The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap +root node in the superblock and each bit in the bitmap represents one +filesystem block. +The first block is block 0, which starts 0x1000 after superblock start. +So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical +address at which block 0 is located. + +Bits at the end of the last bitmap block are set to 1, if the device is +smaller than addressing space in the bitmap. + +Bitmap system area +------------------ + +The bitmap itself is devided into three parts. +First the system area, that is split into two halfs. +Then userspace. + +The requirement for a static, fixed preallocated system area comes from how +qnx6fs deals with writes. +Each superblock got it's own half of the system area. So superblock #1 +always uses blocks from the lower half whilst superblock #2 just writes to +blocks represented by the upper half bitmap system area bits. + +Bitmap blocks, Inode blocks and indirect addressing blocks for those two +tree structures are treated as system blocks. + +The rational behind that is that a write request can work on a new snapshot +(system area of the inactive - resp. lower serial numbered superblock) while +at the same time there is still a complete stable filesystem structer in the +other half of the system area. + +When finished with writing (a sync write is completed, the maximum sync leap +time or a filesystem sync is requested), serial of the previously inactive +superblock atomically is increased and the fs switches over to that - then +stable declared - superblock. + +For all data outside the system area, blocks are just copied while writing. diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt index a8273d5fad20..59b4a0962e0f 100644 --- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt +++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt @@ -297,7 +297,7 @@ the above threads) is: either way about the archive format, and there are alternative tools, such as: - http://freshmeat.net/projects/afio/ + http://freecode.com/projects/afio 2) The cpio archive format chosen by the kernel is simpler and cleaner (and thus easier to create and parse) than any of the (literally dozens of) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 3d9393b845b8..0d0492028082 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -114,7 +114,7 @@ members are defined: struct file_system_type { const char *name; int fs_flags; - struct dentry (*mount) (struct file_system_type *, int, + struct dentry *(*mount) (struct file_system_type *, int, const char *, void *); void (*kill_sb) (struct super_block *); struct module *owner; @@ -993,7 +993,7 @@ struct dentry_operations { If the 'rcu_walk' parameter is true, then the caller is doing a pathwalk in RCU-walk mode. Sleeping is not permitted in this mode, - and the caller can be asked to leave it and call again by returing + and the caller can be asked to leave it and call again by returning -ECHILD. This function is only used if DCACHE_MANAGE_TRANSIT is set on the diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index 792faa3c06cf..620a07844e8c 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt @@ -271,9 +271,26 @@ Some platforms may also use knowledge about what GPIOs are active for power management, such as by powering down unused chip sectors and, more easily, gating off unused clocks. -Note that requesting a GPIO does NOT cause it to be configured in any -way; it just marks that GPIO as in use. Separate code must handle any -pin setup (e.g. controlling which pin the GPIO uses, pullup/pulldown). +For GPIOs that use pins known to the pinctrl subsystem, that subsystem should +be informed of their use; a gpiolib driver's .request() operation may call +pinctrl_request_gpio(), and a gpiolib driver's .free() operation may call +pinctrl_free_gpio(). The pinctrl subsystem allows a pinctrl_request_gpio() +to succeed concurrently with a pin or pingroup being "owned" by a device for +pin multiplexing. + +Any programming of pin multiplexing hardware that is needed to route the +GPIO signal to the appropriate pin should occur within a GPIO driver's +.direction_input() or .direction_output() operations, and occur after any +setup of an output GPIO's value. This allows a glitch-free migration from a +pin's special function to GPIO. This is sometimes required when using a GPIO +to implement a workaround on signals typically driven by a non-GPIO HW block. + +Some platforms allow some or all GPIO signals to be routed to different pins. +Similarly, other aspects of the GPIO or pin may need to be configured, such as +pullup/pulldown. Platform software should arrange that any such details are +configured prior to gpio_request() being called for those GPIOs, e.g. using +the pinctrl subsystem's mapping table, so that GPIO users need not be aware +of these details. Also note that it's your responsibility to have stopped using a GPIO before you free it. @@ -302,6 +319,8 @@ where 'flags' is currently defined to specify the following properties: * GPIOF_INIT_LOW - as output, set initial level to LOW * GPIOF_INIT_HIGH - as output, set initial level to HIGH + * GPIOF_OPEN_DRAIN - gpio pin is open drain type. + * GPIOF_OPEN_SOURCE - gpio pin is open source type. since GPIOF_INIT_* are only valid when configured as output, so group valid combinations as: @@ -310,8 +329,19 @@ combinations as: * GPIOF_OUT_INIT_LOW - configured as output, initial level LOW * GPIOF_OUT_INIT_HIGH - configured as output, initial level HIGH -In the future, these flags can be extended to support more properties such -as open-drain status. +When setting the flag as GPIOF_OPEN_DRAIN then it will assume that pins is +open drain type. Such pins will not be driven to 1 in output mode. It is +require to connect pull-up on such pins. By enabling this flag, gpio lib will +make the direction to input when it is asked to set value of 1 in output mode +to make the pin HIGH. The pin is make to LOW by driving value 0 in output mode. + +When setting the flag as GPIOF_OPEN_SOURCE then it will assume that pins is +open source type. Such pins will not be driven to 0 in output mode. It is +require to connect pull-down on such pin. By enabling this flag, gpio lib will +make the direction to input when it is asked to set value of 0 in output mode +to make the pin LOW. The pin is make to HIGH by driving value 1 in output mode. + +In the future, these flags can be extended to support more properties. Further more, to ease the claim/release of multiple GPIOs, 'struct gpio' is introduced to encapsulate all three fields as: diff --git a/Documentation/hwmon/adm1275 b/Documentation/hwmon/adm1275 index ab70d96d2dfd..2cfa25667123 100644 --- a/Documentation/hwmon/adm1275 +++ b/Documentation/hwmon/adm1275 @@ -2,6 +2,10 @@ Kernel driver adm1275 ===================== Supported chips: + * Analog Devices ADM1075 + Prefix: 'adm1075' + Addresses scanned: - + Datasheet: www.analog.com/static/imported-files/data_sheets/ADM1075.pdf * Analog Devices ADM1275 Prefix: 'adm1275' Addresses scanned: - @@ -17,13 +21,13 @@ Author: Guenter Roeck <guenter.roeck@ericsson.com> Description ----------- -This driver supports hardware montoring for Analog Devices ADM1275 and ADM1276 -Hot-Swap Controller and Digital Power Monitor. +This driver supports hardware montoring for Analog Devices ADM1075, ADM1275, +and ADM1276 Hot-Swap Controller and Digital Power Monitor. -ADM1275 and ADM1276 are hot-swap controllers that allow a circuit board to be -removed from or inserted into a live backplane. They also feature current and -voltage readback via an integrated 12-bit analog-to-digital converter (ADC), -accessed using a PMBus interface. +ADM1075, ADM1275, and ADM1276 are hot-swap controllers that allow a circuit +board to be removed from or inserted into a live backplane. They also feature +current and voltage readback via an integrated 12-bit analog-to-digital +converter (ADC), accessed using a PMBus interface. The driver is a client driver to the core PMBus driver. Please see Documentation/hwmon/pmbus for details on PMBus client drivers. @@ -36,6 +40,10 @@ This driver does not auto-detect devices. You will have to instantiate the devices explicitly. Please see Documentation/i2c/instantiating-devices for details. +The ADM1075, unlike many other PMBus devices, does not support internal voltage +or current scaling. Reported voltages, currents, and power are raw measurements, +and will typically have to be scaled. + Platform data support --------------------- @@ -51,9 +59,10 @@ The following attributes are supported. Limits are read-write, history reset attributes are write-only, all other attributes are read-only. in1_label "vin1" or "vout1" depending on chip variant and - configuration. + configuration. On ADM1075, vout1 reports the voltage on + the VAUX pin. in1_input Measured voltage. -in1_min Minumum Voltage. +in1_min Minimum Voltage. in1_max Maximum voltage. in1_min_alarm Voltage low alarm. in1_max_alarm Voltage high alarm. @@ -74,3 +83,10 @@ curr1_crit Critical maximum current. Depending on the chip curr1_crit_alarm Critical current high alarm. curr1_highest Historical maximum current. curr1_reset_history Write any value to reset history. + +power1_label "pin1" +power1_input Input power. +power1_reset_history Write any value to reset history. + + Power attributes are supported on ADM1075 and ADM1276 + only. diff --git a/Documentation/hwmon/jc42 b/Documentation/hwmon/jc42 index 52729a756c1b..66ecb9fc8246 100644 --- a/Documentation/hwmon/jc42 +++ b/Documentation/hwmon/jc42 @@ -3,71 +3,50 @@ Kernel driver jc42 Supported chips: * Analog Devices ADT7408 - Prefix: 'adt7408' - Addresses scanned: I2C 0x18 - 0x1f Datasheets: http://www.analog.com/static/imported-files/data_sheets/ADT7408.pdf * Atmel AT30TS00 - Prefix: 'at30ts00' - Addresses scanned: I2C 0x18 - 0x1f Datasheets: http://www.atmel.com/Images/doc8585.pdf * IDT TSE2002B3, TSE2002GB2, TS3000B3, TS3000GB2 - Prefix: 'tse2002', 'ts3000' - Addresses scanned: I2C 0x18 - 0x1f Datasheets: http://www.idt.com/sites/default/files/documents/IDT_TSE2002B3C_DST_20100512_120303152056.pdf http://www.idt.com/sites/default/files/documents/IDT_TSE2002GB2A1_DST_20111107_120303145914.pdf http://www.idt.com/sites/default/files/documents/IDT_TS3000B3A_DST_20101129_120303152013.pdf http://www.idt.com/sites/default/files/documents/IDT_TS3000GB2A1_DST_20111104_120303151012.pdf * Maxim MAX6604 - Prefix: 'max6604' - Addresses scanned: I2C 0x18 - 0x1f Datasheets: http://datasheets.maxim-ic.com/en/ds/MAX6604.pdf * Microchip MCP9804, MCP9805, MCP98242, MCP98243, MCP9843 - Prefixes: 'mcp9804', 'mcp9805', 'mcp98242', 'mcp98243', 'mcp9843' - Addresses scanned: I2C 0x18 - 0x1f Datasheets: http://ww1.microchip.com/downloads/en/DeviceDoc/22203C.pdf http://ww1.microchip.com/downloads/en/DeviceDoc/21977b.pdf http://ww1.microchip.com/downloads/en/DeviceDoc/21996a.pdf http://ww1.microchip.com/downloads/en/DeviceDoc/22153c.pdf - * NXP Semiconductors SE97, SE97B - Prefix: 'se97' - Addresses scanned: I2C 0x18 - 0x1f + * NXP Semiconductors SE97, SE97B, SE98, SE98A Datasheets: http://www.nxp.com/documents/data_sheet/SE97.pdf http://www.nxp.com/documents/data_sheet/SE97B.pdf - * NXP Semiconductors SE98 - Prefix: 'se98' - Addresses scanned: I2C 0x18 - 0x1f - Datasheets: http://www.nxp.com/documents/data_sheet/SE98.pdf + http://www.nxp.com/documents/data_sheet/SE98A.pdf * ON Semiconductor CAT34TS02, CAT6095 - Prefix: 'cat34ts02', 'cat6095' - Addresses scanned: I2C 0x18 - 0x1f Datasheet: http://www.onsemi.com/pub_link/Collateral/CAT34TS02-D.PDF http://www.onsemi.com/pub/Collateral/CAT6095-D.PDF - * ST Microelectronics STTS424, STTS424E02 - Prefix: 'stts424' - Addresses scanned: I2C 0x18 - 0x1f - Datasheets: - http://www.st.com/stonline/products/literature/ds/13447/stts424.pdf - http://www.st.com/stonline/products/literature/ds/13448/stts424e02.pdf - * ST Microelectronics STTS2002, STTS3000 - Prefix: 'stts2002', 'stts3000' - Addresses scanned: I2C 0x18 - 0x1f + * ST Microelectronics STTS424, STTS424E02, STTS2002, STTS3000 Datasheets: + http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/DATASHEET/CD00157556.pdf + http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/DATASHEET/CD00157558.pdf http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/DATASHEET/CD00225278.pdf http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/DATA_BRIEF/CD00270920.pdf * JEDEC JC 42.4 compliant temperature sensor chips - Prefix: 'jc42' - Addresses scanned: I2C 0x18 - 0x1f Datasheet: http://www.jedec.org/sites/default/files/docs/4_01_04R19.pdf + Common for all chips: + Prefix: 'jc42' + Addresses scanned: I2C 0x18 - 0x1f + Author: Guenter Roeck <guenter.roeck@ericsson.com> diff --git a/Documentation/hwmon/k10temp b/Documentation/hwmon/k10temp index a10f73624ad3..90956b618025 100644 --- a/Documentation/hwmon/k10temp +++ b/Documentation/hwmon/k10temp @@ -11,7 +11,7 @@ Supported chips: Socket S1G2: Athlon (X2), Sempron (X2), Turion X2 (Ultra) * AMD Family 12h processors: "Llano" (E2/A4/A6/A8-Series) * AMD Family 14h processors: "Brazos" (C/E/G/Z-Series) -* AMD Family 15h processors: "Bulldozer" +* AMD Family 15h processors: "Bulldozer" (FX-Series), "Trinity" Prefix: 'k10temp' Addresses scanned: PCI space diff --git a/Documentation/hwmon/lm80 b/Documentation/hwmon/lm80 index cb5b407ba3e6..a60b43efc32b 100644 --- a/Documentation/hwmon/lm80 +++ b/Documentation/hwmon/lm80 @@ -7,6 +7,11 @@ Supported chips: Addresses scanned: I2C 0x28 - 0x2f Datasheet: Publicly available at the National Semiconductor website http://www.national.com/ + * National Semiconductor LM96080 + Prefix: 'lm96080' + Addresses scanned: I2C 0x28 - 0x2f + Datasheet: Publicly available at the National Semiconductor website + http://www.national.com/ Authors: Frodo Looijaard <frodol@dds.nl>, @@ -17,7 +22,9 @@ Description This driver implements support for the National Semiconductor LM80. It is described as a 'Serial Interface ACPI-Compatible Microprocessor -System Hardware Monitor'. +System Hardware Monitor'. The LM96080 is a more recent incarnation, +it is pin and register compatible, with a few additional features not +yet supported by the driver. The LM80 implements one temperature sensor, two fan rotation speed sensors, seven voltage sensors, alarms, and some miscellaneous stuff. diff --git a/Documentation/hwmon/lm90 b/Documentation/hwmon/lm90 index 9cd14cfe6515..b466974e142f 100644 --- a/Documentation/hwmon/lm90 +++ b/Documentation/hwmon/lm90 @@ -118,6 +118,10 @@ Supported chips: Addresses scanned: I2C 0x48 through 0x4F Datasheet: Publicly available at NXP website http://ics.nxp.com/products/interface/datasheet/sa56004x.pdf + * GMT G781 + Prefix: 'g781' + Addresses scanned: I2C 0x4c, 0x4d + Datasheet: Not publicly available from GMT Author: Jean Delvare <khali@linux-fr.org> diff --git a/Documentation/hwmon/max16064 b/Documentation/hwmon/max16064 index f6e8bcbfaccf..f8b478076f6d 100644 --- a/Documentation/hwmon/max16064 +++ b/Documentation/hwmon/max16064 @@ -42,9 +42,9 @@ attributes are read-only. in[1-4]_label "vout[1-4]" in[1-4]_input Measured voltage. From READ_VOUT register. -in[1-4]_min Minumum Voltage. From VOUT_UV_WARN_LIMIT register. +in[1-4]_min Minimum Voltage. From VOUT_UV_WARN_LIMIT register. in[1-4]_max Maximum voltage. From VOUT_OV_WARN_LIMIT register. -in[1-4]_lcrit Critical minumum Voltage. VOUT_UV_FAULT_LIMIT register. +in[1-4]_lcrit Critical minimum Voltage. VOUT_UV_FAULT_LIMIT register. in[1-4]_crit Critical maximum voltage. From VOUT_OV_FAULT_LIMIT register. in[1-4]_min_alarm Voltage low alarm. From VOLTAGE_UV_WARNING status. in[1-4]_max_alarm Voltage high alarm. From VOLTAGE_OV_WARNING status. diff --git a/Documentation/hwmon/max34440 b/Documentation/hwmon/max34440 index 8ab51536a1eb..04482226db20 100644 --- a/Documentation/hwmon/max34440 +++ b/Documentation/hwmon/max34440 @@ -11,6 +11,11 @@ Supported chips: Prefixes: 'max34441' Addresses scanned: - Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX34441.pdf + * Maxim MAX34446 + PMBus Power-Supply Data Logger + Prefixes: 'max34446' + Addresses scanned: - + Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX34446.pdf Author: Guenter Roeck <guenter.roeck@ericsson.com> @@ -19,8 +24,8 @@ Description ----------- This driver supports hardware montoring for Maxim MAX34440 PMBus 6-Channel -Power-Supply Manager and MAX34441 PMBus 5-Channel Power-Supply Manager -and Intelligent Fan Controller. +Power-Supply Manager, MAX34441 PMBus 5-Channel Power-Supply Manager +and Intelligent Fan Controller, and MAX34446 PMBus Power-Supply Data Logger. The driver is a client driver to the core PMBus driver. Please see Documentation/hwmon/pmbus for details on PMBus client drivers. @@ -33,6 +38,13 @@ This driver does not auto-detect devices. You will have to instantiate the devices explicitly. Please see Documentation/i2c/instantiating-devices for details. +For MAX34446, the value of the currX_crit attribute determines if current or +voltage measurement is enabled for a given channel. Voltage measurement is +enabled if currX_crit is set to 0; current measurement is enabled if the +attribute is set to a positive value. Power measurement is only enabled if +channel 1 (3) is configured for voltage measurement, and channel 2 (4) is +configured for current measurement. + Platform data support --------------------- @@ -48,27 +60,39 @@ attributes are read-only. in[1-6]_label "vout[1-6]". in[1-6]_input Measured voltage. From READ_VOUT register. -in[1-6]_min Minumum Voltage. From VOUT_UV_WARN_LIMIT register. +in[1-6]_min Minimum Voltage. From VOUT_UV_WARN_LIMIT register. in[1-6]_max Maximum voltage. From VOUT_OV_WARN_LIMIT register. -in[1-6]_lcrit Critical minumum Voltage. VOUT_UV_FAULT_LIMIT register. +in[1-6]_lcrit Critical minimum Voltage. VOUT_UV_FAULT_LIMIT register. in[1-6]_crit Critical maximum voltage. From VOUT_OV_FAULT_LIMIT register. in[1-6]_min_alarm Voltage low alarm. From VOLTAGE_UV_WARNING status. in[1-6]_max_alarm Voltage high alarm. From VOLTAGE_OV_WARNING status. in[1-6]_lcrit_alarm Voltage critical low alarm. From VOLTAGE_UV_FAULT status. in[1-6]_crit_alarm Voltage critical high alarm. From VOLTAGE_OV_FAULT status. +in[1-6]_lowest Historical minimum voltage. in[1-6]_highest Historical maximum voltage. in[1-6]_reset_history Write any value to reset history. + MAX34446 only supports in[1-4]. + curr[1-6]_label "iout[1-6]". curr[1-6]_input Measured current. From READ_IOUT register. curr[1-6]_max Maximum current. From IOUT_OC_WARN_LIMIT register. curr[1-6]_crit Critical maximum current. From IOUT_OC_FAULT_LIMIT register. curr[1-6]_max_alarm Current high alarm. From IOUT_OC_WARNING status. curr[1-6]_crit_alarm Current critical high alarm. From IOUT_OC_FAULT status. +curr[1-4]_average Historical average current (MAX34446 only). curr[1-6]_highest Historical maximum current. curr[1-6]_reset_history Write any value to reset history. in6 and curr6 attributes only exist for MAX34440. + MAX34446 only supports curr[1-4]. + +power[1,3]_label "pout[1,3]" +power[1,3]_input Measured power. +power[1,3]_average Historical average power. +power[1,3]_highest Historical maximum power. + + Power attributes only exist for MAX34446. temp[1-8]_input Measured temperatures. From READ_TEMPERATURE_1 register. temp1 is the chip's internal temperature. temp2..temp5 @@ -79,7 +103,9 @@ temp[1-8]_max Maximum temperature. From OT_WARN_LIMIT register. temp[1-8]_crit Critical high temperature. From OT_FAULT_LIMIT register. temp[1-8]_max_alarm Temperature high alarm. temp[1-8]_crit_alarm Temperature critical high alarm. +temp[1-8]_average Historical average temperature (MAX34446 only). temp[1-8]_highest Historical maximum temperature. temp[1-8]_reset_history Write any value to reset history. temp7 and temp8 attributes only exist for MAX34440. + MAX34446 only supports temp[1-3]. diff --git a/Documentation/hwmon/max8688 b/Documentation/hwmon/max8688 index 71ed10a3c94e..fe849871df32 100644 --- a/Documentation/hwmon/max8688 +++ b/Documentation/hwmon/max8688 @@ -42,9 +42,9 @@ attributes are read-only. in1_label "vout1" in1_input Measured voltage. From READ_VOUT register. -in1_min Minumum Voltage. From VOUT_UV_WARN_LIMIT register. +in1_min Minimum Voltage. From VOUT_UV_WARN_LIMIT register. in1_max Maximum voltage. From VOUT_OV_WARN_LIMIT register. -in1_lcrit Critical minumum Voltage. VOUT_UV_FAULT_LIMIT register. +in1_lcrit Critical minimum Voltage. VOUT_UV_FAULT_LIMIT register. in1_crit Critical maximum voltage. From VOUT_OV_FAULT_LIMIT register. in1_min_alarm Voltage low alarm. From VOLTAGE_UV_WARNING status. in1_max_alarm Voltage high alarm. From VOLTAGE_OV_WARNING status. diff --git a/Documentation/hwmon/mc13783-adc b/Documentation/hwmon/mc13783-adc index 044531a86405..d0e7b3fa9e75 100644 --- a/Documentation/hwmon/mc13783-adc +++ b/Documentation/hwmon/mc13783-adc @@ -3,8 +3,11 @@ Kernel driver mc13783-adc Supported chips: * Freescale Atlas MC13783 - Prefix: 'mc13783_adc' + Prefix: 'mc13783' Datasheet: http://www.freescale.com/files/rf_if/doc/data_sheet/MC13783.pdf?fsrch=1 + * Freescale Atlas MC13892 + Prefix: 'mc13892' + Datasheet: http://cache.freescale.com/files/analog/doc/data_sheet/MC13892.pdf?fsrch=1&sr=1 Authors: Sascha Hauer <s.hauer@pengutronix.de> @@ -13,20 +16,21 @@ Authors: Description ----------- -The Freescale MC13783 is a Power Management and Audio Circuit. Among -other things it contains a 10-bit A/D converter. The converter has 16 -channels which can be used in different modes. -The A/D converter has a resolution of 2.25mV. Channels 0-4 have -a dedicated meaning with chip internal scaling applied. Channels 5-7 -can be used as general purpose inputs or alternatively in a dedicated -mode. Channels 12-15 are occupied by the touchscreen if it's active. +The Freescale MC13783 and MC13892 are Power Management and Audio Circuits. +Among other things they contain a 10-bit A/D converter. The converter has 16 +(MC13783) resp. 12 (MC13892) channels which can be used in different modes. The +A/D converter has a resolution of 2.25mV. -Currently the driver only supports channels 2 and 5-15 with no alternative -modes for channels 5-7. +Some channels can be used as General Purpose inputs or in a dedicated mode with +a chip internal scaling applied . -See this table for the meaning of the different channels and their chip -internal scaling: +Currently the driver only supports the Application Supply channel (BP / BPSNS), +the General Purpose inputs and touchscreen. +See the following tables for the meaning of the different channels and their +chip internal scaling: + +MC13783: Channel Signal Input Range Scaling ------------------------------------------------------------------------------- 0 Battery Voltage (BATT) 2.50 - 4.65V -2.40V @@ -34,7 +38,7 @@ Channel Signal Input Range Scaling 2 Application Supply (BP) 2.50 - 4.65V -2.40V 3 Charger Voltage (CHRGRAW) 0 - 10V / /5 0 - 20V /10 -4 Charger Current (CHRGISNSP-CHRGISNSN) -0.25V - 0.25V x4 +4 Charger Current (CHRGISNSP-CHRGISNSN) -0.25 - 0.25V x4 5 General Purpose ADIN5 / Battery Pack Thermistor 0 - 2.30V No 6 General Purpose ADIN6 / Backup Voltage (LICELL) 0 - 2.30V / No / 1.50 - 3.50V -1.20V @@ -48,3 +52,23 @@ Channel Signal Input Range Scaling 13 General Purpose TSX2 / Touchscreen X-plate 2 0 - 2.30V No 14 General Purpose TSY1 / Touchscreen Y-plate 1 0 - 2.30V No 15 General Purpose TSY2 / Touchscreen Y-plate 2 0 - 2.30V No + +MC13892: +Channel Signal Input Range Scaling +------------------------------------------------------------------------------- +0 Battery Voltage (BATT) 0 - 4.8V /2 +1 Battery Current (BATT - BATTISNSCC) -60 - 60 mV x20 +2 Application Supply (BPSNS) 0 - 4.8V /2 +3 Charger Voltage (CHRGRAW) 0 - 12V / /5 + 0 - 20V /10 +4 Charger Current (CHRGISNS-BPSNS) / -0.3 - 0.3V / x4 / + Touchscreen X-plate 1 0 - 2.4V No +5 General Purpose ADIN5 / Battery Pack Thermistor 0 - 2.4V No +6 General Purpose ADIN6 / Backup Voltage (LICELL) 0 - 2.4V / No + Backup Voltage (LICELL) 0 - 3.6V x2/3 +7 General Purpose ADIN7 / UID / Die Temperature 0 - 2.4V / No / + 0 - 4.8V /2 +12 General Purpose TSX1 / Touchscreen X-plate 1 0 - 2.4V No +13 General Purpose TSX2 / Touchscreen X-plate 2 0 - 2.4V No +14 General Purpose TSY1 / Touchscreen Y-plate 1 0 - 2.4V No +15 General Purpose TSY2 / Touchscreen Y-plate 2 0 - 2.4V No diff --git a/Documentation/hwmon/mcp3021 b/Documentation/hwmon/mcp3021 new file mode 100644 index 000000000000..325fd87e81b2 --- /dev/null +++ b/Documentation/hwmon/mcp3021 @@ -0,0 +1,22 @@ +Kernel driver MCP3021 +====================== + +Supported chips: + * Microchip Technology MCP3021 + Prefix: 'mcp3021' + Datasheet: http://ww1.microchip.com/downloads/en/DeviceDoc/21805a.pdf + +Author: Mingkai Hu + +Description +----------- + +This driver implements support for the Microchip Technology MCP3021 chip. + +The Microchip Technology Inc. MCP3021 is a successive approximation A/D +converter (ADC) with 10-bit resolution. +This device provides one single-ended input with very low power consumption. +Communication to the MCP3021 is performed using a 2-wire I2C compatible +interface. Standard (100 kHz) and Fast (400 kHz) I2C modes are available. +The default I2C device address is 0x4d (contact the Microchip factory for +additional address options). diff --git a/Documentation/hwmon/pmbus b/Documentation/hwmon/pmbus index d28b591753d1..f90f99920cc5 100644 --- a/Documentation/hwmon/pmbus +++ b/Documentation/hwmon/pmbus @@ -15,13 +15,20 @@ Supported chips: http://www.onsemi.com/pub_link/Collateral/NCP4200-D.PDF http://www.onsemi.com/pub_link/Collateral/JUNE%202009-%20REV.%200.PDF * Lineage Power - Prefixes: 'pdt003', 'pdt006', 'pdt012', 'udt020' + Prefixes: 'mdt040', 'pdt003', 'pdt006', 'pdt012', 'udt020' Addresses scanned: - Datasheets: http://www.lineagepower.com/oem/pdf/PDT003A0X.pdf http://www.lineagepower.com/oem/pdf/PDT006A0X.pdf http://www.lineagepower.com/oem/pdf/PDT012A0X.pdf http://www.lineagepower.com/oem/pdf/UDT020A0X.pdf + http://www.lineagepower.com/oem/pdf/MDT040A0X.pdf + * Texas Instruments TPS40400, TPS40422 + Prefixes: 'tps40400', 'tps40422' + Addresses scanned: - + Datasheets: + http://www.ti.com/lit/gpn/tps40400 + http://www.ti.com/lit/gpn/tps40422 * Generic PMBus devices Prefix: 'pmbus' Addresses scanned: - diff --git a/Documentation/hwmon/sch5627 b/Documentation/hwmon/sch5627 index 446a054e4912..0551d266c51c 100644 --- a/Documentation/hwmon/sch5627 +++ b/Documentation/hwmon/sch5627 @@ -16,6 +16,11 @@ Description SMSC SCH5627 Super I/O chips include complete hardware monitoring capabilities. They can monitor up to 5 voltages, 4 fans and 8 temperatures. +The SMSC SCH5627 hardware monitoring part also contains an integrated +watchdog. In order for this watchdog to function some motherboard specific +initialization most be done by the BIOS, so if the watchdog is not enabled +by the BIOS the sch5627 driver will not register a watchdog device. + The hardware monitoring part of the SMSC SCH5627 is accessed by talking through an embedded microcontroller. An application note describing the protocol for communicating with the microcontroller is available upon diff --git a/Documentation/hwmon/sch5636 b/Documentation/hwmon/sch5636 index f83bd1c260f0..7b0a01da0717 100644 --- a/Documentation/hwmon/sch5636 +++ b/Documentation/hwmon/sch5636 @@ -26,6 +26,9 @@ temperatures. Note that the driver detects how many fan headers / temperature sensors are actually implemented on the motherboard, so you will likely see fewer temperature and fan inputs. +The Fujitsu Theseus hwmon solution also contains an integrated watchdog. +This watchdog is fully supported by the sch5636 driver. + An application note describing the Theseus' registers, as well as an application note describing the protocol for communicating with the microcontroller is available upon request. Please mail me if you want a copy. diff --git a/Documentation/hwmon/ucd9000 b/Documentation/hwmon/ucd9000 index 40ca6db50c48..0df5f276505b 100644 --- a/Documentation/hwmon/ucd9000 +++ b/Documentation/hwmon/ucd9000 @@ -70,9 +70,9 @@ attributes are read-only. in[1-12]_label "vout[1-12]". in[1-12]_input Measured voltage. From READ_VOUT register. -in[1-12]_min Minumum Voltage. From VOUT_UV_WARN_LIMIT register. +in[1-12]_min Minimum Voltage. From VOUT_UV_WARN_LIMIT register. in[1-12]_max Maximum voltage. From VOUT_OV_WARN_LIMIT register. -in[1-12]_lcrit Critical minumum Voltage. VOUT_UV_FAULT_LIMIT register. +in[1-12]_lcrit Critical minimum Voltage. VOUT_UV_FAULT_LIMIT register. in[1-12]_crit Critical maximum voltage. From VOUT_OV_FAULT_LIMIT register. in[1-12]_min_alarm Voltage low alarm. From VOLTAGE_UV_WARNING status. in[1-12]_max_alarm Voltage high alarm. From VOLTAGE_OV_WARNING status. @@ -82,7 +82,7 @@ in[1-12]_crit_alarm Voltage critical high alarm. From VOLTAGE_OV_FAULT status. curr[1-12]_label "iout[1-12]". curr[1-12]_input Measured current. From READ_IOUT register. curr[1-12]_max Maximum current. From IOUT_OC_WARN_LIMIT register. -curr[1-12]_lcrit Critical minumum output current. From IOUT_UC_FAULT_LIMIT +curr[1-12]_lcrit Critical minimum output current. From IOUT_UC_FAULT_LIMIT register. curr[1-12]_crit Critical maximum current. From IOUT_OC_FAULT_LIMIT register. curr[1-12]_max_alarm Current high alarm. From IOUT_OC_WARNING status. diff --git a/Documentation/hwmon/ucd9200 b/Documentation/hwmon/ucd9200 index 3c58607f72fe..fd7d07b1908a 100644 --- a/Documentation/hwmon/ucd9200 +++ b/Documentation/hwmon/ucd9200 @@ -54,9 +54,9 @@ attributes are read-only. in1_label "vin". in1_input Measured voltage. From READ_VIN register. -in1_min Minumum Voltage. From VIN_UV_WARN_LIMIT register. +in1_min Minimum Voltage. From VIN_UV_WARN_LIMIT register. in1_max Maximum voltage. From VIN_OV_WARN_LIMIT register. -in1_lcrit Critical minumum Voltage. VIN_UV_FAULT_LIMIT register. +in1_lcrit Critical minimum Voltage. VIN_UV_FAULT_LIMIT register. in1_crit Critical maximum voltage. From VIN_OV_FAULT_LIMIT register. in1_min_alarm Voltage low alarm. From VIN_UV_WARNING status. in1_max_alarm Voltage high alarm. From VIN_OV_WARNING status. @@ -65,9 +65,9 @@ in1_crit_alarm Voltage critical high alarm. From VIN_OV_FAULT status. in[2-5]_label "vout[1-4]". in[2-5]_input Measured voltage. From READ_VOUT register. -in[2-5]_min Minumum Voltage. From VOUT_UV_WARN_LIMIT register. +in[2-5]_min Minimum Voltage. From VOUT_UV_WARN_LIMIT register. in[2-5]_max Maximum voltage. From VOUT_OV_WARN_LIMIT register. -in[2-5]_lcrit Critical minumum Voltage. VOUT_UV_FAULT_LIMIT register. +in[2-5]_lcrit Critical minimum Voltage. VOUT_UV_FAULT_LIMIT register. in[2-5]_crit Critical maximum voltage. From VOUT_OV_FAULT_LIMIT register. in[2-5]_min_alarm Voltage low alarm. From VOLTAGE_UV_WARNING status. in[2-5]_max_alarm Voltage high alarm. From VOLTAGE_OV_WARNING status. @@ -80,7 +80,7 @@ curr1_input Measured current. From READ_IIN register. curr[2-5]_label "iout[1-4]". curr[2-5]_input Measured current. From READ_IOUT register. curr[2-5]_max Maximum current. From IOUT_OC_WARN_LIMIT register. -curr[2-5]_lcrit Critical minumum output current. From IOUT_UC_FAULT_LIMIT +curr[2-5]_lcrit Critical minimum output current. From IOUT_UC_FAULT_LIMIT register. curr[2-5]_crit Critical maximum current. From IOUT_OC_FAULT_LIMIT register. curr[2-5]_max_alarm Current high alarm. From IOUT_OC_WARNING status. diff --git a/Documentation/hwmon/zl6100 b/Documentation/hwmon/zl6100 index a4e8d90f59f6..a995b41724fd 100644 --- a/Documentation/hwmon/zl6100 +++ b/Documentation/hwmon/zl6100 @@ -34,6 +34,14 @@ Supported chips: Prefix: 'zl6105' Addresses scanned: - Datasheet: http://www.intersil.com/data/fn/fn6906.pdf + * Intersil / Zilker Labs ZL9101M + Prefix: 'zl9101' + Addresses scanned: - + Datasheet: http://www.intersil.com/data/fn/fn7669.pdf + * Intersil / Zilker Labs ZL9117M + Prefix: 'zl9117' + Addresses scanned: - + Datasheet: http://www.intersil.com/data/fn/fn7914.pdf * Ericsson BMR450, BMR451 Prefix: 'bmr450', 'bmr451' Addresses scanned: - @@ -106,7 +114,7 @@ in1_label "vin" in1_input Measured input voltage. in1_min Minimum input voltage. in1_max Maximum input voltage. -in1_lcrit Critical minumum input voltage. +in1_lcrit Critical minimum input voltage. in1_crit Critical maximum input voltage. in1_min_alarm Input voltage low alarm. in1_max_alarm Input voltage high alarm. @@ -115,7 +123,7 @@ in1_crit_alarm Input voltage critical high alarm. in2_label "vout1" in2_input Measured output voltage. -in2_lcrit Critical minumum output Voltage. +in2_lcrit Critical minimum output Voltage. in2_crit Critical maximum output voltage. in2_lcrit_alarm Critical output voltage critical low alarm. in2_crit_alarm Critical output voltage critical high alarm. diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index 2871fd500349..71f55bbcefc8 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -20,6 +20,7 @@ Supported adapters: * Intel Patsburg (PCH) * Intel DH89xxCC (PCH) * Intel Panther Point (PCH) + * Intel Lynx Point (PCH) Datasheets: Publicly available at the Intel website On Intel Patsburg and later chipsets, both the normal host SMBus controller diff --git a/Documentation/i2c/busses/scx200_acb b/Documentation/i2c/busses/scx200_acb index 7c07883d4dfc..ce83c871fe95 100644 --- a/Documentation/i2c/busses/scx200_acb +++ b/Documentation/i2c/busses/scx200_acb @@ -28,5 +28,5 @@ If the scx200_acb driver is built into the kernel, add the following parameter to your boot command line: scx200_acb.base=0x810,0x820 If the scx200_acb driver is built as a module, add the following line to -the file /etc/modprobe.conf instead: +a configuration file in /etc/modprobe.d/ instead: options scx200_acb base=0x810,0x820 diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices index 9edb75d8c9b9..abf63615ee05 100644 --- a/Documentation/i2c/instantiating-devices +++ b/Documentation/i2c/instantiating-devices @@ -87,11 +87,11 @@ it may have different addresses from one board to the next (manufacturer changing its design without notice). In this case, you can call i2c_new_probed_device() instead of i2c_new_device(). -Example (from the pnx4008 OHCI driver): +Example (from the nxp OHCI driver): static const unsigned short normal_i2c[] = { 0x2c, 0x2d, I2C_CLIENT_END }; -static int __devinit usb_hcd_pnx4008_probe(struct platform_device *pdev) +static int __devinit usb_hcd_nxp_probe(struct platform_device *pdev) { (...) struct i2c_adapter *i2c_adap; @@ -100,7 +100,7 @@ static int __devinit usb_hcd_pnx4008_probe(struct platform_device *pdev) (...) i2c_adap = i2c_get_adapter(2); memset(&i2c_info, 0, sizeof(struct i2c_board_info)); - strlcpy(i2c_info.type, "isp1301_pnx", I2C_NAME_SIZE); + strlcpy(i2c_info.type, "isp1301_nxp", I2C_NAME_SIZE); isp1301_i2c_client = i2c_new_probed_device(i2c_adap, &i2c_info, normal_i2c, NULL); i2c_put_adapter(i2c_adap); diff --git a/Documentation/i2o/ioctl b/Documentation/i2o/ioctl index 22ca53a67e23..27c3c5493116 100644 --- a/Documentation/i2o/ioctl +++ b/Documentation/i2o/ioctl @@ -138,7 +138,7 @@ VI. Setting Parameters The return value is the size in bytes of the data written into ops->resbuf if no errors occur. If an error occurs, -1 is returned - and errno is set appropriatly: + and errno is set appropriately: EFAULT Invalid user space pointer was passed ENXIO Invalid IOP number @@ -222,7 +222,7 @@ VIII. Downloading Software RETURNS This function returns 0 no errors occur. If an error occurs, -1 - is returned and errno is set appropriatly: + is returned and errno is set appropriately: EFAULT Invalid user space pointer was passed ENXIO Invalid IOP number @@ -264,7 +264,7 @@ IX. Uploading Software RETURNS This function returns 0 if no errors occur. If an error occurs, -1 - is returned and errno is set appropriatly: + is returned and errno is set appropriately: EFAULT Invalid user space pointer was passed ENXIO Invalid IOP number @@ -301,7 +301,7 @@ X. Removing Software RETURNS This function returns 0 if no errors occur. If an error occurs, -1 - is returned and errno is set appropriatly: + is returned and errno is set appropriately: EFAULT Invalid user space pointer was passed ENXIO Invalid IOP number @@ -325,7 +325,7 @@ X. Validating Configuration RETURNS This function returns 0 if no erro occur. If an error occurs, -1 is - returned and errno is set appropriatly: + returned and errno is set appropriately: ETIMEDOUT Timeout waiting for reply message ENXIO Invalid IOP number @@ -360,7 +360,7 @@ XI. Configuration Dialog RETURNS This function returns 0 if no error occur. If an error occurs, -1 - is returned and errno is set appropriatly: + is returned and errno is set appropriately: EFAULT Invalid user space pointer was passed ENXIO Invalid IOP number diff --git a/Documentation/ide/ChangeLog.ide-cd.1994-2004 b/Documentation/ide/ChangeLog.ide-cd.1994-2004 index 190d17bfff62..4cc3ad99f39b 100644 --- a/Documentation/ide/ChangeLog.ide-cd.1994-2004 +++ b/Documentation/ide/ChangeLog.ide-cd.1994-2004 @@ -175,7 +175,7 @@ * since the .pdf version doesn't seem to work... * -- Updated the TODO list to something more current. * - * 4.15 Aug 25, 1998 -- Updated ide-cd.h to respect mechine endianess, + * 4.15 Aug 25, 1998 -- Updated ide-cd.h to respect machine endianness, * patch thanks to "Eddie C. Dost" <ecd@skynet.be> * * 4.50 Oct 19, 1998 -- New maintainers! diff --git a/Documentation/ide/ide.txt b/Documentation/ide/ide.txt index e77bebfa7b0d..7aca987c23d9 100644 --- a/Documentation/ide/ide.txt +++ b/Documentation/ide/ide.txt @@ -169,7 +169,7 @@ When using ide.c as a module in combination with kmod, add: alias block-major-3 ide-probe -to /etc/modprobe.conf. +to a configuration file in /etc/modprobe.d/. When ide.c is used as a module, you can pass command line parameters to the driver using the "options=" keyword to insmod, while replacing any ',' with diff --git a/Documentation/input/alps.txt b/Documentation/input/alps.txt index 2f95308251d4..ae8ba9a74ce1 100644 --- a/Documentation/input/alps.txt +++ b/Documentation/input/alps.txt @@ -132,8 +132,8 @@ number of contacts (f1 and f0 in the table below). byte 5: 0 1 ? ? ? ? f1 f0 This packet only appears after a position packet with the mt bit set, and -ususally only appears when there are two or more contacts (although -ocassionally it's seen with only a single contact). +usually only appears when there are two or more contacts (although +occassionally it's seen with only a single contact). The final v3 packet type is the trackstick packet. diff --git a/Documentation/input/input.txt b/Documentation/input/input.txt index b3d6787b4fb1..666c06c5ab0c 100644 --- a/Documentation/input/input.txt +++ b/Documentation/input/input.txt @@ -250,8 +250,8 @@ And so on up to event31. a USB keyboard works and is correctly connected to the kernel keyboard driver. - Doing a cat /dev/input/mouse0 (c, 13, 32) will verify that a mouse -is also emulated, characters should appear if you move it. + Doing a "cat /dev/input/mouse0" (c, 13, 32) will verify that a mouse +is also emulated; characters should appear if you move it. You can test the joystick emulation with the 'jstest' utility, available in the joystick package (see Documentation/input/joystick.txt). diff --git a/Documentation/input/joystick.txt b/Documentation/input/joystick.txt index 8007b7ca87bf..304262bb661a 100644 --- a/Documentation/input/joystick.txt +++ b/Documentation/input/joystick.txt @@ -330,7 +330,7 @@ the USB documentation for how to setup an USB mouse. The TM DirectConnect (BSP) protocol is supported by the tmdc.c module. This includes, but is not limited to: -* ThrustMaster Millenium 3D Inceptor +* ThrustMaster Millennium 3D Interceptor * ThrustMaster 3D Rage Pad * ThrustMaster Fusion Digital Game Pad diff --git a/Documentation/ioctl/hdio.txt b/Documentation/ioctl/hdio.txt index 91a6ecbae0bb..18eb98c44ffe 100644 --- a/Documentation/ioctl/hdio.txt +++ b/Documentation/ioctl/hdio.txt @@ -596,7 +596,7 @@ HDIO_DRIVE_TASKFILE execute raw taskfile if CHS/LBA28 The association between in_flags.all and each enable - bitfield flips depending on endianess; fortunately, TASKFILE + bitfield flips depending on endianness; fortunately, TASKFILE only uses inflags.b.data bit and ignores all other bits. The end result is that, on any endian machines, it has no effect other than modifying in_flags on completion. @@ -720,7 +720,7 @@ HDIO_DRIVE_TASKFILE execute raw taskfile [6] Do not access {in|out}_flags->all except for resetting all the bits. Always access individual bit fields. ->all - value will flip depending on endianess. For the same + value will flip depending on endianness. For the same reason, do not use IDE_{TASKFILE|HOB}_STD_{OUT|IN}_FLAGS constants defined in hdreg.h. diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 4840334ea97b..e34b531dc316 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -189,7 +189,7 @@ Code Seq#(hex) Include File Comments 'Y' all linux/cyclades.h 'Z' 14-15 drivers/message/fusion/mptctl.h '[' 00-07 linux/usb/tmc.h USB Test and Measurement Devices - <mailto:gregkh@suse.de> + <mailto:gregkh@linuxfoundation.org> 'a' all linux/atm*.h, linux/sonet.h ATM on linux <http://lrcwww.epfl.ch/> 'b' 00-FF conflict! bit3 vme host bridge @@ -218,12 +218,14 @@ Code Seq#(hex) Include File Comments 'h' 00-7F conflict! Charon filesystem <mailto:zapman@interlan.net> 'h' 00-1F linux/hpet.h conflict! +'h' 80-8F fs/hfsplus/ioctl.c 'i' 00-3F linux/i2o-dev.h conflict! 'i' 0B-1F linux/ipmi.h conflict! 'i' 80-8F linux/i8k.h 'j' 00-3F linux/joystick.h 'k' 00-0F linux/spi/spidev.h conflict! 'k' 00-05 video/kyro.h conflict! +'k' 10-17 linux/hsi/hsi_char.h HSI character device 'l' 00-3F linux/tcfs_fs.h transparent cryptographic file system <http://web.archive.org/web/*/http://mikonos.dia.unisa.it/tcfs> 'l' 40-7F linux/udf_fs_i.h in development: @@ -255,7 +257,7 @@ Code Seq#(hex) Include File Comments linux/ixjuser.h <http://web.archive.org/web/*/http://www.quicknet.net> 'r' 00-1F linux/msdos_fs.h and fs/fat/dir.c 's' all linux/cdk.h -'t' 00-7F linux/if_ppp.h +'t' 00-7F linux/ppp-ioctl.h 't' 80-8F linux/isdn_ppp.h 't' 90 linux/toshiba.h 'u' 00-1F linux/smb_fs.h gone diff --git a/Documentation/isdn/README.gigaset b/Documentation/isdn/README.gigaset index ef3343eaa002..7534c6039adc 100644 --- a/Documentation/isdn/README.gigaset +++ b/Documentation/isdn/README.gigaset @@ -97,8 +97,7 @@ GigaSet 307x Device Driver 2.5.): 1=on (default), 0=off Depending on your distribution you may want to create a separate module - configuration file /etc/modprobe.d/gigaset for these, or add them to a - custom file like /etc/modprobe.conf.local. + configuration file like /etc/modprobe.d/gigaset.conf for these. 2.2. Device nodes for user space programs ------------------------------------ @@ -212,8 +211,8 @@ GigaSet 307x Device Driver options ppp_async flag_time=0 - to an appropriate module configuration file, like /etc/modprobe.d/gigaset - or /etc/modprobe.conf.local. + to an appropriate module configuration file, like + /etc/modprobe.d/gigaset.conf. Unimodem mode is needed for making some devices [e.g. SX100] work which do not support the regular Gigaset command set. If debug output (see @@ -237,8 +236,8 @@ GigaSet 307x Device Driver modprobe usb_gigaset startmode=0 or by adding a line like options usb_gigaset startmode=0 - to an appropriate module configuration file, like /etc/modprobe.d/gigaset - or /etc/modprobe.conf.local. + to an appropriate module configuration file, like + /etc/modprobe.d/gigaset.conf 2.6. Call-ID (CID) mode ------------------ @@ -310,7 +309,7 @@ GigaSet 307x Device Driver options isdn dialtimeout=15 - to /etc/modprobe.d/gigaset, /etc/modprobe.conf.local or a similar file. + to /etc/modprobe.d/gigaset.conf or a similar file. Problem: The isdnlog program emits error messages or just doesn't work. @@ -350,8 +349,7 @@ GigaSet 307x Device Driver The initial value can be set using the debug parameter when loading the module "gigaset", e.g. by adding a line options gigaset debug=0 - to your module configuration file, eg. /etc/modprobe.d/gigaset or - /etc/modprobe.conf.local. + to your module configuration file, eg. /etc/modprobe.d/gigaset.conf Generated debugging information can be found - as output of the command diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index 44e2649fbb29..a686f9cd69c1 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -117,7 +117,7 @@ applicable everywhere (see syntax). This attribute is only applicable to menu blocks, if the condition is false, the menu block is not displayed to the user (the symbols contained there can still be selected by other symbols, though). It is - similar to a conditional "prompt" attribude for individual menu + similar to a conditional "prompt" attribute for individual menu entries. Default value of "visible" is true. - numerical ranges: "range" <symbol> <symbol> ["if" <expr>] diff --git a/Documentation/kbuild/kconfig.txt b/Documentation/kbuild/kconfig.txt index c313d71324b4..9d5f2a90dca9 100644 --- a/Documentation/kbuild/kconfig.txt +++ b/Documentation/kbuild/kconfig.txt @@ -28,12 +28,10 @@ new (default) values, so you can use: grep "(NEW)" conf.new -to see the new config symbols or you can 'diff' the previous and -new .config files to see the differences: +to see the new config symbols or you can use diffconfig to see the +differences between the previous and new .config files: - diff .config.old .config | less - -(Yes, we need something better here.) + scripts/diffconfig .config.old .config | less ______________________________________________________________________ Environment variables for '*config' diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index d99fd9c0ec0e..c1601e5a8b71 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -713,6 +713,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted. The filter can be disabled or changed to another driver later using sysfs. + drm_kms_helper.edid_firmware=[<connector>:]<file> + Broken monitors, graphic adapters and KVMs may + send no or incorrect EDID data sets. This parameter + allows to specify an EDID data set in the + /lib/firmware directory that is used instead. + Generic built-in EDID data sets are used, if one of + edid/1024x768.bin, edid/1280x1024.bin, + edid/1680x1050.bin, or edid/1920x1080.bin is given + and no file with the same name exists. Details and + instructions how to build your own EDID data are + available in Documentation/EDID/HOWTO.txt. An EDID + data set will only be used for a particular connector, + if its name and a colon are prepended to the EDID + name. + dscc4.setup= [NET] earlycon= [KNL] Output early console device and options. @@ -950,7 +965,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. controller i8042.nopnp [HW] Don't use ACPIPnP / PnPBIOS to discover KBD/AUX controllers - i8042.notimeout [HW] Ignore timeout condition signalled by conroller + i8042.notimeout [HW] Ignore timeout condition signalled by controller i8042.reset [HW] Reset the controller during init and cleanup i8042.unlock [HW] Unlock (ignore) the keylock @@ -1071,8 +1086,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. no_x2apic_optout BIOS x2APIC opt-out request will be ignored - inttest= [IA-64] - iomem= Disable strict checking of access to MMIO memory strict regions from userspace. relaxed @@ -1657,6 +1670,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted. of returning the full 64-bit number. The default is to return 64-bit inode numbers. + nfs.max_session_slots= + [NFSv4.1] Sets the maximum number of session slots + the client will attempt to negotiate with the server. + This limits the number of simultaneous RPC requests + that the client can send to the NFSv4.1 server. + Note that there is little point in setting this + value higher than the max_tcp_slot_table_limit. + nfs.nfs4_disable_idmapping= [NFSv4] When set to the default of '1', this option ensures that both the RPC level authentication @@ -1670,6 +1691,27 @@ bytes respectively. Such letter suffixes can also be entirely omitted. back to using the idmapper. To turn off this behaviour, set the value to '0'. + nfs.send_implementation_id = + [NFSv4.1] Send client implementation identification + information in exchange_id requests. + If zero, no implementation identification information + will be sent. + The default is to send the implementation identification + information. + + nfsd.nfs4_disable_idmapping= + [NFSv4] When set to the default of '1', the NFSv4 + server will return only numeric uids and gids to + clients using auth_sys, and will accept numeric uids + and gids from such clients. This is intended to ease + migration from NFSv2/v3. + + objlayoutdriver.osd_login_prog= + [NFS] [OBJLAYOUT] sets the pathname to the program which + is used to automatically discover and login into new + osd-targets. Please see: + Documentation/filesystems/pnfs.txt for more explanations + nmi_debug= [KNL,AVR32,SH] Specify one or more actions to take when a NMI is triggered. Format: [state][,regs][,debounce][,die] @@ -1833,6 +1875,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. shutdown the other cpus. Instead use the REBOOT_VECTOR irq. + nomodule Disable module load + nopat [X86] Disable PAT (page attribute table extension of pagetables) support. @@ -2109,8 +2153,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted. the default. off: Turn ECRC off on: Turn ECRC on. - realloc reallocate PCI resources if allocations done by BIOS - are erroneous. + realloc= Enable/disable reallocating PCI bridge resources + if allocations done by BIOS are too small to + accommodate resources required by all child + devices. + off: Turn realloc off + on: Turn realloc on + realloc same as realloc=on + noari do not use PCIe ARI. pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power Management. @@ -2118,6 +2168,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. force Enable ASPM even on devices that claim not to support it. WARNING: Forcing ASPM on may cause system lockups. + pcie_hp= [PCIE] PCI Express Hotplug driver options: + nomsi Do not use MSI for PCI Express Native Hotplug (this + makes all PCIe ports use INTx for hotplug services). + pcie_ports= [PCIE] PCIe ports handling: auto Ask the BIOS whether or not to use native PCIe services associated with PCIe ports (PME, hot-plug, AER). Use @@ -2440,7 +2494,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. For more information see Documentation/vm/slub.txt. slub_min_order= [MM, SLUB] - Determines the mininum page order for slabs. Must be + Determines the minimum page order for slabs. Must be lower than slub_max_order. For more information see Documentation/vm/slub.txt. @@ -2606,7 +2660,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. threadirqs [KNL] Force threading of all interrupt handlers except those - marked explicitely IRQF_NO_THREAD. + marked explicitly IRQF_NO_THREAD. topology= [S390] Format: {off | on} @@ -2635,6 +2689,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. to facilitate early boot debugging. See also Documentation/trace/events.txt + transparent_hugepage= + [KNL] + Format: [always|madvise|never] + Can be used to control the default behavior of the system + with respect to transparent hugepages. + See Documentation/vm/transhuge.txt for more details. + tsc= Disable clocksource stability checks for TSC. Format: <string> [x86] reliable: mark tsc clocksource as reliable, this diff --git a/Documentation/ko_KR/HOWTO b/Documentation/ko_KR/HOWTO index ab5189ae3428..2f48f205fedc 100644 --- a/Documentation/ko_KR/HOWTO +++ b/Documentation/ko_KR/HOWTO @@ -354,7 +354,7 @@ Andrew Morton에 의해 배포된 실험적인 커널 패치들이다. Andrew는 git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git quilt trees: - - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman < gregkh@suse.de> + - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman < gregkh@linuxfoundation.org> kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ - x86-64, partly i386, Andi Kleen < ak@suse.de> ftp.firstfloor.org:/pub/ak/x86_64/quilt/ diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt index 3ab2472509cb..49578cf1aea5 100644 --- a/Documentation/kobject.txt +++ b/Documentation/kobject.txt @@ -1,6 +1,6 @@ Everything you never wanted to know about kobjects, ksets, and ktypes -Greg Kroah-Hartman <gregkh@suse.de> +Greg Kroah-Hartman <gregkh@linuxfoundation.org> Based on an original article by Jon Corbet for lwn.net written October 1, 2003 and located at http://lwn.net/Articles/51437/ diff --git a/Documentation/laptops/asus-laptop.txt b/Documentation/laptops/asus-laptop.txt index 803e51f6768b..a1e04d679289 100644 --- a/Documentation/laptops/asus-laptop.txt +++ b/Documentation/laptops/asus-laptop.txt @@ -45,7 +45,7 @@ Status Usage ----- - Try "modprobe asus_acpi". Check your dmesg (simply type dmesg). You should + Try "modprobe asus-laptop". Check your dmesg (simply type dmesg). You should see some lines like this : Asus Laptop Extras version 0.42 diff --git a/Documentation/laptops/sony-laptop.txt b/Documentation/laptops/sony-laptop.txt index 2bd4e82e5d9f..0d5ac7f5287e 100644 --- a/Documentation/laptops/sony-laptop.txt +++ b/Documentation/laptops/sony-laptop.txt @@ -17,6 +17,11 @@ subsystem. See the logs of acpid or /proc/acpi/event and devices are created by the driver. Additionally, loading the driver with the debug option will report all events in the kernel log. +The "scancodes" passed to the input system (that can be remapped with udev) +are indexes to the table "sony_laptop_input_keycode_map" in the sony-laptop.c +module. For example the "FN/E" key combination (EJECTCD on some models) +generates the scancode 20 (0x14). + Backlight control: ------------------ If your laptop model supports it, you will find sysfs files in the diff --git a/Documentation/laptops/sonypi.txt b/Documentation/laptops/sonypi.txt index 4857acfc50f1..606bdb9ce036 100644 --- a/Documentation/laptops/sonypi.txt +++ b/Documentation/laptops/sonypi.txt @@ -110,7 +110,7 @@ Module use: ----------- In order to automatically load the sonypi module on use, you can put those -lines in your /etc/modprobe.conf file: +lines a configuration file in /etc/modprobe.d/: alias char-major-10-250 sonypi options sonypi minor=250 diff --git a/Documentation/leds/leds-lp5521.txt b/Documentation/leds/leds-lp5521.txt index c4d8d151e0fe..0e542ab3d4a0 100644 --- a/Documentation/leds/leds-lp5521.txt +++ b/Documentation/leds/leds-lp5521.txt @@ -43,17 +43,23 @@ Format: 10x mA i.e 10 means 1.0 mA example platform data: Note: chan_nr can have values between 0 and 2. +The name of each channel can be configurable. +If the name field is not defined, the default name will be set to 'xxxx:channelN' +(XXXX : pdata->label or i2c client name, N : channel number) static struct lp5521_led_config lp5521_led_config[] = { { + .name = "red", .chan_nr = 0, .led_current = 50, .max_current = 130, }, { + .name = "green", .chan_nr = 1, .led_current = 0, .max_current = 130, }, { + .name = "blue", .chan_nr = 2, .led_current = 0, .max_current = 130, @@ -86,3 +92,60 @@ static struct lp5521_platform_data lp5521_platform_data = { If the current is set to 0 in the platform data, that channel is disabled and it is not visible in the sysfs. + +The 'update_config' : CONFIG register (ADDR 08h) +This value is platform-specific data. +If update_config is not defined, the CONFIG register is set with +'LP5521_PWRSAVE_EN | LP5521_CP_MODE_AUTO | LP5521_R_TO_BATT'. +(Enable auto-powersave, set charge pump to auto, red to battery) + +example of update_config : + +#define LP5521_CONFIGS (LP5521_PWM_HF | LP5521_PWRSAVE_EN | \ + LP5521_CP_MODE_AUTO | LP5521_R_TO_BATT | \ + LP5521_CLK_INT) + +static struct lp5521_platform_data lp5521_pdata = { + .led_config = lp5521_led_config, + .num_channels = ARRAY_SIZE(lp5521_led_config), + .clock_mode = LP5521_CLOCK_INT, + .update_config = LP5521_CONFIGS, +}; + +LED patterns : LP5521 has autonomous operation without external control. +Pattern data can be defined in the platform data. + +example of led pattern data : + +/* RGB(50,5,0) 500ms on, 500ms off, infinite loop */ +static u8 pattern_red[] = { + 0x40, 0x32, 0x60, 0x00, 0x40, 0x00, 0x60, 0x00, + }; + +static u8 pattern_green[] = { + 0x40, 0x05, 0x60, 0x00, 0x40, 0x00, 0x60, 0x00, + }; + +static struct lp5521_led_pattern board_led_patterns[] = { + { + .r = pattern_red, + .g = pattern_green, + .size_r = ARRAY_SIZE(pattern_red), + .size_g = ARRAY_SIZE(pattern_green), + }, +}; + +static struct lp5521_platform_data lp5521_platform_data = { + .led_config = lp5521_led_config, + .num_channels = ARRAY_SIZE(lp5521_led_config), + .clock_mode = LP5521_CLOCK_EXT, + .patterns = board_led_patterns, + .num_patterns = ARRAY_SIZE(board_led_patterns), +}; + +Then predefined led pattern(s) can be executed via the sysfs. +To start the pattern #1, +# echo 1 > /sys/bus/i2c/devices/xxxx/led_pattern +(xxxx : i2c bus & slave address) +To end the pattern, +# echo 0 > /sys/bus/i2c/devices/xxxx/led_pattern diff --git a/Documentation/lockup-watchdogs.txt b/Documentation/lockup-watchdogs.txt new file mode 100644 index 000000000000..d2a36602ca8d --- /dev/null +++ b/Documentation/lockup-watchdogs.txt @@ -0,0 +1,63 @@ +=============================================================== +Softlockup detector and hardlockup detector (aka nmi_watchdog) +=============================================================== + +The Linux kernel can act as a watchdog to detect both soft and hard +lockups. + +A 'softlockup' is defined as a bug that causes the kernel to loop in +kernel mode for more than 20 seconds (see "Implementation" below for +details), without giving other tasks a chance to run. The current +stack trace is displayed upon detection and, by default, the system +will stay locked up. Alternatively, the kernel can be configured to +panic; a sysctl, "kernel.softlockup_panic", a kernel parameter, +"softlockup_panic" (see "Documentation/kernel-parameters.txt" for +details), and a compile option, "BOOTPARAM_HARDLOCKUP_PANIC", are +provided for this. + +A 'hardlockup' is defined as a bug that causes the CPU to loop in +kernel mode for more than 10 seconds (see "Implementation" below for +details), without letting other interrupts have a chance to run. +Similarly to the softlockup case, the current stack trace is displayed +upon detection and the system will stay locked up unless the default +behavior is changed, which can be done through a compile time knob, +"BOOTPARAM_HARDLOCKUP_PANIC", and a kernel parameter, "nmi_watchdog" +(see "Documentation/kernel-parameters.txt" for details). + +The panic option can be used in combination with panic_timeout (this +timeout is set through the confusingly named "kernel.panic" sysctl), +to cause the system to reboot automatically after a specified amount +of time. + +=== Implementation === + +The soft and hard lockup detectors are built on top of the hrtimer and +perf subsystems, respectively. A direct consequence of this is that, +in principle, they should work in any architecture where these +subsystems are present. + +A periodic hrtimer runs to generate interrupts and kick the watchdog +task. An NMI perf event is generated every "watchdog_thresh" +(compile-time initialized to 10 and configurable through sysctl of the +same name) seconds to check for hardlockups. If any CPU in the system +does not receive any hrtimer interrupt during that time the +'hardlockup detector' (the handler for the NMI perf event) will +generate a kernel warning or call panic, depending on the +configuration. + +The watchdog task is a high priority kernel thread that updates a +timestamp every time it is scheduled. If that timestamp is not updated +for 2*watchdog_thresh seconds (the softlockup threshold) the +'softlockup detector' (coded inside the hrtimer callback function) +will dump useful debug information to the system log, after which it +will call panic if it was instructed to do so or resume execution of +other kernel code. + +The period of the hrtimer is 2*watchdog_thresh/5, which means it has +two or three chances to generate an interrupt before the hardlockup +detector kicks in. + +As explained above, a kernel knob is provided that allows +administrators to configure the period of the hrtimer and the perf +event. The right value for a particular environment is a trade-off +between fast response to lockups and detection overhead. diff --git a/Documentation/magic-number.txt b/Documentation/magic-number.txt index abf481f780ec..82761a31d64d 100644 --- a/Documentation/magic-number.txt +++ b/Documentation/magic-number.txt @@ -89,7 +89,7 @@ TTY_DRIVER_MAGIC 0x5402 tty_driver include/linux/tty_driver.h MGSLPC_MAGIC 0x5402 mgslpc_info drivers/char/pcmcia/synclink_cs.c TTY_LDISC_MAGIC 0x5403 tty_ldisc include/linux/tty_ldisc.h USB_SERIAL_MAGIC 0x6702 usb_serial drivers/usb/serial/usb-serial.h -FULL_DUPLEX_MAGIC 0x6969 drivers/net/tulip/de2104x.c +FULL_DUPLEX_MAGIC 0x6969 drivers/net/ethernet/dec/tulip/de2104x.c USB_BLUETOOTH_MAGIC 0x6d02 usb_bluetooth drivers/usb/class/bluetty.c RFCOMM_TTY_MAGIC 0x6d02 net/bluetooth/rfcomm/tty.c USB_SERIAL_PORT_MAGIC 0x7301 usb_serial_port drivers/usb/serial/usb-serial.h diff --git a/Documentation/mono.txt b/Documentation/mono.txt index e8e1758e87da..d01ac6052194 100644 --- a/Documentation/mono.txt +++ b/Documentation/mono.txt @@ -38,11 +38,11 @@ if [ ! -e /proc/sys/fs/binfmt_misc/register ]; then /sbin/modprobe binfmt_misc # Some distributions, like Fedora Core, perform # the following command automatically when the - # binfmt_misc module is loaded into the kernel. + # binfmt_misc module is loaded into the kernel + # or during normal boot up (systemd-based systems). # Thus, it is possible that the following line - # is not needed at all. Look at /etc/modprobe.conf - # to check whether this is applicable or not. - mount -t binfmt_misc none /proc/sys/fs/binfmt_misc + # is not needed at all. + mount -t binfmt_misc none /proc/sys/fs/binfmt_misc fi # Register support for .NET CLR binaries diff --git a/Documentation/networking/LICENSE.qlge b/Documentation/networking/LICENSE.qlge index 123b6edd7f18..ce64e4d15b21 100644 --- a/Documentation/networking/LICENSE.qlge +++ b/Documentation/networking/LICENSE.qlge @@ -1,46 +1,288 @@ -Copyright (c) 2003-2008 QLogic Corporation -QLogic Linux Networking HBA Driver +Copyright (c) 2003-2011 QLogic Corporation +QLogic Linux qlge NIC Driver -This program includes a device driver for Linux 2.6 that may be -distributed with QLogic hardware specific firmware binary file. You may modify and redistribute the device driver code under the -GNU General Public License as published by the Free Software -Foundation (version 2 or a later version). - -You may redistribute the hardware specific firmware binary file -under the following terms: - - 1. Redistribution of source code (only if applicable), - must retain the above copyright notice, this list of - conditions and the following disclaimer. - - 2. Redistribution in binary form must reproduce the above - copyright notice, this list of conditions and the - following disclaimer in the documentation and/or other - materials provided with the distribution. - - 3. The name of QLogic Corporation may not be used to - endorse or promote products derived from this software - without specific prior written permission - -REGARDLESS OF WHAT LICENSING MECHANISM IS USED OR APPLICABLE, -THIS PROGRAM IS PROVIDED BY QLOGIC CORPORATION "AS IS'' AND ANY -EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A -PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR -BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, -EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED -TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON -ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY -OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. - -USER ACKNOWLEDGES AND AGREES THAT USE OF THIS PROGRAM WILL NOT -CREATE OR GIVE GROUNDS FOR A LICENSE BY IMPLICATION, ESTOPPEL, OR -OTHERWISE IN ANY INTELLECTUAL PROPERTY RIGHTS (PATENT, COPYRIGHT, -TRADE SECRET, MASK WORK, OR OTHER PROPRIETARY RIGHT) EMBODIED IN -ANY OTHER QLOGIC HARDWARE OR SOFTWARE EITHER SOLELY OR IN -COMBINATION WITH THIS PROGRAM. +GNU General Public License (a copy of which is attached hereto as +Exhibit A) published by the Free Software Foundation (version 2). + +EXHIBIT A + + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. diff --git a/Documentation/networking/baycom.txt b/Documentation/networking/baycom.txt index 4e68849d5639..688f18fd4467 100644 --- a/Documentation/networking/baycom.txt +++ b/Documentation/networking/baycom.txt @@ -93,7 +93,7 @@ Every time a driver is inserted into the kernel, it has to know which modems it should access at which ports. This can be done with the setbaycom utility. If you are only using one modem, you can also configure the driver from the insmod command line (or by means of an option line in -/etc/modprobe.conf). +/etc/modprobe.d/*.conf). Examples: modprobe baycom_ser_fdx mode="ser12*" iobase=0x3f8 irq=4 diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 080ad26690ae..bfea8a338901 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -173,9 +173,8 @@ bonding module at load time, or are specified via sysfs. Module options may be given as command line arguments to the insmod or modprobe command, but are usually specified in either the -/etc/modules.conf or /etc/modprobe.conf configuration file, or in a -distro-specific configuration file (some of which are detailed in the next -section). +/etc/modrobe.d/*.conf configuration files, or in a distro-specific +configuration file (some of which are detailed in the next section). Details on bonding support for sysfs is provided in the "Configuring Bonding Manually via Sysfs" section, below. @@ -1021,7 +1020,7 @@ ifcfg-bondX files. Because the sysconfig scripts supply the bonding module options in the ifcfg-bondX file, it is not necessary to add them to -the system /etc/modules.conf or /etc/modprobe.conf configuration file. +the system /etc/modules.d/*.conf configuration files. 3.2 Configuration with Initscripts Support ------------------------------------------ @@ -1098,15 +1097,13 @@ queried targets, e.g., arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2 is the proper syntax to specify multiple targets. When specifying -options via BONDING_OPTS, it is not necessary to edit /etc/modules.conf or -/etc/modprobe.conf. +options via BONDING_OPTS, it is not necessary to edit /etc/modprobe.d/*.conf. For even older versions of initscripts that do not support -BONDING_OPTS, it is necessary to edit /etc/modules.conf (or -/etc/modprobe.conf, depending upon your distro) to load the bonding module -with your desired options when the bond0 interface is brought up. The -following lines in /etc/modules.conf (or modprobe.conf) will load the -bonding module, and select its options: +BONDING_OPTS, it is necessary to edit /etc/modprobe.d/*.conf, depending upon +your distro) to load the bonding module with your desired options when the +bond0 interface is brought up. The following lines in /etc/modprobe.d/*.conf +will load the bonding module, and select its options: alias bond0 bonding options bond0 mode=balance-alb miimon=100 @@ -1152,7 +1149,7 @@ knowledge of bonding. One such distro is SuSE Linux Enterprise Server version 8. The general method for these systems is to place the bonding -module parameters into /etc/modules.conf or /etc/modprobe.conf (as +module parameters into a config file in /etc/modprobe.d/ (as appropriate for the installed distro), then add modprobe and/or ifenslave commands to the system's global init script. The name of the global init script differs; for sysconfig, it is @@ -1228,7 +1225,7 @@ network initialization scripts. specify a different name for each instance (the module loading system requires that every loaded module, even multiple instances of the same module, have a unique name). This is accomplished by supplying multiple -sets of bonding options in /etc/modprobe.conf, for example: +sets of bonding options in /etc/modprobe.d/*.conf, for example: alias bond0 bonding options bond0 -o bond0 mode=balance-rr miimon=100 @@ -1793,8 +1790,8 @@ route additions may cause trouble. On systems with network configuration scripts that do not associate physical devices directly with network interface names (so that the same physical device always has the same "ethX" name), it may -be necessary to add some special logic to either /etc/modules.conf or -/etc/modprobe.conf (depending upon which is installed on the system). +be necessary to add some special logic to config files in +/etc/modprobe.d/. For example, given a modules.conf containing the following: @@ -1821,20 +1818,15 @@ add above bonding e1000 tg3 bonding is loaded. This command is fully documented in the modules.conf manual page. - On systems utilizing modprobe.conf (or modprobe.conf.local), -an equivalent problem can occur. In this case, the following can be -added to modprobe.conf (or modprobe.conf.local, as appropriate), as -follows (all on one line; it has been split here for clarity): + On systems utilizing modprobe an equivalent problem can occur. +In this case, the following can be added to config files in +/etc/modprobe.d/ as: -install bonding /sbin/modprobe tg3; /sbin/modprobe e1000; - /sbin/modprobe --ignore-install bonding +softdep bonding pre: tg3 e1000 - This will, when loading the bonding module, rather than -performing the normal action, instead execute the provided command. -This command loads the device drivers in the order needed, then calls -modprobe with --ignore-install to cause the normal action to then take -place. Full documentation on this can be found in the modprobe.conf -and modprobe manual pages. + This will load tg3 and e1000 modules before loading the bonding one. +Full documentation on this can be found in the modprobe.d and modprobe +manual pages. 8.3. Painfully Slow Or No Failed Link Detection By Miimon --------------------------------------------------------- diff --git a/Documentation/networking/dl2k.txt b/Documentation/networking/dl2k.txt index 10e8490fa406..cba74f7a3abc 100644 --- a/Documentation/networking/dl2k.txt +++ b/Documentation/networking/dl2k.txt @@ -45,12 +45,13 @@ Now eth0 should active, you can test it by "ping" or get more information by "ifconfig". If tested ok, continue the next step. 4. cp dl2k.ko /lib/modules/`uname -r`/kernel/drivers/net -5. Add the following line to /etc/modprobe.conf: +5. Add the following line to /etc/modprobe.d/dl2k.conf: alias eth0 dl2k -6. Run "netconfig" or "netconf" to create configuration script ifcfg-eth0 +6. Run depmod to updated module indexes. +7. Run "netconfig" or "netconf" to create configuration script ifcfg-eth0 located at /etc/sysconfig/network-scripts or create it manually. [see - Configuration Script Sample] -7. Driver will automatically load and configure at next boot time. +8. Driver will automatically load and configure at next boot time. Compiling the Driver ==================== @@ -154,8 +155,8 @@ Installing the Driver ----------------- 1. Copy dl2k.o to the network modules directory, typically /lib/modules/2.x.x-xx/net or /lib/modules/2.x.x/kernel/drivers/net. - 2. Locate the boot module configuration file, most commonly modprobe.conf - or modules.conf (for 2.4) in the /etc directory. Add the following lines: + 2. Locate the boot module configuration file, most commonly in the + /etc/modprobe.d/ directory. Add the following lines: alias ethx dl2k options dl2k <optional parameters> diff --git a/Documentation/networking/dns_resolver.txt b/Documentation/networking/dns_resolver.txt index 7f531ad83285..d86adcdae420 100644 --- a/Documentation/networking/dns_resolver.txt +++ b/Documentation/networking/dns_resolver.txt @@ -102,6 +102,10 @@ implemented in the module can be called after doing: If _expiry is non-NULL, the expiry time (TTL) of the result will be returned also. +The kernel maintains an internal keyring in which it caches looked up keys. +This can be cleared by any process that has the CAP_SYS_ADMIN capability by +the use of KEYCTL_KEYRING_CLEAR on the keyring ID. + =============================== READING DNS KEYS FROM USERSPACE diff --git a/Documentation/networking/driver.txt b/Documentation/networking/driver.txt index 03283daa64fe..da59e2884130 100644 --- a/Documentation/networking/driver.txt +++ b/Documentation/networking/driver.txt @@ -2,16 +2,16 @@ Document about softnet driver issues Transmit path guidelines: -1) The hard_start_xmit method must never return '1' under any - normal circumstances. It is considered a hard error unless +1) The ndo_start_xmit method must not return NETDEV_TX_BUSY under + any normal circumstances. It is considered a hard error unless there is no way your device can tell ahead of time when it's transmit function will become busy. Instead it must maintain the queue properly. For example, for a driver implementing scatter-gather this means: - static int drv_hard_start_xmit(struct sk_buff *skb, - struct net_device *dev) + static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb, + struct net_device *dev) { struct drv *dp = netdev_priv(dev); @@ -23,7 +23,7 @@ Transmit path guidelines: unlock_tx(dp); printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", dev->name); - return 1; + return NETDEV_TX_BUSY; } ... queue packet to card ... @@ -35,6 +35,7 @@ Transmit path guidelines: ... unlock_tx(dp); ... + return NETDEV_TX_OK; } And then at the end of your TX reclamation event handling: @@ -58,15 +59,12 @@ Transmit path guidelines: TX_BUFFS_AVAIL(dp) > 0) netif_wake_queue(dp->dev); -2) Do not forget to update netdev->trans_start to jiffies after - each new tx packet is given to the hardware. - -3) A hard_start_xmit method must not modify the shared parts of a +2) An ndo_start_xmit method must not modify the shared parts of a cloned SKB. -4) Do not forget that once you return 0 from your hard_start_xmit - method, it is your driver's responsibility to free up the SKB - and in some finite amount of time. +3) Do not forget that once you return NETDEV_TX_OK from your + ndo_start_xmit method, it is your driver's responsibility to free + up the SKB and in some finite amount of time. For example, this means that it is not allowed for your TX mitigation scheme to let TX packets "hang out" in the TX @@ -74,8 +72,9 @@ Transmit path guidelines: This error can deadlock sockets waiting for send buffer room to be freed up. - If you return 1 from the hard_start_xmit method, you must not keep - any reference to that SKB and you must not attempt to free it up. + If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you + must not keep any reference to that SKB and you must not attempt + to free it up. Probing guidelines: @@ -85,10 +84,10 @@ Probing guidelines: Close/stop guidelines: -1) After the dev->stop routine has been called, the hardware must +1) After the ndo_stop routine has been called, the hardware must not receive or transmit any data. All in flight packets must be aborted. If necessary, poll or wait for completion of any reset commands. -2) The dev->stop routine will be called by unregister_netdevice +2) The ndo_stop routine will be called by unregister_netdevice if device is still UP. diff --git a/Documentation/networking/e100.txt b/Documentation/networking/e100.txt index 162f323a7a1f..fcb6c71cdb69 100644 --- a/Documentation/networking/e100.txt +++ b/Documentation/networking/e100.txt @@ -94,8 +94,8 @@ Additional Configurations Configuring a network driver to load properly when the system is started is distribution dependent. Typically, the configuration process involves adding - an alias line to /etc/modules.conf or /etc/modprobe.conf as well as editing - other system startup scripts and/or configuration files. Many popular Linux + an alias line to /etc/modprobe.d/*.conf as well as editing other system + startup scripts and/or configuration files. Many popular Linux distributions ship with tools to make these changes for you. To learn the proper way to configure a network device for your system, refer to your distribution documentation. If during this process you are asked for the @@ -103,7 +103,7 @@ Additional Configurations PRO/100 Family of Adapters is e100. As an example, if you install the e100 driver for two PRO/100 adapters - (eth0 and eth1), add the following to modules.conf or modprobe.conf: + (eth0 and eth1), add the following to a configuraton file in /etc/modprobe.d/ alias eth0 e100 alias eth1 e100 diff --git a/Documentation/networking/fore200e.txt b/Documentation/networking/fore200e.txt index 6e0d2a9613ec..f648eb265188 100644 --- a/Documentation/networking/fore200e.txt +++ b/Documentation/networking/fore200e.txt @@ -44,7 +44,7 @@ the 'software updates' pages. The firmware binaries are part of the various ForeThought software distributions. Notice that different versions of the PCA-200E firmware exist, depending -on the endianess of the host architecture. The driver is shipped with +on the endianness of the host architecture. The driver is shipped with both little and big endian PCA firmware images. Name and location of the new firmware images can be set at kernel diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ad3e80e17b4f..1619a8c80873 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -147,7 +147,7 @@ tcp_adv_win_scale - INTEGER (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale), if it is <= 0. Possible values are [-31, 31], inclusive. - Default: 2 + Default: 1 tcp_allowed_congestion_control - STRING Show/set the congestion control choices available to non-privileged @@ -410,7 +410,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables automatic tuning of that socket's receive buffer size, in which case this value is ignored. - Default: between 87380B and 4MB, depending on RAM size. + Default: between 87380B and 6MB, depending on RAM size. tcp_sack - BOOLEAN Enable select acknowledgments (SACKS). @@ -604,15 +604,8 @@ IP Variables: ip_local_port_range - 2 INTEGERS Defines the local port range that is used by TCP and UDP to choose the local port. The first number is the first, the - second the last local port number. Default value depends on - amount of memory available on the system: - > 128Mb 32768-61000 - < 128Mb 1024-4999 or even less. - This number defines number of active connections, which this - system can issue simultaneously to systems not supporting - TCP extensions (timestamps). With tcp_tw_recycle enabled - (i.e. by default) range 1024-4999 is enough to issue up to - 2000 connections per second to systems supporting timestamps. + second the last local port number. The default values are + 32768 and 61000 respectively. ip_local_reserved_ports - list of comma separated ranges Specify the ports which are reserved for known third-party diff --git a/Documentation/networking/ipv6.txt b/Documentation/networking/ipv6.txt index 9fd7e21296c8..6cd74fa55358 100644 --- a/Documentation/networking/ipv6.txt +++ b/Documentation/networking/ipv6.txt @@ -2,9 +2,9 @@ Options for the ipv6 module are supplied as parameters at load time. Module options may be given as command line arguments to the insmod -or modprobe command, but are usually specified in either the -/etc/modules.conf or /etc/modprobe.conf configuration file, or in a -distro-specific configuration file. +or modprobe command, but are usually specified in either +/etc/modules.d/*.conf configuration files, or in a distro-specific +configuration file. The available ipv6 module parameters are listed below. If a parameter is not specified the default value is used. diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt index e196f16df313..d75a1f9565bb 100644 --- a/Documentation/networking/ixgb.txt +++ b/Documentation/networking/ixgb.txt @@ -274,9 +274,9 @@ Additional Configurations ------------------------------------------------- Configuring a network driver to load properly when the system is started is distribution dependent. Typically, the configuration process involves adding - an alias line to /etc/modprobe.conf as well as editing other system startup - scripts and/or configuration files. Many popular Linux distributions ship - with tools to make these changes for you. To learn the proper way to + an alias line to files in /etc/modprobe.d/ as well as editing other system + startup scripts and/or configuration files. Many popular Linux distributions + ship with tools to make these changes for you. To learn the proper way to configure a network device for your system, refer to your distribution documentation. If during this process you are asked for the driver or module name, the name for the Linux Base Driver for the Intel 10GbE Family of diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt index e7bf3979facb..e63fc1f7bf87 100644 --- a/Documentation/networking/l2tp.txt +++ b/Documentation/networking/l2tp.txt @@ -111,7 +111,7 @@ When creating PPPoL2TP sockets, the application provides information to the driver about the socket in a socket connect() call. Source and destination tunnel and session ids are provided, as well as the file descriptor of a UDP socket. See struct pppol2tp_addr in -include/linux/if_ppp.h. Note that zero tunnel / session ids are +include/linux/if_pppol2tp.h. Note that zero tunnel / session ids are treated specially. When creating the per-tunnel PPPoL2TP management socket in Step 2 above, zero source and destination session ids are specified, which tells the driver to prepare the supplied UDP file diff --git a/Documentation/networking/ltpc.txt b/Documentation/networking/ltpc.txt index fe2a9129d959..0bf3220c715b 100644 --- a/Documentation/networking/ltpc.txt +++ b/Documentation/networking/ltpc.txt @@ -25,7 +25,7 @@ the driver will try to determine them itself. If you load the driver as a module, you can pass the parameters "io=", "irq=", and "dma=" on the command line with insmod or modprobe, or add -them as options in /etc/modprobe.conf: +them as options in a configuration file in /etc/modprobe.d/ directory: alias lt0 ltpc # autoload the module when the interface is configured options ltpc io=0x240 irq=9 dma=1 diff --git a/Documentation/networking/mac80211-auth-assoc-deauth.txt b/Documentation/networking/mac80211-auth-assoc-deauth.txt new file mode 100644 index 000000000000..e0a2aa585ca3 --- /dev/null +++ b/Documentation/networking/mac80211-auth-assoc-deauth.txt @@ -0,0 +1,99 @@ +# +# This outlines the Linux authentication/association and +# deauthentication/disassociation flows. +# +# This can be converted into a diagram using the service +# at http://www.websequencediagrams.com/ +# + +participant userspace +participant mac80211 +participant driver + +alt authentication needed (not FT) +userspace->mac80211: authenticate + +alt authenticated/authenticating already +mac80211->driver: sta_state(AP, not-exists) +mac80211->driver: bss_info_changed(clear BSSID) +else associated +note over mac80211,driver +like deauth/disassoc, without sending the +BA session stop & deauth/disassoc frames +end note +end + +mac80211->driver: config(channel, non-HT) +mac80211->driver: bss_info_changed(set BSSID, basic rate bitmap) +mac80211->driver: sta_state(AP, exists) + +alt no probe request data known +mac80211->driver: TX directed probe request +driver->mac80211: RX probe response +end + +mac80211->driver: TX auth frame +driver->mac80211: RX auth frame + +alt WEP shared key auth +mac80211->driver: TX auth frame +driver->mac80211: RX auth frame +end + +mac80211->driver: sta_state(AP, authenticated) +mac80211->userspace: RX auth frame + +end + +userspace->mac80211: associate +alt authenticated or associated +note over mac80211,driver: cleanup like for authenticate +end + +alt not previously authenticated (FT) +mac80211->driver: config(channel, non-HT) +mac80211->driver: bss_info_changed(set BSSID, basic rate bitmap) +mac80211->driver: sta_state(AP, exists) +mac80211->driver: sta_state(AP, authenticated) +end +mac80211->driver: TX assoc +driver->mac80211: RX assoc response +note over mac80211: init rate control +mac80211->driver: sta_state(AP, associated) + +alt not using WPA +mac80211->driver: sta_state(AP, authorized) +end + +mac80211->driver: set up QoS parameters + +alt is HT channel +mac80211->driver: config(channel, HT params) +end + +mac80211->driver: bss_info_changed(QoS, HT, associated with AID) +mac80211->userspace: associated + +note left of userspace: associated now + +alt using WPA +note over userspace +do 4-way-handshake +(data frames) +end note +userspace->mac80211: authorized +mac80211->driver: sta_state(AP, authorized) +end + +userspace->mac80211: deauthenticate/disassociate +mac80211->driver: stop BA sessions +mac80211->driver: TX deauth/disassoc +mac80211->driver: flush frames +mac80211->driver: sta_state(AP,associated) +mac80211->driver: sta_state(AP,authenticated) +mac80211->driver: sta_state(AP,exists) +mac80211->driver: sta_state(AP,not-exists) +mac80211->driver: turn off powersave +mac80211->driver: bss_info_changed(clear BSSID, not associated, no QoS, ...) +mac80211->driver: config(non-HT channel type) +mac80211->userspace: disconnected diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt index 4b1c0dcef84c..4164f5c02e4b 100644 --- a/Documentation/networking/netdev-features.txt +++ b/Documentation/networking/netdev-features.txt @@ -152,3 +152,16 @@ NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN headers. Some drivers set this because the cards can't handle the bigger MTU. [FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU VLANs. This may be not useful, though.] + +* rx-fcs + +This requests that the NIC append the Ethernet Frame Checksum (FCS) +to the end of the skb data. This allows sniffers and other tools to +read the CRC recorded by the NIC on receipt of the packet. + +* rx-all + +This requests that the NIC receive all possible frames, including errored +frames (such as bad FCS, etc). This can be helpful when sniffing a link with +bad packets on it. Some NICs may receive more packets if also put into normal +PROMISC mdoe. diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index 89358341682a..c7ecc7080494 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt @@ -47,26 +47,25 @@ packets is preferred. struct net_device synchronization rules ======================================= -dev->open: +ndo_open: Synchronization: rtnl_lock() semaphore. Context: process -dev->stop: +ndo_stop: Synchronization: rtnl_lock() semaphore. Context: process - Note1: netif_running() is guaranteed false - Note2: dev->poll() is guaranteed to be stopped + Note: netif_running() is guaranteed false -dev->do_ioctl: +ndo_do_ioctl: Synchronization: rtnl_lock() semaphore. Context: process -dev->get_stats: +ndo_get_stats: Synchronization: dev_base_lock rwlock. Context: nominally process, but don't sleep inside an rwlock -dev->hard_start_xmit: - Synchronization: netif_tx_lock spinlock. +ndo_start_xmit: + Synchronization: __netif_tx_lock spinlock. When the driver sets NETIF_F_LLTX in dev->features this will be called without holding netif_tx_lock. In this case the driver @@ -87,20 +86,20 @@ dev->hard_start_xmit: o NETDEV_TX_LOCKED Locking failed, please retry quickly. Only valid when NETIF_F_LLTX is set. -dev->tx_timeout: - Synchronization: netif_tx_lock spinlock. +ndo_tx_timeout: + Synchronization: netif_tx_lock spinlock; all TX queues frozen. Context: BHs disabled Notes: netif_queue_stopped() is guaranteed true -dev->set_rx_mode: - Synchronization: netif_tx_lock spinlock. +ndo_set_rx_mode: + Synchronization: netif_addr_lock spinlock. Context: BHs disabled struct napi_struct synchronization rules ======================================== napi->poll: Synchronization: NAPI_STATE_SCHED bit in napi->state. Device - driver's dev->close method will invoke napi_disable() on + driver's ndo_stop method will invoke napi_disable() on all NAPI instances which will do a sleeping poll on the NAPI_STATE_SCHED napi->state bit, waiting for all pending NAPI activity to cease. diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 9eb1ba52013d..95e5f5985a2a 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -62,7 +62,8 @@ The MDIO bus 5) The bus must also be declared somewhere as a device, and registered. As an example for how one driver implemented an mdio bus driver, see - drivers/net/gianfar_mii.c and arch/ppc/syslib/mpc85xx_devices.c + drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file + for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/") Connecting to a PHY diff --git a/Documentation/networking/ppp_generic.txt b/Documentation/networking/ppp_generic.txt index 15b5172fbb98..091d20273dcb 100644 --- a/Documentation/networking/ppp_generic.txt +++ b/Documentation/networking/ppp_generic.txt @@ -342,7 +342,7 @@ an interface unit are: numbers on received multilink fragments SC_MP_XSHORTSEQ transmit short multilink sequence nos. - The values of these flags are defined in <linux/if_ppp.h>. Note + The values of these flags are defined in <linux/ppp-ioctl.h>. Note that the values of the SC_MULTILINK, SC_MP_SHORTSEQ and SC_MP_XSHORTSEQ bits are ignored if the CONFIG_PPP_MULTILINK option is not selected. @@ -358,7 +358,7 @@ an interface unit are: * PPPIOCSCOMPRESS sets the parameters for packet compression or decompression. The argument should point to a ppp_option_data - structure (defined in <linux/if_ppp.h>), which contains a + structure (defined in <linux/ppp-ioctl.h>), which contains a pointer/length pair which should describe a block of memory containing a CCP option specifying a compression method and its parameters. The ppp_option_data struct also contains a `transmit' @@ -395,7 +395,7 @@ an interface unit are: * PPPIOCSNPMODE sets the network-protocol mode for a given network protocol. The argument should point to an npioctl struct (defined - in <linux/if_ppp.h>). The `protocol' field gives the PPP protocol + in <linux/ppp-ioctl.h>). The `protocol' field gives the PPP protocol number for the protocol to be affected, and the `mode' field specifies what to do with packets for that protocol: diff --git a/Documentation/networking/vortex.txt b/Documentation/networking/vortex.txt index bd70976b8160..b4038ffb3bc5 100644 --- a/Documentation/networking/vortex.txt +++ b/Documentation/networking/vortex.txt @@ -67,8 +67,8 @@ Module parameters ================= There are several parameters which may be provided to the driver when -its module is loaded. These are usually placed in /etc/modprobe.conf -(/etc/modules.conf in 2.4). Example: +its module is loaded. These are usually placed in /etc/modprobe.d/*.conf +configuretion files. Example: options 3c59x debug=3 rx_copybreak=300 @@ -425,7 +425,7 @@ steps you should take: 1) Increase the debug level. Usually this is done via: a) modprobe driver debug=7 - b) In /etc/modprobe.conf (or /etc/modules.conf for 2.4): + b) In /etc/modprobe.d/driver.conf: options driver debug=7 2) Recreate the problem with the higher debug level, diff --git a/Documentation/nmi_watchdog.txt b/Documentation/nmi_watchdog.txt deleted file mode 100644 index bf9f80a98282..000000000000 --- a/Documentation/nmi_watchdog.txt +++ /dev/null @@ -1,83 +0,0 @@ - -[NMI watchdog is available for x86 and x86-64 architectures] - -Is your system locking up unpredictably? No keyboard activity, just -a frustrating complete hard lockup? Do you want to help us debugging -such lockups? If all yes then this document is definitely for you. - -On many x86/x86-64 type hardware there is a feature that enables -us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt -which get executed even if the system is otherwise locked up hard). -This can be used to debug hard kernel lockups. By executing periodic -NMI interrupts, the kernel can monitor whether any CPU has locked up, -and print out debugging messages if so. - -In order to use the NMI watchdog, you need to have APIC support in your -kernel. For SMP kernels, APIC support gets compiled in automatically. For -UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local -APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and -features -> IO-APIC support on uniprocessors) in your kernel config. -CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC. -CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain -kernel debugging options, such as Kernel Stack Meter or Kernel Tracer, -may implicitly disable the NMI watchdog.] - -For x86-64, the needed APIC is always compiled in. - -Using local APIC (nmi_watchdog=2) needs the first performance register, so -you can't use it for other purposes (such as high precision performance -profiling.) However, at least oprofile and the perfctr driver disable the -local APIC NMI watchdog automatically. - -To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot -parameter. Eg. the relevant lilo.conf entry: - - append="nmi_watchdog=1" - -For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1. -For UP machines without an IO-APIC use nmi_watchdog=2, this only works -for some processor types. If in doubt, boot with nmi_watchdog=1 and -check the NMI count in /proc/interrupts; if the count is zero then -reboot with nmi_watchdog=2 and check the NMI count. If it is still -zero then log a problem, you probably have a processor that needs to be -added to the nmi code. - -A 'lockup' is the following scenario: if any CPU in the system does not -execute the period local timer interrupt for more than 5 seconds, then -the NMI handler generates an oops and kills the process. This -'controlled crash' (and the resulting kernel messages) can be used to -debug the lockup. Thus whenever the lockup happens, wait 5 seconds and -the oops will show up automatically. If the kernel produces no messages -then the system has crashed so hard (eg. hardware-wise) that either it -cannot even accept NMI interrupts, or the crash has made the kernel -unable to print messages. - -Be aware that when using local APIC, the frequency of NMI interrupts -it generates, depends on the system load. The local APIC NMI watchdog, -lacking a better source, uses the "cycles unhalted" event. As you may -guess it doesn't tick when the CPU is in the halted state (which happens -when the system is idle), but if your system locks up on anything but the -"hlt" processor instruction, the watchdog will trigger very soon as the -"cycles unhalted" event will happen every clock tick. If it locks up on -"hlt", then you are out of luck -- the event will not happen at all and the -watchdog won't trigger. This is a shortcoming of the local APIC watchdog --- unfortunately there is no "clock ticks" event that would work all the -time. The I/O APIC watchdog is driven externally and has no such shortcoming. -But its NMI frequency is much higher, resulting in a more significant hit -to the overall system performance. - -On x86 nmi_watchdog is disabled by default so you have to enable it with -a boot time parameter. - -It's possible to disable the NMI watchdog in run-time by writing "0" to -/proc/sys/kernel/nmi_watchdog. Writing "1" to the same file will re-enable -the NMI watchdog. Notice that you still need to use "nmi_watchdog=" parameter -at boot time. - -NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally -on x86 SMP boxes. - -[ feel free to send bug reports, suggestions and patches to - Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing - list at <linux-smp@vger.kernel.org> ] - diff --git a/Documentation/numastat.txt b/Documentation/numastat.txt index 9fcc9a608dc0..520327790d54 100644 --- a/Documentation/numastat.txt +++ b/Documentation/numastat.txt @@ -5,18 +5,23 @@ Numa policy hit/miss statistics All units are pages. Hugepages have separate counters. -numa_hit A process wanted to allocate memory from this node, - and succeeded. -numa_miss A process wanted to allocate memory from another node, - but ended up with memory from this node. -numa_foreign A process wanted to allocate on this node, - but ended up with memory from another one. -local_node A process ran on this node and got memory from it. -other_node A process ran on this node and got memory from another node. -interleave_hit Interleaving wanted to allocate from this node - and succeeded. +numa_hit A process wanted to allocate memory from this node, + and succeeded. + +numa_miss A process wanted to allocate memory from another node, + but ended up with memory from this node. + +numa_foreign A process wanted to allocate on this node, + but ended up with memory from another one. + +local_node A process ran on this node and got memory from it. + +other_node A process ran on this node and got memory from another node. + +interleave_hit Interleaving wanted to allocate from this node + and succeeded. For easier reading you can use the numastat utility from the numactl package -(ftp://ftp.suse.com/pub/people/ak/numa/numactl*). Note that it only works +(http://oss.sgi.com/projects/libnuma/). Note that it only works well right now on machines with a small number of CPUs. diff --git a/Documentation/parport.txt b/Documentation/parport.txt index 93a7ceef398d..c208e4366c03 100644 --- a/Documentation/parport.txt +++ b/Documentation/parport.txt @@ -36,18 +36,17 @@ addresses should not be specified for supported PCI cards since they are automatically detected. -KMod ----- +modprobe +-------- -If you use kmod, you will find it useful to edit /etc/modprobe.conf. -Here is an example of the lines that need to be added: +If you use modprobe , you will find it useful to add lines as below to a +configuration file in /etc/modprobe.d/ directory:. alias parport_lowlevel parport_pc options parport_pc io=0x378,0x278 irq=7,auto -KMod will then automatically load parport_pc (with the options -"io=0x378,0x278 irq=7,auto") whenever a parallel port device driver -(such as lp) is loaded. +modprobe will load parport_pc (with the options "io=0x378,0x278 irq=7,auto") +whenever a parallel port device driver (such as lp) is loaded. Note that these are example lines only! You shouldn't in general need to specify any options to parport_pc in order to be able to use a diff --git a/Documentation/pinctrl.txt b/Documentation/pinctrl.txt index 150fd3833d0b..d97bccf46147 100644 --- a/Documentation/pinctrl.txt +++ b/Documentation/pinctrl.txt @@ -206,12 +206,21 @@ using a certain resistor value - pull up and pull down - so that the pin has a stable value when nothing is driving the rail it is connected to, or when it's unconnected. -For example, a platform may do this: +Pin configuration can be programmed either using the explicit APIs described +immediately below, or by adding configuration entries into the mapping table; +see section "Board/machine configuration" below. + +For example, a platform may do the following to pull up a pin to VDD: + +#include <linux/pinctrl/consumer.h> ret = pin_config_set("foo-dev", "FOO_GPIO_PIN", PLATFORM_X_PULL_UP); -To pull up a pin to VDD. The pin configuration driver implements callbacks for -changing pin configuration in the pin controller ops like this: +The format and meaning of the configuration parameter, PLATFORM_X_PULL_UP +above, is entirely defined by the pin controller driver. + +The pin configuration driver implements callbacks for changing pin +configuration in the pin controller ops like this: #include <linux/pinctrl/pinctrl.h> #include <linux/pinctrl/pinconf.h> @@ -492,14 +501,10 @@ Definitions: {"map-i2c0", i2c0, pinctrl0, fi2c0, gi2c0} } - Every map must be assigned a symbolic name, pin controller and function. - The group is not compulsory - if it is omitted the first group presented by - the driver as applicable for the function will be selected, which is - useful for simple cases. - - The device name is present in map entries tied to specific devices. Maps - without device names are referred to as SYSTEM pinmuxes, such as can be taken - by the machine implementation on boot and not tied to any specific device. + Every map must be assigned a state name, pin controller, device and + function. The group is not compulsory - if it is omitted the first group + presented by the driver as applicable for the function will be selected, + which is useful for simple cases. It is possible to map several groups to the same combination of device, pin controller and function. This is for cases where a certain function on @@ -726,19 +731,19 @@ same time. All the above functions are mandatory to implement for a pinmux driver. -Pinmux interaction with the GPIO subsystem -========================================== +Pin control interaction with the GPIO subsystem +=============================================== -The public pinmux API contains two functions named pinmux_request_gpio() -and pinmux_free_gpio(). These two functions shall *ONLY* be called from +The public pinmux API contains two functions named pinctrl_request_gpio() +and pinctrl_free_gpio(). These two functions shall *ONLY* be called from gpiolib-based drivers as part of their gpio_request() and -gpio_free() semantics. Likewise the pinmux_gpio_direction_[input|output] +gpio_free() semantics. Likewise the pinctrl_gpio_direction_[input|output] shall only be called from within respective gpio_direction_[input|output] gpiolib implementation. NOTE that platforms and individual drivers shall *NOT* request GPIO pins to be -muxed in. Instead, implement a proper gpiolib driver and have that driver -request proper muxing for its pins. +controlled e.g. muxed in. Instead, implement a proper gpiolib driver and have +that driver request proper muxing and other control for its pins. The function list could become long, especially if you can convert every individual pin into a GPIO pin independent of any other pins, and then try @@ -747,7 +752,7 @@ the approach to define every pin as a function. In this case, the function array would become 64 entries for each GPIO setting and then the device functions. -For this reason there are two functions a pinmux driver can implement +For this reason there are two functions a pin control driver can implement to enable only GPIO on an individual pin: .gpio_request_enable() and .gpio_disable_free(). @@ -762,12 +767,12 @@ gpiolib driver and the affected GPIO range, pin offset and desired direction will be passed along to this function. Alternatively to using these special functions, it is fully allowed to use -named functions for each GPIO pin, the pinmux_request_gpio() will attempt to +named functions for each GPIO pin, the pinctrl_request_gpio() will attempt to obtain the function "gpioN" where "N" is the global GPIO pin number if no special GPIO-handler is registered. -Pinmux board/machine configuration +Board/machine configuration ================================== Boards and machines define how a certain complete running system is put @@ -775,27 +780,33 @@ together, including how GPIOs and devices are muxed, how regulators are constrained and how the clock tree looks. Of course pinmux settings are also part of this. -A pinmux config for a machine looks pretty much like a simple regulator -configuration, so for the example array above we want to enable i2c and -spi on the second function mapping: +A pin controller configuration for a machine looks pretty much like a simple +regulator configuration, so for the example array above we want to enable i2c +and spi on the second function mapping: #include <linux/pinctrl/machine.h> -static const struct pinmux_map __initdata pmx_mapping[] = { +static const struct pinctrl_map __initdata mapping[] = { { - .ctrl_dev_name = "pinctrl-foo", - .function = "spi0", .dev_name = "foo-spi.0", + .name = PINCTRL_STATE_DEFAULT, + .type = PIN_MAP_TYPE_MUX_GROUP, + .ctrl_dev_name = "pinctrl-foo", + .data.mux.function = "spi0", }, { - .ctrl_dev_name = "pinctrl-foo", - .function = "i2c0", .dev_name = "foo-i2c.0", + .name = PINCTRL_STATE_DEFAULT, + .type = PIN_MAP_TYPE_MUX_GROUP, + .ctrl_dev_name = "pinctrl-foo", + .data.mux.function = "i2c0", }, { - .ctrl_dev_name = "pinctrl-foo", - .function = "mmc0", .dev_name = "foo-mmc.0", + .name = PINCTRL_STATE_DEFAULT, + .type = PIN_MAP_TYPE_MUX_GROUP, + .ctrl_dev_name = "pinctrl-foo", + .data.mux.function = "mmc0", }, }; @@ -805,21 +816,51 @@ must match a function provided by the pinmux driver handling this pin range. As you can see we may have several pin controllers on the system and thus we need to specify which one of them that contain the functions we wish -to map. The map can also use struct device * directly, so there is no -inherent need to use strings to specify .dev_name or .ctrl_dev_name, these -are for the situation where you do not have a handle to the struct device *, -for example if they are not yet instantiated or cumbersome to obtain. +to map. You register this pinmux mapping to the pinmux subsystem by simply: - ret = pinmux_register_mappings(pmx_mapping, ARRAY_SIZE(pmx_mapping)); + ret = pinctrl_register_mappings(mapping, ARRAY_SIZE(mapping)); Since the above construct is pretty common there is a helper macro to make it even more compact which assumes you want to use pinctrl-foo and position 0 for mapping, for example: -static struct pinmux_map __initdata pmx_mapping[] = { - PINMUX_MAP("I2CMAP", "pinctrl-foo", "i2c0", "foo-i2c.0"), +static struct pinctrl_map __initdata mapping[] = { + PIN_MAP_MUX_GROUP("foo-i2c.o", PINCTRL_STATE_DEFAULT, "pinctrl-foo", NULL, "i2c0"), +}; + +The mapping table may also contain pin configuration entries. It's common for +each pin/group to have a number of configuration entries that affect it, so +the table entries for configuration reference an array of config parameters +and values. An example using the convenience macros is shown below: + +static unsigned long i2c_grp_configs[] = { + FOO_PIN_DRIVEN, + FOO_PIN_PULLUP, +}; + +static unsigned long i2c_pin_configs[] = { + FOO_OPEN_COLLECTOR, + FOO_SLEW_RATE_SLOW, +}; + +static struct pinctrl_map __initdata mapping[] = { + PIN_MAP_MUX_GROUP("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0", "i2c0"), + PIN_MAP_MUX_CONFIGS_GROUP("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0", i2c_grp_configs), + PIN_MAP_MUX_CONFIGS_PIN("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0scl", i2c_pin_configs), + PIN_MAP_MUX_CONFIGS_PIN("foo-i2c.0", PINCTRL_STATE_DEFAULT, "pinctrl-foo", "i2c0sda", i2c_pin_configs), +}; + +Finally, some devices expect the mapping table to contain certain specific +named states. When running on hardware that doesn't need any pin controller +configuration, the mapping table must still contain those named states, in +order to explicitly indicate that the states were provided and intended to +be empty. Table entry macro PIN_MAP_DUMMY_STATE serves the purpose of defining +a named state without causing any pin controller to be programmed: + +static struct pinctrl_map __initdata mapping[] = { + PIN_MAP_DUMMY_STATE("foo-i2c.0", PINCTRL_STATE_DEFAULT), }; @@ -831,81 +872,96 @@ As it is possible to map a function to different groups of pins an optional ... { + .dev_name = "foo-spi.0", .name = "spi0-pos-A", + .type = PIN_MAP_TYPE_MUX_GROUP, .ctrl_dev_name = "pinctrl-foo", .function = "spi0", .group = "spi0_0_grp", - .dev_name = "foo-spi.0", }, { + .dev_name = "foo-spi.0", .name = "spi0-pos-B", + .type = PIN_MAP_TYPE_MUX_GROUP, .ctrl_dev_name = "pinctrl-foo", .function = "spi0", .group = "spi0_1_grp", - .dev_name = "foo-spi.0", }, ... This example mapping is used to switch between two positions for spi0 at runtime, as described further below under the heading "Runtime pinmuxing". -Further it is possible to match several groups of pins to the same function -for a single device, say for example in the mmc0 example above, where you can +Further it is possible for one named state to affect the muxing of several +groups of pins, say for example in the mmc0 example above, where you can additively expand the mmc0 bus from 2 to 4 to 8 pins. If we want to use all three groups for a total of 2+2+4 = 8 pins (for an 8-bit MMC bus as is the case), we define a mapping like this: ... { + .dev_name = "foo-mmc.0", .name = "2bit" + .type = PIN_MAP_TYPE_MUX_GROUP, .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", .group = "mmc0_1_grp", - .dev_name = "foo-mmc.0", }, { + .dev_name = "foo-mmc.0", .name = "4bit" + .type = PIN_MAP_TYPE_MUX_GROUP, .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", .group = "mmc0_1_grp", - .dev_name = "foo-mmc.0", }, { + .dev_name = "foo-mmc.0", .name = "4bit" + .type = PIN_MAP_TYPE_MUX_GROUP, .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", .group = "mmc0_2_grp", - .dev_name = "foo-mmc.0", }, { + .dev_name = "foo-mmc.0", .name = "8bit" + .type = PIN_MAP_TYPE_MUX_GROUP, .ctrl_dev_name = "pinctrl-foo", + .function = "mmc0", .group = "mmc0_1_grp", - .dev_name = "foo-mmc.0", }, { + .dev_name = "foo-mmc.0", .name = "8bit" + .type = PIN_MAP_TYPE_MUX_GROUP, .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", .group = "mmc0_2_grp", - .dev_name = "foo-mmc.0", }, { + .dev_name = "foo-mmc.0", .name = "8bit" + .type = PIN_MAP_TYPE_MUX_GROUP, .ctrl_dev_name = "pinctrl-foo", .function = "mmc0", .group = "mmc0_3_grp", - .dev_name = "foo-mmc.0", }, ... The result of grabbing this mapping from the device with something like this (see next paragraph): - pmx = pinmux_get(&device, "8bit"); + p = pinctrl_get(dev); + s = pinctrl_lookup_state(p, "8bit"); + ret = pinctrl_select_state(p, s); + +or more simply: + + p = pinctrl_get_select(dev, "8bit"); Will be that you activate all the three bottom records in the mapping at -once. Since they share the same name, pin controller device, funcion and +once. Since they share the same name, pin controller device, function and device, and since we allow multiple groups to match to a single device, they all get selected, and they all get enabled and disable simultaneously by the pinmux core. @@ -914,97 +970,111 @@ pinmux core. Pinmux requests from drivers ============================ -Generally it is discouraged to let individual drivers get and enable pinmuxes. -So if possible, handle the pinmuxes in platform code or some other place where -you have access to all the affected struct device * pointers. In some cases -where a driver needs to switch between different mux mappings at runtime -this is not possible. +Generally it is discouraged to let individual drivers get and enable pin +control. So if possible, handle the pin control in platform code or some other +place where you have access to all the affected struct device * pointers. In +some cases where a driver needs to e.g. switch between different mux mappings +at runtime this is not possible. -A driver may request a certain mux to be activated, usually just the default -mux like this: +A driver may request a certain control state to be activated, usually just the +default state like this: -#include <linux/pinctrl/pinmux.h> +#include <linux/pinctrl/consumer.h> struct foo_state { - struct pinmux *pmx; + struct pinctrl *p; + struct pinctrl_state *s; ... }; foo_probe() { - /* Allocate a state holder named "state" etc */ - struct pinmux pmx; + /* Allocate a state holder named "foo" etc */ + struct foo_state *foo = ...; + + foo->p = pinctrl_get(&device); + if (IS_ERR(foo->p)) { + /* FIXME: clean up "foo" here */ + return PTR_ERR(foo->p); + } - pmx = pinmux_get(&device, NULL); - if IS_ERR(pmx) - return PTR_ERR(pmx); - pinmux_enable(pmx); + foo->s = pinctrl_lookup_state(foo->p, PINCTRL_STATE_DEFAULT); + if (IS_ERR(foo->s)) { + pinctrl_put(foo->p); + /* FIXME: clean up "foo" here */ + return PTR_ERR(s); + } - state->pmx = pmx; + ret = pinctrl_select_state(foo->s); + if (ret < 0) { + pinctrl_put(foo->p); + /* FIXME: clean up "foo" here */ + return ret; + } } foo_remove() { - pinmux_disable(state->pmx); - pinmux_put(state->pmx); + pinctrl_put(state->p); } -If you want to grab a specific mux mapping and not just the first one found for -this device you can specify a specific mapping name, for example in the above -example the second i2c0 setting: pinmux_get(&device, "spi0-pos-B"); - -This get/enable/disable/put sequence can just as well be handled by bus drivers +This get/lookup/select/put sequence can just as well be handled by bus drivers if you don't want each and every driver to handle it and you know the arrangement on your bus. -The semantics of the get/enable respective disable/put is as follows: +The semantics of the pinctrl APIs are: + +- pinctrl_get() is called in process context to obtain a handle to all pinctrl + information for a given client device. It will allocate a struct from the + kernel memory to hold the pinmux state. All mapping table parsing or similar + slow operations take place within this API. -- pinmux_get() is called in process context to reserve the pins affected with - a certain mapping and set up the pinmux core and the driver. It will allocate - a struct from the kernel memory to hold the pinmux state. +- pinctrl_lookup_state() is called in process context to obtain a handle to a + specific state for a the client device. This operation may be slow too. -- pinmux_enable()/pinmux_disable() is quick and can be called from fastpath - (irq context) when you quickly want to set up/tear down the hardware muxing - when running a device driver. Usually it will just poke some values into a - register. +- pinctrl_select_state() programs pin controller hardware according to the + definition of the state as given by the mapping table. In theory this is a + fast-path operation, since it only involved blasting some register settings + into hardware. However, note that some pin controllers may have their + registers on a slow/IRQ-based bus, so client devices should not assume they + can call pinctrl_select_state() from non-blocking contexts. -- pinmux_disable() is called in process context to tear down the pin requests - and release the state holder struct for the mux setting. +- pinctrl_put() frees all information associated with a pinctrl handle. -Usually the pinmux core handled the get/put pair and call out to the device -drivers bookkeeping operations, like checking available functions and the -associated pins, whereas the enable/disable pass on to the pin controller +Usually the pin control core handled the get/put pair and call out to the +device drivers bookkeeping operations, like checking available functions and +the associated pins, whereas the enable/disable pass on to the pin controller driver which takes care of activating and/or deactivating the mux setting by quickly poking some registers. -The pins are allocated for your device when you issue the pinmux_get() call, +The pins are allocated for your device when you issue the pinctrl_get() call, after this you should be able to see this in the debugfs listing of all pins. -System pinmux hogging -===================== +System pin control hogging +========================== -A system pinmux map entry, i.e. a pinmux setting that does not have a device -associated with it, can be hogged by the core when the pin controller is -registered. This means that the core will attempt to call pinmux_get() and -pinmux_enable() on it immediately after the pin control device has been -registered. +Pin control map entries can be hogged by the core when the pin controller +is registered. This means that the core will attempt to call pinctrl_get(), +lookup_state() and select_state() on it immediately after the pin control +device has been registered. -This is enabled by simply setting the .hog_on_boot field in the map to true, -like this: +This occurs for mapping table entries where the client device name is equal +to the pin controller device name, and the state name is PINCTRL_STATE_DEFAULT. { - .name = "POWERMAP" + .dev_name = "pinctrl-foo", + .name = PINCTRL_STATE_DEFAULT, + .type = PIN_MAP_TYPE_MUX_GROUP, .ctrl_dev_name = "pinctrl-foo", .function = "power_func", - .hog_on_boot = true, }, Since it may be common to request the core to hog a few always-applicable mux settings on the primary pin controller, there is a convenience macro for this: -PINMUX_MAP_PRIMARY_SYS_HOG("POWERMAP", "power_func") +PIN_MAP_MUX_GROUP_HOG_DEFAULT("pinctrl-foo", NULL /* group */, "power_func") This gives the exact same result as the above construction. @@ -1016,32 +1086,47 @@ It is possible to mux a certain function in and out at runtime, say to move an SPI port from one set of pins to another set of pins. Say for example for spi0 in the example above, we expose two different groups of pins for the same function, but with different named in the mapping as described under -"Advanced mapping" above. So we have two mappings named "spi0-pos-A" and -"spi0-pos-B". +"Advanced mapping" above. So that for an SPI device, we have two states named +"pos-A" and "pos-B". This snippet first muxes the function in the pins defined by group A, enables it, disables and releases it, and muxes it in on the pins defined by group B: +#include <linux/pinctrl/consumer.h> + foo_switch() { - struct pinmux *pmx; + struct pinctrl *p; + struct pinctrl_state *s1, *s2; + + /* Setup */ + p = pinctrl_get(&device); + if (IS_ERR(p)) + ... + + s1 = pinctrl_lookup_state(foo->p, "pos-A"); + if (IS_ERR(s1)) + ... + + s2 = pinctrl_lookup_state(foo->p, "pos-B"); + if (IS_ERR(s2)) + ... /* Enable on position A */ - pmx = pinmux_get(&device, "spi0-pos-A"); - if IS_ERR(pmx) - return PTR_ERR(pmx); - pinmux_enable(pmx); + ret = pinctrl_select_state(s1); + if (ret < 0) + ... - /* This releases the pins again */ - pinmux_disable(pmx); - pinmux_put(pmx); + ... /* Enable on position B */ - pmx = pinmux_get(&device, "spi0-pos-B"); - if IS_ERR(pmx) - return PTR_ERR(pmx); - pinmux_enable(pmx); + ret = pinctrl_select_state(s2); + if (ret < 0) + ... + ... + + pinctrl_put(p); } The above has to be done from process context. diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt index 20af7def23c8..872815cd41d3 100644 --- a/Documentation/power/devices.txt +++ b/Documentation/power/devices.txt @@ -96,6 +96,12 @@ struct dev_pm_ops { int (*thaw)(struct device *dev); int (*poweroff)(struct device *dev); int (*restore)(struct device *dev); + int (*suspend_late)(struct device *dev); + int (*resume_early)(struct device *dev); + int (*freeze_late)(struct device *dev); + int (*thaw_early)(struct device *dev); + int (*poweroff_late)(struct device *dev); + int (*restore_early)(struct device *dev); int (*suspend_noirq)(struct device *dev); int (*resume_noirq)(struct device *dev); int (*freeze_noirq)(struct device *dev); @@ -305,7 +311,7 @@ Entering System Suspend ----------------------- When the system goes into the standby or memory sleep state, the phases are: - prepare, suspend, suspend_noirq. + prepare, suspend, suspend_late, suspend_noirq. 1. The prepare phase is meant to prevent races by preventing new devices from being registered; the PM core would never know that all the @@ -324,7 +330,12 @@ When the system goes into the standby or memory sleep state, the phases are: appropriate low-power state, depending on the bus type the device is on, and they may enable wakeup events. - 3. The suspend_noirq phase occurs after IRQ handlers have been disabled, + 3 For a number of devices it is convenient to split suspend into the + "quiesce device" and "save device state" phases, in which cases + suspend_late is meant to do the latter. It is always executed after + runtime power management has been disabled for all devices. + + 4. The suspend_noirq phase occurs after IRQ handlers have been disabled, which means that the driver's interrupt handler will not be called while the callback method is running. The methods should save the values of the device's registers that weren't saved previously and finally put the @@ -359,7 +370,7 @@ Leaving System Suspend ---------------------- When resuming from standby or memory sleep, the phases are: - resume_noirq, resume, complete. + resume_noirq, resume_early, resume, complete. 1. The resume_noirq callback methods should perform any actions needed before the driver's interrupt handlers are invoked. This generally @@ -375,14 +386,18 @@ When resuming from standby or memory sleep, the phases are: device driver's ->pm.resume_noirq() method to perform device-specific actions. - 2. The resume methods should bring the the device back to its operating + 2. The resume_early methods should prepare devices for the execution of + the resume methods. This generally involves undoing the actions of the + preceding suspend_late phase. + + 3 The resume methods should bring the the device back to its operating state, so that it can perform normal I/O. This generally involves undoing the actions of the suspend phase. - 3. The complete phase uses only a bus callback. The method should undo the - actions of the prepare phase. Note, however, that new children may be - registered below the device as soon as the resume callbacks occur; it's - not necessary to wait until the complete phase. + 4. The complete phase should undo the actions of the prepare phase. Note, + however, that new children may be registered below the device as soon as + the resume callbacks occur; it's not necessary to wait until the + complete phase. At the end of these phases, drivers should be as functional as they were before suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are @@ -429,8 +444,8 @@ an image of the system memory while everything is stable, reactivate all devices (thaw), write the image to permanent storage, and finally shut down the system (poweroff). The phases used to accomplish this are: - prepare, freeze, freeze_noirq, thaw_noirq, thaw, complete, - prepare, poweroff, poweroff_noirq + prepare, freeze, freeze_late, freeze_noirq, thaw_noirq, thaw_early, + thaw, complete, prepare, poweroff, poweroff_late, poweroff_noirq 1. The prepare phase is discussed in the "Entering System Suspend" section above. @@ -441,7 +456,11 @@ system (poweroff). The phases used to accomplish this are: save time it's best not to do so. Also, the device should not be prepared to generate wakeup events. - 3. The freeze_noirq phase is analogous to the suspend_noirq phase discussed + 3. The freeze_late phase is analogous to the suspend_late phase described + above, except that the device should not be put in a low-power state and + should not be allowed to generate wakeup events by it. + + 4. The freeze_noirq phase is analogous to the suspend_noirq phase discussed above, except again that the device should not be put in a low-power state and should not be allowed to generate wakeup events. @@ -449,15 +468,19 @@ At this point the system image is created. All devices should be inactive and the contents of memory should remain undisturbed while this happens, so that the image forms an atomic snapshot of the system state. - 4. The thaw_noirq phase is analogous to the resume_noirq phase discussed + 5. The thaw_noirq phase is analogous to the resume_noirq phase discussed above. The main difference is that its methods can assume the device is in the same state as at the end of the freeze_noirq phase. - 5. The thaw phase is analogous to the resume phase discussed above. Its + 6. The thaw_early phase is analogous to the resume_early phase described + above. Its methods should undo the actions of the preceding + freeze_late, if necessary. + + 7. The thaw phase is analogous to the resume phase discussed above. Its methods should bring the device back to an operating state, so that it can be used for saving the image if necessary. - 6. The complete phase is discussed in the "Leaving System Suspend" section + 8. The complete phase is discussed in the "Leaving System Suspend" section above. At this point the system image is saved, and the devices then need to be @@ -465,16 +488,19 @@ prepared for the upcoming system shutdown. This is much like suspending them before putting the system into the standby or memory sleep state, and the phases are similar. - 7. The prepare phase is discussed above. + 9. The prepare phase is discussed above. + + 10. The poweroff phase is analogous to the suspend phase. - 8. The poweroff phase is analogous to the suspend phase. + 11. The poweroff_late phase is analogous to the suspend_late phase. - 9. The poweroff_noirq phase is analogous to the suspend_noirq phase. + 12. The poweroff_noirq phase is analogous to the suspend_noirq phase. -The poweroff and poweroff_noirq callbacks should do essentially the same things -as the suspend and suspend_noirq callbacks. The only notable difference is that -they need not store the device register values, because the registers should -already have been stored during the freeze or freeze_noirq phases. +The poweroff, poweroff_late and poweroff_noirq callbacks should do essentially +the same things as the suspend, suspend_late and suspend_noirq callbacks, +respectively. The only notable difference is that they need not store the +device register values, because the registers should already have been stored +during the freeze, freeze_late or freeze_noirq phases. Leaving Hibernation @@ -518,22 +544,25 @@ To achieve this, the image kernel must restore the devices' pre-hibernation functionality. The operation is much like waking up from the memory sleep state, although it involves different phases: - restore_noirq, restore, complete + restore_noirq, restore_early, restore, complete 1. The restore_noirq phase is analogous to the resume_noirq phase. - 2. The restore phase is analogous to the resume phase. + 2. The restore_early phase is analogous to the resume_early phase. + + 3. The restore phase is analogous to the resume phase. - 3. The complete phase is discussed above. + 4. The complete phase is discussed above. -The main difference from resume[_noirq] is that restore[_noirq] must assume the -device has been accessed and reconfigured by the boot loader or the boot kernel. -Consequently the state of the device may be different from the state remembered -from the freeze and freeze_noirq phases. The device may even need to be reset -and completely re-initialized. In many cases this difference doesn't matter, so -the resume[_noirq] and restore[_norq] method pointers can be set to the same -routines. Nevertheless, different callback pointers are used in case there is a -situation where it actually matters. +The main difference from resume[_early|_noirq] is that restore[_early|_noirq] +must assume the device has been accessed and reconfigured by the boot loader or +the boot kernel. Consequently the state of the device may be different from the +state remembered from the freeze, freeze_late and freeze_noirq phases. The +device may even need to be reset and completely re-initialized. In many cases +this difference doesn't matter, so the resume[_early|_noirq] and +restore[_early|_norq] method pointers can be set to the same routines. +Nevertheless, different callback pointers are used in case there is a situation +where it actually does matter. Device Power Management Domains diff --git a/Documentation/power/freezing-of-tasks.txt b/Documentation/power/freezing-of-tasks.txt index ebd7490ef1df..6ec291ea1c78 100644 --- a/Documentation/power/freezing-of-tasks.txt +++ b/Documentation/power/freezing-of-tasks.txt @@ -9,7 +9,7 @@ architectures). II. How does it work? -There are four per-task flags used for that, PF_NOFREEZE, PF_FROZEN, TIF_FREEZE +There are three per-task flags used for that, PF_NOFREEZE, PF_FROZEN and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have PF_NOFREEZE unset (all user space processes and some kernel threads) are regarded as 'freezable' and treated in a special way before the system enters a @@ -17,30 +17,31 @@ suspend state as well as before a hibernation image is created (in what follows we only consider hibernation, but the description also applies to suspend). Namely, as the first step of the hibernation procedure the function -freeze_processes() (defined in kernel/power/process.c) is called. It executes -try_to_freeze_tasks() that sets TIF_FREEZE for all of the freezable tasks and -either wakes them up, if they are kernel threads, or sends fake signals to them, -if they are user space processes. A task that has TIF_FREEZE set, should react -to it by calling the function called __refrigerator() (defined in -kernel/freezer.c), which sets the task's PF_FROZEN flag, changes its state -to TASK_UNINTERRUPTIBLE and makes it loop until PF_FROZEN is cleared for it. -Then, we say that the task is 'frozen' and therefore the set of functions -handling this mechanism is referred to as 'the freezer' (these functions are -defined in kernel/power/process.c, kernel/freezer.c & include/linux/freezer.h). -User space processes are generally frozen before kernel threads. +freeze_processes() (defined in kernel/power/process.c) is called. A system-wide +variable system_freezing_cnt (as opposed to a per-task flag) is used to indicate +whether the system is to undergo a freezing operation. And freeze_processes() +sets this variable. After this, it executes try_to_freeze_tasks() that sends a +fake signal to all user space processes, and wakes up all the kernel threads. +All freezable tasks must react to that by calling try_to_freeze(), which +results in a call to __refrigerator() (defined in kernel/freezer.c), which sets +the task's PF_FROZEN flag, changes its state to TASK_UNINTERRUPTIBLE and makes +it loop until PF_FROZEN is cleared for it. Then, we say that the task is +'frozen' and therefore the set of functions handling this mechanism is referred +to as 'the freezer' (these functions are defined in kernel/power/process.c, +kernel/freezer.c & include/linux/freezer.h). User space processes are generally +frozen before kernel threads. __refrigerator() must not be called directly. Instead, use the try_to_freeze() function (defined in include/linux/freezer.h), that checks -the task's TIF_FREEZE flag and makes the task enter __refrigerator() if the -flag is set. +if the task is to be frozen and makes the task enter __refrigerator(). For user space processes try_to_freeze() is called automatically from the signal-handling code, but the freezable kernel threads need to call it explicitly in suitable places or use the wait_event_freezable() or wait_event_freezable_timeout() macros (defined in include/linux/freezer.h) -that combine interruptible sleep with checking if TIF_FREEZE is set and calling -try_to_freeze(). The main loop of a freezable kernel thread may look like the -following one: +that combine interruptible sleep with checking if the task is to be frozen and +calling try_to_freeze(). The main loop of a freezable kernel thread may look +like the following one: set_freezable(); do { @@ -53,7 +54,7 @@ following one: (from drivers/usb/core/hub.c::hub_thread()). If a freezable kernel thread fails to call try_to_freeze() after the freezer has -set TIF_FREEZE for it, the freezing of tasks will fail and the entire +initiated a freezing operation, the freezing of tasks will fail and the entire hibernation operation will be cancelled. For this reason, freezable kernel threads must call try_to_freeze() somewhere or use one of the wait_event_freezable() and wait_event_freezable_timeout() macros. @@ -63,6 +64,27 @@ devices have been reinitialized, the function thaw_processes() is called in order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that have been frozen leave __refrigerator() and continue running. + +Rationale behind the functions dealing with freezing and thawing of tasks: +------------------------------------------------------------------------- + +freeze_processes(): + - freezes only userspace tasks + +freeze_kernel_threads(): + - freezes all tasks (including kernel threads) because we can't freeze + kernel threads without freezing userspace tasks + +thaw_kernel_threads(): + - thaws only kernel threads; this is particularly useful if we need to do + anything special in between thawing of kernel threads and thawing of + userspace tasks, or if we want to postpone the thawing of userspace tasks + +thaw_processes(): + - thaws all tasks (including kernel threads) because we can't thaw userspace + tasks without thawing kernel threads + + III. Which kernel threads are freezable? Kernel threads are not freezable by default. However, a kernel thread may clear diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt new file mode 100644 index 000000000000..3007bc98af28 --- /dev/null +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -0,0 +1,270 @@ + + Firmware-Assisted Dump + ------------------------ + July 2011 + +The goal of firmware-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +- Firmware assisted dump (fadump) infrastructure is intended to replace + the existing phyp assisted dump. +- Fadump uses the same firmware interfaces and memory reservation model + as phyp assisted dump. +- Unlike phyp dump, fadump exports the memory dump through /proc/vmcore + in the ELF format in the same way as kdump. This helps us reuse the + kdump infrastructure for dump capture and filtering. +- Unlike phyp dump, userspace tool does not need to refer any sysfs + interface while reading /proc/vmcore. +- Unlike phyp dump, fadump allows user to release all the memory reserved + for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem. +- Once enabled through kernel boot parameter, fadump can be + started/stopped through /sys/kernel/fadump_registered interface (see + sysfs files section below) and can be easily integrated with kdump + service start/stop init scripts. + +Comparing with kdump or other strategies, firmware-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- Once the dump is copied out, the memory that held the dump + is immediately available to the running kernel. And therefore, + unlike kdump, fadump doesn't need a 2nd reboot to get back + the system to the production configuration. + +The above can only be accomplished by coordination with, +and assistance from the Power firmware. The procedure is +as follows: + +-- The first kernel registers the sections of memory with the + Power firmware for dump preservation during OS initialization. + These registered sections of memory are reserved by the first + kernel during early boot. + +-- When a system crashes, the Power firmware will save + the low memory (boot memory of size larger of 5% of system RAM + or 256MB) of RAM to the previous registered region. It will + also save system registers, and hardware PTE's. + + NOTE: The term 'boot memory' means size of the low memory chunk + that is required for a kernel to boot successfully when + booted with restricted memory. By default, the boot memory + size will be the larger of 5% of system RAM or 256MB. + Alternatively, user can also specify boot memory size + through boot parameter 'fadump_reserve_mem=' which will + override the default calculated size. Use this option + if default boot memory size is not sufficient for second + kernel to boot successfully. + +-- After the low memory (boot memory) area has been saved, the + firmware will reset PCI and other hardware state. It will + *not* clear the RAM. It will then launch the bootloader, as + normal. + +-- The freshly booted kernel will notice that there is a new + node (ibm,dump-kernel) in the device tree, indicating that + there is crash data available from a previous boot. During + the early boot OS will reserve rest of the memory above + boot memory size effectively booting with restricted memory + size. This will make sure that the second kernel will not + touch any of the dump memory area. + +-- User-space tools will read /proc/vmcore to obtain the contents + of memory, which holds the previous crashed kernel dump in ELF + format. The userspace tools may copy this info to disk, or + network, nas, san, iscsi, etc. as desired. + +-- Once the userspace tool is done saving dump, it will echo + '1' to /sys/kernel/fadump_release_mem to release the reserved + memory back to general use, except the memory required for + next firmware-assisted dump registration. + + e.g. + # echo 1 > /sys/kernel/fadump_release_mem + +Please note that the firmware-assisted dump feature +is only available on Power6 and above systems with recent +firmware versions. + +Implementation details: +---------------------- + +During boot, a check is made to see if firmware supports +this feature on that particular machine. If it does, then +we check to see if an active dump is waiting for us. If yes +then everything but boot memory size of RAM is reserved during +early boot (See Fig. 2). This area is released once we finish +collecting the dump from user land scripts (e.g. kdump scripts) +that are run. If there is dump data, then the +/sys/kernel/fadump_release_mem file is created, and the reserved +memory is held. + +If there is no waiting dump data, then only the memory required +to hold CPU state, HPTE region, boot memory dump and elfcore +header, is reserved at the top of memory (see Fig. 1). This area +is *not* released: this region will be kept permanently reserved, +so that it can act as a receptacle for a copy of the boot memory +content in addition to CPU state and HPTE region, in the case a +crash does occur. + + o Memory Reservation during first kernel + + Low memory Top of memory + 0 boot memory size | + | | |<--Reserved dump area -->| + V V | Permanent Reservation V + +-----------+----------/ /----------+---+----+-----------+----+ + | | |CPU|HPTE| DUMP |ELF | + +-----------+----------/ /----------+---+----+-----------+----+ + | ^ + | | + \ / + ------------------------------------------- + Boot memory content gets transferred to + reserved area by firmware at the time of + crash + Fig. 1 + + o Memory Reservation during second kernel after crash + + Low memory Top of memory + 0 boot memory size | + | |<------------- Reserved dump area ----------- -->| + V V V + +-----------+----------/ /----------+---+----+-----------+----+ + | | |CPU|HPTE| DUMP |ELF | + +-----------+----------/ /----------+---+----+-----------+----+ + | | + V V + Used by second /proc/vmcore + kernel to boot + Fig. 2 + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The dump data available through +/proc/vmcore will be in ELF format. Hence the existing kdump +infrastructure (kdump scripts) to save the dump works fine with +minor modifications. + +The tools to examine the dump will be same as the ones +used for kdump. + +How to enable firmware-assisted dump (fadump): +------------------------------------- + +1. Set config option CONFIG_FA_DUMP=y and build kernel. +2. Boot into linux kernel with 'fadump=on' kernel cmdline option. +3. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline + to specify size of the memory to reserve for boot memory dump + preservation. + +NOTE: If firmware-assisted dump fails to reserve memory then it will + fallback to existing kdump mechanism if 'crashkernel=' option + is set at kernel cmdline. + +Sysfs/debugfs files: +------------ + +Firmware-assisted dump feature uses sysfs file system to hold +the control files and debugfs file to display memory reserved region. + +Here is the list of files under kernel sysfs: + + /sys/kernel/fadump_enabled + + This is used to display the fadump status. + 0 = fadump is disabled + 1 = fadump is enabled + + This interface can be used by kdump init scripts to identify if + fadump is enabled in the kernel and act accordingly. + + /sys/kernel/fadump_registered + + This is used to display the fadump registration status as well + as to control (start/stop) the fadump registration. + 0 = fadump is not registered. + 1 = fadump is registered and ready to handle system crash. + + To register fadump echo 1 > /sys/kernel/fadump_registered and + echo 0 > /sys/kernel/fadump_registered for un-register and stop the + fadump. Once the fadump is un-registered, the system crash will not + be handled and vmcore will not be captured. This interface can be + easily integrated with kdump service start/stop. + + /sys/kernel/fadump_release_mem + + This file is available only when fadump is active during + second kernel. This is used to release the reserved memory + region that are held for saving crash dump. To release the + reserved memory echo 1 to it: + + echo 1 > /sys/kernel/fadump_release_mem + + After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region + file will change to reflect the new memory reservations. + + The existing userspace tools (kdump infrastructure) can be easily + enhanced to use this interface to release the memory reserved for + dump and continue without 2nd reboot. + +Here is the list of files under powerpc debugfs: +(Assuming debugfs is mounted on /sys/kernel/debug directory.) + + /sys/kernel/debug/powerpc/fadump_region + + This file shows the reserved memory regions if fadump is + enabled otherwise this file is empty. The output format + is: + <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size> + + e.g. + Contents when fadump is registered during first kernel + + # cat /sys/kernel/debug/powerpc/fadump_region + CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0 + HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0 + DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0 + + Contents when fadump is active during second kernel + + # cat /sys/kernel/debug/powerpc/fadump_region + CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020 + HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000 + DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000 + : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000 + +NOTE: Please refer to Documentation/filesystems/debugfs.txt on + how to mount the debugfs filesystem. + + +TODO: +----- + o Need to come up with the better approach to find out more + accurate boot memory size that is required for a kernel to + boot successfully when booted with restricted memory. + o The fadump implementation introduces a fadump crash info structure + in the scratch area before the ELF core header. The idea of introducing + this structure is to pass some important crash info data to the second + kernel which will help second kernel to populate ELF core header with + correct data before it gets exported through /proc/vmcore. The current + design implementation does not address a possibility of introducing + additional fields (in future) to this structure without affecting + compatibility. Need to come up with the better approach to address this. + The possible approaches are: + 1. Introduce version field for version tracking, bump up the version + whenever a new field is added to the structure in future. The version + field can be used to find out what fields are valid for the current + version of the structure. + 2. Reserve the area of predefined size (say PAGE_SIZE) for this + structure and have unused area as reserved (initialized to zero) + for future field additions. + The advantage of approach 1 over 2 is we don't need to reserve extra space. +--- +Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> +This document is based on the original documentation written for phyp +assisted dump by Linas Vepstas and Manish Ahuja. diff --git a/Documentation/powerpc/mpc52xx.txt b/Documentation/powerpc/mpc52xx.txt index 10dd4ab93b85..0d540a31ea1a 100644 --- a/Documentation/powerpc/mpc52xx.txt +++ b/Documentation/powerpc/mpc52xx.txt @@ -2,7 +2,7 @@ Linux 2.6.x on MPC52xx family ----------------------------- For the latest info, go to http://www.246tNt.com/mpc52xx/ - + To compile/use : - U-Boot: @@ -10,23 +10,23 @@ To compile/use : if you wish to ). # make lite5200_defconfig # make uImage - + then, on U-boot: => tftpboot 200000 uImage => tftpboot 400000 pRamdisk => bootm 200000 400000 - + - DBug: # <edit Makefile to set ARCH=ppc & CROSS_COMPILE=... ( also EXTRAVERSION if you wish to ). # make lite5200_defconfig # cp your_initrd.gz arch/ppc/boot/images/ramdisk.image.gz - # make zImage.initrd - # make + # make zImage.initrd + # make then in DBug: DBug> dn -i zImage.initrd.lite5200 - + Some remarks : - The port is named mpc52xxx, and config options are PPC_MPC52xx. The MGT5100 diff --git a/Documentation/powerpc/phyp-assisted-dump.txt b/Documentation/powerpc/phyp-assisted-dump.txt deleted file mode 100644 index ad340205d96a..000000000000 --- a/Documentation/powerpc/phyp-assisted-dump.txt +++ /dev/null @@ -1,127 +0,0 @@ - - Hypervisor-Assisted Dump - ------------------------ - November 2007 - -The goal of hypervisor-assisted dump is to enable the dump of -a crashed system, and to do so from a fully-reset system, and -to minimize the total elapsed time until the system is back -in production use. - -As compared to kdump or other strategies, hypervisor-assisted -dump offers several strong, practical advantages: - --- Unlike kdump, the system has been reset, and loaded - with a fresh copy of the kernel. In particular, - PCI and I/O devices have been reinitialized and are - in a clean, consistent state. --- As the dump is performed, the dumped memory becomes - immediately available to the system for normal use. --- After the dump is completed, no further reboots are - required; the system will be fully usable, and running - in its normal, production mode on its normal kernel. - -The above can only be accomplished by coordination with, -and assistance from the hypervisor. The procedure is -as follows: - --- When a system crashes, the hypervisor will save - the low 256MB of RAM to a previously registered - save region. It will also save system state, system - registers, and hardware PTE's. - --- After the low 256MB area has been saved, the - hypervisor will reset PCI and other hardware state. - It will *not* clear RAM. It will then launch the - bootloader, as normal. - --- The freshly booted kernel will notice that there - is a new node (ibm,dump-kernel) in the device tree, - indicating that there is crash data available from - a previous boot. It will boot into only 256MB of RAM, - reserving the rest of system memory. - --- Userspace tools will parse /sys/kernel/release_region - and read /proc/vmcore to obtain the contents of memory, - which holds the previous crashed kernel. The userspace - tools may copy this info to disk, or network, nas, san, - iscsi, etc. as desired. - - For Example: the values in /sys/kernel/release-region - would look something like this (address-range pairs). - CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / - DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A - --- As the userspace tools complete saving a portion of - dump, they echo an offset and size to - /sys/kernel/release_region to release the reserved - memory back to general use. - - An example of this is: - "echo 0x40000000 0x10000000 > /sys/kernel/release_region" - which will release 256MB at the 1GB boundary. - -Please note that the hypervisor-assisted dump feature -is only available on Power6-based systems with recent -firmware versions. - -Implementation details: ----------------------- - -During boot, a check is made to see if firmware supports -this feature on this particular machine. If it does, then -we check to see if a active dump is waiting for us. If yes -then everything but 256 MB of RAM is reserved during early -boot. This area is released once we collect a dump from user -land scripts that are run. If there is dump data, then -the /sys/kernel/release_region file is created, and -the reserved memory is held. - -If there is no waiting dump data, then only the highest -256MB of the ram is reserved as a scratch area. This area -is *not* released: this region will be kept permanently -reserved, so that it can act as a receptacle for a copy -of the low 256MB in the case a crash does occur. See, -however, "open issues" below, as to whether -such a reserved region is really needed. - -Currently the dump will be copied from /proc/vmcore to a -a new file upon user intervention. The starting address -to be read and the range for each data point in provided -in /sys/kernel/release_region. - -The tools to examine the dump will be same as the ones -used for kdump. - -General notes: --------------- -Security: please note that there are potential security issues -with any sort of dump mechanism. In particular, plaintext -(unencrypted) data, and possibly passwords, may be present in -the dump data. Userspace tools must take adequate precautions to -preserve security. - -Open issues/ToDo: ------------- - o The various code paths that tell the hypervisor that a crash - occurred, vs. it simply being a normal reboot, should be - reviewed, and possibly clarified/fixed. - - o Instead of using /sys/kernel, should there be a /sys/dump - instead? There is a dump_subsys being created by the s390 code, - perhaps the pseries code should use a similar layout as well. - - o Is reserving a 256MB region really required? The goal of - reserving a 256MB scratch area is to make sure that no - important crash data is clobbered when the hypervisor - save low mem to the scratch area. But, if one could assure - that nothing important is located in some 256MB area, then - it would not need to be reserved. Something that can be - improved in subsequent versions. - - o Still working the kdump team to integrate this with kdump, - some work remains but this would not affect the current - patches. - - o Still need to write a shell script, to copy the dump away. - Currently I am parsing it manually. diff --git a/Documentation/remoteproc.txt b/Documentation/remoteproc.txt new file mode 100644 index 000000000000..70a048cd3fa3 --- /dev/null +++ b/Documentation/remoteproc.txt @@ -0,0 +1,322 @@ +Remote Processor Framework + +1. Introduction + +Modern SoCs typically have heterogeneous remote processor devices in asymmetric +multiprocessing (AMP) configurations, which may be running different instances +of operating system, whether it's Linux or any other flavor of real-time OS. + +OMAP4, for example, has dual Cortex-A9, dual Cortex-M3 and a C64x+ DSP. +In a typical configuration, the dual cortex-A9 is running Linux in a SMP +configuration, and each of the other three cores (two M3 cores and a DSP) +is running its own instance of RTOS in an AMP configuration. + +The remoteproc framework allows different platforms/architectures to +control (power on, load firmware, power off) those remote processors while +abstracting the hardware differences, so the entire driver doesn't need to be +duplicated. In addition, this framework also adds rpmsg virtio devices +for remote processors that supports this kind of communication. This way, +platform-specific remoteproc drivers only need to provide a few low-level +handlers, and then all rpmsg drivers will then just work +(for more information about the virtio-based rpmsg bus and its drivers, +please read Documentation/rpmsg.txt). +Registration of other types of virtio devices is now also possible. Firmwares +just need to publish what kind of virtio devices do they support, and then +remoteproc will add those devices. This makes it possible to reuse the +existing virtio drivers with remote processor backends at a minimal development +cost. + +2. User API + + int rproc_boot(struct rproc *rproc) + - Boot a remote processor (i.e. load its firmware, power it on, ...). + If the remote processor is already powered on, this function immediately + returns (successfully). + Returns 0 on success, and an appropriate error value otherwise. + Note: to use this function you should already have a valid rproc + handle. There are several ways to achieve that cleanly (devres, pdata, + the way remoteproc_rpmsg.c does this, or, if this becomes prevalent, we + might also consider using dev_archdata for this). See also + rproc_get_by_name() below. + + void rproc_shutdown(struct rproc *rproc) + - Power off a remote processor (previously booted with rproc_boot()). + In case @rproc is still being used by an additional user(s), then + this function will just decrement the power refcount and exit, + without really powering off the device. + Every call to rproc_boot() must (eventually) be accompanied by a call + to rproc_shutdown(). Calling rproc_shutdown() redundantly is a bug. + Notes: + - we're not decrementing the rproc's refcount, only the power refcount. + which means that the @rproc handle stays valid even after + rproc_shutdown() returns, and users can still use it with a subsequent + rproc_boot(), if needed. + - don't call rproc_shutdown() to unroll rproc_get_by_name(), exactly + because rproc_shutdown() _does not_ decrement the refcount of @rproc. + To decrement the refcount of @rproc, use rproc_put() (but _only_ if + you acquired @rproc using rproc_get_by_name()). + + struct rproc *rproc_get_by_name(const char *name) + - Find an rproc handle using the remote processor's name, and then + boot it. If it's already powered on, then just immediately return + (successfully). Returns the rproc handle on success, and NULL on failure. + This function increments the remote processor's refcount, so always + use rproc_put() to decrement it back once rproc isn't needed anymore. + Note: currently rproc_get_by_name() and rproc_put() are not used anymore + by the rpmsg bus and its drivers. We need to scrutinize the use cases + that still need them, and see if we can migrate them to use the non + name-based boot/shutdown interface. + + void rproc_put(struct rproc *rproc) + - Decrement @rproc's power refcount and shut it down if it reaches zero + (essentially by just calling rproc_shutdown), and then decrement @rproc's + validity refcount too. + After this function returns, @rproc may _not_ be used anymore, and its + handle should be considered invalid. + This function should be called _iff_ the @rproc handle was grabbed by + calling rproc_get_by_name(). + +3. Typical usage + +#include <linux/remoteproc.h> + +/* in case we were given a valid 'rproc' handle */ +int dummy_rproc_example(struct rproc *my_rproc) +{ + int ret; + + /* let's power on and boot our remote processor */ + ret = rproc_boot(my_rproc); + if (ret) { + /* + * something went wrong. handle it and leave. + */ + } + + /* + * our remote processor is now powered on... give it some work + */ + + /* let's shut it down now */ + rproc_shutdown(my_rproc); +} + +4. API for implementors + + struct rproc *rproc_alloc(struct device *dev, const char *name, + const struct rproc_ops *ops, + const char *firmware, int len) + - Allocate a new remote processor handle, but don't register + it yet. Required parameters are the underlying device, the + name of this remote processor, platform-specific ops handlers, + the name of the firmware to boot this rproc with, and the + length of private data needed by the allocating rproc driver (in bytes). + + This function should be used by rproc implementations during + initialization of the remote processor. + After creating an rproc handle using this function, and when ready, + implementations should then call rproc_register() to complete + the registration of the remote processor. + On success, the new rproc is returned, and on failure, NULL. + + Note: _never_ directly deallocate @rproc, even if it was not registered + yet. Instead, if you just need to unroll rproc_alloc(), use rproc_free(). + + void rproc_free(struct rproc *rproc) + - Free an rproc handle that was allocated by rproc_alloc. + This function should _only_ be used if @rproc was only allocated, + but not registered yet. + If @rproc was already successfully registered (by calling + rproc_register()), then use rproc_unregister() instead. + + int rproc_register(struct rproc *rproc) + - Register @rproc with the remoteproc framework, after it has been + allocated with rproc_alloc(). + This is called by the platform-specific rproc implementation, whenever + a new remote processor device is probed. + Returns 0 on success and an appropriate error code otherwise. + Note: this function initiates an asynchronous firmware loading + context, which will look for virtio devices supported by the rproc's + firmware. + If found, those virtio devices will be created and added, so as a result + of registering this remote processor, additional virtio drivers might get + probed. + + int rproc_unregister(struct rproc *rproc) + - Unregister a remote processor, and decrement its refcount. + If its refcount drops to zero, then @rproc will be freed. If not, + it will be freed later once the last reference is dropped. + + This function should be called when the platform specific rproc + implementation decides to remove the rproc device. it should + _only_ be called if a previous invocation of rproc_register() + has completed successfully. + + After rproc_unregister() returns, @rproc is _not_ valid anymore and + it shouldn't be used. More specifically, don't call rproc_free() + or try to directly free @rproc after rproc_unregister() returns; + none of these are needed, and calling them is a bug. + + Returns 0 on success and -EINVAL if @rproc isn't valid. + +5. Implementation callbacks + +These callbacks should be provided by platform-specific remoteproc +drivers: + +/** + * struct rproc_ops - platform-specific device handlers + * @start: power on the device and boot it + * @stop: power off the device + * @kick: kick a virtqueue (virtqueue id given as a parameter) + */ +struct rproc_ops { + int (*start)(struct rproc *rproc); + int (*stop)(struct rproc *rproc); + void (*kick)(struct rproc *rproc, int vqid); +}; + +Every remoteproc implementation should at least provide the ->start and ->stop +handlers. If rpmsg/virtio functionality is also desired, then the ->kick handler +should be provided as well. + +The ->start() handler takes an rproc handle and should then power on the +device and boot it (use rproc->priv to access platform-specific private data). +The boot address, in case needed, can be found in rproc->bootaddr (remoteproc +core puts there the ELF entry point). +On success, 0 should be returned, and on failure, an appropriate error code. + +The ->stop() handler takes an rproc handle and powers the device down. +On success, 0 is returned, and on failure, an appropriate error code. + +The ->kick() handler takes an rproc handle, and an index of a virtqueue +where new message was placed in. Implementations should interrupt the remote +processor and let it know it has pending messages. Notifying remote processors +the exact virtqueue index to look in is optional: it is easy (and not +too expensive) to go through the existing virtqueues and look for new buffers +in the used rings. + +6. Binary Firmware Structure + +At this point remoteproc only supports ELF32 firmware binaries. However, +it is quite expected that other platforms/devices which we'd want to +support with this framework will be based on different binary formats. + +When those use cases show up, we will have to decouple the binary format +from the framework core, so we can support several binary formats without +duplicating common code. + +When the firmware is parsed, its various segments are loaded to memory +according to the specified device address (might be a physical address +if the remote processor is accessing memory directly). + +In addition to the standard ELF segments, most remote processors would +also include a special section which we call "the resource table". + +The resource table contains system resources that the remote processor +requires before it should be powered on, such as allocation of physically +contiguous memory, or iommu mapping of certain on-chip peripherals. +Remotecore will only power up the device after all the resource table's +requirement are met. + +In addition to system resources, the resource table may also contain +resource entries that publish the existence of supported features +or configurations by the remote processor, such as trace buffers and +supported virtio devices (and their configurations). + +The resource table begins with this header: + +/** + * struct resource_table - firmware resource table header + * @ver: version number + * @num: number of resource entries + * @reserved: reserved (must be zero) + * @offset: array of offsets pointing at the various resource entries + * + * The header of the resource table, as expressed by this structure, + * contains a version number (should we need to change this format in the + * future), the number of available resource entries, and their offsets + * in the table. + */ +struct resource_table { + u32 ver; + u32 num; + u32 reserved[2]; + u32 offset[0]; +} __packed; + +Immediately following this header are the resource entries themselves, +each of which begins with the following resource entry header: + +/** + * struct fw_rsc_hdr - firmware resource entry header + * @type: resource type + * @data: resource data + * + * Every resource entry begins with a 'struct fw_rsc_hdr' header providing + * its @type. The content of the entry itself will immediately follow + * this header, and it should be parsed according to the resource type. + */ +struct fw_rsc_hdr { + u32 type; + u8 data[0]; +} __packed; + +Some resources entries are mere announcements, where the host is informed +of specific remoteproc configuration. Other entries require the host to +do something (e.g. allocate a system resource). Sometimes a negotiation +is expected, where the firmware requests a resource, and once allocated, +the host should provide back its details (e.g. address of an allocated +memory region). + +Here are the various resource types that are currently supported: + +/** + * enum fw_resource_type - types of resource entries + * + * @RSC_CARVEOUT: request for allocation of a physically contiguous + * memory region. + * @RSC_DEVMEM: request to iommu_map a memory-based peripheral. + * @RSC_TRACE: announces the availability of a trace buffer into which + * the remote processor will be writing logs. + * @RSC_VDEV: declare support for a virtio device, and serve as its + * virtio header. + * @RSC_LAST: just keep this one at the end + * + * Please note that these values are used as indices to the rproc_handle_rsc + * lookup table, so please keep them sane. Moreover, @RSC_LAST is used to + * check the validity of an index before the lookup table is accessed, so + * please update it as needed. + */ +enum fw_resource_type { + RSC_CARVEOUT = 0, + RSC_DEVMEM = 1, + RSC_TRACE = 2, + RSC_VDEV = 3, + RSC_LAST = 4, +}; + +For more details regarding a specific resource type, please see its +dedicated structure in include/linux/remoteproc.h. + +We also expect that platform-specific resource entries will show up +at some point. When that happens, we could easily add a new RSC_PLATFORM +type, and hand those resources to the platform-specific rproc driver to handle. + +7. Virtio and remoteproc + +The firmware should provide remoteproc information about virtio devices +that it supports, and their configurations: a RSC_VDEV resource entry +should specify the virtio device id (as in virtio_ids.h), virtio features, +virtio config space, vrings information, etc. + +When a new remote processor is registered, the remoteproc framework +will look for its resource table and will register the virtio devices +it supports. A firmware may support any number of virtio devices, and +of any type (a single remote processor can also easily support several +rpmsg virtio devices this way, if desired). + +Of course, RSC_VDEV resource entries are only good enough for static +allocation of virtio devices. Dynamic allocations will also be made possible +using the rpmsg bus (similar to how we already do dynamic allocations of +rpmsg channels; read more about it in rpmsg.txt). diff --git a/Documentation/rpmsg.txt b/Documentation/rpmsg.txt new file mode 100644 index 000000000000..409d9f964c5b --- /dev/null +++ b/Documentation/rpmsg.txt @@ -0,0 +1,293 @@ +Remote Processor Messaging (rpmsg) Framework + +Note: this document describes the rpmsg bus and how to write rpmsg drivers. +To learn how to add rpmsg support for new platforms, check out remoteproc.txt +(also a resident of Documentation/). + +1. Introduction + +Modern SoCs typically employ heterogeneous remote processor devices in +asymmetric multiprocessing (AMP) configurations, which may be running +different instances of operating system, whether it's Linux or any other +flavor of real-time OS. + +OMAP4, for example, has dual Cortex-A9, dual Cortex-M3 and a C64x+ DSP. +Typically, the dual cortex-A9 is running Linux in a SMP configuration, +and each of the other three cores (two M3 cores and a DSP) is running +its own instance of RTOS in an AMP configuration. + +Typically AMP remote processors employ dedicated DSP codecs and multimedia +hardware accelerators, and therefore are often used to offload CPU-intensive +multimedia tasks from the main application processor. + +These remote processors could also be used to control latency-sensitive +sensors, drive random hardware blocks, or just perform background tasks +while the main CPU is idling. + +Users of those remote processors can either be userland apps (e.g. multimedia +frameworks talking with remote OMX components) or kernel drivers (controlling +hardware accessible only by the remote processor, reserving kernel-controlled +resources on behalf of the remote processor, etc..). + +Rpmsg is a virtio-based messaging bus that allows kernel drivers to communicate +with remote processors available on the system. In turn, drivers could then +expose appropriate user space interfaces, if needed. + +When writing a driver that exposes rpmsg communication to userland, please +keep in mind that remote processors might have direct access to the +system's physical memory and other sensitive hardware resources (e.g. on +OMAP4, remote cores and hardware accelerators may have direct access to the +physical memory, gpio banks, dma controllers, i2c bus, gptimers, mailbox +devices, hwspinlocks, etc..). Moreover, those remote processors might be +running RTOS where every task can access the entire memory/devices exposed +to the processor. To minimize the risks of rogue (or buggy) userland code +exploiting remote bugs, and by that taking over the system, it is often +desired to limit userland to specific rpmsg channels (see definition below) +it can send messages on, and if possible, minimize how much control +it has over the content of the messages. + +Every rpmsg device is a communication channel with a remote processor (thus +rpmsg devices are called channels). Channels are identified by a textual name +and have a local ("source") rpmsg address, and remote ("destination") rpmsg +address. + +When a driver starts listening on a channel, its rx callback is bound with +a unique rpmsg local address (a 32-bit integer). This way when inbound messages +arrive, the rpmsg core dispatches them to the appropriate driver according +to their destination address (this is done by invoking the driver's rx handler +with the payload of the inbound message). + + +2. User API + + int rpmsg_send(struct rpmsg_channel *rpdev, void *data, int len); + - sends a message across to the remote processor on a given channel. + The caller should specify the channel, the data it wants to send, + and its length (in bytes). The message will be sent on the specified + channel, i.e. its source and destination address fields will be + set to the channel's src and dst addresses. + + In case there are no TX buffers available, the function will block until + one becomes available (i.e. until the remote processor consumes + a tx buffer and puts it back on virtio's used descriptor ring), + or a timeout of 15 seconds elapses. When the latter happens, + -ERESTARTSYS is returned. + The function can only be called from a process context (for now). + Returns 0 on success and an appropriate error value on failure. + + int rpmsg_sendto(struct rpmsg_channel *rpdev, void *data, int len, u32 dst); + - sends a message across to the remote processor on a given channel, + to a destination address provided by the caller. + The caller should specify the channel, the data it wants to send, + its length (in bytes), and an explicit destination address. + The message will then be sent to the remote processor to which the + channel belongs, using the channel's src address, and the user-provided + dst address (thus the channel's dst address will be ignored). + + In case there are no TX buffers available, the function will block until + one becomes available (i.e. until the remote processor consumes + a tx buffer and puts it back on virtio's used descriptor ring), + or a timeout of 15 seconds elapses. When the latter happens, + -ERESTARTSYS is returned. + The function can only be called from a process context (for now). + Returns 0 on success and an appropriate error value on failure. + + int rpmsg_send_offchannel(struct rpmsg_channel *rpdev, u32 src, u32 dst, + void *data, int len); + - sends a message across to the remote processor, using the src and dst + addresses provided by the user. + The caller should specify the channel, the data it wants to send, + its length (in bytes), and explicit source and destination addresses. + The message will then be sent to the remote processor to which the + channel belongs, but the channel's src and dst addresses will be + ignored (and the user-provided addresses will be used instead). + + In case there are no TX buffers available, the function will block until + one becomes available (i.e. until the remote processor consumes + a tx buffer and puts it back on virtio's used descriptor ring), + or a timeout of 15 seconds elapses. When the latter happens, + -ERESTARTSYS is returned. + The function can only be called from a process context (for now). + Returns 0 on success and an appropriate error value on failure. + + int rpmsg_trysend(struct rpmsg_channel *rpdev, void *data, int len); + - sends a message across to the remote processor on a given channel. + The caller should specify the channel, the data it wants to send, + and its length (in bytes). The message will be sent on the specified + channel, i.e. its source and destination address fields will be + set to the channel's src and dst addresses. + + In case there are no TX buffers available, the function will immediately + return -ENOMEM without waiting until one becomes available. + The function can only be called from a process context (for now). + Returns 0 on success and an appropriate error value on failure. + + int rpmsg_trysendto(struct rpmsg_channel *rpdev, void *data, int len, u32 dst) + - sends a message across to the remote processor on a given channel, + to a destination address provided by the user. + The user should specify the channel, the data it wants to send, + its length (in bytes), and an explicit destination address. + The message will then be sent to the remote processor to which the + channel belongs, using the channel's src address, and the user-provided + dst address (thus the channel's dst address will be ignored). + + In case there are no TX buffers available, the function will immediately + return -ENOMEM without waiting until one becomes available. + The function can only be called from a process context (for now). + Returns 0 on success and an appropriate error value on failure. + + int rpmsg_trysend_offchannel(struct rpmsg_channel *rpdev, u32 src, u32 dst, + void *data, int len); + - sends a message across to the remote processor, using source and + destination addresses provided by the user. + The user should specify the channel, the data it wants to send, + its length (in bytes), and explicit source and destination addresses. + The message will then be sent to the remote processor to which the + channel belongs, but the channel's src and dst addresses will be + ignored (and the user-provided addresses will be used instead). + + In case there are no TX buffers available, the function will immediately + return -ENOMEM without waiting until one becomes available. + The function can only be called from a process context (for now). + Returns 0 on success and an appropriate error value on failure. + + struct rpmsg_endpoint *rpmsg_create_ept(struct rpmsg_channel *rpdev, + void (*cb)(struct rpmsg_channel *, void *, int, void *, u32), + void *priv, u32 addr); + - every rpmsg address in the system is bound to an rx callback (so when + inbound messages arrive, they are dispatched by the rpmsg bus using the + appropriate callback handler) by means of an rpmsg_endpoint struct. + + This function allows drivers to create such an endpoint, and by that, + bind a callback, and possibly some private data too, to an rpmsg address + (either one that is known in advance, or one that will be dynamically + assigned for them). + + Simple rpmsg drivers need not call rpmsg_create_ept, because an endpoint + is already created for them when they are probed by the rpmsg bus + (using the rx callback they provide when they registered to the rpmsg bus). + + So things should just work for simple drivers: they already have an + endpoint, their rx callback is bound to their rpmsg address, and when + relevant inbound messages arrive (i.e. messages which their dst address + equals to the src address of their rpmsg channel), the driver's handler + is invoked to process it. + + That said, more complicated drivers might do need to allocate + additional rpmsg addresses, and bind them to different rx callbacks. + To accomplish that, those drivers need to call this function. + Drivers should provide their channel (so the new endpoint would bind + to the same remote processor their channel belongs to), an rx callback + function, an optional private data (which is provided back when the + rx callback is invoked), and an address they want to bind with the + callback. If addr is RPMSG_ADDR_ANY, then rpmsg_create_ept will + dynamically assign them an available rpmsg address (drivers should have + a very good reason why not to always use RPMSG_ADDR_ANY here). + + Returns a pointer to the endpoint on success, or NULL on error. + + void rpmsg_destroy_ept(struct rpmsg_endpoint *ept); + - destroys an existing rpmsg endpoint. user should provide a pointer + to an rpmsg endpoint that was previously created with rpmsg_create_ept(). + + int register_rpmsg_driver(struct rpmsg_driver *rpdrv); + - registers an rpmsg driver with the rpmsg bus. user should provide + a pointer to an rpmsg_driver struct, which contains the driver's + ->probe() and ->remove() functions, an rx callback, and an id_table + specifying the names of the channels this driver is interested to + be probed with. + + void unregister_rpmsg_driver(struct rpmsg_driver *rpdrv); + - unregisters an rpmsg driver from the rpmsg bus. user should provide + a pointer to a previously-registered rpmsg_driver struct. + Returns 0 on success, and an appropriate error value on failure. + + +3. Typical usage + +The following is a simple rpmsg driver, that sends an "hello!" message +on probe(), and whenever it receives an incoming message, it dumps its +content to the console. + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/rpmsg.h> + +static void rpmsg_sample_cb(struct rpmsg_channel *rpdev, void *data, int len, + void *priv, u32 src) +{ + print_hex_dump(KERN_INFO, "incoming message:", DUMP_PREFIX_NONE, + 16, 1, data, len, true); +} + +static int rpmsg_sample_probe(struct rpmsg_channel *rpdev) +{ + int err; + + dev_info(&rpdev->dev, "chnl: 0x%x -> 0x%x\n", rpdev->src, rpdev->dst); + + /* send a message on our channel */ + err = rpmsg_send(rpdev, "hello!", 6); + if (err) { + pr_err("rpmsg_send failed: %d\n", err); + return err; + } + + return 0; +} + +static void __devexit rpmsg_sample_remove(struct rpmsg_channel *rpdev) +{ + dev_info(&rpdev->dev, "rpmsg sample client driver is removed\n"); +} + +static struct rpmsg_device_id rpmsg_driver_sample_id_table[] = { + { .name = "rpmsg-client-sample" }, + { }, +}; +MODULE_DEVICE_TABLE(rpmsg, rpmsg_driver_sample_id_table); + +static struct rpmsg_driver rpmsg_sample_client = { + .drv.name = KBUILD_MODNAME, + .drv.owner = THIS_MODULE, + .id_table = rpmsg_driver_sample_id_table, + .probe = rpmsg_sample_probe, + .callback = rpmsg_sample_cb, + .remove = __devexit_p(rpmsg_sample_remove), +}; + +static int __init init(void) +{ + return register_rpmsg_driver(&rpmsg_sample_client); +} +module_init(init); + +static void __exit fini(void) +{ + unregister_rpmsg_driver(&rpmsg_sample_client); +} +module_exit(fini); + +Note: a similar sample which can be built and loaded can be found +in samples/rpmsg/. + +4. Allocations of rpmsg channels: + +At this point we only support dynamic allocations of rpmsg channels. + +This is possible only with remote processors that have the VIRTIO_RPMSG_F_NS +virtio device feature set. This feature bit means that the remote +processor supports dynamic name service announcement messages. + +When this feature is enabled, creation of rpmsg devices (i.e. channels) +is completely dynamic: the remote processor announces the existence of a +remote rpmsg service by sending a name service message (which contains +the name and rpmsg addr of the remote service, see struct rpmsg_ns_msg). + +This message is then handled by the rpmsg bus, which in turn dynamically +creates and registers an rpmsg channel (which represents the remote service). +If/when a relevant rpmsg driver is registered, it will be immediately probed +by the bus, and can then start sending messages to the remote service. + +The plan is also to add static creation of rpmsg channels via the virtio +config space, but it's not implemented yet. diff --git a/Documentation/s390/3270.txt b/Documentation/s390/3270.txt index 7a5c73a7ed7f..7c715de99774 100644 --- a/Documentation/s390/3270.txt +++ b/Documentation/s390/3270.txt @@ -47,9 +47,9 @@ including the console 3270, changes subchannel identifier relative to one another. ReIPL as soon as possible after running the configuration script and the resulting /tmp/mkdev3270. -If you have chosen to make tub3270 a module, you add a line to -/etc/modprobe.conf. If you are working on a VM virtual machine, you -can use DEF GRAF to define virtual 3270 devices. +If you have chosen to make tub3270 a module, you add a line to a +configuration file under /etc/modprobe.d/. If you are working on a VM +virtual machine, you can use DEF GRAF to define virtual 3270 devices. You may generate both 3270 and 3215 console support, or one or the other, or neither. If you generate both, the console type under VM is @@ -60,7 +60,7 @@ at boot time to a 3270 if it is a 3215. In brief, these are the steps: 1. Install the tub3270 patch - 2. (If a module) add a line to /etc/modprobe.conf + 2. (If a module) add a line to a file in /etc/modprobe.d/*.conf 3. (If VM) define devices with DEF GRAF 4. Reboot 5. Configure @@ -84,13 +84,12 @@ Here are the installation steps in detail: make modules_install 2. (Perform this step only if you have configured tub3270 as a - module.) Add a line to /etc/modprobe.conf to automatically - load the driver when it's needed. With this line added, - you will see login prompts appear on your 3270s as soon as - boot is complete (or with emulated 3270s, as soon as you dial - into your vm guest using the command "DIAL <vmguestname>"). - Since the line-mode major number is 227, the line to add to - /etc/modprobe.conf should be: + module.) Add a line to a file /etc/modprobe.d/*.conf to automatically + load the driver when it's needed. With this line added, you will see + login prompts appear on your 3270s as soon as boot is complete (or + with emulated 3270s, as soon as you dial into your vm guest using the + command "DIAL <vmguestname>"). Since the line-mode major number is + 227, the line to add should be: alias char-major-227 tub3270 3. Define graphic devices to your vm guest machine, if you diff --git a/Documentation/scheduler/sched-stats.txt b/Documentation/scheduler/sched-stats.txt index 1cd5d51bc761..8259b34a66ae 100644 --- a/Documentation/scheduler/sched-stats.txt +++ b/Documentation/scheduler/sched-stats.txt @@ -38,7 +38,8 @@ First field is a sched_yield() statistic: 1) # of times sched_yield() was called Next three are schedule() statistics: - 2) # of times we switched to the expired queue and reused it + 2) This field is a legacy array expiration count field used in the O(1) + scheduler. We kept it for ABI compatibility, but it is always set to zero. 3) # of times schedule() was called 4) # of times schedule() left the processor idle diff --git a/Documentation/scsi/00-INDEX b/Documentation/scsi/00-INDEX index b48ded55b555..b7dd6502bec5 100644 --- a/Documentation/scsi/00-INDEX +++ b/Documentation/scsi/00-INDEX @@ -94,3 +94,5 @@ sym53c8xx_2.txt - info on second generation driver for sym53c8xx based adapters tmscsim.txt - info on driver for AM53c974 based adapters +ufs.txt + - info on Universal Flash Storage(UFS) and UFS host controller driver. diff --git a/Documentation/scsi/ChangeLog.lpfc b/Documentation/scsi/ChangeLog.lpfc index c56ec99d7b2f..2f6d595f95e1 100644 --- a/Documentation/scsi/ChangeLog.lpfc +++ b/Documentation/scsi/ChangeLog.lpfc @@ -1718,7 +1718,7 @@ Changes from 20040319 to 20040326 * lpfc_els_timeout_handler() now uses system timer. * Further cleanup of #ifdef powerpc * lpfc_scsi_timeout_handler() now uses system timer. - * Replace common driver's own defines for endianess w/ Linux's + * Replace common driver's own defines for endianness w/ Linux's __BIG_ENDIAN etc. * Added #ifdef IPFC for all IPFC specific code. * lpfc_disc_retry_rptlun() now uses system timer. diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas index 57566bacb4c5..83f8ea8b79eb 100644 --- a/Documentation/scsi/ChangeLog.megaraid_sas +++ b/Documentation/scsi/ChangeLog.megaraid_sas @@ -510,7 +510,7 @@ i. Support for 1078 type (ppc IOP) controller, device id : 0x60 added. 3 Older Version : 00.00.02.02 i. Register 16 byte CDB capability with scsi midlayer - "Ths patch properly registers the 16 byte command length capability of the + "This patch properly registers the 16 byte command length capability of the megaraid_sas controlled hardware with the scsi midlayer. All megaraid_sas hardware supports 16 byte CDB's." diff --git a/Documentation/scsi/LICENSE.qla2xxx b/Documentation/scsi/LICENSE.qla2xxx index 19e7cd4bba66..ce0fdf349a81 100644 --- a/Documentation/scsi/LICENSE.qla2xxx +++ b/Documentation/scsi/LICENSE.qla2xxx @@ -1,48 +1,11 @@ Copyright (c) 2003-2011 QLogic Corporation -QLogic Linux/ESX Fibre Channel HBA Driver +QLogic Linux FC-FCoE Driver -This program includes a device driver for Linux 2.6/ESX that may be -distributed with QLogic hardware specific firmware binary file. +This program includes a device driver for Linux 3.x. You may modify and redistribute the device driver code under the GNU General Public License (a copy of which is attached hereto as Exhibit A) published by the Free Software Foundation (version 2). -You may redistribute the hardware specific firmware binary file -under the following terms: - - 1. Redistribution of source code (only if applicable), - must retain the above copyright notice, this list of - conditions and the following disclaimer. - - 2. Redistribution in binary form must reproduce the above - copyright notice, this list of conditions and the - following disclaimer in the documentation and/or other - materials provided with the distribution. - - 3. The name of QLogic Corporation may not be used to - endorse or promote products derived from this software - without specific prior written permission - -REGARDLESS OF WHAT LICENSING MECHANISM IS USED OR APPLICABLE, -THIS PROGRAM IS PROVIDED BY QLOGIC CORPORATION "AS IS'' AND ANY -EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A -PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR -BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, -EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED -TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON -ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY -OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. - -USER ACKNOWLEDGES AND AGREES THAT USE OF THIS PROGRAM WILL NOT -CREATE OR GIVE GROUNDS FOR A LICENSE BY IMPLICATION, ESTOPPEL, OR -OTHERWISE IN ANY INTELLECTUAL PROPERTY RIGHTS (PATENT, COPYRIGHT, -TRADE SECRET, MASK WORK, OR OTHER PROPRIETARY RIGHT) EMBODIED IN -ANY OTHER QLOGIC HARDWARE OR SOFTWARE EITHER SOLELY OR IN -COMBINATION WITH THIS PROGRAM. EXHIBIT A diff --git a/Documentation/scsi/aic79xx.txt b/Documentation/scsi/aic79xx.txt index 64ac7093c872..e2d3273000d4 100644 --- a/Documentation/scsi/aic79xx.txt +++ b/Documentation/scsi/aic79xx.txt @@ -215,7 +215,7 @@ The following information is available in this file: INCORRECTLY CAN RENDER YOUR SYSTEM INOPERABLE. USE THEM WITH CAUTION. - Edit the file "modprobe.conf" in the directory /etc and add/edit a + Put a .conf file in the /etc/modprobe.d/ directory and add/edit a line containing 'options aic79xx aic79xx=[command[,command...]]' where 'command' is one or more of the following: ----------------------------------------------------------------- diff --git a/Documentation/scsi/aic7xxx.txt b/Documentation/scsi/aic7xxx.txt index 18f8d1905e6a..7c5d0223d444 100644 --- a/Documentation/scsi/aic7xxx.txt +++ b/Documentation/scsi/aic7xxx.txt @@ -190,7 +190,7 @@ The following information is available in this file: INCORRECTLY CAN RENDER YOUR SYSTEM INOPERABLE. USE THEM WITH CAUTION. - Edit the file "modprobe.conf" in the directory /etc and add/edit a + Put a .conf file in the /etc/modprobe.d directory and add/edit a line containing 'options aic7xxx aic7xxx=[command[,command...]]' where 'command' is one or more of the following: ----------------------------------------------------------------- diff --git a/Documentation/scsi/bfa.txt b/Documentation/scsi/bfa.txt new file mode 100644 index 000000000000..f2d6e9d1791e --- /dev/null +++ b/Documentation/scsi/bfa.txt @@ -0,0 +1,82 @@ +Linux driver for Brocade FC/FCOE adapters + + +Supported Hardware +------------------ + +bfa 3.0.2.2 driver supports all Brocade FC/FCOE adapters. Below is a list of +adapter models with corresponding PCIIDs. + + PCIID Model + + 1657:0013:1657:0014 425 4Gbps dual port FC HBA + 1657:0013:1657:0014 825 8Gbps PCIe dual port FC HBA + 1657:0013:103c:1742 HP 82B 8Gbps PCIedual port FC HBA + 1657:0013:103c:1744 HP 42B 4Gbps dual port FC HBA + 1657:0017:1657:0014 415 4Gbps single port FC HBA + 1657:0017:1657:0014 815 8Gbps single port FC HBA + 1657:0017:103c:1741 HP 41B 4Gbps single port FC HBA + 1657:0017:103c 1743 HP 81B 8Gbps single port FC HBA + 1657:0021:103c:1779 804 8Gbps FC HBA for HP Bladesystem c-class + + 1657:0014:1657:0014 1010 10Gbps single port CNA - FCOE + 1657:0014:1657:0014 1020 10Gbps dual port CNA - FCOE + 1657:0014:1657:0014 1007 10Gbps dual port CNA - FCOE + 1657:0014:1657:0014 1741 10Gbps dual port CNA - FCOE + + 1657:0022:1657:0024 1860 16Gbps FC HBA + 1657:0022:1657:0022 1860 10Gbps CNA - FCOE + + +Firmware download +----------------- + +The latest Firmware package for 3.0.2.2 bfa driver can be found at: + +http://www.brocade.com/services-support/drivers-downloads/adapters/Linux.page + +and then click following respective util package link: + + Version Link + + v3.0.0.0 Linux Adapter Firmware package for RHEL 6.2, SLES 11SP2 + + +Configuration & Management utility download +------------------------------------------- + +The latest driver configuration & management utility for 3.0.2.2 bfa driver can +be found at: + +http://www.brocade.com/services-support/drivers-downloads/adapters/Linux.page + +and then click following respective util pacakge link + + Version Link + + v3.0.2.0 Linux Adapter Firmware package for RHEL 6.2, SLES 11SP2 + + +Documentation +------------- + +The latest Administration's Guide, Installation and Reference Manual, +Troubleshooting Guide, and Release Notes for the corresponding out-of-box +driver can be found at: + +http://www.brocade.com/services-support/drivers-downloads/adapters/Linux.page + +and use the following inbox and out-of-box driver version mapping to find +the corresponding documentation: + + Inbox Version Out-of-box Version + + v3.0.2.2 v3.0.0.0 + + +Support +------- + +For general product and support info, go to the Brocade website at: + +http://www.brocade.com/services-support/index.page diff --git a/Documentation/scsi/libsas.txt b/Documentation/scsi/libsas.txt index aa54f54c4a50..3cc9c7843e15 100644 --- a/Documentation/scsi/libsas.txt +++ b/Documentation/scsi/libsas.txt @@ -398,21 +398,6 @@ struct sas_task { task_done -- callback when the task has finished execution }; -When an external entity, entity other than the LLDD or the -SAS Layer, wants to work with a struct domain_device, it -_must_ call kobject_get() when getting a handle on the -device and kobject_put() when it is done with the device. - -This does two things: - A) implements proper kfree() for the device; - B) increments/decrements the kref for all players: - domain_device - all domain_device's ... (if past an expander) - port - host adapter - pci device - and up the ladder, etc. - DISCOVERY --------- diff --git a/Documentation/scsi/osst.txt b/Documentation/scsi/osst.txt index ad86c6d1e898..00c8ebb2fd18 100644 --- a/Documentation/scsi/osst.txt +++ b/Documentation/scsi/osst.txt @@ -66,7 +66,7 @@ recognized. If you want to have the module autoloaded on access to /dev/osst, you may add something like alias char-major-206 osst -to your /etc/modprobe.conf (before 2.6: modules.conf). +to a file under /etc/modprobe.d/ directory. You may find it convenient to create a symbolic link ln -s nosst0 /dev/tape diff --git a/Documentation/scsi/scsi-generic.txt b/Documentation/scsi/scsi-generic.txt index 0a22ab8ea0c1..51be20a6a14d 100644 --- a/Documentation/scsi/scsi-generic.txt +++ b/Documentation/scsi/scsi-generic.txt @@ -62,7 +62,7 @@ There are two packages of sg utilities: and earlier Both packages will work in the lk 2.4 series however sg3_utils offers more capabilities. They can be found at: http://sg.danny.cz/sg/sg3_utils.html and -freshmeat.net +freecode.com Another approach is to look at the applications that use the sg driver. These include cdrecord, cdparanoia, SANE and cdrdao. diff --git a/Documentation/scsi/st.txt b/Documentation/scsi/st.txt index 691ca292c24d..685bf3582abe 100644 --- a/Documentation/scsi/st.txt +++ b/Documentation/scsi/st.txt @@ -390,6 +390,10 @@ MTSETDRVBUFFER MT_ST_SYSV sets the SYSV semantics (mode) MT_ST_NOWAIT enables immediate mode (i.e., don't wait for the command to finish) for some commands (e.g., rewind) + MT_ST_NOWAIT_EOF enables immediate filemark mode (i.e. when + writing a filemark, don't wait for it to complete). Please + see the BASICS note about MTWEOFI with respect to the + possible dangers of writing immediate filemarks. MT_ST_SILI enables setting the SILI bit in SCSI commands when reading in variable block mode to enhance performance when reading blocks shorter than the byte count; set this only diff --git a/Documentation/scsi/tmscsim.txt b/Documentation/scsi/tmscsim.txt index 61c0531e044a..3303d218b32e 100644 --- a/Documentation/scsi/tmscsim.txt +++ b/Documentation/scsi/tmscsim.txt @@ -102,7 +102,7 @@ So take at least the following measures: ftp://student.physik.uni-dortmund.de/pub/linux/kernel/bootdisk.gz One more warning: I used to overclock my PCI bus to 41.67 MHz. My Tekram -DC390F (Sym53c875) accepted this as well as my Millenium. But the Am53C974 +DC390F (Sym53c875) accepted this as well as my Millennium. But the Am53C974 produced errors and started to corrupt my disks. So don't do that! A 37.50 MHz PCI bus works for me, though, but I don't recommend using higher clocks than the 33.33 MHz being in the PCI spec. diff --git a/Documentation/scsi/ufs.txt b/Documentation/scsi/ufs.txt new file mode 100644 index 000000000000..41a6164592aa --- /dev/null +++ b/Documentation/scsi/ufs.txt @@ -0,0 +1,133 @@ + Universal Flash Storage + ======================= + + +Contents +-------- + +1. Overview +2. UFS Architecture Overview + 2.1 Application Layer + 2.2 UFS Transport Protocol(UTP) layer + 2.3 UFS Interconnect(UIC) Layer +3. UFSHCD Overview + 3.1 UFS controller initialization + 3.2 UTP Transfer requests + 3.3 UFS error handling + 3.4 SCSI Error handling + + +1. Overview +----------- + +Universal Flash Storage(UFS) is a storage specification for flash devices. +It is aimed to provide a universal storage interface for both +embedded and removable flash memory based storage in mobile +devices such as smart phones and tablet computers. The specification +is defined by JEDEC Solid State Technology Association. UFS is based +on MIPI M-PHY physical layer standard. UFS uses MIPI M-PHY as the +physical layer and MIPI Unipro as the link layer. + +The main goals of UFS is to provide, + * Optimized performance: + For UFS version 1.0 and 1.1 the target performance is as follows, + Support for Gear1 is mandatory (rate A: 1248Mbps, rate B: 1457.6Mbps) + Support for Gear2 is optional (rate A: 2496Mbps, rate B: 2915.2Mbps) + Future version of the standard, + Gear3 (rate A: 4992Mbps, rate B: 5830.4Mbps) + * Low power consumption + * High random IOPs and low latency + + +2. UFS Architecture Overview +---------------------------- + +UFS has a layered communication architecture which is based on SCSI +SAM-5 architectural model. + +UFS communication architecture consists of following layers, + +2.1 Application Layer + + The Application layer is composed of UFS command set layer(UCS), + Task Manager and Device manager. The UFS interface is designed to be + protocol agnostic, however SCSI has been selected as a baseline + protocol for versions 1.0 and 1.1 of UFS protocol layer. + UFS supports subset of SCSI commands defined by SPC-4 and SBC-3. + * UCS: It handles SCSI commands supported by UFS specification. + * Task manager: It handles task management functions defined by the + UFS which are meant for command queue control. + * Device manager: It handles device level operations and device + configuration operations. Device level operations mainly involve + device power management operations and commands to Interconnect + layers. Device level configurations involve handling of query + requests which are used to modify and retrieve configuration + information of the device. + +2.2 UFS Transport Protocol(UTP) layer + + UTP layer provides services for + the higher layers through Service Access Points. UTP defines 3 + service access points for higher layers. + * UDM_SAP: Device manager service access point is exposed to device + manager for device level operations. These device level operations + are done through query requests. + * UTP_CMD_SAP: Command service access point is exposed to UFS command + set layer(UCS) to transport commands. + * UTP_TM_SAP: Task management service access point is exposed to task + manager to transport task management functions. + UTP transports messages through UFS protocol information unit(UPIU). + +2.3 UFS Interconnect(UIC) Layer + + UIC is the lowest layer of UFS layered architecture. It handles + connection between UFS host and UFS device. UIC consists of + MIPI UniPro and MIPI M-PHY. UIC provides 2 service access points + to upper layer, + * UIC_SAP: To transport UPIU between UFS host and UFS device. + * UIO_SAP: To issue commands to Unipro layers. + + +3. UFSHCD Overview +------------------ + +The UFS host controller driver is based on Linux SCSI Framework. +UFSHCD is a low level device driver which acts as an interface between +SCSI Midlayer and PCIe based UFS host controllers. + +The current UFSHCD implementation supports following functionality, + +3.1 UFS controller initialization + + The initialization module brings UFS host controller to active state + and prepares the controller to transfer commands/response between + UFSHCD and UFS device. + +3.2 UTP Transfer requests + + Transfer request handling module of UFSHCD receives SCSI commands + from SCSI Midlayer, forms UPIUs and issues the UPIUs to UFS Host + controller. Also, the module decodes, responses received from UFS + host controller in the form of UPIUs and intimates the SCSI Midlayer + of the status of the command. + +3.3 UFS error handling + + Error handling module handles Host controller fatal errors, + Device fatal errors and UIC interconnect layer related errors. + +3.4 SCSI Error handling + + This is done through UFSHCD SCSI error handling routines registered + with SCSI Midlayer. Examples of some of the error handling commands + issues by SCSI Midlayer are Abort task, Lun reset and host reset. + UFSHCD Routines to perform these tasks are registered with + SCSI Midlayer through .eh_abort_handler, .eh_device_reset_handler and + .eh_host_reset_handler. + +In this version of UFSHCD Query requests and power management +functionality are not implemented. + +UFS Specifications can be found at, +UFS - http://www.jedec.org/sites/default/files/docs/JESD220.pdf +UFSHCI - http://www.jedec.org/sites/default/files/docs/JESD223.pdf diff --git a/Documentation/security/00-INDEX b/Documentation/security/00-INDEX index 99b85d39751c..eeed1de546d4 100644 --- a/Documentation/security/00-INDEX +++ b/Documentation/security/00-INDEX @@ -6,6 +6,8 @@ SELinux.txt - how to get started with the SELinux security enhancement. Smack.txt - documentation on the Smack Linux Security Module. +Yama.txt + - documentation on the Yama Linux Security Module. apparmor.txt - documentation on the AppArmor security extension. credentials.txt diff --git a/Documentation/security/Smack.txt b/Documentation/security/Smack.txt index e9dab41c0fe0..d2f72ae66432 100644 --- a/Documentation/security/Smack.txt +++ b/Documentation/security/Smack.txt @@ -536,6 +536,6 @@ writing a single character to the /smack/logging file : 3 : log denied & accepted Events are logged as 'key=value' pairs, for each event you at least will get -the subjet, the object, the rights requested, the action, the kernel function +the subject, the object, the rights requested, the action, the kernel function that triggered the event, plus other pairs depending on the type of event audited. diff --git a/Documentation/security/Yama.txt b/Documentation/security/Yama.txt new file mode 100644 index 000000000000..a9511f179069 --- /dev/null +++ b/Documentation/security/Yama.txt @@ -0,0 +1,65 @@ +Yama is a Linux Security Module that collects a number of system-wide DAC +security protections that are not handled by the core kernel itself. To +select it at boot time, specify "security=yama" (though this will disable +any other LSM). + +Yama is controlled through sysctl in /proc/sys/kernel/yama: + +- ptrace_scope + +============================================================== + +ptrace_scope: + +As Linux grows in popularity, it will become a larger target for +malware. One particularly troubling weakness of the Linux process +interfaces is that a single user is able to examine the memory and +running state of any of their processes. For example, if one application +(e.g. Pidgin) was compromised, it would be possible for an attacker to +attach to other running processes (e.g. Firefox, SSH sessions, GPG agent, +etc) to extract additional credentials and continue to expand the scope +of their attack without resorting to user-assisted phishing. + +This is not a theoretical problem. SSH session hijacking +(http://www.storm.net.nz/projects/7) and arbitrary code injection +(http://c-skills.blogspot.com/2007/05/injectso.html) attacks already +exist and remain possible if ptrace is allowed to operate as before. +Since ptrace is not commonly used by non-developers and non-admins, system +builders should be allowed the option to disable this debugging system. + +For a solution, some applications use prctl(PR_SET_DUMPABLE, ...) to +specifically disallow such ptrace attachment (e.g. ssh-agent), but many +do not. A more general solution is to only allow ptrace directly from a +parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still +work), or with CAP_SYS_PTRACE (i.e. "gdb --pid=PID", and "strace -p PID" +still work as root). + +For software that has defined application-specific relationships +between a debugging process and its inferior (crash handlers, etc), +prctl(PR_SET_PTRACER, pid, ...) can be used. An inferior can declare which +other process (and its descendents) are allowed to call PTRACE_ATTACH +against it. Only one such declared debugging process can exists for +each inferior at a time. For example, this is used by KDE, Chromium, and +Firefox's crash handlers, and by Wine for allowing only Wine processes +to ptrace each other. If a process wishes to entirely disable these ptrace +restrictions, it can call prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...) +so that any otherwise allowed process (even those in external pid namespaces) +may attach. + +The sysctl settings are: + +0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other + process running under the same uid, as long as it is dumpable (i.e. + did not transition uids, start privileged, or have called + prctl(PR_SET_DUMPABLE...) already). + +1 - restricted ptrace: a process must have a predefined relationship + with the inferior it wants to call PTRACE_ATTACH on. By default, + this relationship is that of only its descendants when the above + classic criteria is also met. To change the relationship, an + inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare + an allowed debugger PID to call PTRACE_ATTACH on the inferior. + +The original children-only logic was based on the restrictions in grsecurity. + +============================================================== diff --git a/Documentation/security/keys-trusted-encrypted.txt b/Documentation/security/keys-trusted-encrypted.txt index c9e4855ed3d7..e105ae97a4f5 100644 --- a/Documentation/security/keys-trusted-encrypted.txt +++ b/Documentation/security/keys-trusted-encrypted.txt @@ -1,7 +1,7 @@ Trusted and Encrypted Keys Trusted and Encrypted Keys are two new key types added to the existing kernel -key ring service. Both of these new types are variable length symmetic keys, +key ring service. Both of these new types are variable length symmetric keys, and in both cases all keys are created in the kernel, and user space sees, stores, and loads only encrypted blobs. Trusted Keys require the availability of a Trusted Platform Module (TPM) chip for greater security, while Encrypted diff --git a/Documentation/security/keys.txt b/Documentation/security/keys.txt index 4d75931d2d79..d389acd31e19 100644 --- a/Documentation/security/keys.txt +++ b/Documentation/security/keys.txt @@ -123,7 +123,7 @@ KEY SERVICE OVERVIEW The key service provides a number of features besides keys: - (*) The key service defines two special key types: + (*) The key service defines three special key types: (+) "keyring" @@ -137,6 +137,18 @@ The key service provides a number of features besides keys: blobs of data. These can be created, updated and read by userspace, and aren't intended for use by kernel services. + (+) "logon" + + Like a "user" key, a "logon" key has a payload that is an arbitrary + blob of data. It is intended as a place to store secrets which are + accessible to the kernel but not to userspace programs. + + The description can be arbitrary, but must be prefixed with a non-zero + length string that describes the key "subclass". The subclass is + separated from the rest of the description by a ':'. "logon" keys can + be created and updated from userspace, but the payload is only + readable from kernel space. + (*) Each process subscribes to three keyrings: a thread-specific keyring, a process-specific keyring, and a session-specific keyring. @@ -554,6 +566,10 @@ The keyctl syscall functions are: process must have write permission on the keyring, and it must be a keyring (or else error ENOTDIR will result). + This function can also be used to clear special kernel keyrings if they + are appropriately marked if the user has CAP_SYS_ADMIN capability. The + DNS resolver cache keyring is an example of this. + (*) Link a key into a keyring: @@ -668,7 +684,7 @@ The keyctl syscall functions are: If the kernel calls back to userspace to complete the instantiation of a key, userspace should use this call mark the key as negative before the - invoked process returns if it is unable to fulfil the request. + invoked process returns if it is unable to fulfill the request. The process must have write access on the key to be able to instantiate it, and the key must be uninstantiated. diff --git a/Documentation/serial/computone.txt b/Documentation/serial/computone.txt index 39ddcdbeeb85..a6a1158ea2ba 100644 --- a/Documentation/serial/computone.txt +++ b/Documentation/serial/computone.txt @@ -49,7 +49,7 @@ Hardware - If you have an ISA card, find a free interrupt and io port. Note the hardware address from the Computone ISA cards installed into the system. These are required for editing ip2.c or editing - /etc/modprobe.conf, or for specification on the modprobe + /etc/modprobe.d/*.conf, or for specification on the modprobe command line. Note that the /etc/modules.conf should be used for older (pre-2.6) @@ -66,7 +66,7 @@ b) Run "make config" or "make menuconfig" or "make xconfig" c) Set address on ISA cards then: edit /usr/src/linux/drivers/char/ip2.c if needed or - edit /etc/modprobe.conf if needed (module). + edit config file in /etc/modprobe.d/ if needed (module). or both to match this setting. d) Run "make modules" e) Run "make modules_install" @@ -153,11 +153,11 @@ the irqs are not specified the driver uses the default in ip2.c (which selects polled mode). If no base addresses are specified the defaults in ip2.c are used. If you are autoloading the driver module with kerneld or kmod the base addresses and interrupt number must also be set in ip2.c -and recompile or just insert and options line in /etc/modprobe.conf or both. +and recompile or just insert and options line in /etc/modprobe.d/*.conf or both. The options line is equivalent to the command line and takes precedence over what is in ip2.c. -/etc/modprobe.conf sample: +config sample to put /etc/modprobe.d/*.conf: options ip2 io=1,0x328 irq=1,10 alias char-major-71 ip2 alias char-major-72 ip2 diff --git a/Documentation/serial/rocket.txt b/Documentation/serial/rocket.txt index 1d8582990435..60b039891057 100644 --- a/Documentation/serial/rocket.txt +++ b/Documentation/serial/rocket.txt @@ -62,7 +62,7 @@ in the system log at /var/log/messages. If installed as a module, the module must be loaded. This can be done manually by entering "modprobe rocket". To have the module loaded automatically -upon system boot, edit the /etc/modprobe.conf file and add the line +upon system boot, edit a /etc/modprobe.d/*.conf file and add the line "alias char-major-46 rocket". In order to use the ports, their device names (nodes) must be created with mknod. diff --git a/Documentation/serial/stallion.txt b/Documentation/serial/stallion.txt index 5c4902d9a5be..55090914a9c5 100644 --- a/Documentation/serial/stallion.txt +++ b/Documentation/serial/stallion.txt @@ -139,8 +139,8 @@ secondary address 0x280 and IRQ 10. You will probably want to enter this module load and configuration information into your system startup scripts so that the drivers are loaded and configured -on each system boot. Typically the start up script would be something like -/etc/modprobe.conf. +on each system boot. Typically configuration files are put in the +/etc/modprobe.d/ directory. 2.2 STATIC DRIVER CONFIGURATION: diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 936699e4f04b..8c16d50f6cb6 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt @@ -860,7 +860,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. [Multiple options for each card instance] model - force the model name - position_fix - Fix DMA pointer (0 = auto, 1 = use LPIB, 2 = POSBUF) + position_fix - Fix DMA pointer (0 = auto, 1 = use LPIB, 2 = POSBUF, + 3 = VIACOMBO, 4 = COMBO) probe_mask - Bitmask to probe codecs (default = -1, meaning all slots) When the bit 8 (0x100) is set, the lower 8 bits are used as the "fixed" codec slots; i.e. the driver probes the @@ -925,6 +926,11 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. (Usually SD_LPIB register is more accurate than the position buffer.) + position_fix=3 is specific to VIA devices. The position + of the capture stream is checked from both LPIB and POSBUF + values. position_fix=4 is a combination mode, using LPIB + for playback and POSBUF for capture. + NB: If you get many "azx_get_response timeout" messages at loading, it's likely a problem of interrupts (e.g. ACPI irq routing). Try to boot with options like "pci=noacpi". Also, you @@ -1588,7 +1594,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Module supports autoprobe a chip. - Note: the driver may have problems regarding endianess. + Note: the driver may have problems regarding endianness. The power-management is supported. @@ -2038,7 +2044,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Install the necessary firmware files in alsa-firmware package. When no hotplug fw loader is available, you need to load the firmware via vxloader utility in alsa-tools package. To invoke - vxloader automatically, add the following to /etc/modprobe.conf + vxloader automatically, add the following to /etc/modprobe.d/alsa.conf install snd-vx222 /sbin/modprobe --first-time -i snd-vx222 && /usr/bin/vxloader @@ -2162,10 +2168,10 @@ corresponds to the card index of ALSA. Usually, define this as the same card module. An example configuration for a single emu10k1 card is like below: ------ /etc/modprobe.conf +----- /etc/modprobe.d/alsa.conf alias snd-card-0 snd-emu10k1 alias sound-slot-0 snd-emu10k1 ------ /etc/modprobe.conf +----- /etc/modprobe.d/alsa.conf The available number of auto-loaded sound cards depends on the module option "cards_limit" of snd module. As default it's set to 1. @@ -2178,7 +2184,7 @@ cards is kept consistent. An example configuration for two sound cards is like below: ------ /etc/modprobe.conf +----- /etc/modprobe.d/alsa.conf # ALSA portion options snd cards_limit=2 alias snd-card-0 snd-interwave @@ -2188,7 +2194,7 @@ options snd-ens1371 index=1 # OSS/Free portion alias sound-slot-0 snd-interwave alias sound-slot-1 snd-ens1371 ------ /etc/modprobe.conf +----- /etc/modprobe.d/alsa.conf In this example, the interwave card is always loaded as the first card (index 0) and ens1371 as the second (index 1). diff --git a/Documentation/sound/alsa/Audiophile-Usb.txt b/Documentation/sound/alsa/Audiophile-Usb.txt index a4c53d8961e1..654dd3b694a8 100644 --- a/Documentation/sound/alsa/Audiophile-Usb.txt +++ b/Documentation/sound/alsa/Audiophile-Usb.txt @@ -232,7 +232,7 @@ The parameter can be given: # modprobe snd-usb-audio index=1 device_setup=0x09 * Or while configuring the modules options in your modules configuration file - - For Fedora distributions, edit the /etc/modprobe.conf file: + (tipically a .conf file in /etc/modprobe.d/ directory: alias snd-card-1 snd-usb-audio options snd-usb-audio index=1 device_setup=0x09 @@ -253,7 +253,7 @@ CAUTION when initializing the device - first turn off the device - de-register the snd-usb-audio module (modprobe -r) - change the device_setup parameter by changing the device_setup - option in /etc/modprobe.conf + option in /etc/modprobe.d/*.conf - turn on the device * A workaround for this last issue has been applied to kernel 2.6.23, but it may not be enough to ensure the 'stability' of the device initialization. diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index c8c54544abc5..03f7897c6414 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -8,37 +8,10 @@ ALC880 5stack-digout 5-jack in back, 2-jack in front, a SPDIF out 6stack 6-jack in back, 2-jack in front 6stack-digout 6-jack with a SPDIF out - w810 3-jack - z71v 3-jack (HP shared SPDIF) - asus 3-jack (ASUS Mobo) - asus-w1v ASUS W1V - asus-dig ASUS with SPDIF out - asus-dig2 ASUS with SPDIF out (using GPIO2) - uniwill 3-jack - fujitsu Fujitsu Laptops (Pi1536) - F1734 2-jack - lg LG laptop (m1 express dual) - lg-lw LG LW20/LW25 laptop - tcl TCL S700 - clevo Clevo laptops (m520G, m665n) - medion Medion Rim 2150 - test for testing/debugging purpose, almost all controls can be - adjusted. Appearing only when compiled with - $CONFIG_SND_DEBUG=y - auto auto-config reading BIOS (default) ALC260 ====== - fujitsu Fujitsu S7020 - acer Acer TravelMate - will Will laptops (PB V7900) - replacer Replacer 672V - favorit100 Maxdata Favorit 100XS - basic fixed pin assignment (old default model) - test for testing/debugging purpose, almost all controls can - adjusted. Appearing only when compiled with - $CONFIG_SND_DEBUG=y - auto auto-config reading BIOS (default) + N/A ALC262 ====== @@ -70,55 +43,9 @@ ALC680 ALC882/883/885/888/889 ====================== - 3stack-dig 3-jack with SPDIF I/O - 6stack-dig 6-jack digital with SPDIF I/O - arima Arima W820Di1 - targa Targa T8, MSI-1049 T8 - asus-a7j ASUS A7J - asus-a7m ASUS A7M - macpro MacPro support - mb5 Macbook 5,1 - macmini3 Macmini 3,1 - mba21 Macbook Air 2,1 - mbp3 Macbook Pro rev3 - imac24 iMac 24'' with jack detection - imac91 iMac 9,1 - w2jc ASUS W2JC - 3stack-2ch-dig 3-jack with SPDIF I/O (ALC883) - alc883-6stack-dig 6-jack digital with SPDIF I/O (ALC883) - 3stack-6ch 3-jack 6-channel - 3stack-6ch-dig 3-jack 6-channel with SPDIF I/O - 6stack-dig-demo 6-jack digital for Intel demo board - acer Acer laptops (Travelmate 3012WTMi, Aspire 5600, etc) - acer-aspire Acer Aspire 9810 - acer-aspire-4930g Acer Aspire 4930G - acer-aspire-6530g Acer Aspire 6530G - acer-aspire-7730g Acer Aspire 7730G - acer-aspire-8930g Acer Aspire 8930G - medion Medion Laptops - targa-dig Targa/MSI - targa-2ch-dig Targa/MSI with 2-channel - targa-8ch-dig Targa/MSI with 8-channel (MSI GX620) - laptop-eapd 3-jack with SPDIF I/O and EAPD (Clevo M540JE, M550JE) - lenovo-101e Lenovo 101E - lenovo-nb0763 Lenovo NB0763 - lenovo-ms7195-dig Lenovo MS7195 - lenovo-sky Lenovo Sky - haier-w66 Haier W66 - 3stack-hp HP machines with 3stack (Lucknow, Samba boards) - 6stack-dell Dell machines with 6stack (Inspiron 530) - mitac Mitac 8252D - clevo-m540r Clevo M540R (6ch + digital) - clevo-m720 Clevo M720 laptop series - fujitsu-pi2515 Fujitsu AMILO Pi2515 - fujitsu-xa3530 Fujitsu AMILO XA3530 - 3stack-6ch-intel Intel DG33* boards - intel-alc889a Intel IbexPeak with ALC889A - intel-x58 Intel DX58 with ALC889 - asus-p5q ASUS P5Q-EM boards - mb31 MacBook 3,1 - sony-vaio-tt Sony VAIO TT - auto auto-config reading BIOS (default) + acer-aspire-4930g Acer Aspire 4930G/5930G/6530G/6930G/7730G + acer-aspire-8930g Acer Aspire 8330G/6935G + acer-aspire Acer Aspire others ALC861/660 ========== diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt index 91fee3b45fb8..7813c06a5c71 100644 --- a/Documentation/sound/alsa/HD-Audio.txt +++ b/Documentation/sound/alsa/HD-Audio.txt @@ -59,7 +59,12 @@ a case, you can change the default method via `position_fix` option. `position_fix=1` means to use LPIB method explicitly. `position_fix=2` means to use the position-buffer. `position_fix=3` means to use a combination of both methods, needed -for some VIA and ATI controllers. 0 is the default value for all other +for some VIA controllers. The capture stream position is corrected +by comparing both LPIB and position-buffer values. +`position_fix=4` is another combination available for all controllers, +and uses LPIB for the playback and the position-buffer for the capture +streams. +0 is the default value for all other controllers, the automatic check and fallback to LPIB as described in the above. If you get a problem of repeated sounds, this option might help. diff --git a/Documentation/sound/alsa/MIXART.txt b/Documentation/sound/alsa/MIXART.txt index ef42c44fa1f2..4ee35b4fbe4a 100644 --- a/Documentation/sound/alsa/MIXART.txt +++ b/Documentation/sound/alsa/MIXART.txt @@ -76,9 +76,9 @@ FIRMWARE when CONFIG_FW_LOADER is set. The mixartloader is necessary only for older versions or when you build the driver into kernel.] -For loading the firmware automatically after the module is loaded, use -the post-install command. For example, add the following entry to -/etc/modprobe.conf for miXart driver: +For loading the firmware automatically after the module is loaded, use a +install command. For example, add the following entry to +/etc/modprobe.d/mixart.conf for miXart driver: install snd-mixart /sbin/modprobe --first-time -i snd-mixart && \ /usr/bin/mixartloader diff --git a/Documentation/sound/alsa/OSS-Emulation.txt b/Documentation/sound/alsa/OSS-Emulation.txt index 022aaeb0e9dd..152ca2a3f1bd 100644 --- a/Documentation/sound/alsa/OSS-Emulation.txt +++ b/Documentation/sound/alsa/OSS-Emulation.txt @@ -19,7 +19,7 @@ the card number and the minor unit number. Usually you don't have to define these aliases by yourself. Only necessary step for auto-loading of OSS modules is to define the -card alias in /etc/modprobe.conf, such as +card alias in /etc/modprobe.d/alsa.conf, such as alias sound-slot-0 snd-emu10k1 diff --git a/Documentation/sound/oss/AudioExcelDSP16 b/Documentation/sound/oss/AudioExcelDSP16 index e0dc0641b480..ea8549faede9 100644 --- a/Documentation/sound/oss/AudioExcelDSP16 +++ b/Documentation/sound/oss/AudioExcelDSP16 @@ -41,7 +41,7 @@ mpu_base I/O base address for activate MPU-401 mode (0x300, 0x310, 0x320 or 0x330) mpu_irq MPU-401 irq line (5, 7, 9, 10 or 0) -The /etc/modprobe.conf will have lines like this: +A configuration file in /etc/modprobe.d/ directory will have lines like this: options opl3 io=0x388 options ad1848 io=0x530 irq=11 dma=3 @@ -51,11 +51,11 @@ Where the aedsp16 options are the options for this driver while opl3 and ad1848 are the corresponding options for the MSS and OPL3 modules. Loading MSS and OPL3 needs to pre load the aedsp16 module to set up correctly -the sound card. Installation dependencies must be written in the modprobe.conf -file: +the sound card. Installation dependencies must be written in configuration +files under /etc/modprobe.d/ directory: -install ad1848 /sbin/modprobe aedsp16 && /sbin/modprobe -i ad1848 -install opl3 /sbin/modprobe aedsp16 && /sbin/modprobe -i opl3 +softdep ad1848 pre: aedsp16 +softdep opl3 pre: aedsp16 Then you must load the sound modules stack in this order: sound -> aedsp16 -> [ ad1848, opl3 ] diff --git a/Documentation/sound/oss/CMI8330 b/Documentation/sound/oss/CMI8330 index 9c439f1a6dba..8a5fd1611c6f 100644 --- a/Documentation/sound/oss/CMI8330 +++ b/Documentation/sound/oss/CMI8330 @@ -143,11 +143,10 @@ CONFIG_SOUND_MSS=m -Alma Chao <elysian@ethereal.torsion.org> suggests the following /etc/modprobe.conf: +Alma Chao <elysian@ethereal.torsion.org> suggests the following in +a /etc/modprobe.d/*conf file: alias sound ad1848 alias synth0 opl3 options ad1848 io=0x530 irq=7 dma=0 soundpro=1 options opl3 io=0x388 - - diff --git a/Documentation/sound/oss/Introduction b/Documentation/sound/oss/Introduction index 75d967ff9266..42da2d8fa372 100644 --- a/Documentation/sound/oss/Introduction +++ b/Documentation/sound/oss/Introduction @@ -167,8 +167,8 @@ in a file such as /root/soundon.sh. MODPROBE: ========= -If loading via modprobe, these common files are automatically loaded -when requested by modprobe. For example, my /etc/modprobe.conf contains: +If loading via modprobe, these common files are automatically loaded when +requested by modprobe. For example, my /etc/modprobe.d/oss.conf contains: alias sound sb options sb io=0x240 irq=9 dma=3 dma16=5 mpu_io=0x300 @@ -228,7 +228,7 @@ http://www.opensound.com. Before loading the commercial sound driver, you should do the following: 1. remove sound modules (detailed above) -2. remove the sound modules from /etc/modprobe.conf +2. remove the sound modules from /etc/modprobe.d/*.conf 3. move the sound modules from /lib/modules/<kernel>/misc (for example, I make a /lib/modules/<kernel>/misc/tmp directory and copy the sound module files to that @@ -265,7 +265,7 @@ twice, you need to do the following: sb.o could be copied (or symlinked) to sb1.o for the second SoundBlaster. -2. Make a second entry in /etc/modprobe.conf, for example, +2. Make a second entry in /etc/modprobe.d/*conf, for example, sound1 or sb1. This second entry should refer to the new module names for example sb1, and should include the I/O, etc. for the second sound card. @@ -369,7 +369,7 @@ There are several ways of configuring your sound: 2) On the command line when using insmod or in a bash script using command line calls to load sound. -3) In /etc/modprobe.conf when using modprobe. +3) In /etc/modprobe.d/*conf when using modprobe. 4) Via Red Hat's GPL'd /usr/sbin/sndconfig program (text based). diff --git a/Documentation/sound/oss/Opti b/Documentation/sound/oss/Opti index c15af3c07d46..4cd5d9ab3580 100644 --- a/Documentation/sound/oss/Opti +++ b/Documentation/sound/oss/Opti @@ -18,7 +18,7 @@ force the card into a mode in which it can be programmed. If you have another OS installed on your computer it is recommended that Linux and the other OS use the same resources. -Also, it is recommended that resources specified in /etc/modprobe.conf +Also, it is recommended that resources specified in /etc/modprobe.d/*.conf and resources specified in /etc/isapnp.conf agree. Compiling the sound driver @@ -67,11 +67,7 @@ address is hard-coded into the driver. Using kmod and autoloading the sound driver ------------------------------------------- -Comment: as of linux-2.1.90 kmod is replacing kerneld. -The config file '/etc/modprobe.conf' is used as before. - -This is the sound part of my /etc/modprobe.conf file. -Following that I will explain each line. +Config files in '/etc/modprobe.d/' are used as below: alias mixer0 mad16 alias audio0 mad16 diff --git a/Documentation/sound/oss/PAS16 b/Documentation/sound/oss/PAS16 index 3dca4b75988e..5c27229eec8c 100644 --- a/Documentation/sound/oss/PAS16 +++ b/Documentation/sound/oss/PAS16 @@ -128,7 +128,7 @@ CONFIG_SOUND_YM3812 You can then get OPL3 functionality by issuing the command: insmod opl3 In addition, you must either add the following line to - /etc/modprobe.conf: + /etc/modprobe.d/*.conf: options opl3 io=0x388 or else add the following line to /etc/lilo.conf: opl3=0x388 @@ -158,5 +158,5 @@ following line would be appropriate: append="pas2=0x388,10,3,-1,0,-1,-1,-1 opl3=0x388" If sound is built totally modular, the above options may be -specified in /etc/modprobe.conf for pas2, sb and opl3 +specified in /etc/modprobe.d/*.conf for pas2, sb and opl3 respectively. diff --git a/Documentation/sound/oss/README.modules b/Documentation/sound/oss/README.modules index e691d74e1e5e..cdc039421a46 100644 --- a/Documentation/sound/oss/README.modules +++ b/Documentation/sound/oss/README.modules @@ -26,7 +26,7 @@ Note that it is no longer necessary or possible to configure sound in the drivers/sound dir. Now one simply configures and makes one's kernel and modules in the usual way. - Then, add to your /etc/modprobe.conf something like: + Then, add to your /etc/modprobe.d/oss.conf something like: alias char-major-14-* sb install sb /sbin/modprobe -i sb && /sbin/modprobe adlib_card @@ -36,7 +36,7 @@ options adlib_card io=0x388 # FM synthesizer Alternatively, if you have compiled in kernel level ISAPnP support: alias char-major-14 sb -post-install sb /sbin/modprobe "-k" "adlib_card" +softdep sb post: adlib_card options adlib_card io=0x388 The effect of this is that the sound driver and all necessary bits and @@ -66,12 +66,12 @@ args are expected. Note that at present there is no way to configure the io, irq and other parameters for the modular drivers as one does for the wired drivers.. One needs to pass the modules the necessary parameters as arguments, either -with /etc/modprobe.conf or with command-line args to modprobe, e.g. +with /etc/modprobe.d/*.conf or with command-line args to modprobe, e.g. modprobe sb io=0x220 irq=7 dma=1 dma16=5 mpu_io=0x330 modprobe adlib_card io=0x388 - recommend using /etc/modprobe.conf. + recommend using /etc/modprobe.d/*.conf. Persistent DMA Buffers: @@ -89,7 +89,7 @@ wasteful of RAM, but it guarantees that sound always works. To make the sound driver use persistent DMA buffers we need to pass the sound.o module a "dmabuf=1" command-line argument. This is normally done -in /etc/modprobe.conf like so: +in /etc/modprobe.d/*.conf files like so: options sound dmabuf=1 diff --git a/Documentation/spi/spi-summary b/Documentation/spi/spi-summary index 4884cb33845d..7312ec14dd89 100644 --- a/Documentation/spi/spi-summary +++ b/Documentation/spi/spi-summary @@ -1,7 +1,7 @@ Overview of Linux kernel SPI support ==================================== -21-May-2007 +02-Feb-2012 What is SPI? ------------ @@ -483,9 +483,9 @@ also initialize its own internal state. (See below about bus numbering and those methods.) After you initialize the spi_master, then use spi_register_master() to -publish it to the rest of the system. At that time, device nodes for -the controller and any predeclared spi devices will be made available, -and the driver model core will take care of binding them to drivers. +publish it to the rest of the system. At that time, device nodes for the +controller and any predeclared spi devices will be made available, and +the driver model core will take care of binding them to drivers. If you need to remove your SPI controller driver, spi_unregister_master() will reverse the effect of spi_register_master(). @@ -521,21 +521,53 @@ SPI MASTER METHODS ** When you code setup(), ASSUME that the controller ** is actively processing transfers for another device. - master->transfer(struct spi_device *spi, struct spi_message *message) - This must not sleep. Its responsibility is arrange that the - transfer happens and its complete() callback is issued. The two - will normally happen later, after other transfers complete, and - if the controller is idle it will need to be kickstarted. - master->cleanup(struct spi_device *spi) Your controller driver may use spi_device.controller_state to hold state it dynamically associates with that device. If you do that, be sure to provide the cleanup() method to free that state. + master->prepare_transfer_hardware(struct spi_master *master) + This will be called by the queue mechanism to signal to the driver + that a message is coming in soon, so the subsystem requests the + driver to prepare the transfer hardware by issuing this call. + This may sleep. + + master->unprepare_transfer_hardware(struct spi_master *master) + This will be called by the queue mechanism to signal to the driver + that there are no more messages pending in the queue and it may + relax the hardware (e.g. by power management calls). This may sleep. + + master->transfer_one_message(struct spi_master *master, + struct spi_message *mesg) + The subsystem calls the driver to transfer a single message while + queuing transfers that arrive in the meantime. When the driver is + finished with this message, it must call + spi_finalize_current_message() so the subsystem can issue the next + transfer. This may sleep. + + DEPRECATED METHODS + + master->transfer(struct spi_device *spi, struct spi_message *message) + This must not sleep. Its responsibility is arrange that the + transfer happens and its complete() callback is issued. The two + will normally happen later, after other transfers complete, and + if the controller is idle it will need to be kickstarted. This + method is not used on queued controllers and must be NULL if + transfer_one_message() and (un)prepare_transfer_hardware() are + implemented. + SPI MESSAGE QUEUE -The bulk of the driver will be managing the I/O queue fed by transfer(). +If you are happy with the standard queueing mechanism provided by the +SPI subsystem, just implement the queued methods specified above. Using +the message queue has the upside of centralizing a lot of code and +providing pure process-context execution of methods. The message queue +can also be elevated to realtime priority on high-priority SPI traffic. + +Unless the queueing mechanism in the SPI subsystem is selected, the bulk +of the driver will be managing the I/O queue fed by the now deprecated +function transfer(). That queue could be purely conceptual. For example, a driver used only for low-frequency sensor access might be fine using synchronous PIO. @@ -561,4 +593,6 @@ Stephen Street Mark Underwood Andrew Victor Vitaly Wool - +Grant Likely +Mark Brown +Linus Walleij diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt new file mode 100644 index 000000000000..d93f3c00f245 --- /dev/null +++ b/Documentation/static-keys.txt @@ -0,0 +1,286 @@ + Static Keys + ----------- + +By: Jason Baron <jbaron@redhat.com> + +0) Abstract + +Static keys allows the inclusion of seldom used features in +performance-sensitive fast-path kernel code, via a GCC feature and a code +patching technique. A quick example: + + struct static_key key = STATIC_KEY_INIT_FALSE; + + ... + + if (static_key_false(&key)) + do unlikely code + else + do likely code + + ... + static_key_slow_inc(); + ... + static_key_slow_inc(); + ... + +The static_key_false() branch will be generated into the code with as little +impact to the likely code path as possible. + + +1) Motivation + + +Currently, tracepoints are implemented using a conditional branch. The +conditional check requires checking a global variable for each tracepoint. +Although the overhead of this check is small, it increases when the memory +cache comes under pressure (memory cache lines for these global variables may +be shared with other memory accesses). As we increase the number of tracepoints +in the kernel this overhead may become more of an issue. In addition, +tracepoints are often dormant (disabled) and provide no direct kernel +functionality. Thus, it is highly desirable to reduce their impact as much as +possible. Although tracepoints are the original motivation for this work, other +kernel code paths should be able to make use of the static keys facility. + + +2) Solution + + +gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label: + +http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01556.html + +Using the 'asm goto', we can create branches that are either taken or not taken +by default, without the need to check memory. Then, at run-time, we can patch +the branch site to change the branch direction. + +For example, if we have a simple branch that is disabled by default: + + if (static_key_false(&key)) + printk("I am the true branch\n"); + +Thus, by default the 'printk' will not be emitted. And the code generated will +consist of a single atomic 'no-op' instruction (5 bytes on x86), in the +straight-line code path. When the branch is 'flipped', we will patch the +'no-op' in the straight-line codepath with a 'jump' instruction to the +out-of-line true branch. Thus, changing branch direction is expensive but +branch selection is basically 'free'. That is the basic tradeoff of this +optimization. + +This lowlevel patching mechanism is called 'jump label patching', and it gives +the basis for the static keys facility. + +3) Static key label API, usage and examples: + + +In order to make use of this optimization you must first define a key: + + struct static_key key; + +Which is initialized as: + + struct static_key key = STATIC_KEY_INIT_TRUE; + +or: + + struct static_key key = STATIC_KEY_INIT_FALSE; + +If the key is not initialized, it is default false. The 'struct static_key', +must be a 'global'. That is, it can't be allocated on the stack or dynamically +allocated at run-time. + +The key is then used in code as: + + if (static_key_false(&key)) + do unlikely code + else + do likely code + +Or: + + if (static_key_true(&key)) + do likely code + else + do unlikely code + +A key that is initialized via 'STATIC_KEY_INIT_FALSE', must be used in a +'static_key_false()' construct. Likewise, a key initialized via +'STATIC_KEY_INIT_TRUE' must be used in a 'static_key_true()' construct. A +single key can be used in many branches, but all the branches must match the +way that the key has been initialized. + +The branch(es) can then be switched via: + + static_key_slow_inc(&key); + ... + static_key_slow_dec(&key); + +Thus, 'static_key_slow_inc()' means 'make the branch true', and +'static_key_slow_dec()' means 'make the the branch false' with appropriate +reference counting. For example, if the key is initialized true, a +static_key_slow_dec(), will switch the branch to false. And a subsequent +static_key_slow_inc(), will change the branch back to true. Likewise, if the +key is initialized false, a 'static_key_slow_inc()', will change the branch to +true. And then a 'static_key_slow_dec()', will again make the branch false. + +An example usage in the kernel is the implementation of tracepoints: + + static inline void trace_##name(proto) \ + { \ + if (static_key_false(&__tracepoint_##name.key)) \ + __DO_TRACE(&__tracepoint_##name, \ + TP_PROTO(data_proto), \ + TP_ARGS(data_args), \ + TP_CONDITION(cond)); \ + } + +Tracepoints are disabled by default, and can be placed in performance critical +pieces of the kernel. Thus, by using a static key, the tracepoints can have +absolutely minimal impact when not in use. + + +4) Architecture level code patching interface, 'jump labels' + + +There are a few functions and macros that architectures must implement in order +to take advantage of this optimization. If there is no architecture support, we +simply fall back to a traditional, load, test, and jump sequence. + +* select HAVE_ARCH_JUMP_LABEL, see: arch/x86/Kconfig + +* #define JUMP_LABEL_NOP_SIZE, see: arch/x86/include/asm/jump_label.h + +* __always_inline bool arch_static_branch(struct static_key *key), see: + arch/x86/include/asm/jump_label.h + +* void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type), + see: arch/x86/kernel/jump_label.c + +* __init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type), + see: arch/x86/kernel/jump_label.c + + +* struct jump_entry, see: arch/x86/include/asm/jump_label.h + + +5) Static keys / jump label analysis, results (x86_64): + + +As an example, let's add the following branch to 'getppid()', such that the +system call now looks like: + +SYSCALL_DEFINE0(getppid) +{ + int pid; + ++ if (static_key_false(&key)) ++ printk("I am the true branch\n"); + + rcu_read_lock(); + pid = task_tgid_vnr(rcu_dereference(current->real_parent)); + rcu_read_unlock(); + + return pid; +} + +The resulting instructions with jump labels generated by GCC is: + +ffffffff81044290 <sys_getppid>: +ffffffff81044290: 55 push %rbp +ffffffff81044291: 48 89 e5 mov %rsp,%rbp +ffffffff81044294: e9 00 00 00 00 jmpq ffffffff81044299 <sys_getppid+0x9> +ffffffff81044299: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax +ffffffff810442a0: 00 00 +ffffffff810442a2: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax +ffffffff810442a9: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax +ffffffff810442b0: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi +ffffffff810442b7: e8 f4 d9 00 00 callq ffffffff81051cb0 <pid_vnr> +ffffffff810442bc: 5d pop %rbp +ffffffff810442bd: 48 98 cltq +ffffffff810442bf: c3 retq +ffffffff810442c0: 48 c7 c7 e3 54 98 81 mov $0xffffffff819854e3,%rdi +ffffffff810442c7: 31 c0 xor %eax,%eax +ffffffff810442c9: e8 71 13 6d 00 callq ffffffff8171563f <printk> +ffffffff810442ce: eb c9 jmp ffffffff81044299 <sys_getppid+0x9> + +Without the jump label optimization it looks like: + +ffffffff810441f0 <sys_getppid>: +ffffffff810441f0: 8b 05 8a 52 d8 00 mov 0xd8528a(%rip),%eax # ffffffff81dc9480 <key> +ffffffff810441f6: 55 push %rbp +ffffffff810441f7: 48 89 e5 mov %rsp,%rbp +ffffffff810441fa: 85 c0 test %eax,%eax +ffffffff810441fc: 75 27 jne ffffffff81044225 <sys_getppid+0x35> +ffffffff810441fe: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax +ffffffff81044205: 00 00 +ffffffff81044207: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax +ffffffff8104420e: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax +ffffffff81044215: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi +ffffffff8104421c: e8 2f da 00 00 callq ffffffff81051c50 <pid_vnr> +ffffffff81044221: 5d pop %rbp +ffffffff81044222: 48 98 cltq +ffffffff81044224: c3 retq +ffffffff81044225: 48 c7 c7 13 53 98 81 mov $0xffffffff81985313,%rdi +ffffffff8104422c: 31 c0 xor %eax,%eax +ffffffff8104422e: e8 60 0f 6d 00 callq ffffffff81715193 <printk> +ffffffff81044233: eb c9 jmp ffffffff810441fe <sys_getppid+0xe> +ffffffff81044235: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) +ffffffff8104423c: 00 00 00 00 + +Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction +vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched +to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump +label case adds: + +6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes. + +If we then include the padding bytes, the jump label code saves, 16 total bytes +of instruction memory for this small fucntion. In this case the non-jump label +function is 80 bytes long. Thus, we have have saved 20% of the instruction +footprint. We can in fact improve this even further, since the 5-byte no-op +really can be a 2-byte no-op since we can reach the branch with a 2-byte jmp. +However, we have not yet implemented optimal no-op sizes (they are currently +hard-coded). + +Since there are a number of static key API uses in the scheduler paths, +'pipe-test' (also known as 'perf bench sched pipe') can be used to show the +performance improvement. Testing done on 3.3.0-rc2: + +jump label disabled: + + Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): + + 855.700314 task-clock # 0.534 CPUs utilized ( +- 0.11% ) + 200,003 context-switches # 0.234 M/sec ( +- 0.00% ) + 0 CPU-migrations # 0.000 M/sec ( +- 39.58% ) + 487 page-faults # 0.001 M/sec ( +- 0.02% ) + 1,474,374,262 cycles # 1.723 GHz ( +- 0.17% ) + <not supported> stalled-cycles-frontend + <not supported> stalled-cycles-backend + 1,178,049,567 instructions # 0.80 insns per cycle ( +- 0.06% ) + 208,368,926 branches # 243.507 M/sec ( +- 0.06% ) + 5,569,188 branch-misses # 2.67% of all branches ( +- 0.54% ) + + 1.601607384 seconds time elapsed ( +- 0.07% ) + +jump label enabled: + + Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): + + 841.043185 task-clock # 0.533 CPUs utilized ( +- 0.12% ) + 200,004 context-switches # 0.238 M/sec ( +- 0.00% ) + 0 CPU-migrations # 0.000 M/sec ( +- 40.87% ) + 487 page-faults # 0.001 M/sec ( +- 0.05% ) + 1,432,559,428 cycles # 1.703 GHz ( +- 0.18% ) + <not supported> stalled-cycles-frontend + <not supported> stalled-cycles-backend + 1,175,363,994 instructions # 0.82 insns per cycle ( +- 0.04% ) + 206,859,359 branches # 245.956 M/sec ( +- 0.04% ) + 4,884,119 branch-misses # 2.36% of all branches ( +- 0.85% ) + + 1.579384366 seconds time elapsed + +The percentage of saved branches is .7%, and we've saved 12% on +'branch-misses'. This is where we would expect to get the most savings, since +this optimization is about reducing the number of branches. In addition, we've +saved .2% on instructions, and 2.8% on cycles and 1.4% on elapsed time. diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt index 312e3754e8c5..642f84495b29 100644 --- a/Documentation/sysrq.txt +++ b/Documentation/sysrq.txt @@ -241,9 +241,8 @@ command you are interested in. * I have more questions, who can I ask? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -And I'll answer any questions about the registration system you got, also -responding as soon as possible. - -Crutcher +Just ask them on the linux-kernel mailing list: + linux-kernel@vger.kernel.org * Credits ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/target/tcm_mod_builder.py b/Documentation/target/tcm_mod_builder.py index 6e21b8b52638..a78879b01f09 100755 --- a/Documentation/target/tcm_mod_builder.py +++ b/Documentation/target/tcm_mod_builder.py @@ -775,7 +775,7 @@ def tcm_mod_dump_fabric_ops(proto_ident, fabric_mod_dir_var, fabric_mod_name): buf += " struct " + fabric_mod_name + "_nacl *nacl;\n\n" buf += " nacl = kzalloc(sizeof(struct " + fabric_mod_name + "_nacl), GFP_KERNEL);\n" buf += " if (!nacl) {\n" - buf += " printk(KERN_ERR \"Unable to alocate struct " + fabric_mod_name + "_nacl\\n\");\n" + buf += " printk(KERN_ERR \"Unable to allocate struct " + fabric_mod_name + "_nacl\\n\");\n" buf += " return NULL;\n" buf += " }\n\n" buf += " return &nacl->se_node_acl;\n" diff --git a/Documentation/trace/events-power.txt b/Documentation/trace/events-power.txt index 96d87b67fe37..cf794af22855 100644 --- a/Documentation/trace/events-power.txt +++ b/Documentation/trace/events-power.txt @@ -57,7 +57,7 @@ power_end "cpu_id=%lu" The 'type' parameter takes one of those macros: . POWER_NONE = 0, . POWER_CSTATE = 1, /* C-State */ - . POWER_PSTATE = 2, /* Fequency change or DVFS */ + . POWER_PSTATE = 2, /* Frequency change or DVFS */ The 'state' parameter is set depending on the type: . Target C-state for type=POWER_CSTATE, diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index 1ebc24cf9a55..6f51fed45f2d 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -226,6 +226,13 @@ Here is the list of current tracers that may be configured. Traces and records the max latency that it takes for the highest priority task to get scheduled after it has been woken up. + Traces all tasks as an average developer would expect. + + "wakeup_rt" + + Traces and records the max latency that it takes for just + RT tasks (as the current "wakeup" does). This is useful + for those interested in wake up timings of RT tasks. "hw-branch-tracer" diff --git a/Documentation/usb/URB.txt b/Documentation/usb/URB.txt index 8ffce746d496..00d2c644068e 100644 --- a/Documentation/usb/URB.txt +++ b/Documentation/usb/URB.txt @@ -168,6 +168,28 @@ that if the completion handler or anyone else tries to resubmit it they will get a -EPERM error. Thus you can be sure that when usb_kill_urb() returns, the URB is totally idle. +There is a lifetime issue to consider. An URB may complete at any +time, and the completion handler may free the URB. If this happens +while usb_unlink_urb or usb_kill_urb is running, it will cause a +memory-access violation. The driver is responsible for avoiding this, +which often means some sort of lock will be needed to prevent the URB +from being deallocated while it is still in use. + +On the other hand, since usb_unlink_urb may end up calling the +completion handler, the handler must not take any lock that is held +when usb_unlink_urb is invoked. The general solution to this problem +is to increment the URB's reference count while holding the lock, then +drop the lock and call usb_unlink_urb or usb_kill_urb, and then +decrement the URB's reference count. You increment the reference +count by calling + + struct urb *usb_get_urb(struct urb *urb) + +(ignore the return value; it is the same as the argument) and +decrement the reference count by calling usb_free_urb. Of course, +none of this is necessary if there's no danger of the URB being freed +by the completion handler. + 1.7. What about the completion handler? diff --git a/Documentation/usb/mtouchusb.txt b/Documentation/usb/mtouchusb.txt index 86302cd53ed3..a91adb26ea7b 100644 --- a/Documentation/usb/mtouchusb.txt +++ b/Documentation/usb/mtouchusb.txt @@ -1,7 +1,7 @@ CHANGES - 0.3 - Created based off of scanner & INSTALL from the original touchscreen - driver on freshmeat (http://freshmeat.net/projects/3mtouchscreendriver) + driver on freecode (http://freecode.com/projects/3mtouchscreendriver) - Amended for linux-2.4.18, then 2.4.19 - 0.5 - Complete rewrite using Linux Input in 2.6.3 diff --git a/Documentation/usb/power-management.txt b/Documentation/usb/power-management.txt index 12511c98cc4f..4204eb01fd38 100644 --- a/Documentation/usb/power-management.txt +++ b/Documentation/usb/power-management.txt @@ -179,7 +179,8 @@ do: modprobe usbcore autosuspend=5 -Equivalently, you could add to /etc/modprobe.conf a line saying: +Equivalently, you could add to a configuration file in /etc/modprobe.d +a line saying: options usbcore autosuspend=5 @@ -345,7 +346,7 @@ autosuspend the device. Drivers need not be concerned about balancing changes to the usage counter; the USB core will undo any remaining "get"s when a driver is unbound from its interface. As a corollary, drivers must not call -any of the usb_autopm_* functions after their diconnect() routine has +any of the usb_autopm_* functions after their disconnect() routine has returned. Drivers using the async routines are responsible for their own diff --git a/Documentation/usb/proc_usb_info.txt b/Documentation/usb/proc_usb_info.txt index afe596d5f201..c9c3f0f5ad7b 100644 --- a/Documentation/usb/proc_usb_info.txt +++ b/Documentation/usb/proc_usb_info.txt @@ -7,7 +7,7 @@ The usbfs filesystem for USB devices is traditionally mounted at /proc/bus/usb. It provides the /proc/bus/usb/devices file, as well as the /proc/bus/usb/BBB/DDD files. -In many modern systems the usbfs filsystem isn't used at all. Instead +In many modern systems the usbfs filesystem isn't used at all. Instead USB device nodes are created under /dev/usb/ or someplace similar. The "devices" file is available in debugfs, typically as /sys/kernel/debug/usb/devices. diff --git a/Documentation/usb/usbmon.txt b/Documentation/usb/usbmon.txt index 5335fa8b06eb..c42bb9cd3b43 100644 --- a/Documentation/usb/usbmon.txt +++ b/Documentation/usb/usbmon.txt @@ -183,10 +183,10 @@ An input control transfer to get a port status. d5ea89a0 3575914555 S Ci:1:001:0 s a3 00 0000 0003 0004 4 < d5ea89a0 3575914560 C Ci:1:001:0 0 4 = 01050000 -An output bulk transfer to send a SCSI command 0x5E in a 31-byte Bulk wrapper -to a storage device at address 5: +An output bulk transfer to send a SCSI command 0x28 (READ_10) in a 31-byte +Bulk wrapper to a storage device at address 5: -dd65f0e8 4128379752 S Bo:1:005:2 -115 31 = 55534243 5e000000 00000000 00000600 00000000 00000000 00000000 000000 +dd65f0e8 4128379752 S Bo:1:005:2 -115 31 = 55534243 ad000000 00800000 80010a28 20000000 20000040 00000000 000000 dd65f0e8 4128379808 C Bo:1:005:2 0 31 > * Raw binary format and API diff --git a/Documentation/video4linux/CQcam.txt b/Documentation/video4linux/CQcam.txt index 8977e7ce4dab..6e680fec1e9c 100644 --- a/Documentation/video4linux/CQcam.txt +++ b/Documentation/video4linux/CQcam.txt @@ -61,29 +61,19 @@ But that is my personal preference. 2.2 Configuration The configuration requires module configuration and device -configuration. I like kmod or kerneld process with the -/etc/modprobe.conf file so the modules can automatically load/unload as -they are used. The video devices could already exist, be generated -using MAKEDEV, or need to be created. The following sections detail -these procedures. +configuration. The following sections detail these procedures. 2.1 Module Configuration Using modules requires a bit of work to install and pass the -parameters. Understand that entries in /etc/modprobe.conf of: +parameters. Understand that entries in /etc/modprobe.d/*.conf of: alias parport_lowlevel parport_pc options parport_pc io=0x378 irq=none alias char-major-81 videodev alias char-major-81-0 c-qcam -will cause the kmod/modprobe to do certain things. If you are -using kmod, then a request for a 'char-major-81-0' will cause -the 'c-qcam' module to load. If you have other video sources with -modules, you might want to assign the different minor numbers to -different modules. - 2.2 Device Configuration At this point, we need to ensure that the device files exist. diff --git a/Documentation/video4linux/Zoran b/Documentation/video4linux/Zoran index 9ed629d4874b..b5a911fd0602 100644 --- a/Documentation/video4linux/Zoran +++ b/Documentation/video4linux/Zoran @@ -255,7 +255,7 @@ Load zr36067.o. If it can't autodetect your card, use the card=X insmod option with X being the card number as given in the previous section. To have more than one card, use card=X1[,X2[,X3,[X4[..]]]] -To automate this, add the following to your /etc/modprobe.conf: +To automate this, add the following to your /etc/modprobe.d/zoran.conf: options zr36067 card=X1[,X2[,X3[,X4[..]]]] alias char-major-81-0 zr36067 diff --git a/Documentation/video4linux/bttv/Modules.conf b/Documentation/video4linux/bttv/Modules.conf index 753f15956eb8..8f258faf18f1 100644 --- a/Documentation/video4linux/bttv/Modules.conf +++ b/Documentation/video4linux/bttv/Modules.conf @@ -1,4 +1,4 @@ -# For modern kernels (2.6 or above), this belongs in /etc/modprobe.conf +# For modern kernels (2.6 or above), this belongs in /etc/modprobe.d/*.conf # For for 2.4 kernels or earlier, this belongs in /etc/modules.conf. # i2c diff --git a/Documentation/video4linux/meye.txt b/Documentation/video4linux/meye.txt index 34e2842c70ae..a051152ea99c 100644 --- a/Documentation/video4linux/meye.txt +++ b/Documentation/video4linux/meye.txt @@ -55,7 +55,7 @@ Module use: ----------- In order to automatically load the meye module on use, you can put those lines -in your /etc/modprobe.conf file: +in your /etc/modprobe.d/meye.conf file: alias char-major-81 videodev alias char-major-81-0 meye diff --git a/Documentation/video4linux/uvcvideo.txt b/Documentation/video4linux/uvcvideo.txt index 848d620dcc5c..35ce19cddcf8 100644 --- a/Documentation/video4linux/uvcvideo.txt +++ b/Documentation/video4linux/uvcvideo.txt @@ -116,7 +116,7 @@ Description: A UVC control can be mapped to several V4L2 controls. For instance, a UVC pan/tilt control could be mapped to separate pan and tilt V4L2 controls. The UVC control is divided into non overlapping fields using - the 'size' and 'offset' fields and are then independantly mapped to + the 'size' and 'offset' fields and are then independently mapped to V4L2 control. For signed integer V4L2 controls the data_type field should be set to diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e1d94bf4056e..6386f8c0482e 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -95,7 +95,7 @@ described as 'basic' will be available. Capability: basic Architectures: all Type: system ioctl -Parameters: none +Parameters: machine type identifier (KVM_VM_*) Returns: a VM fd that can be used to control the new virtual machine. The new VM has no virtual cpus and no memory. An mmap() of a VM fd @@ -103,6 +103,11 @@ will access the virtual machine's physical address space; offset zero corresponds to guest physical address zero. Use of mmap() on a VM fd is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is available. +You most certainly want to use 0 as machine type. + +In order to create user controlled virtual machines on S390, check +KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as +privileged user (CAP_SYS_ADMIN). 4.3 KVM_GET_MSR_INDEX_LIST @@ -213,6 +218,11 @@ allocation of vcpu ids. For example, if userspace wants single-threaded guest vcpus, it should make all vcpu ids be a multiple of the number of vcpus per vcore. +For virtual cpus that have been created with S390 user controlled virtual +machines, the resulting vcpu fd can be memory mapped at page offset +KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual +cpu's hardware control block. + 4.8 KVM_GET_DIRTY_LOG (vm ioctl) Capability: basic @@ -1159,6 +1169,14 @@ following flags are specified: /* Depends on KVM_CAP_IOMMU */ #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) +/* The following two depend on KVM_CAP_PCI_2_3 */ +#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) +#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2) + +If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx interrupts +via the PCI-2.3-compliant device-level mask, thus enable IRQ sharing with other +assigned devices or host devices. KVM_DEV_ASSIGN_MASK_INTX specifies the +guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for details. The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure isolation of the device. Usages not specifying this flag are deprecated. @@ -1399,6 +1417,71 @@ The following flags are defined: If datamatch flag is set, the event will be signaled only if the written value to the registered address is equal to datamatch in struct kvm_ioeventfd. +4.59 KVM_DIRTY_TLB + +Capability: KVM_CAP_SW_TLB +Architectures: ppc +Type: vcpu ioctl +Parameters: struct kvm_dirty_tlb (in) +Returns: 0 on success, -1 on error + +struct kvm_dirty_tlb { + __u64 bitmap; + __u32 num_dirty; +}; + +This must be called whenever userspace has changed an entry in the shared +TLB, prior to calling KVM_RUN on the associated vcpu. + +The "bitmap" field is the userspace address of an array. This array +consists of a number of bits, equal to the total number of TLB entries as +determined by the last successful call to KVM_CONFIG_TLB, rounded up to the +nearest multiple of 64. + +Each bit corresponds to one TLB entry, ordered the same as in the shared TLB +array. + +The array is little-endian: the bit 0 is the least significant bit of the +first byte, bit 8 is the least significant bit of the second byte, etc. +This avoids any complications with differing word sizes. + +The "num_dirty" field is a performance hint for KVM to determine whether it +should skip processing the bitmap and just invalidate everything. It must +be set to the number of set bits in the bitmap. + +4.60 KVM_ASSIGN_SET_INTX_MASK + +Capability: KVM_CAP_PCI_2_3 +Architectures: x86 +Type: vm ioctl +Parameters: struct kvm_assigned_pci_dev (in) +Returns: 0 on success, -1 on error + +Allows userspace to mask PCI INTx interrupts from the assigned device. The +kernel will not deliver INTx interrupts to the guest between setting and +clearing of KVM_ASSIGN_SET_INTX_MASK via this interface. This enables use of +and emulation of PCI 2.3 INTx disable command register behavior. + +This may be used for both PCI 2.3 devices supporting INTx disable natively and +older devices lacking this support. Userspace is responsible for emulating the +read value of the INTx disable bit in the guest visible PCI command register. +When modifying the INTx disable state, userspace should precede updating the +physical device command register by calling this ioctl to inform the kernel of +the new intended INTx mask state. + +Note that the kernel uses the device INTx disable bit to internally manage the +device interrupt state for PCI 2.3 devices. Reads of this register may +therefore not match the expected value. Writes should always use the guest +intended INTx disable value rather than attempting to read-copy-update the +current physical device state. Races between user and kernel updates to the +INTx disable bit are handled lazily in the kernel. It's possible the device +may generate unintended interrupts, but they will not be injected into the +guest. + +See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified +by assigned_dev_id. In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is +evaluated. + 4.62 KVM_CREATE_SPAPR_TCE Capability: KVM_CAP_SPAPR_TCE @@ -1491,6 +1574,101 @@ following algorithm: Some guests configure the LINT1 NMI input to cause a panic, aiding in debugging. +4.65 KVM_S390_UCAS_MAP + +Capability: KVM_CAP_S390_UCONTROL +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_s390_ucas_mapping (in) +Returns: 0 in case of success + +The parameter is defined like this: + struct kvm_s390_ucas_mapping { + __u64 user_addr; + __u64 vcpu_addr; + __u64 length; + }; + +This ioctl maps the memory at "user_addr" with the length "length" to +the vcpu's address space starting at "vcpu_addr". All parameters need to +be alligned by 1 megabyte. + +4.66 KVM_S390_UCAS_UNMAP + +Capability: KVM_CAP_S390_UCONTROL +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_s390_ucas_mapping (in) +Returns: 0 in case of success + +The parameter is defined like this: + struct kvm_s390_ucas_mapping { + __u64 user_addr; + __u64 vcpu_addr; + __u64 length; + }; + +This ioctl unmaps the memory in the vcpu's address space starting at +"vcpu_addr" with the length "length". The field "user_addr" is ignored. +All parameters need to be alligned by 1 megabyte. + +4.67 KVM_S390_VCPU_FAULT + +Capability: KVM_CAP_S390_UCONTROL +Architectures: s390 +Type: vcpu ioctl +Parameters: vcpu absolute address (in) +Returns: 0 in case of success + +This call creates a page table entry on the virtual cpu's address space +(for user controlled virtual machines) or the virtual machine's address +space (for regular virtual machines). This only works for minor faults, +thus it's recommended to access subject memory page via the user page +table upfront. This is useful to handle validity intercepts for user +controlled virtual machines to fault in the virtual cpu's lowcore pages +prior to calling the KVM_RUN ioctl. + +4.68 KVM_SET_ONE_REG + +Capability: KVM_CAP_ONE_REG +Architectures: all +Type: vcpu ioctl +Parameters: struct kvm_one_reg (in) +Returns: 0 on success, negative value on failure + +struct kvm_one_reg { + __u64 id; + __u64 addr; +}; + +Using this ioctl, a single vcpu register can be set to a specific value +defined by user space with the passed in struct kvm_one_reg, where id +refers to the register identifier as described below and addr is a pointer +to a variable with the respective size. There can be architecture agnostic +and architecture specific registers. Each have their own range of operation +and their own constants and width. To keep track of the implemented +registers, find a list below: + + Arch | Register | Width (bits) + | | + PPC | KVM_REG_PPC_HIOR | 64 + +4.69 KVM_GET_ONE_REG + +Capability: KVM_CAP_ONE_REG +Architectures: all +Type: vcpu ioctl +Parameters: struct kvm_one_reg (in and out) +Returns: 0 on success, negative value on failure + +This ioctl allows to receive the value of a single register implemented +in a vcpu. The register to read is indicated by the "id" field of the +kvm_one_reg struct passed in. On success, the register value can be found +at the memory location pointed to by "addr". + +The list of registers accessible using this interface is identical to the +list in 4.64. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by @@ -1651,6 +1829,20 @@ s390 specific. s390 specific. + /* KVM_EXIT_S390_UCONTROL */ + struct { + __u64 trans_exc_code; + __u32 pgm_code; + } s390_ucontrol; + +s390 specific. A page fault has occurred for a user controlled virtual +machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be +resolved by the kernel. +The program code and the translation exception code that were placed +in the cpu's lowcore are presented here as defined by the z Architecture +Principles of Operation Book in the Chapter for Dynamic Address Translation +(DAT) + /* KVM_EXIT_DCR */ struct { __u32 dcrn; @@ -1693,6 +1885,29 @@ developer registration required to access it). /* Fix the size of the union. */ char padding[256]; }; + + /* + * shared registers between kvm and userspace. + * kvm_valid_regs specifies the register classes set by the host + * kvm_dirty_regs specified the register classes dirtied by userspace + * struct kvm_sync_regs is architecture specific, as well as the + * bits for kvm_valid_regs and kvm_dirty_regs + */ + __u64 kvm_valid_regs; + __u64 kvm_dirty_regs; + union { + struct kvm_sync_regs regs; + char padding[1024]; + } s; + +If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access +certain guest registers without having to call SET/GET_*REGS. Thus we can +avoid some system call overhead if userspace has to handle the exit. +Userspace can query the validity of the structure by checking +kvm_valid_regs for specific bits. These bits are architecture specific +and usually define the validity of a groups of registers. (e.g. one bit + for general purpose registers) + }; 6. Capabilities that can be enabled @@ -1741,3 +1956,45 @@ HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the HTAB invisible to the guest. When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur. + +6.3 KVM_CAP_SW_TLB + +Architectures: ppc +Parameters: args[0] is the address of a struct kvm_config_tlb +Returns: 0 on success; -1 on error + +struct kvm_config_tlb { + __u64 params; + __u64 array; + __u32 mmu_type; + __u32 array_len; +}; + +Configures the virtual CPU's TLB array, establishing a shared memory area +between userspace and KVM. The "params" and "array" fields are userspace +addresses of mmu-type-specific data structures. The "array_len" field is an +safety mechanism, and should be set to the size in bytes of the memory that +userspace has reserved for the array. It must be at least the size dictated +by "mmu_type" and "params". + +While KVM_RUN is active, the shared region is under control of KVM. Its +contents are undefined, and any modification by userspace results in +boundedly undefined behavior. + +On return from KVM_RUN, the shared region will reflect the current state of +the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB +to tell KVM which entries have been changed, prior to calling KVM_RUN again +on this vcpu. + +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: + - The "params" field is of type "struct kvm_book3e_206_tlb_params". + - The "array" field points to an array of type "struct + kvm_book3e_206_tlb_entry". + - The array consists of all entries in the first TLB, followed by all + entries in the second TLB. + - Within a TLB, entries are ordered first by increasing set number. Within a + set, entries are ordered by way (increasing ESEL). + - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1) + where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value. + - The tsize field of mas1 shall be set to 4K on TLB0, even though the + hardware ignores this value for TLB0. diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index 5dc972c09b55..fa5f1dbc6b23 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -347,7 +347,7 @@ To instantiate a large spte, four constraints must be satisfied: - the spte must point to a large host page - the guest pte must be a large pte of at least equivalent size (if tdp is - enabled, there is no guest pte and this condition is satisified) + enabled, there is no guest pte and this condition is satisfied) - if the spte will be writeable, the large page frame may not overlap any write-protected pages - the guest page must be wholly contained by a single memory slot @@ -356,7 +356,7 @@ To check the last two conditions, the mmu maintains a ->write_count set of arrays for each memory slot and large page size. Every write protected page causes its write_count to be incremented, thus preventing instantiation of a large spte. The frames at the end of an unaligned memory slot have -artificically inflated ->write_counts so they can never be instantiated. +artificially inflated ->write_counts so they can never be instantiated. Further reading =============== diff --git a/Documentation/virtual/kvm/ppc-pv.txt b/Documentation/virtual/kvm/ppc-pv.txt index 2b7ce190cde4..6e7c37050930 100644 --- a/Documentation/virtual/kvm/ppc-pv.txt +++ b/Documentation/virtual/kvm/ppc-pv.txt @@ -81,28 +81,8 @@ additional registers to the magic page. If you add fields to the magic page, also define a new hypercall feature to indicate that the host can give you more registers. Only if the host supports the additional features, make use of them. -The magic page has the following layout as described in -arch/powerpc/include/asm/kvm_para.h: - -struct kvm_vcpu_arch_shared { - __u64 scratch1; - __u64 scratch2; - __u64 scratch3; - __u64 critical; /* Guest may not get interrupts if == r1 */ - __u64 sprg0; - __u64 sprg1; - __u64 sprg2; - __u64 sprg3; - __u64 srr0; - __u64 srr1; - __u64 dar; - __u64 msr; - __u32 dsisr; - __u32 int_pending; /* Tells the guest if we have an interrupt */ -}; - -Additions to the page must only occur at the end. Struct fields are always 32 -or 64 bit aligned, depending on them being 32 or 64 bit wide respectively. +The magic page layout is described by struct kvm_vcpu_arch_shared +in arch/powerpc/include/asm/kvm_para.h. Magic page features =================== diff --git a/Documentation/virtual/virtio-spec.txt b/Documentation/virtual/virtio-spec.txt index a350ae135b8c..da094737e2f8 100644 --- a/Documentation/virtual/virtio-spec.txt +++ b/Documentation/virtual/virtio-spec.txt @@ -1403,7 +1403,7 @@ segmentation, if both guests are amenable. Packets are transmitted by placing them in the transmitq, and buffers for incoming packets are placed in the receiveq. In each -case, the packet itself is preceeded by a header: +case, the packet itself is preceded by a header: struct virtio_net_hdr { @@ -1642,7 +1642,7 @@ struct virtio_net_ctrl_mac { The device can filter incoming packets by any number of destination MAC addresses.[footnote: -Since there are no guarentees, it can use a hash filter +Since there are no guarantees, it can use a hash filter orsilently switch to allmulti or promiscuous mode if it is given too many addresses. ] This table is set using the class VIRTIO_NET_CTRL_MAC and the @@ -1805,7 +1805,7 @@ the FLUSH and FLUSH_OUT types are equivalent, the device does not distinguish between them ]). If the device has VIRTIO_BLK_F_BARRIER feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this request acts as a -barrier and that all preceeding requests must be complete before +barrier and that all preceding requests must be complete before this one, and all following requests must not be started until this is complete. Note that a barrier does not flush caches in the underlying backend device in host, and thus does not serve as @@ -2118,7 +2118,7 @@ This is historical, and independent of the guest page size Otherwise, the guest may begin to re-use pages previously given to the balloon before the device has acknowledged their - withdrawl. [footnote: + withdrawal. [footnote: In this case, deflation advice is merely a courtesy ] diff --git a/Documentation/vm/Makefile b/Documentation/vm/Makefile deleted file mode 100644 index 3fa4d0668864..000000000000 --- a/Documentation/vm/Makefile +++ /dev/null @@ -1,8 +0,0 @@ -# kbuild trick to avoid linker error. Can be omitted if a module is built. -obj- := dummy.o - -# List of programs to build -hostprogs-y := page-types hugepage-mmap hugepage-shm map_hugetlb - -# Tell kbuild to always build the programs -always := $(hostprogs-y) diff --git a/Documentation/vm/cleancache.txt b/Documentation/vm/cleancache.txt index 36c367c73084..142fbb0f325a 100644 --- a/Documentation/vm/cleancache.txt +++ b/Documentation/vm/cleancache.txt @@ -46,10 +46,11 @@ a negative return value indicates failure. A "put_page" will copy a the pool id, a file key, and a page index into the file. (The combination of a pool id, a file key, and an index is sometimes called a "handle".) A "get_page" will copy the page, if found, from cleancache into kernel memory. -A "flush_page" will ensure the page no longer is present in cleancache; -a "flush_inode" will flush all pages associated with the specified file; -and, when a filesystem is unmounted, a "flush_fs" will flush all pages in -all files specified by the given pool id and also surrender the pool id. +An "invalidate_page" will ensure the page no longer is present in cleancache; +an "invalidate_inode" will invalidate all pages associated with the specified +file; and, when a filesystem is unmounted, an "invalidate_fs" will invalidate +all pages in all files specified by the given pool id and also surrender +the pool id. An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache to treat the pool as shared using a 128-bit UUID as a key. On systems @@ -62,12 +63,12 @@ of the kernel (e.g. by "tools" that control cleancache). Or a cleancache implementation can simply disable shared_init by always returning a negative value. -If a get_page is successful on a non-shared pool, the page is flushed (thus -making cleancache an "exclusive" cache). On a shared pool, the page -is NOT flushed on a successful get_page so that it remains accessible to +If a get_page is successful on a non-shared pool, the page is invalidated +(thus making cleancache an "exclusive" cache). On a shared pool, the page +is NOT invalidated on a successful get_page so that it remains accessible to other sharers. The kernel is responsible for ensuring coherency between cleancache (shared or not), the page cache, and the filesystem, using -cleancache flush operations as required. +cleancache invalidate operations as required. Note that cleancache must enforce put-put-get coherency and get-get coherency. For the former, if two puts are made to the same handle but @@ -77,22 +78,22 @@ if a get for a given handle fails, subsequent gets for that handle will never succeed unless preceded by a successful put with that handle. Last, cleancache provides no SMP serialization guarantees; if two -different Linux threads are simultaneously putting and flushing a page +different Linux threads are simultaneously putting and invalidating a page with the same handle, the results are indeterminate. Callers must lock the page to ensure serial behavior. CLEANCACHE PERFORMANCE METRICS -Cleancache monitoring is done by sysfs files in the -/sys/kernel/mm/cleancache directory. The effectiveness of cleancache +If properly configured, monitoring of cleancache is done via debugfs in +the /sys/kernel/debug/mm/cleancache directory. The effectiveness of cleancache can be measured (across all filesystems) with: succ_gets - number of gets that were successful failed_gets - number of gets that failed puts - number of puts attempted (all "succeed") -flushes - number of flushes attempted +invalidates - number of invalidates attempted -A backend implementatation may provide additional metrics. +A backend implementation may provide additional metrics. FAQ @@ -143,7 +144,7 @@ systems. The core hooks for cleancache in VFS are in most cases a single line and the minimum set are placed precisely where needed to maintain -coherency (via cleancache_flush operations) between cleancache, +coherency (via cleancache_invalidate operations) between cleancache, the page cache, and disk. All hooks compile into nothingness if cleancache is config'ed off and turn into a function-pointer- compare-to-NULL if config'ed on but no backend claims the ops @@ -184,15 +185,15 @@ or for real kernel-addressable RAM, it makes perfect sense for transcendent memory. 4) Why is non-shared cleancache "exclusive"? And where is the - page "flushed" after a "get"? (Minchan Kim) + page "invalidated" after a "get"? (Minchan Kim) The main reason is to free up space in transcendent memory and -to avoid unnecessary cleancache_flush calls. If you want inclusive, +to avoid unnecessary cleancache_invalidate calls. If you want inclusive, the page can be "put" immediately following the "get". If put-after-get for inclusive becomes common, the interface could -be easily extended to add a "get_no_flush" call. +be easily extended to add a "get_no_invalidate" call. -The flush is done by the cleancache backend implementation. +The invalidate is done by the cleancache backend implementation. 5) What's the performance impact? @@ -222,7 +223,7 @@ Some points for a filesystem to consider: as tmpfs should not enable cleancache) - To ensure coherency/correctness, the FS must ensure that all file removal or truncation operations either go through VFS or - add hooks to do the equivalent cleancache "flush" operations + add hooks to do the equivalent cleancache "invalidate" operations - To ensure coherency/correctness, either inode numbers must be unique across the lifetime of the on-disk file OR the FS must provide an "encode_fh" function. @@ -243,11 +244,11 @@ If cleancache would use the inode virtual address instead of inode/filehandle, the pool id could be eliminated. But, this won't work because cleancache retains pagecache data pages persistently even when the inode has been pruned from the -inode unused list, and only flushes the data page if the file +inode unused list, and only invalidates the data page if the file gets removed/truncated. So if cleancache used the inode kva, there would be potential coherency issues if/when the inode kva is reused for a different file. Alternately, if cleancache -flushed the pages when the inode kva was freed, much of the value +invalidated the pages when the inode kva was freed, much of the value of cleancache would be lost because the cache of pages in cleanache is potentially much larger than the kernel pagecache and is most useful if the pages survive inode cache removal. diff --git a/Documentation/vm/hugepage-mmap.c b/Documentation/vm/hugepage-mmap.c deleted file mode 100644 index db0dd9a33d54..000000000000 --- a/Documentation/vm/hugepage-mmap.c +++ /dev/null @@ -1,91 +0,0 @@ -/* - * hugepage-mmap: - * - * Example of using huge page memory in a user application using the mmap - * system call. Before running this application, make sure that the - * administrator has mounted the hugetlbfs filesystem (on some directory - * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this - * example, the app is requesting memory of size 256MB that is backed by - * huge pages. - * - * For the ia64 architecture, the Linux kernel reserves Region number 4 for - * huge pages. That means that if one requires a fixed address, a huge page - * aligned address starting with 0x800000... will be required. If a fixed - * address is not required, the kernel will select an address in the proper - * range. - * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. - */ - -#include <stdlib.h> -#include <stdio.h> -#include <unistd.h> -#include <sys/mman.h> -#include <fcntl.h> - -#define FILE_NAME "/mnt/hugepagefile" -#define LENGTH (256UL*1024*1024) -#define PROTECTION (PROT_READ | PROT_WRITE) - -/* Only ia64 requires this */ -#ifdef __ia64__ -#define ADDR (void *)(0x8000000000000000UL) -#define FLAGS (MAP_SHARED | MAP_FIXED) -#else -#define ADDR (void *)(0x0UL) -#define FLAGS (MAP_SHARED) -#endif - -static void check_bytes(char *addr) -{ - printf("First hex is %x\n", *((unsigned int *)addr)); -} - -static void write_bytes(char *addr) -{ - unsigned long i; - - for (i = 0; i < LENGTH; i++) - *(addr + i) = (char)i; -} - -static void read_bytes(char *addr) -{ - unsigned long i; - - check_bytes(addr); - for (i = 0; i < LENGTH; i++) - if (*(addr + i) != (char)i) { - printf("Mismatch at %lu\n", i); - break; - } -} - -int main(void) -{ - void *addr; - int fd; - - fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755); - if (fd < 0) { - perror("Open failed"); - exit(1); - } - - addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0); - if (addr == MAP_FAILED) { - perror("mmap"); - unlink(FILE_NAME); - exit(1); - } - - printf("Returned address is %p\n", addr); - check_bytes(addr); - write_bytes(addr); - read_bytes(addr); - - munmap(addr, LENGTH); - close(fd); - unlink(FILE_NAME); - - return 0; -} diff --git a/Documentation/vm/hugepage-shm.c b/Documentation/vm/hugepage-shm.c deleted file mode 100644 index 07956d8592c9..000000000000 --- a/Documentation/vm/hugepage-shm.c +++ /dev/null @@ -1,98 +0,0 @@ -/* - * hugepage-shm: - * - * Example of using huge page memory in a user application using Sys V shared - * memory system calls. In this example the app is requesting 256MB of - * memory that is backed by huge pages. The application uses the flag - * SHM_HUGETLB in the shmget system call to inform the kernel that it is - * requesting huge pages. - * - * For the ia64 architecture, the Linux kernel reserves Region number 4 for - * huge pages. That means that if one requires a fixed address, a huge page - * aligned address starting with 0x800000... will be required. If a fixed - * address is not required, the kernel will select an address in the proper - * range. - * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. - * - * Note: The default shared memory limit is quite low on many kernels, - * you may need to increase it via: - * - * echo 268435456 > /proc/sys/kernel/shmmax - * - * This will increase the maximum size per shared memory segment to 256MB. - * The other limit that you will hit eventually is shmall which is the - * total amount of shared memory in pages. To set it to 16GB on a system - * with a 4kB pagesize do: - * - * echo 4194304 > /proc/sys/kernel/shmall - */ - -#include <stdlib.h> -#include <stdio.h> -#include <sys/types.h> -#include <sys/ipc.h> -#include <sys/shm.h> -#include <sys/mman.h> - -#ifndef SHM_HUGETLB -#define SHM_HUGETLB 04000 -#endif - -#define LENGTH (256UL*1024*1024) - -#define dprintf(x) printf(x) - -/* Only ia64 requires this */ -#ifdef __ia64__ -#define ADDR (void *)(0x8000000000000000UL) -#define SHMAT_FLAGS (SHM_RND) -#else -#define ADDR (void *)(0x0UL) -#define SHMAT_FLAGS (0) -#endif - -int main(void) -{ - int shmid; - unsigned long i; - char *shmaddr; - - if ((shmid = shmget(2, LENGTH, - SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) { - perror("shmget"); - exit(1); - } - printf("shmid: 0x%x\n", shmid); - - shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS); - if (shmaddr == (char *)-1) { - perror("Shared memory attach failure"); - shmctl(shmid, IPC_RMID, NULL); - exit(2); - } - printf("shmaddr: %p\n", shmaddr); - - dprintf("Starting the writes:\n"); - for (i = 0; i < LENGTH; i++) { - shmaddr[i] = (char)(i); - if (!(i % (1024 * 1024))) - dprintf("."); - } - dprintf("\n"); - - dprintf("Starting the Check..."); - for (i = 0; i < LENGTH; i++) - if (shmaddr[i] != (char)i) - printf("\nIndex %lu mismatched\n", i); - dprintf("Done.\n"); - - if (shmdt((const void *)shmaddr) != 0) { - perror("Detach failure"); - shmctl(shmid, IPC_RMID, NULL); - exit(3); - } - - shmctl(shmid, IPC_RMID, NULL); - - return 0; -} diff --git a/Documentation/vm/map_hugetlb.c b/Documentation/vm/map_hugetlb.c deleted file mode 100644 index eda1a6d3578a..000000000000 --- a/Documentation/vm/map_hugetlb.c +++ /dev/null @@ -1,77 +0,0 @@ -/* - * Example of using hugepage memory in a user application using the mmap - * system call with MAP_HUGETLB flag. Before running this program make - * sure the administrator has allocated enough default sized huge pages - * to cover the 256 MB allocation. - * - * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages. - * That means the addresses starting with 0x800000... will need to be - * specified. Specifying a fixed address is not required on ppc64, i386 - * or x86_64. - */ -#include <stdlib.h> -#include <stdio.h> -#include <unistd.h> -#include <sys/mman.h> -#include <fcntl.h> - -#define LENGTH (256UL*1024*1024) -#define PROTECTION (PROT_READ | PROT_WRITE) - -#ifndef MAP_HUGETLB -#define MAP_HUGETLB 0x40000 /* arch specific */ -#endif - -/* Only ia64 requires this */ -#ifdef __ia64__ -#define ADDR (void *)(0x8000000000000000UL) -#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED) -#else -#define ADDR (void *)(0x0UL) -#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) -#endif - -static void check_bytes(char *addr) -{ - printf("First hex is %x\n", *((unsigned int *)addr)); -} - -static void write_bytes(char *addr) -{ - unsigned long i; - - for (i = 0; i < LENGTH; i++) - *(addr + i) = (char)i; -} - -static void read_bytes(char *addr) -{ - unsigned long i; - - check_bytes(addr); - for (i = 0; i < LENGTH; i++) - if (*(addr + i) != (char)i) { - printf("Mismatch at %lu\n", i); - break; - } -} - -int main(void) -{ - void *addr; - - addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, 0, 0); - if (addr == MAP_FAILED) { - perror("mmap"); - exit(1); - } - - printf("Returned address is %p\n", addr); - check_bytes(addr); - write_bytes(addr); - read_bytes(addr); - - munmap(addr, LENGTH); - - return 0; -} diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c deleted file mode 100644 index 7445caa26d05..000000000000 --- a/Documentation/vm/page-types.c +++ /dev/null @@ -1,1100 +0,0 @@ -/* - * page-types: Tool for querying page flags - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License as published by the Free - * Software Foundation; version 2. - * - * This program is distributed in the hope that it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - * more details. - * - * You should find a copy of v2 of the GNU General Public License somewhere on - * your Linux system; if not, write to the Free Software Foundation, Inc., 59 - * Temple Place, Suite 330, Boston, MA 02111-1307 USA. - * - * Copyright (C) 2009 Intel corporation - * - * Authors: Wu Fengguang <fengguang.wu@intel.com> - */ - -#define _LARGEFILE64_SOURCE -#include <stdio.h> -#include <stdlib.h> -#include <unistd.h> -#include <stdint.h> -#include <stdarg.h> -#include <string.h> -#include <getopt.h> -#include <limits.h> -#include <assert.h> -#include <sys/types.h> -#include <sys/errno.h> -#include <sys/fcntl.h> -#include <sys/mount.h> -#include <sys/statfs.h> -#include "../../include/linux/magic.h" - - -#ifndef MAX_PATH -# define MAX_PATH 256 -#endif - -#ifndef STR -# define _STR(x) #x -# define STR(x) _STR(x) -#endif - -/* - * pagemap kernel ABI bits - */ - -#define PM_ENTRY_BYTES sizeof(uint64_t) -#define PM_STATUS_BITS 3 -#define PM_STATUS_OFFSET (64 - PM_STATUS_BITS) -#define PM_STATUS_MASK (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET) -#define PM_STATUS(nr) (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK) -#define PM_PSHIFT_BITS 6 -#define PM_PSHIFT_OFFSET (PM_STATUS_OFFSET - PM_PSHIFT_BITS) -#define PM_PSHIFT_MASK (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET) -#define PM_PSHIFT(x) (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK) -#define PM_PFRAME_MASK ((1LL << PM_PSHIFT_OFFSET) - 1) -#define PM_PFRAME(x) ((x) & PM_PFRAME_MASK) - -#define PM_PRESENT PM_STATUS(4LL) -#define PM_SWAP PM_STATUS(2LL) - - -/* - * kernel page flags - */ - -#define KPF_BYTES 8 -#define PROC_KPAGEFLAGS "/proc/kpageflags" - -/* copied from kpageflags_read() */ -#define KPF_LOCKED 0 -#define KPF_ERROR 1 -#define KPF_REFERENCED 2 -#define KPF_UPTODATE 3 -#define KPF_DIRTY 4 -#define KPF_LRU 5 -#define KPF_ACTIVE 6 -#define KPF_SLAB 7 -#define KPF_WRITEBACK 8 -#define KPF_RECLAIM 9 -#define KPF_BUDDY 10 - -/* [11-20] new additions in 2.6.31 */ -#define KPF_MMAP 11 -#define KPF_ANON 12 -#define KPF_SWAPCACHE 13 -#define KPF_SWAPBACKED 14 -#define KPF_COMPOUND_HEAD 15 -#define KPF_COMPOUND_TAIL 16 -#define KPF_HUGE 17 -#define KPF_UNEVICTABLE 18 -#define KPF_HWPOISON 19 -#define KPF_NOPAGE 20 -#define KPF_KSM 21 - -/* [32-] kernel hacking assistances */ -#define KPF_RESERVED 32 -#define KPF_MLOCKED 33 -#define KPF_MAPPEDTODISK 34 -#define KPF_PRIVATE 35 -#define KPF_PRIVATE_2 36 -#define KPF_OWNER_PRIVATE 37 -#define KPF_ARCH 38 -#define KPF_UNCACHED 39 - -/* [48-] take some arbitrary free slots for expanding overloaded flags - * not part of kernel API - */ -#define KPF_READAHEAD 48 -#define KPF_SLOB_FREE 49 -#define KPF_SLUB_FROZEN 50 -#define KPF_SLUB_DEBUG 51 - -#define KPF_ALL_BITS ((uint64_t)~0ULL) -#define KPF_HACKERS_BITS (0xffffULL << 32) -#define KPF_OVERLOADED_BITS (0xffffULL << 48) -#define BIT(name) (1ULL << KPF_##name) -#define BITS_COMPOUND (BIT(COMPOUND_HEAD) | BIT(COMPOUND_TAIL)) - -static const char *page_flag_names[] = { - [KPF_LOCKED] = "L:locked", - [KPF_ERROR] = "E:error", - [KPF_REFERENCED] = "R:referenced", - [KPF_UPTODATE] = "U:uptodate", - [KPF_DIRTY] = "D:dirty", - [KPF_LRU] = "l:lru", - [KPF_ACTIVE] = "A:active", - [KPF_SLAB] = "S:slab", - [KPF_WRITEBACK] = "W:writeback", - [KPF_RECLAIM] = "I:reclaim", - [KPF_BUDDY] = "B:buddy", - - [KPF_MMAP] = "M:mmap", - [KPF_ANON] = "a:anonymous", - [KPF_SWAPCACHE] = "s:swapcache", - [KPF_SWAPBACKED] = "b:swapbacked", - [KPF_COMPOUND_HEAD] = "H:compound_head", - [KPF_COMPOUND_TAIL] = "T:compound_tail", - [KPF_HUGE] = "G:huge", - [KPF_UNEVICTABLE] = "u:unevictable", - [KPF_HWPOISON] = "X:hwpoison", - [KPF_NOPAGE] = "n:nopage", - [KPF_KSM] = "x:ksm", - - [KPF_RESERVED] = "r:reserved", - [KPF_MLOCKED] = "m:mlocked", - [KPF_MAPPEDTODISK] = "d:mappedtodisk", - [KPF_PRIVATE] = "P:private", - [KPF_PRIVATE_2] = "p:private_2", - [KPF_OWNER_PRIVATE] = "O:owner_private", - [KPF_ARCH] = "h:arch", - [KPF_UNCACHED] = "c:uncached", - - [KPF_READAHEAD] = "I:readahead", - [KPF_SLOB_FREE] = "P:slob_free", - [KPF_SLUB_FROZEN] = "A:slub_frozen", - [KPF_SLUB_DEBUG] = "E:slub_debug", -}; - - -static const char *debugfs_known_mountpoints[] = { - "/sys/kernel/debug", - "/debug", - 0, -}; - -/* - * data structures - */ - -static int opt_raw; /* for kernel developers */ -static int opt_list; /* list pages (in ranges) */ -static int opt_no_summary; /* don't show summary */ -static pid_t opt_pid; /* process to walk */ - -#define MAX_ADDR_RANGES 1024 -static int nr_addr_ranges; -static unsigned long opt_offset[MAX_ADDR_RANGES]; -static unsigned long opt_size[MAX_ADDR_RANGES]; - -#define MAX_VMAS 10240 -static int nr_vmas; -static unsigned long pg_start[MAX_VMAS]; -static unsigned long pg_end[MAX_VMAS]; - -#define MAX_BIT_FILTERS 64 -static int nr_bit_filters; -static uint64_t opt_mask[MAX_BIT_FILTERS]; -static uint64_t opt_bits[MAX_BIT_FILTERS]; - -static int page_size; - -static int pagemap_fd; -static int kpageflags_fd; - -static int opt_hwpoison; -static int opt_unpoison; - -static char hwpoison_debug_fs[MAX_PATH+1]; -static int hwpoison_inject_fd; -static int hwpoison_forget_fd; - -#define HASH_SHIFT 13 -#define HASH_SIZE (1 << HASH_SHIFT) -#define HASH_MASK (HASH_SIZE - 1) -#define HASH_KEY(flags) (flags & HASH_MASK) - -static unsigned long total_pages; -static unsigned long nr_pages[HASH_SIZE]; -static uint64_t page_flags[HASH_SIZE]; - - -/* - * helper functions - */ - -#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) - -#define min_t(type, x, y) ({ \ - type __min1 = (x); \ - type __min2 = (y); \ - __min1 < __min2 ? __min1 : __min2; }) - -#define max_t(type, x, y) ({ \ - type __max1 = (x); \ - type __max2 = (y); \ - __max1 > __max2 ? __max1 : __max2; }) - -static unsigned long pages2mb(unsigned long pages) -{ - return (pages * page_size) >> 20; -} - -static void fatal(const char *x, ...) -{ - va_list ap; - - va_start(ap, x); - vfprintf(stderr, x, ap); - va_end(ap); - exit(EXIT_FAILURE); -} - -static int checked_open(const char *pathname, int flags) -{ - int fd = open(pathname, flags); - - if (fd < 0) { - perror(pathname); - exit(EXIT_FAILURE); - } - - return fd; -} - -/* - * pagemap/kpageflags routines - */ - -static unsigned long do_u64_read(int fd, char *name, - uint64_t *buf, - unsigned long index, - unsigned long count) -{ - long bytes; - - if (index > ULONG_MAX / 8) - fatal("index overflow: %lu\n", index); - - if (lseek(fd, index * 8, SEEK_SET) < 0) { - perror(name); - exit(EXIT_FAILURE); - } - - bytes = read(fd, buf, count * 8); - if (bytes < 0) { - perror(name); - exit(EXIT_FAILURE); - } - if (bytes % 8) - fatal("partial read: %lu bytes\n", bytes); - - return bytes / 8; -} - -static unsigned long kpageflags_read(uint64_t *buf, - unsigned long index, - unsigned long pages) -{ - return do_u64_read(kpageflags_fd, PROC_KPAGEFLAGS, buf, index, pages); -} - -static unsigned long pagemap_read(uint64_t *buf, - unsigned long index, - unsigned long pages) -{ - return do_u64_read(pagemap_fd, "/proc/pid/pagemap", buf, index, pages); -} - -static unsigned long pagemap_pfn(uint64_t val) -{ - unsigned long pfn; - - if (val & PM_PRESENT) - pfn = PM_PFRAME(val); - else - pfn = 0; - - return pfn; -} - - -/* - * page flag names - */ - -static char *page_flag_name(uint64_t flags) -{ - static char buf[65]; - int present; - int i, j; - - for (i = 0, j = 0; i < ARRAY_SIZE(page_flag_names); i++) { - present = (flags >> i) & 1; - if (!page_flag_names[i]) { - if (present) - fatal("unknown flag bit %d\n", i); - continue; - } - buf[j++] = present ? page_flag_names[i][0] : '_'; - } - - return buf; -} - -static char *page_flag_longname(uint64_t flags) -{ - static char buf[1024]; - int i, n; - - for (i = 0, n = 0; i < ARRAY_SIZE(page_flag_names); i++) { - if (!page_flag_names[i]) - continue; - if ((flags >> i) & 1) - n += snprintf(buf + n, sizeof(buf) - n, "%s,", - page_flag_names[i] + 2); - } - if (n) - n--; - buf[n] = '\0'; - - return buf; -} - - -/* - * page list and summary - */ - -static void show_page_range(unsigned long voffset, - unsigned long offset, uint64_t flags) -{ - static uint64_t flags0; - static unsigned long voff; - static unsigned long index; - static unsigned long count; - - if (flags == flags0 && offset == index + count && - (!opt_pid || voffset == voff + count)) { - count++; - return; - } - - if (count) { - if (opt_pid) - printf("%lx\t", voff); - printf("%lx\t%lx\t%s\n", - index, count, page_flag_name(flags0)); - } - - flags0 = flags; - index = offset; - voff = voffset; - count = 1; -} - -static void show_page(unsigned long voffset, - unsigned long offset, uint64_t flags) -{ - if (opt_pid) - printf("%lx\t", voffset); - printf("%lx\t%s\n", offset, page_flag_name(flags)); -} - -static void show_summary(void) -{ - int i; - - printf(" flags\tpage-count MB" - " symbolic-flags\t\t\tlong-symbolic-flags\n"); - - for (i = 0; i < ARRAY_SIZE(nr_pages); i++) { - if (nr_pages[i]) - printf("0x%016llx\t%10lu %8lu %s\t%s\n", - (unsigned long long)page_flags[i], - nr_pages[i], - pages2mb(nr_pages[i]), - page_flag_name(page_flags[i]), - page_flag_longname(page_flags[i])); - } - - printf(" total\t%10lu %8lu\n", - total_pages, pages2mb(total_pages)); -} - - -/* - * page flag filters - */ - -static int bit_mask_ok(uint64_t flags) -{ - int i; - - for (i = 0; i < nr_bit_filters; i++) { - if (opt_bits[i] == KPF_ALL_BITS) { - if ((flags & opt_mask[i]) == 0) - return 0; - } else { - if ((flags & opt_mask[i]) != opt_bits[i]) - return 0; - } - } - - return 1; -} - -static uint64_t expand_overloaded_flags(uint64_t flags) -{ - /* SLOB/SLUB overload several page flags */ - if (flags & BIT(SLAB)) { - if (flags & BIT(PRIVATE)) - flags ^= BIT(PRIVATE) | BIT(SLOB_FREE); - if (flags & BIT(ACTIVE)) - flags ^= BIT(ACTIVE) | BIT(SLUB_FROZEN); - if (flags & BIT(ERROR)) - flags ^= BIT(ERROR) | BIT(SLUB_DEBUG); - } - - /* PG_reclaim is overloaded as PG_readahead in the read path */ - if ((flags & (BIT(RECLAIM) | BIT(WRITEBACK))) == BIT(RECLAIM)) - flags ^= BIT(RECLAIM) | BIT(READAHEAD); - - return flags; -} - -static uint64_t well_known_flags(uint64_t flags) -{ - /* hide flags intended only for kernel hacker */ - flags &= ~KPF_HACKERS_BITS; - - /* hide non-hugeTLB compound pages */ - if ((flags & BITS_COMPOUND) && !(flags & BIT(HUGE))) - flags &= ~BITS_COMPOUND; - - return flags; -} - -static uint64_t kpageflags_flags(uint64_t flags) -{ - flags = expand_overloaded_flags(flags); - - if (!opt_raw) - flags = well_known_flags(flags); - - return flags; -} - -/* verify that a mountpoint is actually a debugfs instance */ -static int debugfs_valid_mountpoint(const char *debugfs) -{ - struct statfs st_fs; - - if (statfs(debugfs, &st_fs) < 0) - return -ENOENT; - else if (st_fs.f_type != (long) DEBUGFS_MAGIC) - return -ENOENT; - - return 0; -} - -/* find the path to the mounted debugfs */ -static const char *debugfs_find_mountpoint(void) -{ - const char **ptr; - char type[100]; - FILE *fp; - - ptr = debugfs_known_mountpoints; - while (*ptr) { - if (debugfs_valid_mountpoint(*ptr) == 0) { - strcpy(hwpoison_debug_fs, *ptr); - return hwpoison_debug_fs; - } - ptr++; - } - - /* give up and parse /proc/mounts */ - fp = fopen("/proc/mounts", "r"); - if (fp == NULL) - perror("Can't open /proc/mounts for read"); - - while (fscanf(fp, "%*s %" - STR(MAX_PATH) - "s %99s %*s %*d %*d\n", - hwpoison_debug_fs, type) == 2) { - if (strcmp(type, "debugfs") == 0) - break; - } - fclose(fp); - - if (strcmp(type, "debugfs") != 0) - return NULL; - - return hwpoison_debug_fs; -} - -/* mount the debugfs somewhere if it's not mounted */ - -static void debugfs_mount(void) -{ - const char **ptr; - - /* see if it's already mounted */ - if (debugfs_find_mountpoint()) - return; - - ptr = debugfs_known_mountpoints; - while (*ptr) { - if (mount(NULL, *ptr, "debugfs", 0, NULL) == 0) { - /* save the mountpoint */ - strcpy(hwpoison_debug_fs, *ptr); - break; - } - ptr++; - } - - if (*ptr == NULL) { - perror("mount debugfs"); - exit(EXIT_FAILURE); - } -} - -/* - * page actions - */ - -static void prepare_hwpoison_fd(void) -{ - char buf[MAX_PATH + 1]; - - debugfs_mount(); - - if (opt_hwpoison && !hwpoison_inject_fd) { - snprintf(buf, MAX_PATH, "%s/hwpoison/corrupt-pfn", - hwpoison_debug_fs); - hwpoison_inject_fd = checked_open(buf, O_WRONLY); - } - - if (opt_unpoison && !hwpoison_forget_fd) { - snprintf(buf, MAX_PATH, "%s/hwpoison/unpoison-pfn", - hwpoison_debug_fs); - hwpoison_forget_fd = checked_open(buf, O_WRONLY); - } -} - -static int hwpoison_page(unsigned long offset) -{ - char buf[100]; - int len; - - len = sprintf(buf, "0x%lx\n", offset); - len = write(hwpoison_inject_fd, buf, len); - if (len < 0) { - perror("hwpoison inject"); - return len; - } - return 0; -} - -static int unpoison_page(unsigned long offset) -{ - char buf[100]; - int len; - - len = sprintf(buf, "0x%lx\n", offset); - len = write(hwpoison_forget_fd, buf, len); - if (len < 0) { - perror("hwpoison forget"); - return len; - } - return 0; -} - -/* - * page frame walker - */ - -static int hash_slot(uint64_t flags) -{ - int k = HASH_KEY(flags); - int i; - - /* Explicitly reserve slot 0 for flags 0: the following logic - * cannot distinguish an unoccupied slot from slot (flags==0). - */ - if (flags == 0) - return 0; - - /* search through the remaining (HASH_SIZE-1) slots */ - for (i = 1; i < ARRAY_SIZE(page_flags); i++, k++) { - if (!k || k >= ARRAY_SIZE(page_flags)) - k = 1; - if (page_flags[k] == 0) { - page_flags[k] = flags; - return k; - } - if (page_flags[k] == flags) - return k; - } - - fatal("hash table full: bump up HASH_SHIFT?\n"); - exit(EXIT_FAILURE); -} - -static void add_page(unsigned long voffset, - unsigned long offset, uint64_t flags) -{ - flags = kpageflags_flags(flags); - - if (!bit_mask_ok(flags)) - return; - - if (opt_hwpoison) - hwpoison_page(offset); - if (opt_unpoison) - unpoison_page(offset); - - if (opt_list == 1) - show_page_range(voffset, offset, flags); - else if (opt_list == 2) - show_page(voffset, offset, flags); - - nr_pages[hash_slot(flags)]++; - total_pages++; -} - -#define KPAGEFLAGS_BATCH (64 << 10) /* 64k pages */ -static void walk_pfn(unsigned long voffset, - unsigned long index, - unsigned long count) -{ - uint64_t buf[KPAGEFLAGS_BATCH]; - unsigned long batch; - long pages; - unsigned long i; - - while (count) { - batch = min_t(unsigned long, count, KPAGEFLAGS_BATCH); - pages = kpageflags_read(buf, index, batch); - if (pages == 0) - break; - - for (i = 0; i < pages; i++) - add_page(voffset + i, index + i, buf[i]); - - index += pages; - count -= pages; - } -} - -#define PAGEMAP_BATCH (64 << 10) -static void walk_vma(unsigned long index, unsigned long count) -{ - uint64_t buf[PAGEMAP_BATCH]; - unsigned long batch; - unsigned long pages; - unsigned long pfn; - unsigned long i; - - while (count) { - batch = min_t(unsigned long, count, PAGEMAP_BATCH); - pages = pagemap_read(buf, index, batch); - if (pages == 0) - break; - - for (i = 0; i < pages; i++) { - pfn = pagemap_pfn(buf[i]); - if (pfn) - walk_pfn(index + i, pfn, 1); - } - - index += pages; - count -= pages; - } -} - -static void walk_task(unsigned long index, unsigned long count) -{ - const unsigned long end = index + count; - unsigned long start; - int i = 0; - - while (index < end) { - - while (pg_end[i] <= index) - if (++i >= nr_vmas) - return; - if (pg_start[i] >= end) - return; - - start = max_t(unsigned long, pg_start[i], index); - index = min_t(unsigned long, pg_end[i], end); - - assert(start < index); - walk_vma(start, index - start); - } -} - -static void add_addr_range(unsigned long offset, unsigned long size) -{ - if (nr_addr_ranges >= MAX_ADDR_RANGES) - fatal("too many addr ranges\n"); - - opt_offset[nr_addr_ranges] = offset; - opt_size[nr_addr_ranges] = min_t(unsigned long, size, ULONG_MAX-offset); - nr_addr_ranges++; -} - -static void walk_addr_ranges(void) -{ - int i; - - kpageflags_fd = checked_open(PROC_KPAGEFLAGS, O_RDONLY); - - if (!nr_addr_ranges) - add_addr_range(0, ULONG_MAX); - - for (i = 0; i < nr_addr_ranges; i++) - if (!opt_pid) - walk_pfn(0, opt_offset[i], opt_size[i]); - else - walk_task(opt_offset[i], opt_size[i]); - - close(kpageflags_fd); -} - - -/* - * user interface - */ - -static const char *page_flag_type(uint64_t flag) -{ - if (flag & KPF_HACKERS_BITS) - return "(r)"; - if (flag & KPF_OVERLOADED_BITS) - return "(o)"; - return " "; -} - -static void usage(void) -{ - int i, j; - - printf( -"page-types [options]\n" -" -r|--raw Raw mode, for kernel developers\n" -" -d|--describe flags Describe flags\n" -" -a|--addr addr-spec Walk a range of pages\n" -" -b|--bits bits-spec Walk pages with specified bits\n" -" -p|--pid pid Walk process address space\n" -#if 0 /* planned features */ -" -f|--file filename Walk file address space\n" -#endif -" -l|--list Show page details in ranges\n" -" -L|--list-each Show page details one by one\n" -" -N|--no-summary Don't show summary info\n" -" -X|--hwpoison hwpoison pages\n" -" -x|--unpoison unpoison pages\n" -" -h|--help Show this usage message\n" -"flags:\n" -" 0x10 bitfield format, e.g.\n" -" anon bit-name, e.g.\n" -" 0x10,anon comma-separated list, e.g.\n" -"addr-spec:\n" -" N one page at offset N (unit: pages)\n" -" N+M pages range from N to N+M-1\n" -" N,M pages range from N to M-1\n" -" N, pages range from N to end\n" -" ,M pages range from 0 to M-1\n" -"bits-spec:\n" -" bit1,bit2 (flags & (bit1|bit2)) != 0\n" -" bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1\n" -" bit1,~bit2 (flags & (bit1|bit2)) == bit1\n" -" =bit1,bit2 flags == (bit1|bit2)\n" -"bit-names:\n" - ); - - for (i = 0, j = 0; i < ARRAY_SIZE(page_flag_names); i++) { - if (!page_flag_names[i]) - continue; - printf("%16s%s", page_flag_names[i] + 2, - page_flag_type(1ULL << i)); - if (++j > 3) { - j = 0; - putchar('\n'); - } - } - printf("\n " - "(r) raw mode bits (o) overloaded bits\n"); -} - -static unsigned long long parse_number(const char *str) -{ - unsigned long long n; - - n = strtoll(str, NULL, 0); - - if (n == 0 && str[0] != '0') - fatal("invalid name or number: %s\n", str); - - return n; -} - -static void parse_pid(const char *str) -{ - FILE *file; - char buf[5000]; - - opt_pid = parse_number(str); - - sprintf(buf, "/proc/%d/pagemap", opt_pid); - pagemap_fd = checked_open(buf, O_RDONLY); - - sprintf(buf, "/proc/%d/maps", opt_pid); - file = fopen(buf, "r"); - if (!file) { - perror(buf); - exit(EXIT_FAILURE); - } - - while (fgets(buf, sizeof(buf), file) != NULL) { - unsigned long vm_start; - unsigned long vm_end; - unsigned long long pgoff; - int major, minor; - char r, w, x, s; - unsigned long ino; - int n; - - n = sscanf(buf, "%lx-%lx %c%c%c%c %llx %x:%x %lu", - &vm_start, - &vm_end, - &r, &w, &x, &s, - &pgoff, - &major, &minor, - &ino); - if (n < 10) { - fprintf(stderr, "unexpected line: %s\n", buf); - continue; - } - pg_start[nr_vmas] = vm_start / page_size; - pg_end[nr_vmas] = vm_end / page_size; - if (++nr_vmas >= MAX_VMAS) { - fprintf(stderr, "too many VMAs\n"); - break; - } - } - fclose(file); -} - -static void parse_file(const char *name) -{ -} - -static void parse_addr_range(const char *optarg) -{ - unsigned long offset; - unsigned long size; - char *p; - - p = strchr(optarg, ','); - if (!p) - p = strchr(optarg, '+'); - - if (p == optarg) { - offset = 0; - size = parse_number(p + 1); - } else if (p) { - offset = parse_number(optarg); - if (p[1] == '\0') - size = ULONG_MAX; - else { - size = parse_number(p + 1); - if (*p == ',') { - if (size < offset) - fatal("invalid range: %lu,%lu\n", - offset, size); - size -= offset; - } - } - } else { - offset = parse_number(optarg); - size = 1; - } - - add_addr_range(offset, size); -} - -static void add_bits_filter(uint64_t mask, uint64_t bits) -{ - if (nr_bit_filters >= MAX_BIT_FILTERS) - fatal("too much bit filters\n"); - - opt_mask[nr_bit_filters] = mask; - opt_bits[nr_bit_filters] = bits; - nr_bit_filters++; -} - -static uint64_t parse_flag_name(const char *str, int len) -{ - int i; - - if (!*str || !len) - return 0; - - if (len <= 8 && !strncmp(str, "compound", len)) - return BITS_COMPOUND; - - for (i = 0; i < ARRAY_SIZE(page_flag_names); i++) { - if (!page_flag_names[i]) - continue; - if (!strncmp(str, page_flag_names[i] + 2, len)) - return 1ULL << i; - } - - return parse_number(str); -} - -static uint64_t parse_flag_names(const char *str, int all) -{ - const char *p = str; - uint64_t flags = 0; - - while (1) { - if (*p == ',' || *p == '=' || *p == '\0') { - if ((*str != '~') || (*str == '~' && all && *++str)) - flags |= parse_flag_name(str, p - str); - if (*p != ',') - break; - str = p + 1; - } - p++; - } - - return flags; -} - -static void parse_bits_mask(const char *optarg) -{ - uint64_t mask; - uint64_t bits; - const char *p; - - p = strchr(optarg, '='); - if (p == optarg) { - mask = KPF_ALL_BITS; - bits = parse_flag_names(p + 1, 0); - } else if (p) { - mask = parse_flag_names(optarg, 0); - bits = parse_flag_names(p + 1, 0); - } else if (strchr(optarg, '~')) { - mask = parse_flag_names(optarg, 1); - bits = parse_flag_names(optarg, 0); - } else { - mask = parse_flag_names(optarg, 0); - bits = KPF_ALL_BITS; - } - - add_bits_filter(mask, bits); -} - -static void describe_flags(const char *optarg) -{ - uint64_t flags = parse_flag_names(optarg, 0); - - printf("0x%016llx\t%s\t%s\n", - (unsigned long long)flags, - page_flag_name(flags), - page_flag_longname(flags)); -} - -static const struct option opts[] = { - { "raw" , 0, NULL, 'r' }, - { "pid" , 1, NULL, 'p' }, - { "file" , 1, NULL, 'f' }, - { "addr" , 1, NULL, 'a' }, - { "bits" , 1, NULL, 'b' }, - { "describe" , 1, NULL, 'd' }, - { "list" , 0, NULL, 'l' }, - { "list-each" , 0, NULL, 'L' }, - { "no-summary", 0, NULL, 'N' }, - { "hwpoison" , 0, NULL, 'X' }, - { "unpoison" , 0, NULL, 'x' }, - { "help" , 0, NULL, 'h' }, - { NULL , 0, NULL, 0 } -}; - -int main(int argc, char *argv[]) -{ - int c; - - page_size = getpagesize(); - - while ((c = getopt_long(argc, argv, - "rp:f:a:b:d:lLNXxh", opts, NULL)) != -1) { - switch (c) { - case 'r': - opt_raw = 1; - break; - case 'p': - parse_pid(optarg); - break; - case 'f': - parse_file(optarg); - break; - case 'a': - parse_addr_range(optarg); - break; - case 'b': - parse_bits_mask(optarg); - break; - case 'd': - describe_flags(optarg); - exit(0); - case 'l': - opt_list = 1; - break; - case 'L': - opt_list = 2; - break; - case 'N': - opt_no_summary = 1; - break; - case 'X': - opt_hwpoison = 1; - prepare_hwpoison_fd(); - break; - case 'x': - opt_unpoison = 1; - prepare_hwpoison_fd(); - break; - case 'h': - usage(); - exit(0); - default: - usage(); - exit(1); - } - } - - if (opt_list && opt_pid) - printf("voffset\t"); - if (opt_list == 1) - printf("offset\tlen\tflags\n"); - if (opt_list == 2) - printf("offset\tflags\n"); - - walk_addr_ranges(); - - if (opt_list == 1) - show_page_range(0, 0, 0); /* drain the buffer */ - - if (opt_no_summary) - return 0; - - if (opt_list) - printf("\n\n"); - - show_summary(); - - return 0; -} diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt index df09b9650a81..4600cbe3d6be 100644 --- a/Documentation/vm/pagemap.txt +++ b/Documentation/vm/pagemap.txt @@ -60,6 +60,7 @@ There are three components to pagemap: 19. HWPOISON 20. NOPAGE 21. KSM + 22. THP Short descriptions to the page flags: @@ -97,6 +98,9 @@ Short descriptions to the page flags: 21. KSM identical memory pages dynamically shared between one or more processes +22. THP + contiguous pages which construct transparent hugepages + [IO related page flags] 1. ERROR IO error occurred 3. UPTODATE page has up-to-date data diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt index 97bae3c576c2..fa206cccf89f 100644 --- a/Documentation/vm/unevictable-lru.txt +++ b/Documentation/vm/unevictable-lru.txt @@ -538,7 +538,7 @@ different reverse map mechanisms. process because mlocked pages are migratable. However, for reclaim, if the page is mapped into a VM_LOCKED VMA, the scan stops. - try_to_unmap_anon() attempts to acquire in read mode the mmap semphore of + try_to_unmap_anon() attempts to acquire in read mode the mmap semaphore of the mm_struct to which the VMA belongs. If this is successful, it will mlock the page via mlock_vma_page() - we wouldn't have gotten to try_to_unmap_anon() if the page were already mlocked - and will return @@ -619,11 +619,11 @@ all PTEs from the page. For this purpose, the unevictable/mlock infrastructure introduced a variant of try_to_unmap() called try_to_munlock(). try_to_munlock() calls the same functions as try_to_unmap() for anonymous and -mapped file pages with an additional argument specifing unlock versus unmap +mapped file pages with an additional argument specifying unlock versus unmap processing. Again, these functions walk the respective reverse maps looking for VM_LOCKED VMAs. When such a VMA is found for anonymous pages and file pages mapped in linear VMAs, as in the try_to_unmap() case, the functions -attempt to acquire the associated mmap semphore, mlock the page via +attempt to acquire the associated mmap semaphore, mlock the page via mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page. @@ -641,7 +641,7 @@ with it - the usual fallback position. Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. However, the scan can terminate when it encounters a VM_LOCKED VMA and can -successfully acquire the VMA's mmap semphore for read and mlock the page. +successfully acquire the VMA's mmap semaphore for read and mlock the page. Although try_to_munlock() might be called a great many times when munlocking a large region or tearing down a large address space that has been mlocked via mlockall(), overall this is a fairly rare event. diff --git a/Documentation/watchdog/00-INDEX b/Documentation/watchdog/00-INDEX deleted file mode 100644 index fc9082a1477a..000000000000 --- a/Documentation/watchdog/00-INDEX +++ /dev/null @@ -1,19 +0,0 @@ -00-INDEX - - this file. -convert_drivers_to_kernel_api.txt - - how-to for converting old watchdog drivers to the new kernel API. -hpwdt.txt - - information on the HP iLO2 NMI watchdog -pcwd-watchdog.txt - - documentation for Berkshire Products PC Watchdog ISA cards. -src/ - - directory holding watchdog related example programs. -watchdog-api.txt - - description of the Linux Watchdog driver API. -watchdog-kernel-api.txt - - description of the Linux WatchDog Timer Driver Core kernel API. -watchdog-parameters.txt - - information on driver parameters (for drivers other than - the ones that have driver-specific files here) -wdt.txt - - description of the Watchdog Timer Interfaces for Linux. diff --git a/Documentation/watchdog/convert_drivers_to_kernel_api.txt b/Documentation/watchdog/convert_drivers_to_kernel_api.txt index be8119bb15d2..271b8850dde7 100644 --- a/Documentation/watchdog/convert_drivers_to_kernel_api.txt +++ b/Documentation/watchdog/convert_drivers_to_kernel_api.txt @@ -59,6 +59,10 @@ Here is a overview of the functions and probably needed actions: WDIOC_GETTIMEOUT: No preparations needed + WDIOC_GETTIMELEFT: + It needs get_timeleft() callback to be defined. Otherwise it + will return EOPNOTSUPP + Other IOCTLs can be served using the ioctl-callback. Note that this is mainly intended for porting old drivers; new drivers should not invent private IOCTLs. Private IOCTLs are processed first. When the callback returns with diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt index 4b93c28e35c6..227f6cd0e5fa 100644 --- a/Documentation/watchdog/watchdog-kernel-api.txt +++ b/Documentation/watchdog/watchdog-kernel-api.txt @@ -1,6 +1,6 @@ The Linux WatchDog Timer Driver Core kernel API. =============================================== -Last reviewed: 29-Nov-2011 +Last reviewed: 16-Mar-2012 Wim Van Sebroeck <wim@iguana.be> @@ -77,6 +77,7 @@ struct watchdog_ops { int (*ping)(struct watchdog_device *); unsigned int (*status)(struct watchdog_device *); int (*set_timeout)(struct watchdog_device *, unsigned int); + unsigned int (*get_timeleft)(struct watchdog_device *); long (*ioctl)(struct watchdog_device *, unsigned int, unsigned long); }; @@ -117,11 +118,13 @@ they are supported. These optional routines/operations are: status of the device is reported with watchdog WDIOF_* status flags/bits. * set_timeout: this routine checks and changes the timeout of the watchdog timer device. It returns 0 on success, -EINVAL for "parameter out of range" - and -EIO for "could not write value to the watchdog". On success the timeout - value of the watchdog_device will be changed to the value that was just used - to re-program the watchdog timer device. + and -EIO for "could not write value to the watchdog". On success this + routine should set the timeout value of the watchdog_device to the + achieved timeout value (which may be different from the requested one + because the watchdog does not necessarily has a 1 second resolution). (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the watchdog's info structure). +* get_timeleft: this routines returns the time that's left before a reset. * ioctl: if this routine is present then it will be called first before we do our own internal ioctl call handling. This routine should return -ENOIOCTLCMD if a command is not supported. The parameters that are passed to the ioctl @@ -167,4 +170,4 @@ driver specific data to and a pointer to the data itself. The watchdog_get_drvdata function allows you to retrieve driver specific data. The argument of this function is the watchdog device where you want to retrieve -data from. The function retruns the pointer to the driver specific data. +data from. The function returns the pointer to the driver specific data. diff --git a/Documentation/zh_CN/HOWTO b/Documentation/zh_CN/HOWTO index faf976c0c731..7fba5aab9ef9 100644 --- a/Documentation/zh_CN/HOWTO +++ b/Documentation/zh_CN/HOWTO @@ -316,7 +316,7 @@ linux-kernel邮件列表中提供反馈,告诉大家你遇到了问题还是 git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git 使用quilt管理的补丁集: - - USB, PCI, 驱动程序核心和I2C, Greg Kroah-Hartman <gregkh@suse.de> + - USB, PCI, 驱动程序核心和I2C, Greg Kroah-Hartman <gregkh@linuxfoundation.org> kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ - x86-64, 部分i386, Andi Kleen <ak@suse.de> ftp.firstfloor.org:/pub/ak/x86_64/quilt/ diff --git a/Documentation/zh_CN/magic-number.txt b/Documentation/zh_CN/magic-number.txt index c278f412dc65..f606ba8598cf 100644 --- a/Documentation/zh_CN/magic-number.txt +++ b/Documentation/zh_CN/magic-number.txt @@ -89,7 +89,7 @@ TTY_DRIVER_MAGIC 0x5402 tty_driver include/linux/tty_driver.h MGSLPC_MAGIC 0x5402 mgslpc_info drivers/char/pcmcia/synclink_cs.c TTY_LDISC_MAGIC 0x5403 tty_ldisc include/linux/tty_ldisc.h USB_SERIAL_MAGIC 0x6702 usb_serial drivers/usb/serial/usb-serial.h -FULL_DUPLEX_MAGIC 0x6969 drivers/net/tulip/de2104x.c +FULL_DUPLEX_MAGIC 0x6969 drivers/net/ethernet/dec/tulip/de2104x.c USB_BLUETOOTH_MAGIC 0x6d02 usb_bluetooth drivers/usb/class/bluetty.c RFCOMM_TTY_MAGIC 0x6d02 net/bluetooth/rfcomm/tty.c USB_SERIAL_PORT_MAGIC 0x7301 usb_serial_port drivers/usb/serial/usb-serial.h |