diff options
Diffstat (limited to 'Documentation')
22 files changed, 793 insertions, 223 deletions
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt index 028614cdd062..e07f2530326b 100644 --- a/Documentation/DMA-mapping.txt +++ b/Documentation/DMA-mapping.txt @@ -664,109 +664,6 @@ It is that simple. Well, not for some odd devices. See the next section for information about that. - DAC Addressing for Address Space Hungry Devices - -There exists a class of devices which do not mesh well with the PCI -DMA mapping API. By definition these "mappings" are a finite -resource. The number of total available mappings per bus is platform -specific, but there will always be a reasonable amount. - -What is "reasonable"? Reasonable means that networking and block I/O -devices need not worry about using too many mappings. - -As an example of a problematic device, consider compute cluster cards. -They can potentially need to access gigabytes of memory at once via -DMA. Dynamic mappings are unsuitable for this kind of access pattern. - -To this end we've provided a small API by which a device driver -may use DAC cycles to directly address all of physical memory. -Not all platforms support this, but most do. It is easy to determine -whether the platform will work properly at probe time. - -First, understand that there may be a SEVERE performance penalty for -using these interfaces on some platforms. Therefore, you MUST only -use these interfaces if it is absolutely required. %99 of devices can -use the normal APIs without any problems. - -Note that for streaming type mappings you must either use these -interfaces, or the dynamic mapping interfaces above. You may not mix -usage of both for the same device. Such an act is illegal and is -guaranteed to put a banana in your tailpipe. - -However, consistent mappings may in fact be used in conjunction with -these interfaces. Remember that, as defined, consistent mappings are -always going to be SAC addressable. - -The first thing your driver needs to do is query the PCI platform -layer if it is capable of handling your devices DAC addressing -capabilities: - - int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask); - -You may not use the following interfaces if this routine fails. - -Next, DMA addresses using this API are kept track of using the -dma64_addr_t type. It is guaranteed to be big enough to hold any -DAC address the platform layer will give to you from the following -routines. If you have consistent mappings as well, you still -use plain dma_addr_t to keep track of those. - -All mappings obtained here will be direct. The mappings are not -translated, and this is the purpose of this dialect of the DMA API. - -All routines work with page/offset pairs. This is the _ONLY_ way to -portably refer to any piece of memory. If you have a cpu pointer -(which may be validly DMA'd too) you may easily obtain the page -and offset using something like this: - - struct page *page = virt_to_page(ptr); - unsigned long offset = offset_in_page(ptr); - -Here are the interfaces: - - dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, - struct page *page, - unsigned long offset, - int direction); - -The DAC address for the tuple PAGE/OFFSET are returned. The direction -argument is the same as for pci_{map,unmap}_single(). The same rules -for cpu/device access apply here as for the streaming mapping -interfaces. To reiterate: - - The cpu may touch the buffer before pci_dac_page_to_dma. - The device may touch the buffer after pci_dac_page_to_dma - is made, but the cpu may NOT. - -When the DMA transfer is complete, invoke: - - void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev, - dma64_addr_t dma_addr, - size_t len, int direction); - -This must be done before the CPU looks at the buffer again. -This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu(). - -And likewise, if you wish to let the device get back at the buffer after -the cpu has read/written it, invoke: - - void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev, - dma64_addr_t dma_addr, - size_t len, int direction); - -before letting the device access the DMA area again. - -If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t -the following interfaces are provided: - - struct page *pci_dac_dma_to_page(struct pci_dev *pdev, - dma64_addr_t dma_addr); - unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev, - dma64_addr_t dma_addr); - -This is possible with the DAC interfaces purely because they are -not translated in any way. - Optimizing Unmap State Space Consumption On many platforms, pci_unmap_{single,page}() is simply a nop. diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 8c5698a8c2e1..46bcff2849bd 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -643,6 +643,60 @@ X!Idrivers/video/console/fonts.c !Edrivers/spi/spi.c </chapter> + <chapter id="i2c"> + <title>I<superscript>2</superscript>C and SMBus Subsystem</title> + + <para> + I<superscript>2</superscript>C (or without fancy typography, "I2C") + is an acronym for the "Inter-IC" bus, a simple bus protocol which is + widely used where low data rate communications suffice. + Since it's also a licensed trademark, some vendors use another + name (such as "Two-Wire Interface", TWI) for the same bus. + I2C only needs two signals (SCL for clock, SDA for data), conserving + board real estate and minimizing signal quality issues. + Most I2C devices use seven bit addresses, and bus speeds of up + to 400 kHz; there's a high speed extension (3.4 MHz) that's not yet + found wide use. + I2C is a multi-master bus; open drain signaling is used to + arbitrate between masters, as well as to handshake and to + synchronize clocks from slower clients. + </para> + + <para> + The Linux I2C programming interfaces support only the master + side of bus interactions, not the slave side. + The programming interface is structured around two kinds of driver, + and two kinds of device. + An I2C "Adapter Driver" abstracts the controller hardware; it binds + to a physical device (perhaps a PCI device or platform_device) and + exposes a <structname>struct i2c_adapter</structname> representing + each I2C bus segment it manages. + On each I2C bus segment will be I2C devices represented by a + <structname>struct i2c_client</structname>. Those devices will + be bound to a <structname>struct i2c_driver</structname>, + which should follow the standard Linux driver model. + (At this writing, a legacy model is more widely used.) + There are functions to perform various I2C protocol operations; at + this writing all such functions are usable only from task context. + </para> + + <para> + The System Management Bus (SMBus) is a sibling protocol. Most SMBus + systems are also I2C conformant. The electrical constraints are + tighter for SMBus, and it standardizes particular protocol messages + and idioms. Controllers that support I2C can also support most + SMBus operations, but SMBus controllers don't support all the protocol + options that an I2C controller will. + There are functions to perform various SMBus protocol operations, + either using I2C primitives or by issuing SMBus commands to + i2c_adapter devices which don't support those I2C operations. + </para> + +!Iinclude/linux/i2c.h +!Fdrivers/i2c/i2c-boardinfo.c i2c_register_board_info +!Edrivers/i2c/i2c-core.c + </chapter> + <chapter id="splice"> <title>splice API</title> <para>) @@ -654,4 +708,5 @@ X!Idrivers/video/console/fonts.c !Ffs/splice.c </chapter> + </book> diff --git a/Documentation/blackfin/kgdb.txt b/Documentation/blackfin/kgdb.txt new file mode 100644 index 000000000000..84f6a484ae9a --- /dev/null +++ b/Documentation/blackfin/kgdb.txt @@ -0,0 +1,155 @@ + A Simple Guide to Configure KGDB + + Sonic Zhang <sonic.zhang@analog.com> + Aug. 24th 2006 + + +This KGDB patch enables the kernel developer to do source level debugging on +the kernel for the Blackfin architecture. The debugging works over either the +ethernet interface or one of the uarts. Both software breakpoints and +hardware breakpoints are supported in this version. +http://docs.blackfin.uclinux.org/doku.php?id=kgdb + + +2 known issues: +1. This bug: + http://blackfin.uclinux.org/tracker/index.php?func=detail&aid=544&group_id=18&atid=145 + The GDB client for Blackfin uClinux causes incorrect values of local + variables to be displayed when the user breaks the running of kernel in GDB. +2. Because of a hardware bug in Blackfin 533 v1.0.3: + 05000067 - Watchpoints (Hardware Breakpoints) are not supported + Hardware breakpoints cannot be set properly. + + +Debug over Ethernet: + +1. Compile and install the cross platform version of gdb for blackfin, which + can be found at $(BINROOT)/bfin-elf-gdb. + +2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under + "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". + With this selected, option "Full Symbolic/Source Debugging support" and + "Compile the kernel with frame pointers" are also selected. + +3. Select option "KGDB: connect over (Ethernet)". Add "kgdboe=@target-IP/,@host-IP/" to + the option "Compiled-in Kernel Boot Parameter" under "Kernel hacking". + +4. Connect minicom to the serial port and boot the kernel image. + +5. Configure the IP "/> ifconfig eth0 target-IP" + +6. Start GDB client "bfin-elf-gdb vmlinux". + +7. Connect to the target "(gdb) target remote udp:target-IP:6443". + +8. Set software breakpoint "(gdb) break sys_open". + +9. Continue "(gdb) c". + +10. Run ls in the target console "/> ls". + +11. Breakpoint hits. "Breakpoint 1: sys_open(..." + +12. Display local variables and function paramters. + (*) This operation gives wrong results, see known issue 1. + +13. Single stepping "(gdb) si". + +14. Remove breakpoint 1. "(gdb) del 1" + +15. Set hardware breakpoint "(gdb) hbreak sys_open". + +16. Continue "(gdb) c". + +17. Run ls in the target console "/> ls". + +18. Hardware breakpoint hits. "Breakpoint 1: sys_open(...". + (*) This hardware breakpoint will not be hit, see known issue 2. + +19. Continue "(gdb) c". + +20. Interrupt the target in GDB "Ctrl+C". + +21. Detach from the target "(gdb) detach". + +22. Exit GDB "(gdb) quit". + + +Debug over the UART: + +1. Compile and install the cross platform version of gdb for blackfin, which + can be found at $(BINROOT)/bfin-elf-gdb. + +2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under + "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". + With this selected, option "Full Symbolic/Source Debugging support" and + "Compile the kernel with frame pointers" are also selected. + +3. Select option "KGDB: connect over (UART)". Set "KGDB: UART port number" to be + a different one from the console. Don't forget to change the mode of + blackfin serial driver to PIO. Otherwise kgdb works incorrectly on UART. + +4. If you want connect to kgdb when the kernel boots, enable + "KGDB: Wait for gdb connection early" + +5. Compile kernel. + +6. Connect minicom to the serial port of the console and boot the kernel image. + +7. Start GDB client "bfin-elf-gdb vmlinux". + +8. Set the baud rate in GDB "(gdb) set remotebaud 57600". + +9. Connect to the target on the second serial port "(gdb) target remote /dev/ttyS1". + +10. Set software breakpoint "(gdb) break sys_open". + +11. Continue "(gdb) c". + +12. Run ls in the target console "/> ls". + +13. A breakpoint is hit. "Breakpoint 1: sys_open(..." + +14. All other operations are the same as that in KGDB over Ethernet. + + +Debug over the same UART as console: + +1. Compile and install the cross platform version of gdb for blackfin, which + can be found at $(BINROOT)/bfin-elf-gdb. + +2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under + "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". + With this selected, option "Full Symbolic/Source Debugging support" and + "Compile the kernel with frame pointers" are also selected. + +3. Select option "KGDB: connect over UART". Set "KGDB: UART port number" to console. + Don't forget to change the mode of blackfin serial driver to PIO. + Otherwise kgdb works incorrectly on UART. + +4. If you want connect to kgdb when the kernel boots, enable + "KGDB: Wait for gdb connection early" + +5. Connect minicom to the serial port and boot the kernel image. + +6. (Optional) Ask target to wait for gdb connection by entering Ctrl+A. In minicom, you should enter Ctrl+A+A. + +7. Start GDB client "bfin-elf-gdb vmlinux". + +8. Set the baud rate in GDB "(gdb) set remotebaud 57600". + +9. Connect to the target "(gdb) target remote /dev/ttyS0". + +10. Set software breakpoint "(gdb) break sys_open". + +11. Continue "(gdb) c". Then enter Ctrl+C twice to stop GDB connection. + +12. Run ls in the target console "/> ls". Dummy string can be seen on the console. + +13. Then connect the gdb to target again. "(gdb) target remote /dev/ttyS0". + Now you will find a breakpoint is hit. "Breakpoint 1: sys_open(..." + +14. All other operations are the same as that in KGDB over Ethernet. The only + difference is that after continue command in GDB, please stop GDB + connection by 2 "Ctrl+C"s and connect again after breakpoints are hit or + Ctrl+A is entered. diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 3a159dac04f5..0599a0c7c026 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -262,25 +262,6 @@ Who: Richard Purdie <rpurdie@rpsys.net> --------------------------- -What: Multipath cached routing support in ipv4 -When: in 2.6.23 -Why: Code was merged, then submitter immediately disappeared leaving - us with no maintainer and lots of bugs. The code should not have - been merged in the first place, and many aspects of it's - implementation are blocking more critical core networking - development. It's marked EXPERIMENTAL and no distribution - enables it because it cause obscure crashes due to unfixable bugs - (interfaces don't return errors so memory allocation can't be - handled, calling contexts of these interfaces make handling - errors impossible too because they get called after we've - totally commited to creating a route object, for example). - This problem has existed for years and no forward progress - has ever been made, and nobody steps up to try and salvage - this code, so we're going to finally just get rid of it. -Who: David S. Miller <davem@davemloft.net> - ---------------------------- - What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer) When: December 2007 Why: These functions are a leftover from 2.4 times. They have several @@ -330,3 +311,18 @@ Who: Tejun Heo <htejun@gmail.com> --------------------------- +What: Legacy RTC drivers (under drivers/i2c/chips) +When: November 2007 +Why: Obsolete. We have a RTC subsystem with better drivers. +Who: Jean Delvare <khali@linux-fr.org> + +--------------------------- + +What: iptables SAME target +When: 1.1. 2008 +Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h +Why: Obsolete for multiple years now, NAT core provides the same behaviour. + Unfixable broken wrt. 32/64 bit cleanness. +Who: Patrick McHardy <kaber@trash.net> + +--------------------------- diff --git a/Documentation/firmware_class/firmware_sample_firmware_class.c b/Documentation/firmware_class/firmware_sample_firmware_class.c index 4994f1f28f8c..fba943aacf93 100644 --- a/Documentation/firmware_class/firmware_sample_firmware_class.c +++ b/Documentation/firmware_class/firmware_sample_firmware_class.c @@ -78,6 +78,7 @@ static CLASS_DEVICE_ATTR(loading, 0644, firmware_loading_show, firmware_loading_store); static ssize_t firmware_data_read(struct kobject *kobj, + struct bin_attribute *bin_attr, char *buffer, loff_t offset, size_t count) { struct class_device *class_dev = to_class_dev(kobj); @@ -88,6 +89,7 @@ static ssize_t firmware_data_read(struct kobject *kobj, return count; } static ssize_t firmware_data_write(struct kobject *kobj, + struct bin_attribute *bin_attr, char *buffer, loff_t offset, size_t count) { struct class_device *class_dev = to_class_dev(kobj); diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index c34f0db78a30..fe6406f2f9a6 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -5,8 +5,8 @@ Supported adapters: '810' and '810E' chipsets) * Intel 82801BA (ICH2 - part of the '815E' chipset) * Intel 82801CA/CAM (ICH3) - * Intel 82801DB (ICH4) (HW PEC supported, 32 byte buffer not supported) - * Intel 82801EB/ER (ICH5) (HW PEC supported, 32 byte buffer not supported) + * Intel 82801DB (ICH4) (HW PEC supported) + * Intel 82801EB/ER (ICH5) (HW PEC supported) * Intel 6300ESB * Intel 82801FB/FR/FW/FRW (ICH6) * Intel 82801G (ICH7) diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4 index 7cbe43fa2701..fa0c786a8bf5 100644 --- a/Documentation/i2c/busses/i2c-piix4 +++ b/Documentation/i2c/busses/i2c-piix4 @@ -6,7 +6,7 @@ Supported adapters: Datasheet: Publicly available at the Intel website * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges Datasheet: Only available via NDA from ServerWorks - * ATI IXP200, IXP300, IXP400 and SB600 southbridges + * ATI IXP200, IXP300, IXP400, SB600 and SB700 southbridges Datasheet: Not publicly available * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge Datasheet: Publicly available at the SMSC website http://www.smsc.com diff --git a/Documentation/i2c/busses/i2c-taos-evm b/Documentation/i2c/busses/i2c-taos-evm new file mode 100644 index 000000000000..9146e33be6dd --- /dev/null +++ b/Documentation/i2c/busses/i2c-taos-evm @@ -0,0 +1,46 @@ +Kernel driver i2c-taos-evm + +Author: Jean Delvare <khali@linux-fr.org> + +This is a driver for the evaluation modules for TAOS I2C/SMBus chips. +The modules include an SMBus master with limited capabilities, which can +be controlled over the serial port. Virtually all evaluation modules +are supported, but a few lines of code need to be added for each new +module to instantiate the right I2C chip on the bus. Obviously, a driver +for the chip in question is also needed. + +Currently supported devices are: + +* TAOS TSL2550 EVM + +For addtional information on TAOS products, please see + http://www.taosinc.com/ + + +Using this driver +----------------- + +In order to use this driver, you'll need the serport driver, and the +inputattach tool, which is part of the input-utils package. The following +commands will tell the kernel that you have a TAOS EVM on the first +serial port: + +# modprobe serport +# inputattach --taos-evm /dev/ttyS0 + + +Technical details +----------------- + +Only 4 SMBus transaction types are supported by the TAOS evaluation +modules: +* Receive Byte +* Send Byte +* Read Byte +* Write Byte + +The communication protocol is text-based and pretty simple. It is +described in a PDF document on the CD which comes with the evaluation +module. The communication is rather slow, because the serial port has +to operate at 1200 bps. However, I don't think this is a big concern in +practice, as these modules are meant for evaluation and testing only. diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875 index 96fec562a8e9..a0cd8af2f408 100644 --- a/Documentation/i2c/chips/max6875 +++ b/Documentation/i2c/chips/max6875 @@ -99,7 +99,7 @@ And then read the data or - count = i2c_smbus_read_i2c_block_data(fd, 0x84, buffer); + count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer); The block read should read 16 bytes. 0x84 is the block read command. diff --git a/Documentation/i2c/chips/x1205 b/Documentation/i2c/chips/x1205 deleted file mode 100644 index 09407c991fe5..000000000000 --- a/Documentation/i2c/chips/x1205 +++ /dev/null @@ -1,38 +0,0 @@ -Kernel driver x1205 -=================== - -Supported chips: - * Xicor X1205 RTC - Prefix: 'x1205' - Addresses scanned: none - Datasheet: http://www.intersil.com/cda/deviceinfo/0,1477,X1205,00.html - -Authors: - Karen Spearel <kas11@tampabay.rr.com>, - Alessandro Zummo <a.zummo@towertech.it> - -Description ------------ - -This module aims to provide complete access to the Xicor X1205 RTC. -Recently Xicor has merged with Intersil, but the chip is -still sold under the Xicor brand. - -This chip is located at address 0x6f and uses a 2-byte register addressing. -Two bytes need to be written to read a single register, while most -other chips just require one and take the second one as the data -to be written. To prevent corrupting unknown chips, the user must -explicitely set the probe parameter. - -example: - -modprobe x1205 probe=0,0x6f - -The module supports one more option, hctosys, which is used to set the -software clock from the x1205. On systems where the x1205 is the -only hardware rtc, this parameter could be used to achieve a correct -date/time earlier in the system boot sequence. - -example: - -modprobe x1205 probe=0,0x6f hctosys=1 diff --git a/Documentation/i2c/summary b/Documentation/i2c/summary index aea60bf7e8f0..003c7319b8c7 100644 --- a/Documentation/i2c/summary +++ b/Documentation/i2c/summary @@ -67,7 +67,6 @@ i2c-proc: The /proc/sys/dev/sensors interface for device (client) drivers Algorithm drivers ----------------- -i2c-algo-8xx: An algorithm for CPM's I2C device in Motorola 8xx processors (NOT BUILT BY DEFAULT) i2c-algo-bit: A bit-banging algorithm i2c-algo-pcf: A PCF 8584 style algorithm i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT) @@ -81,6 +80,5 @@ i2c-pcf-epp: PCF8584 on a EPP parallel port (uses i2c-algo-pcf) (NOT mkpatch i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit) i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT) i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit) -i2c-rpx: RPX board Motorola 8xx I2C device (uses i2c-algo-8xx) (NOT BUILT BY DEFAULT) i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit) diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 3d8d36b0ad12..2c170032bf37 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients @@ -571,7 +571,7 @@ SMBus communication u8 command, u8 length, u8 *values); extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client, - u8 command, u8 *values); + u8 command, u8 length, u8 *values); These ones were removed in Linux 2.6.10 because they had no users, but could be added back later if needed: diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt index c04a421f4a7c..75b3680c41eb 100644 --- a/Documentation/i386/zero-page.txt +++ b/Documentation/i386/zero-page.txt @@ -37,6 +37,7 @@ Offset Type Description 0x1d0 unsigned long EFI memory descriptor map pointer 0x1d4 unsigned long EFI memory descriptor map size 0x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb +0x1e4 unsigned long Scratch field for the kernel setup code 0x1e8 char number of entries in E820MAP (below) 0x1e9 unsigned char number of entries in EDDBUF (below) 0x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below) diff --git a/Documentation/ia64/aliasing-test.c b/Documentation/ia64/aliasing-test.c index d485256ee1ce..773a814d4093 100644 --- a/Documentation/ia64/aliasing-test.c +++ b/Documentation/ia64/aliasing-test.c @@ -19,6 +19,7 @@ #include <sys/mman.h> #include <sys/stat.h> #include <unistd.h> +#include <linux/pci.h> int sum; @@ -34,13 +35,19 @@ int map_mem(char *path, off_t offset, size_t length, int touch) return -1; } + if (fnmatch("/proc/bus/pci/*", path, 0) == 0) { + rc = ioctl(fd, PCIIOC_MMAP_IS_MEM); + if (rc == -1) + perror("PCIIOC_MMAP_IS_MEM ioctl"); + } + addr = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset); if (addr == MAP_FAILED) return 1; if (touch) { c = (int *) addr; - while (c < (int *) (offset + length)) + while (c < (int *) (addr + length)) sum += *c++; } @@ -54,7 +61,7 @@ int map_mem(char *path, off_t offset, size_t length, int touch) return 0; } -int scan_sysfs(char *path, char *file, off_t offset, size_t length, int touch) +int scan_tree(char *path, char *file, off_t offset, size_t length, int touch) { struct dirent **namelist; char *name, *path2; @@ -93,7 +100,7 @@ int scan_sysfs(char *path, char *file, off_t offset, size_t length, int touch) } else { r = lstat(path2, &buf); if (r == 0 && S_ISDIR(buf.st_mode)) { - rc = scan_sysfs(path2, file, offset, length, touch); + rc = scan_tree(path2, file, offset, length, touch); if (rc < 0) return rc; } @@ -238,10 +245,15 @@ int main() else fprintf(stderr, "FAIL: /dev/mem 0x0-0x100000 not accessible\n"); - scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0, 0xA0000, 1); - scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0xA0000, 0x20000, 0); - scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0xC0000, 0x40000, 1); - scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0, 1024*1024, 0); + scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 0xA0000, 1); + scan_tree("/sys/class/pci_bus", "legacy_mem", 0xA0000, 0x20000, 0); + scan_tree("/sys/class/pci_bus", "legacy_mem", 0xC0000, 0x40000, 1); + scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 1024*1024, 0); scan_rom("/sys/devices", "rom"); + + scan_tree("/proc/bus/pci", "??.?", 0, 0xA0000, 1); + scan_tree("/proc/bus/pci", "??.?", 0xA0000, 0x20000, 0); + scan_tree("/proc/bus/pci", "??.?", 0xC0000, 0x40000, 1); + scan_tree("/proc/bus/pci", "??.?", 0, 1024*1024, 0); } diff --git a/Documentation/ia64/aliasing.txt b/Documentation/ia64/aliasing.txt index 9a431a7d0f5d..aa3e953f0f7b 100644 --- a/Documentation/ia64/aliasing.txt +++ b/Documentation/ia64/aliasing.txt @@ -112,6 +112,18 @@ POTENTIAL ATTRIBUTE ALIASING CASES The /dev/mem mmap constraints apply. + mmap of /proc/bus/pci/.../??.? + + This is an MMIO mmap of PCI functions, which additionally may or + may not be requested as using the WC attribute. + + If WC is requested, and the region in kern_memmap is either WC + or UC, and the EFI memory map designates the region as WC, then + the WC mapping is allowed. + + Otherwise, the user mapping must use the same attribute as the + kernel mapping. + read/write of /dev/mem This uses copy_from_user(), which implicitly uses a kernel diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index af6a63ab9026..09c184e41cf8 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -874,8 +874,7 @@ accept_redirects - BOOLEAN accept_source_route - INTEGER Accept source routing (routing extension header). - > 0: Accept routing header. - = 0: Accept only routing header type 2. + >= 0: Accept only routing header type 2. < 0: Do not accept routing header. Default: 0 diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt new file mode 100644 index 000000000000..2451f551c505 --- /dev/null +++ b/Documentation/networking/l2tp.txt @@ -0,0 +1,169 @@ +This brief document describes how to use the kernel's PPPoL2TP driver +to provide L2TP functionality. L2TP is a protocol that tunnels one or +more PPP sessions over a UDP tunnel. It is commonly used for VPNs +(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP +network infrastructure. + +Design +====== + +The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by +which PPP frames carried through an L2TP session are passed through +the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all +PPP interaction with the peer. PPP network interfaces are created for +each local PPP endpoint. + +The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP +control and data frames. L2TP control frames carry messages between +L2TP clients/servers and are used to setup / teardown tunnels and +sessions. An L2TP client or server is implemented in userspace and +will use a regular UDP socket per tunnel. L2TP data frames carry PPP +frames, which may be PPP control or PPP data. The kernel's PPP +subsystem arranges for PPP control frames to be delivered to pppd, +while data frames are forwarded as usual. + +Each tunnel and session within a tunnel is assigned a unique tunnel_id +and session_id. These ids are carried in the L2TP header of every +control and data packet. The pppol2tp driver uses them to lookup +internal tunnel and/or session contexts. Zero tunnel / session ids are +treated specially - zero ids are never assigned to tunnels or sessions +in the network. In the driver, the tunnel context keeps a pointer to +the tunnel UDP socket. The session context keeps a pointer to the +PPPoL2TP socket, as well as other data that lets the driver interface +to the kernel PPP subsystem. + +Note that the pppol2tp kernel driver handles only L2TP data frames; +L2TP control frames are simply passed up to userspace in the UDP +tunnel socket. The kernel handles all datapath aspects of the +protocol, including data packet resequencing (if enabled). + +There are a number of requirements on the userspace L2TP daemon in +order to use the pppol2tp driver. + +1. Use a UDP socket per tunnel. + +2. Create a single PPPoL2TP socket per tunnel bound to a special null + session id. This is used only for communicating with the driver but + must remain open while the tunnel is active. Opening this tunnel + management socket causes the driver to mark the tunnel socket as an + L2TP UDP encapsulation socket and flags it for use by the + referenced tunnel id. This hooks up the UDP receive path via + udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed + in this special PPPoX socket. + +3. Create a PPPoL2TP socket per L2TP session. This is typically done + by starting pppd with the pppol2tp plugin and appropriate + arguments. A PPPoL2TP tunnel management socket (Step 2) must be + created before the first PPPoL2TP session socket is created. + +When creating PPPoL2TP sockets, the application provides information +to the driver about the socket in a socket connect() call. Source and +destination tunnel and session ids are provided, as well as the file +descriptor of a UDP socket. See struct pppol2tp_addr in +include/linux/if_ppp.h. Note that zero tunnel / session ids are +treated specially. When creating the per-tunnel PPPoL2TP management +socket in Step 2 above, zero source and destination session ids are +specified, which tells the driver to prepare the supplied UDP file +descriptor for use as an L2TP tunnel socket. + +Userspace may control behavior of the tunnel or session using +setsockopt and ioctl on the PPPoX socket. The following socket +options are supported:- + +DEBUG - bitmask of debug message categories. See below. +SENDSEQ - 0 => don't send packets with sequence numbers + 1 => send packets with sequence numbers +RECVSEQ - 0 => receive packet sequence numbers are optional + 1 => drop receive packets without sequence numbers +LNSMODE - 0 => act as LAC. + 1 => act as LNS. +REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder. + +Only the DEBUG option is supported by the special tunnel management +PPPoX socket. + +In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided +to retrieve tunnel and session statistics from the kernel using the +PPPoX socket of the appropriate tunnel or session. + +Debugging +========= + +The driver supports a flexible debug scheme where kernel trace +messages may be optionally enabled per tunnel and per session. Care is +needed when debugging a live system since the messages are not +rate-limited and a busy system could be swamped. Userspace uses +setsockopt on the PPPoX socket to set a debug mask. + +The following debug mask bits are available: + +PPPOL2TP_MSG_DEBUG verbose debug (if compiled in) +PPPOL2TP_MSG_CONTROL userspace - kernel interface +PPPOL2TP_MSG_SEQ sequence numbers handling +PPPOL2TP_MSG_DATA data packets + +Sample Userspace Code +===================== + +1. Create tunnel management PPPoX socket + + kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); + if (kernel_fd >= 0) { + struct sockaddr_pppol2tp sax; + struct sockaddr_in const *peer_addr; + + peer_addr = l2tp_tunnel_get_peer_addr(tunnel); + memset(&sax, 0, sizeof(sax)); + sax.sa_family = AF_PPPOX; + sax.sa_protocol = PX_PROTO_OL2TP; + sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */ + sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr; + sax.pppol2tp.addr.sin_port = peer_addr->sin_port; + sax.pppol2tp.addr.sin_family = AF_INET; + sax.pppol2tp.s_tunnel = tunnel_id; + sax.pppol2tp.s_session = 0; /* special case: mgmt socket */ + sax.pppol2tp.d_tunnel = 0; + sax.pppol2tp.d_session = 0; /* special case: mgmt socket */ + + if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) { + perror("connect failed"); + result = -errno; + goto err; + } + } + +2. Create session PPPoX data socket + + struct sockaddr_pppol2tp sax; + int fd; + + /* Note, the target socket must be bound already, else it will not be ready */ + sax.sa_family = AF_PPPOX; + sax.sa_protocol = PX_PROTO_OL2TP; + sax.pppol2tp.fd = tunnel_fd; + sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr; + sax.pppol2tp.addr.sin_port = addr->sin_port; + sax.pppol2tp.addr.sin_family = AF_INET; + sax.pppol2tp.s_tunnel = tunnel_id; + sax.pppol2tp.s_session = session_id; + sax.pppol2tp.d_tunnel = peer_tunnel_id; + sax.pppol2tp.d_session = peer_session_id; + + /* session_fd is the fd of the session's PPPoL2TP socket. + * tunnel_fd is the fd of the tunnel UDP socket. + */ + fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax)); + if (fd < 0 ) { + return -errno; + } + return 0; + +Miscellanous +============ + +The PPPoL2TP driver was developed as part of the OpenL2TP project by +Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server, +designed from the ground up to have the L2TP datapath in the +kernel. The project also implemented the pppol2tp plugin for pppd +which allows pppd to use the kernel driver. Details can be found at +http://openl2tp.sourceforge.net. diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt new file mode 100644 index 000000000000..00b60cce2224 --- /dev/null +++ b/Documentation/networking/multiqueue.txt @@ -0,0 +1,111 @@ + + HOWTO for multiqueue network device support + =========================================== + +Section 1: Base driver requirements for implementing multiqueue support +Section 2: Qdisc support for multiqueue devices +Section 3: Brief howto using PRIO or RR for multiqueue devices + + +Intro: Kernel support for multiqueue devices +--------------------------------------------------------- + +Kernel support for multiqueue devices is only an API that is presented to the +netdevice layer for base drivers to implement. This feature is part of the +core networking stack, and all network devices will be running on the +multiqueue-aware stack. If a base driver only has one queue, then these +changes are transparent to that driver. + + +Section 1: Base driver requirements for implementing multiqueue support +----------------------------------------------------------------------- + +Base drivers are required to use the new alloc_etherdev_mq() or +alloc_netdev_mq() functions to allocate the subqueues for the device. The +underlying kernel API will take care of the allocation and deallocation of +the subqueue memory, as well as netdev configuration of where the queues +exist in memory. + +The base driver will also need to manage the queues as it does the global +netdev->queue_lock today. Therefore base drivers should use the +netif_{start|stop|wake}_subqueue() functions to manage each queue while the +device is still operational. netdev->queue_lock is still used when the device +comes online or when it's completely shut down (unregister_netdev(), etc.). + +Finally, the base driver should indicate that it is a multiqueue device. The +feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features +bitmap on device initialization. Below is an example from e1000: + +#ifdef CONFIG_E1000_MQ + if ( (adapter->hw.mac.type == e1000_82571) || + (adapter->hw.mac.type == e1000_82572) || + (adapter->hw.mac.type == e1000_80003es2lan)) + netdev->features |= NETIF_F_MULTI_QUEUE; +#endif + + +Section 2: Qdisc support for multiqueue devices +----------------------------------------------- + +Currently two qdiscs support multiqueue devices. A new round-robin qdisc, +sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to +bands and queues, and will store the queue mapping into skb->queue_mapping. +Use this field in the base driver to determine which queue to send the skb +to. + +sch_rr has been added for hardware that doesn't want scheduling policies from +software, so it's a straight round-robin qdisc. It uses the same syntax and +classification priomap that sch_prio uses, so it should be intuitive to +configure for people who've used sch_prio. + +The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been +built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of +bands requested is equal to the number of queues on the hardware. If they +are equal, it sets a one-to-one mapping up between the queues and bands. If +they're not equal, it will not load the qdisc. This is the same behavior +for RR. Once the association is made, any skb that is classified will have +skb->queue_mapping set, which will allow the driver to properly queue skb's +to multiple queues. + + +Section 3: Brief howto using PRIO and RR for multiqueue devices +--------------------------------------------------------------- + +The userspace command 'tc,' part of the iproute2 package, is used to configure +qdiscs. To add the PRIO qdisc to your network device, assuming the device is +called eth0, run the following command: + +# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue + +This will create 4 bands, 0 being highest priority, and associate those bands +to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping +would look like: + +band 0 => queue 0 +band 1 => queue 1 +band 2 => queue 2 +band 3 => queue 3 + +Traffic will begin flowing through each queue if your TOS values are assigning +traffic across the various bands. For example, ssh traffic will always try to +go out band 0 based on TOS -> Linux priority conversion (realtime traffic), +so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" +traffic classification, which is band 1. Therefore pings will be send out +queue 1 on the NIC. + +Note the use of the multiqueue keyword. This is only in versions of iproute2 +that support multiqueue networking devices; if this is omitted when loading +a qdisc onto a multiqueue device, the qdisc will load and operate the same +if it were loaded onto a single-queue device (i.e. - sends all traffic to +queue 0). + +Another alternative to multiqueue band allocation can be done by using the +multiqueue option and specify 0 bands. If this is the case, the qdisc will +allocate the number of bands to equal the number of queues that the device +reports, and bring the qdisc online. + +The behavior of tc filters remains the same, where it will override TOS priority +classification. + + +Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index ce1361f95243..37869295fc70 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt @@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If separately allocated data is attached to the network device (dev->priv) then it is up to the module exit handler to free that. +MTU +=== +Each network device has a Maximum Transfer Unit. The MTU does not +include any link layer protocol overhead. Upper layer protocols must +not pass a socket buffer (skb) to a device to transmit with more data +than the mtu. The MTU does not include link layer header overhead, so +for example on Ethernet if the standard MTU is 1500 bytes used, the +actual skb will contain up to 1514 bytes because of the Ethernet +header. Devices should allow for the 4 byte VLAN header as well. + +Segmentation Offload (GSO, TSO) is an exception to this rule. The +upper layer protocol may pass a large socket buffer to the device +transmit routine, and the device will break that up into separate +packets based on the current MTU. + +MTU is symmetrical and applies both to receive and transmit. A device +must be able to receive at least the maximum size packet allowed by +the MTU. A network device may use the MTU as mechanism to size receive +buffers, but the device should allow packets with VLAN header. With +standard Ethernet mtu of 1500 bytes, the device should allow up to +1518 byte packets (1500 + 14 header + 4 tag). The device may either: +drop, truncate, or pass up oversize packets, but dropping oversize +packets is preferred. + struct net_device synchronization rules ======================================= @@ -43,16 +67,17 @@ dev->get_stats: dev->hard_start_xmit: Synchronization: netif_tx_lock spinlock. + When the driver sets NETIF_F_LLTX in dev->features this will be called without holding netif_tx_lock. In this case the driver has to lock by itself when needed. It is recommended to use a try lock - for this and return -1 when the spin lock fails. + for this and return NETDEV_TX_LOCKED when the spin lock fails. The locking there should also properly protect against - set_multicast_list - Context: Process with BHs disabled or BH (timer). - Notes: netif_queue_stopped() is guaranteed false - Interrupts must be enabled when calling hard_start_xmit. - (Interrupts must also be enabled when enabling the BH handler.) + set_multicast_list. + + Context: Process with BHs disabled or BH (timer), + will be called with interrupts disabled by netconsole. + Return codes: o NETDEV_TX_OK everything ok. o NETDEV_TX_BUSY Cannot transmit packet, try later @@ -74,4 +99,5 @@ dev->poll: Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See dev_close code and comments in net/core/dev.c for more info. Context: softirq + will be called with interrupts disabled by netconsole. diff --git a/Documentation/pci.txt b/Documentation/pci.txt index d38261b67905..7754f5aea4e9 100644 --- a/Documentation/pci.txt +++ b/Documentation/pci.txt @@ -113,9 +113,6 @@ initialization with a pointer to a structure describing the driver (Please see Documentation/power/pci.txt for descriptions of PCI Power Management and the related functions.) - enable_wake Enable device to generate wake events from a low power - state. - shutdown Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any idling DMA operations. Useful for enabling wake-on-lan (NIC) or changing @@ -299,7 +296,10 @@ If the PCI device can use the PCI Memory-Write-Invalidate transaction, call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval and also ensures that the cache line size register is set correctly. Check the return value of pci_set_mwi() as not all architectures -or chip-sets may support Memory-Write-Invalidate. +or chip-sets may support Memory-Write-Invalidate. Alternatively, +if Mem-Wr-Inval would be nice to have but is not required, call +pci_try_set_mwi() to have the system do its best effort at enabling +Mem-Wr-Inval. 3.2 Request MMIO/IOP resources diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index e00b099a4b86..dd8fe43888d3 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt @@ -164,7 +164,6 @@ struct pci_driver: int (*suspend) (struct pci_dev *dev, pm_message_t state); int (*resume) (struct pci_dev *dev); - int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); suspend @@ -251,42 +250,6 @@ The driver should update the current_state field in its pci_dev structure in this function, except for PM-capable devices when pci_set_power_state is used. -enable_wake ------------ - -Usage: - -if (dev->driver && dev->driver->enable_wake) - dev->driver->enable_wake(dev,state,enable); - -This callback is generally only relevant for devices that support the PCI PM -spec and have the ability to generate a PME# (Power Management Event Signal) -to wake the system up. (However, it is possible that a device may support -some non-standard way of generating a wake event on sleep.) - -Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's -PM Capabilities describe what power states the device supports generating a -wake event from: - -+------------------+ -| Bit | State | -+------------------+ -| 11 | D0 | -| 12 | D1 | -| 13 | D2 | -| 14 | D3hot | -| 15 | D3cold | -+------------------+ - -A device can use this to enable wake events: - - pci_enable_wake(dev,state,enable); - -Note that to enable PME# from D3cold, a value of 4 should be passed to -pci_enable_wake (since it uses an index into a bitmask). If a driver gets -a request to enable wake events from D3, two calls should be made to -pci_enable_wake (one for both D3hot and D3cold). - A reference implementation ------------------------- diff --git a/Documentation/sysfs-rules.txt b/Documentation/sysfs-rules.txt new file mode 100644 index 000000000000..42861bb0bc9b --- /dev/null +++ b/Documentation/sysfs-rules.txt @@ -0,0 +1,166 @@ +Rules on how to access information in the Linux kernel sysfs + +The kernel exported sysfs exports internal kernel implementation-details +and depends on internal kernel structures and layout. It is agreed upon +by the kernel developers that the Linux kernel does not provide a stable +internal API. As sysfs is a direct export of kernel internal +structures, the sysfs interface can not provide a stable interface eighter, +it may always change along with internal kernel changes. + +To minimize the risk of breaking users of sysfs, which are in most cases +low-level userspace applications, with a new kernel release, the users +of sysfs must follow some rules to use an as abstract-as-possible way to +access this filesystem. The current udev and HAL programs already +implement this and users are encouraged to plug, if possible, into the +abstractions these programs provide instead of accessing sysfs +directly. + +But if you really do want or need to access sysfs directly, please follow +the following rules and then your programs should work with future +versions of the sysfs interface. + +- Do not use libsysfs + It makes assumptions about sysfs which are not true. Its API does not + offer any abstraction, it exposes all the kernel driver-core + implementation details in its own API. Therefore it is not better than + reading directories and opening the files yourself. + Also, it is not actively maintained, in the sense of reflecting the + current kernel-development. The goal of providing a stable interface + to sysfs has failed, it causes more problems, than it solves. It + violates many of the rules in this document. + +- sysfs is always at /sys + Parsing /proc/mounts is a waste of time. Other mount points are a + system configuration bug you should not try to solve. For test cases, + possibly support a SYSFS_PATH environment variable to overwrite the + applications behavior, but never try to search for sysfs. Never try + to mount it, if you are not an early boot script. + +- devices are only "devices" + There is no such thing like class-, bus-, physical devices, + interfaces, and such that you can rely on in userspace. Everything is + just simply a "device". Class-, bus-, physical, ... types are just + kernel implementation details, which should not be expected by + applications that look for devices in sysfs. + + The properties of a device are: + o devpath (/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0) + - identical to the DEVPATH value in the event sent from the kernel + at device creation and removal + - the unique key to the device at that point in time + - the kernels path to the device-directory without the leading + /sys, and always starting with with a slash + - all elements of a devpath must be real directories. Symlinks + pointing to /sys/devices must always be resolved to their real + target, and the target path must be used to access the device. + That way the devpath to the device matches the devpath of the + kernel used at event time. + - using or exposing symlink values as elements in a devpath string + is a bug in the application + + o kernel name (sda, tty, 0000:00:1f.2, ...) + - a directory name, identical to the last element of the devpath + - applications need to handle spaces and characters like '!' in + the name + + o subsystem (block, tty, pci, ...) + - simple string, never a path or a link + - retrieved by reading the "subsystem"-link and using only the + last element of the target path + + o driver (tg3, ata_piix, uhci_hcd) + - a simple string, which may contain spaces, never a path or a + link + - it is retrieved by reading the "driver"-link and using only the + last element of the target path + - devices which do not have "driver"-link, just do not have a + driver; copying the driver value in a child device context, is a + bug in the application + + o attributes + - the files in the device directory or files below a subdirectories + of the same device directory + - accessing attributes reached by a symlink pointing to another device, + like the "device"-link, is a bug in the application + + Everything else is just a kernel driver-core implementation detail, + that should not be assumed to be stable across kernel releases. + +- Properties of parent devices never belong into a child device. + Always look at the parent devices themselves for determining device + context properties. If the device 'eth0' or 'sda' does not have a + "driver"-link, then this device does not have a driver. Its value is empty. + Never copy any property of the parent-device into a child-device. Parent + device-properties may change dynamically without any notice to the + child device. + +- Hierarchy in a single device-tree + There is only one valid place in sysfs where hierarchy can be examined + and this is below: /sys/devices. + It is planned, that all device directories will end up in the tree + below this directory. + +- Classification by subsystem + There are currently three places for classification of devices: + /sys/block, /sys/class and /sys/bus. It is planned that these will + not contain any device-directories themselves, but only flat lists of + symlinks pointing to the unified /sys/devices tree. + All three places have completely different rules on how to access + device information. It is planned to merge all three + classification-directories into one place at /sys/subsystem, + following the layout of the bus-directories. All buses and + classes, including the converted block-subsystem, will show up + there. + The devices belonging to a subsystem will create a symlink in the + "devices" directory at /sys/subsystem/<name>/devices. + + If /sys/subsystem exists, /sys/bus, /sys/class and /sys/block can be + ignored. If it does not exist, you have always to scan all three + places, as the kernel is free to move a subsystem from one place to + the other, as long as the devices are still reachable by the same + subsystem name. + + Assuming /sys/class/<subsystem> and /sys/bus/<subsystem>, or + /sys/block and /sys/class/block are not interchangeable, is a bug in + the application. + +- Block + The converted block-subsystem at /sys/class/block, or + /sys/subsystem/block will contain the links for disks and partitions + at the same level, never in a hierarchy. Assuming the block-subsytem to + contain only disks and not partition-devices in the same flat list is + a bug in the application. + +- "device"-link and <subsystem>:<kernel name>-links + Never depend on the "device"-link. The "device"-link is a workaround + for the old layout, where class-devices are not created in + /sys/devices/ like the bus-devices. If the link-resolving of a + device-directory does not end in /sys/devices/, you can use the + "device"-link to find the parent devices in /sys/devices/. That is the + single valid use of the "device"-link, it must never appear in any + path as an element. Assuming the existence of the "device"-link for + a device in /sys/devices/ is a bug in the application. + Accessing /sys/class/net/eth0/device is a bug in the application. + + Never depend on the class-specific links back to the /sys/class + directory. These links are also a workaround for the design mistake + that class-devices are not created in /sys/devices. If a device + directory does not contain directories for child devices, these links + may be used to find the child devices in /sys/class. That is the single + valid use of these links, they must never appear in any path as an + element. Assuming the existence of these links for devices which are + real child device directories in the /sys/devices tree, is a bug in + the application. + + It is planned to remove all these links when when all class-device + directories live in /sys/devices. + +- Position of devices along device chain can change. + Never depend on a specific parent device position in the devpath, + or the chain of parent devices. The kernel is free to insert devices into + the chain. You must always request the parent device you are looking for + by its subsystem value. You need to walk up the chain until you find + the device that matches the expected subsystem. Depending on a specific + position of a parent device, or exposing relative paths, using "../" to + access the chain of parents, is a bug in the application. + |