From 099f53cb50e45ef617a9f1d63ceec799e489418b Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 8 Apr 2009 14:28:37 -0700 Subject: async_tx: rename zero_sum to val 'zero_sum' does not properly describe the operation of generating parity and checking that it validates against an existing buffer. Change the name of the operation to 'val' (for 'validate'). This is in anticipation of the p+q case where it is a requirement to identify the target parity buffers separately from the source buffers, because the target parity buffers will not have corresponding pq coefficients. Reviewed-by: Andre Noll Acked-by: Maciej Sosnowski Signed-off-by: Dan Williams --- Documentation/crypto/async-tx-api.txt | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/crypto/async-tx-api.txt b/Documentation/crypto/async-tx-api.txt index 9f59fcbf5d82..4af12180d191 100644 --- a/Documentation/crypto/async-tx-api.txt +++ b/Documentation/crypto/async-tx-api.txt @@ -61,13 +61,13 @@ async_(, void *callback_parameter); 3.2 Supported operations: -memcpy - memory copy between a source and a destination buffer -memset - fill a destination buffer with a byte value -xor - xor a series of source buffers and write the result to a - destination buffer -xor_zero_sum - xor a series of source buffers and set a flag if the - result is zero. The implementation attempts to prevent - writes to memory +memcpy - memory copy between a source and a destination buffer +memset - fill a destination buffer with a byte value +xor - xor a series of source buffers and write the result to a + destination buffer +xor_val - xor a series of source buffers and set a flag if the + result is zero. The implementation attempts to prevent + writes to memory 3.3 Descriptor management: The return value is non-NULL and points to a 'descriptor' when the operation -- cgit v1.2.1 From 88ba2aa586c874681c072101287e15d40de7e6e2 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 9 Apr 2009 16:16:18 -0700 Subject: async_tx: kill ASYNC_TX_DEP_ACK flag In support of inter-channel chaining async_tx utilizes an ack flag to gate whether a dependent operation can be chained to another. While the flag is not set the chain can be considered open for appending. Setting the ack flag closes the chain and flags the descriptor for garbage collection. The ASYNC_TX_DEP_ACK flag essentially means "close the chain after adding this dependency". Since each operation can only have one child the api now implicitly sets the ack flag at dependency submission time. This removes an unnecessary management burden from clients of the api. [ Impact: clean up and enforce one dependency per operation ] Reviewed-by: Andre Noll Acked-by: Maciej Sosnowski Signed-off-by: Dan Williams --- Documentation/crypto/async-tx-api.txt | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/crypto/async-tx-api.txt b/Documentation/crypto/async-tx-api.txt index 4af12180d191..76feda8541dc 100644 --- a/Documentation/crypto/async-tx-api.txt +++ b/Documentation/crypto/async-tx-api.txt @@ -80,8 +80,8 @@ acknowledged by the application before the offload engine driver is allowed to recycle (or free) the descriptor. A descriptor can be acked by one of the following methods: 1/ setting the ASYNC_TX_ACK flag if no child operations are to be submitted -2/ setting the ASYNC_TX_DEP_ACK flag to acknowledge the parent - descriptor of a new operation. +2/ submitting an unacknowledged descriptor as a dependency to another + async_tx call will implicitly set the acknowledged state. 3/ calling async_tx_ack() on the descriptor. 3.4 When does the operation execute? @@ -136,10 +136,9 @@ int run_xor_copy_xor(struct page **xor_srcs, tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL); - tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, - ASYNC_TX_DEP_ACK, tx, NULL, NULL); + tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, tx, NULL, NULL); tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, - ASYNC_TX_XOR_DROP_DST | ASYNC_TX_DEP_ACK | ASYNC_TX_ACK, + ASYNC_TX_XOR_DROP_DST | ASYNC_TX_ACK, tx, complete_xor_copy_xor, NULL); async_tx_issue_pending_all(); -- cgit v1.2.1 From a08abd8ca890a377521d65d493d174bebcaf694b Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 3 Jun 2009 11:43:59 -0700 Subject: async_tx: structify submission arguments, add scribble Prepare the api for the arrival of a new parameter, 'scribble'. This will allow callers to identify scratchpad memory for dma address or page address conversions. As this adds yet another parameter, take this opportunity to convert the common submission parameters (flags, dependency, callback, and callback argument) into an object that is passed by reference. Also, take this opportunity to fix up the kerneldoc and add notes about the relevant ASYNC_TX_* flags for each routine. [ Impact: moves api pass-by-value parameters to a pass-by-reference struct ] Signed-off-by: Andre Noll Acked-by: Maciej Sosnowski Signed-off-by: Dan Williams --- Documentation/crypto/async-tx-api.txt | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/crypto/async-tx-api.txt b/Documentation/crypto/async-tx-api.txt index 76feda8541dc..dfe0475f7919 100644 --- a/Documentation/crypto/async-tx-api.txt +++ b/Documentation/crypto/async-tx-api.txt @@ -54,11 +54,7 @@ features surfaced as a result: 3.1 General format of the API: struct dma_async_tx_descriptor * -async_(, - enum async_tx_flags flags, - struct dma_async_tx_descriptor *dependency, - dma_async_tx_callback callback_routine, - void *callback_parameter); +async_(, struct async_submit ctl *submit) 3.2 Supported operations: memcpy - memory copy between a source and a destination buffer -- cgit v1.2.1 From 04ce9ab385dc97eb55299d533cd3af79b8fc7529 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 3 Jun 2009 14:22:28 -0700 Subject: async_xor: permit callers to pass in a 'dma/page scribble' region async_xor() needs space to perform dma and page address conversions. In most cases the code can simply reuse the struct page * array because the size of the native pointer matches the size of a dma/page address. In order to support archs where sizeof(dma_addr_t) is larger than sizeof(struct page *), or to preserve the input parameters, we utilize a memory region passed in by the caller. Since the code is now prepared to handle the case where it cannot perform address conversions on the stack, we no longer need the !HIGHMEM64G dependency in drivers/dma/Kconfig. [ Impact: don't clobber input buffers for address conversions ] Reviewed-by: Andre Noll Acked-by: Maciej Sosnowski Signed-off-by: Dan Williams --- Documentation/crypto/async-tx-api.txt | 43 +++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 15 deletions(-) (limited to 'Documentation') diff --git a/Documentation/crypto/async-tx-api.txt b/Documentation/crypto/async-tx-api.txt index dfe0475f7919..6b15e488c0e7 100644 --- a/Documentation/crypto/async-tx-api.txt +++ b/Documentation/crypto/async-tx-api.txt @@ -115,29 +115,42 @@ of an operation. Perform a xor->copy->xor operation where each operation depends on the result from the previous operation: -void complete_xor_copy_xor(void *param) +void callback(void *param) { - printk("complete\n"); + struct completion *cmp = param; + + complete(cmp); } -int run_xor_copy_xor(struct page **xor_srcs, - int xor_src_cnt, - struct page *xor_dest, - size_t xor_len, - struct page *copy_src, - struct page *copy_dest, - size_t copy_len) +void run_xor_copy_xor(struct page **xor_srcs, + int xor_src_cnt, + struct page *xor_dest, + size_t xor_len, + struct page *copy_src, + struct page *copy_dest, + size_t copy_len) { struct dma_async_tx_descriptor *tx; + addr_conv_t addr_conv[xor_src_cnt]; + struct async_submit_ctl submit; + addr_conv_t addr_conv[NDISKS]; + struct completion cmp; + + init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL, + addr_conv); + tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit) - tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, - ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL); - tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, tx, NULL, NULL); - tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, - ASYNC_TX_XOR_DROP_DST | ASYNC_TX_ACK, - tx, complete_xor_copy_xor, NULL); + submit->depend_tx = tx; + tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, &submit); + + init_completion(&cmp); + init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST | ASYNC_TX_ACK, tx, + callback, &cmp, addr_conv); + tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit); async_tx_issue_pending_all(); + + wait_for_completion(&cmp); } See include/linux/async_tx.h for more information on the flags. See the -- cgit v1.2.1 From 955234755ce4a2c33cfc558912aa8f2148cc1fc6 Mon Sep 17 00:00:00 2001 From: Paul Wise Date: Sat, 1 Aug 2009 21:30:31 +0900 Subject: vfat: change the default from shortname=lower to shortname=mixed Because, with "shortname=lower", copying one FAT filesystem tree to another FAT filesystem tree using Linux results in semantically different filesystems. (E.g.: Filenames which were once "all uppercase" are now "all lowercase"). So, this changes the default of "shortname=lower" to "shortname=mixed". Signed-off-by: Paul Wise [change fat_show_options()] Signed-off-by: OGAWA Hirofumi --- Documentation/filesystems/vfat.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt index b58b84b50fa2..eed520fd0c8e 100644 --- a/Documentation/filesystems/vfat.txt +++ b/Documentation/filesystems/vfat.txt @@ -102,7 +102,7 @@ shortname=lower|win95|winnt|mixed winnt: emulate the Windows NT rule for display/create. mixed: emulate the Windows NT rule for display, emulate the Windows 95 rule for create. - Default setting is `lower'. + Default setting is `mixed'. tz=UTC -- Interpret timestamps as UTC rather than local time. This option disables the conversion of timestamps -- cgit v1.2.1 From b2f46fd8ef3dff2ab30f31126833f78b7480283a Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 14 Jul 2009 12:20:36 -0700 Subject: async_tx: add support for asynchronous GF multiplication [ Based on an original patch by Yuri Tikhonov ] This adds support for doing asynchronous GF multiplication by adding two additional functions to the async_tx API: async_gen_syndrome() does simultaneous XOR and Galois field multiplication of sources. async_syndrome_val() validates the given source buffers against known P and Q values. When a request is made to run async_pq against more than the hardware maximum number of supported sources we need to reuse the previous generated P and Q values as sources into the next operation. Care must be taken to remove Q from P' and P from Q'. For example to perform a 5 source pq op with hardware that only supports 4 sources at a time the following approach is taken: p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08})) p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10})) p' = p + q + q + src4 = p + src4 q' = {00}*p + {01}*q + {00}*q + {10}*src4 = q + {10}*src4 Note: 4 is the minimum acceptable maxpq otherwise we punt to synchronous-software path. The DMA_PREP_CONTINUE flag indicates to the driver to reuse p and q as sources (in the above manner) and fill the remaining slots up to maxpq with the new sources/coefficients. Note1: Some devices have native support for P+Q continuation and can skip this extra work. Devices with this capability can advertise it with dma_set_maxpq. It is up to each driver how to handle the DMA_PREP_CONTINUE flag. Note2: The api supports disabling the generation of P when generating Q, this is ignored by the synchronous path but is implemented by some dma devices to save unnecessary writes. In this case the continuation algorithm is simplified to only reuse Q as a source. Cc: H. Peter Anvin Cc: David Woodhouse Signed-off-by: Yuri Tikhonov Signed-off-by: Ilya Yanok Reviewed-by: Andre Noll Acked-by: Maciej Sosnowski Signed-off-by: Dan Williams --- Documentation/crypto/async-tx-api.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation') diff --git a/Documentation/crypto/async-tx-api.txt b/Documentation/crypto/async-tx-api.txt index 6b15e488c0e7..0e48e054d69a 100644 --- a/Documentation/crypto/async-tx-api.txt +++ b/Documentation/crypto/async-tx-api.txt @@ -64,6 +64,9 @@ xor - xor a series of source buffers and write the result to a xor_val - xor a series of source buffers and set a flag if the result is zero. The implementation attempts to prevent writes to memory +pq - generate the p+q (raid6 syndrome) from a series of source buffers +pq_val - validate that a p and or q buffer are in sync with a given series of + sources 3.3 Descriptor management: The return value is non-NULL and points to a 'descriptor' when the operation -- cgit v1.2.1 From 0a82a6239beecc95db6e05fe43ee62d16b381d38 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 14 Jul 2009 12:20:37 -0700 Subject: async_tx: add support for asynchronous RAID6 recovery operations async_raid6_2data_recov() recovers two data disk failures async_raid6_datap_recov() recovers a data disk and the P disk These routines are a port of the synchronous versions found in drivers/md/raid6recov.c. The primary difference is breaking out the xor operations into separate calls to async_xor. Two helper routines are introduced to perform scalar multiplication where needed. async_sum_product() multiplies two sources by scalar coefficients and then sums (xor) the result. async_mult() simply multiplies a single source by a scalar. This implemention also includes, in contrast to the original synchronous-only code, special case handling for the 4-disk and 5-disk array cases. In these situations the default N-disk algorithm will present 0-source or 1-source operations to dma devices. To cover for dma devices where the minimum source count is 2 we implement 4-disk and 5-disk handling in the recovery code. [ Impact: asynchronous raid6 recovery routines for 2data and datap cases ] Cc: Yuri Tikhonov Cc: Ilya Yanok Cc: H. Peter Anvin Cc: David Woodhouse Reviewed-by: Andre Noll Acked-by: Maciej Sosnowski Signed-off-by: Dan Williams --- Documentation/crypto/async-tx-api.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/crypto/async-tx-api.txt b/Documentation/crypto/async-tx-api.txt index 0e48e054d69a..ba046b8fa92f 100644 --- a/Documentation/crypto/async-tx-api.txt +++ b/Documentation/crypto/async-tx-api.txt @@ -67,6 +67,10 @@ xor_val - xor a series of source buffers and set a flag if the pq - generate the p+q (raid6 syndrome) from a series of source buffers pq_val - validate that a p and or q buffer are in sync with a given series of sources +datap - (raid6_datap_recov) recover a raid6 data block and the p block + from the given sources +2data - (raid6_2data_recov) recover 2 raid6 data blocks from the given + sources 3.3 Descriptor management: The return value is non-NULL and points to a 'descriptor' when the operation -- cgit v1.2.1 From bc581770cfdd8c17ea17d324dc05e2f9c599e7ca Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Tue, 15 Sep 2009 17:30:37 +0100 Subject: ARM: 5580/2: ARM TCM (Tightly-Coupled Memory) support v3 This adds the TCM interface to Linux, when active, it will detect and report TCM memories and sizes early in boot if present, introduce generic TCM memory handling, provide a generic TCM memory pool and select TCM memory for the U300 platform. See the Documentation/arm/tcm.txt for documentation. Signed-off-by: Linus Walleij Signed-off-by: Russell King --- Documentation/arm/tcm.txt | 145 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 145 insertions(+) create mode 100644 Documentation/arm/tcm.txt (limited to 'Documentation') diff --git a/Documentation/arm/tcm.txt b/Documentation/arm/tcm.txt new file mode 100644 index 000000000000..074f4be6667f --- /dev/null +++ b/Documentation/arm/tcm.txt @@ -0,0 +1,145 @@ +ARM TCM (Tightly-Coupled Memory) handling in Linux +---- +Written by Linus Walleij + +Some ARM SoC:s have a so-called TCM (Tightly-Coupled Memory). +This is usually just a few (4-64) KiB of RAM inside the ARM +processor. + +Due to being embedded inside the CPU The TCM has a +Harvard-architecture, so there is an ITCM (instruction TCM) +and a DTCM (data TCM). The DTCM can not contain any +instructions, but the ITCM can actually contain data. +The size of DTCM or ITCM is minimum 4KiB so the typical +minimum configuration is 4KiB ITCM and 4KiB DTCM. + +ARM CPU:s have special registers to read out status, physical +location and size of TCM memories. arch/arm/include/asm/cputype.h +defines a CPUID_TCM register that you can read out from the +system control coprocessor. Documentation from ARM can be found +at http://infocenter.arm.com, search for "TCM Status Register" +to see documents for all CPUs. Reading this register you can +determine if ITCM (bit 0) and/or DTCM (bit 16) is present in the +machine. + +There is further a TCM region register (search for "TCM Region +Registers" at the ARM site) that can report and modify the location +size of TCM memories at runtime. This is used to read out and modify +TCM location and size. Notice that this is not a MMU table: you +actually move the physical location of the TCM around. At the +place you put it, it will mask any underlying RAM from the +CPU so it is usually wise not to overlap any physical RAM with +the TCM. The TCM memory exists totally outside the MMU and will +override any MMU mappings. + +Code executing inside the ITCM does not "see" any MMU mappings +and e.g. register accesses must be made to physical addresses. + +TCM is used for a few things: + +- FIQ and other interrupt handlers that need deterministic + timing and cannot wait for cache misses. + +- Idle loops where all external RAM is set to self-refresh + retention mode, so only on-chip RAM is accessible by + the CPU and then we hang inside ITCM waiting for an + interrupt. + +- Other operations which implies shutting off or reconfiguring + the external RAM controller. + +There is an interface for using TCM on the ARM architecture +in . Using this interface it is possible to: + +- Define the physical address and size of ITCM and DTCM. + +- Tag functions to be compiled into ITCM. + +- Tag data and constants to be allocated to DTCM and ITCM. + +- Have the remaining TCM RAM added to a special + allocation pool with gen_pool_create() and gen_pool_add() + and provice tcm_alloc() and tcm_free() for this + memory. Such a heap is great for things like saving + device state when shutting off device power domains. + +A machine that has TCM memory shall select HAVE_TCM in +arch/arm/Kconfig for itself, and then the +rest of the functionality will depend on the physical +location and size of ITCM and DTCM to be defined in +mach/memory.h for the machine. Code that needs to use +TCM shall #include If the TCM is not located +at the place given in memory.h it will be moved using +the TCM Region registers. + +Functions to go into itcm can be tagged like this: +int __tcmfunc foo(int bar); + +Variables to go into dtcm can be tagged like this: +int __tcmdata foo; + +Constants can be tagged like this: +int __tcmconst foo; + +To put assembler into TCM just use +.section ".tcm.text" or .section ".tcm.data" +respectively. + +Example code: + +#include + +/* Uninitialized data */ +static u32 __tcmdata tcmvar; +/* Initialized data */ +static u32 __tcmdata tcmassigned = 0x2BADBABEU; +/* Constant */ +static const u32 __tcmconst tcmconst = 0xCAFEBABEU; + +static void __tcmlocalfunc tcm_to_tcm(void) +{ + int i; + for (i = 0; i < 100; i++) + tcmvar ++; +} + +static void __tcmfunc hello_tcm(void) +{ + /* Some abstract code that runs in ITCM */ + int i; + for (i = 0; i < 100; i++) { + tcmvar ++; + } + tcm_to_tcm(); +} + +static void __init test_tcm(void) +{ + u32 *tcmem; + int i; + + hello_tcm(); + printk("Hello TCM executed from ITCM RAM\n"); + + printk("TCM variable from testrun: %u @ %p\n", tcmvar, &tcmvar); + tcmvar = 0xDEADBEEFU; + printk("TCM variable: 0x%x @ %p\n", tcmvar, &tcmvar); + + printk("TCM assigned variable: 0x%x @ %p\n", tcmassigned, &tcmassigned); + + printk("TCM constant: 0x%x @ %p\n", tcmconst, &tcmconst); + + /* Allocate some TCM memory from the pool */ + tcmem = tcm_alloc(20); + if (tcmem) { + printk("TCM Allocated 20 bytes of TCM @ %p\n", tcmem); + tcmem[0] = 0xDEADBEEFU; + tcmem[1] = 0x2BADBABEU; + tcmem[2] = 0xCAFEBABEU; + tcmem[3] = 0xDEADBEEFU; + tcmem[4] = 0x2BADBABEU; + for (i = 0; i < 5; i++) + printk("TCM tcmem[%d] = %08x\n", i, tcmem[i]); + tcm_free(tcmem, 20); + } +} -- cgit v1.2.1 From 257187362123f15d9d1e09918cf87cebbea4e786 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 16 Sep 2009 11:50:13 +0200 Subject: HWPOISON: Define a new error_remove_page address space op for async truncation Truncating metadata pages is not safe right now before we haven't audited all file systems. To enable truncation only for data address space define a new address_space callback error_remove_page. This is used for memory_failure.c memory error handling. This can be then set to truncate_inode_page() This patch just defines the new operation and adds documentation. Callers and users come in followon patches. Signed-off-by: Andi Kleen --- Documentation/filesystems/vfs.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index f49eecf2e573..623f094c9d8d 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -536,6 +536,7 @@ struct address_space_operations { /* migrate the contents of a page to the specified target */ int (*migratepage) (struct page *, struct page *); int (*launder_page) (struct page *); + int (*error_remove_page) (struct mapping *mapping, struct page *page); }; writepage: called by the VM to write a dirty page to backing store. @@ -694,6 +695,12 @@ struct address_space_operations { prevent redirtying the page, it is kept locked during the whole operation. + error_remove_page: normally set to generic_error_remove_page if truncation + is ok for this address space. Used for memory failure handling. + Setting this implies you deal with pages going away under you, + unless you have them locked or reference counts increased. + + The File Object =============== -- cgit v1.2.1 From 6a46079cf57a7f7758e8b926980a4f852f89b34d Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 16 Sep 2009 11:50:15 +0200 Subject: HWPOISON: The high level memory error handler in the VM v7 Add the high level memory handler that poisons pages that got corrupted by hardware (typically by a two bit flip in a DIMM or a cache) on the Linux level. The goal is to prevent everyone from accessing these pages in the future. This done at the VM level by marking a page hwpoisoned and doing the appropriate action based on the type of page it is. The code that does this is portable and lives in mm/memory-failure.c To quote the overview comment: High level machine check handler. Handles pages reported by the hardware as being corrupted usually due to a 2bit ECC memory or cache failure. This focuses on pages detected as corrupted in the background. When the current CPU tries to consume corruption the currently running process can just be killed directly instead. This implies that if the error cannot be handled for some reason it's safe to just ignore it because no corruption has been consumed yet. Instead when that happens another machine check will happen. Handles page cache pages in various states. The tricky part here is that we can access any page asynchronous to other VM users, because memory failures could happen anytime and anywhere, possibly violating some of their assumptions. This is why this code has to be extremely careful. Generally it tries to use normal locking rules, as in get the standard locks, even if that means the error handling takes potentially a long time. Some of the operations here are somewhat inefficient and have non linear algorithmic complexity, because the data structures have not been optimized for this case. This is in particular the case for the mapping from a vma to a process. Since this case is expected to be rare we hope we can get away with this. There are in principle two strategies to kill processes on poison: - just unmap the data and wait for an actual reference before killing - kill as soon as corruption is detected. Both have advantages and disadvantages and should be used in different situations. Right now both are implemented and can be switched with a new sysctl vm.memory_failure_early_kill The default is early kill. The patch does some rmap data structure walking on its own to collect processes to kill. This is unusual because normally all rmap data structure knowledge is in rmap.c only. I put it here for now to keep everything together and rmap knowledge has been seeping out anyways Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu, Nick Piggin (who did a lot of great work) and others. Cc: npiggin@suse.de Cc: riel@redhat.com Signed-off-by: Andi Kleen Acked-by: Rik van Riel Reviewed-by: Hidehiro Kawai --- Documentation/sysctl/vm.txt | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index c4de6359d440..faf62740aa2c 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -32,6 +32,8 @@ Currently, these files are in /proc/sys/vm: - legacy_va_layout - lowmem_reserve_ratio - max_map_count +- memory_failure_early_kill +- memory_failure_recovery - min_free_kbytes - min_slab_ratio - min_unmapped_ratio @@ -53,7 +55,6 @@ Currently, these files are in /proc/sys/vm: - vfs_cache_pressure - zone_reclaim_mode - ============================================================== block_dump @@ -275,6 +276,44 @@ e.g., up to one or two maps per allocation. The default value is 65536. +============================================================= + +memory_failure_early_kill: + +Control how to kill processes when uncorrected memory error (typically +a 2bit error in a memory module) is detected in the background by hardware +that cannot be handled by the kernel. In some cases (like the page +still having a valid copy on disk) the kernel will handle the failure +transparently without affecting any applications. But if there is +no other uptodate copy of the data it will kill to prevent any data +corruptions from propagating. + +1: Kill all processes that have the corrupted and not reloadable page mapped +as soon as the corruption is detected. Note this is not supported +for a few types of pages, like kernel internally allocated data or +the swap cache, but works for the majority of user pages. + +0: Only unmap the corrupted page from all processes and only kill a process +who tries to access it. + +The kill is done using a catchable SIGBUS with BUS_MCEERR_AO, so processes can +handle this if they want to. + +This is only active on architectures/platforms with advanced machine +check handling and depends on the hardware capabilities. + +Applications can override this setting individually with the PR_MCE_KILL prctl + +============================================================== + +memory_failure_recovery + +Enable memory failure recovery (when supported by the platform) + +1: Attempt recovery. + +0: Always panic on a memory failure. + ============================================================== min_free_kbytes: -- cgit v1.2.1 From caa27b66bd7188fd063769eaf4b33533ef0709e6 Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Mon, 20 Jul 2009 21:37:11 +0200 Subject: kbuild: use INSTALLKERNEL to select customized installkernel script Replace the use of CROSS_COMPILE to select a customized installkernel script with the possibility to set INSTALLKERNEL to select a custom installkernel script when running make: make INSTALLKERNEL=arm-installkernel install With this patch we are now more consistent across different architectures - they did not all support use of CROSS_COMPILE. The use of CROSS_COMPILE was a hack as this really belongs to gcc/binutils and the installkernel script does not change just because we change toolchain. The use of CROSS_COMPILE caused troubles with an upcoming patch that saves CROSS_COMPILE when a kernel is built - it would no longer be installable. [Thanks to Peter Z. for this hint] This patch undos what Ian did in commit: 0f8e2d62fa04441cd12c08ce521e84e5bd3f8a46 ("use ${CROSS_COMPILE}installkernel in arch/*/boot/install.sh") The patch has been lightly tested on x86 only - but all changes looks obvious. Acked-by: Peter Zijlstra Acked-by: Mike Frysinger [blackfin] Acked-by: Russell King [arm] Acked-by: Paul Mundt [sh] Acked-by: "H. Peter Anvin" [x86] Cc: Ian Campbell Cc: Tony Luck [ia64] Cc: Fenghua Yu [ia64] Cc: Hirokazu Takata [m32r] Cc: Geert Uytterhoeven [m68k] Cc: Kyle McMartin [parisc] Cc: Benjamin Herrenschmidt [powerpc] Cc: Martin Schwidefsky [s390] Cc: Thomas Gleixner [x86] Cc: Ingo Molnar [x86] Signed-off-by: Sam Ravnborg --- Documentation/kbuild/kbuild.txt | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt index f3355b6812df..bb3bf38f03da 100644 --- a/Documentation/kbuild/kbuild.txt +++ b/Documentation/kbuild/kbuild.txt @@ -65,6 +65,22 @@ INSTALL_PATH INSTALL_PATH specifies where to place the updated kernel and system map images. Default is /boot, but you can set it to other values. +INSTALLKERNEL +-------------------------------------------------- +Install script called when using "make install". +The default name is "installkernel". + +The script will be called with the following arguments: + $1 - kernel version + $2 - kernel image file + $3 - kernel map file + $4 - default install path (use root directory if blank) + +The implmentation of "make install" is architecture specific +and it may differ from the above. + +INSTALLKERNEL is provided to enable the possibility to +specify a custom installer when cross compiling a kernel. MODLIB -------------------------------------------------- -- cgit v1.2.1 From f86fd306605287d7c7f4f0f8e8e2a9d49d28b396 Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Sat, 19 Sep 2009 10:14:33 +0200 Subject: kbuild: rename ld-option to cc-ldoption ld-option is misnamed as it test options to gcc, not to ld. Renamed it to reflect this. Cc: Andi Kleen Cc: Roland McGrath Signed-off-by: Sam Ravnborg --- Documentation/kbuild/makefiles.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index d76cfd8712e1..7847fce13bd6 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -435,14 +435,14 @@ more details, with real examples. The second argument is optional, and if supplied will be used if first argument is not supported. - ld-option - ld-option is used to check if $(CC) when used to link object files + cc-ldoption + cc-ldoption is used to check if $(CC) when used to link object files supports the given option. An optional second option may be specified if first option are not supported. Example: #arch/i386/kernel/Makefile - vsyscall-flags += $(call ld-option, -Wl$(comma)--hash-style=sysv) + vsyscall-flags += $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) In the above example, vsyscall-flags will be assigned the option -Wl$(comma)--hash-style=sysv if it is supported by $(CC). -- cgit v1.2.1 From 691ef3e7fdc1fe4dded169d9404f740987f67d66 Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Sat, 19 Sep 2009 10:31:45 +0200 Subject: kbuild: introduce ld-option ld-option is used to check if $(LD) supports a specific option. Based on patch from Andi Kleen. Cc: Andi Kleen Signed-off-by: Sam Ravnborg First use is to check if option -X is supported (upcoming patch). Theis is ne --- Documentation/kbuild/makefiles.txt | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index 7847fce13bd6..71c602d61680 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -18,6 +18,7 @@ This document describes the Linux kernel Makefiles. --- 3.9 Dependency tracking --- 3.10 Special Rules --- 3.11 $(CC) support functions + --- 3.12 $(LD) support functions === 4 Host Program support --- 4.1 Simple Host Program @@ -570,6 +571,19 @@ more details, with real examples. endif endif +--- 3.12 $(LD) support functions + + ld-option + ld-option is used to check if $(LD) supports the supplied option. + ld-option takes two options as arguments. + The second argument is an optional option that can be used if the + first option is not supported by $(LD). + + Example: + #Makefile + LDFLAGS_vmlinux += $(call really-ld-option, -X) + + === 4 Host Program support Kbuild supports building executables on the host for use during the -- cgit v1.2.1 From 176dd98523fee4836210bc0834c8e3e6a93247bf Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Sun, 20 Sep 2009 14:09:24 -0300 Subject: thinkpad-acpi: drop HKEY event 0x5010 HKEY event 0x5010 is useless to us: old ThinkPads don't issue it. Newer ThinkPads won't issue it anymore. And all ThinkPads issue 0x1010 and 0x1011 events. Just silently drop it instead of sending it to userspace. Signed-off-by: Henrique de Moraes Holschuh Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index 6d03487ef1c7..f635fb09d620 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -525,6 +525,7 @@ compatibility purposes when hotkey_report_mode is set to 1. 0x2305 System is waking up from suspend to eject bay 0x2404 System is waking up from hibernation to undock 0x2405 System is waking up from hibernation to eject bay +0x5010 Brightness level changed/control event The above events are never propagated by the driver. @@ -532,7 +533,6 @@ The above events are never propagated by the driver. 0x4003 Undocked (see 0x2x04), can sleep again 0x500B Tablet pen inserted into its storage bay 0x500C Tablet pen removed from its storage bay -0x5010 Brightness level changed (newer Lenovo BIOSes) The above events are propagated by the driver. -- cgit v1.2.1 From 0d922e3b84dc4923fc67901580a3c166006fba7a Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Sun, 20 Sep 2009 14:09:25 -0300 Subject: thinkpad-acpi: hotkey event driver update Update the HKEY event driver to: 1. Handle better the second-gen firmware, which has no HKEY mask support but does report FN+F3, FN+F4 and FN+F12 without the need for NVRAM polling. a) always make the mask-related attributes available in sysfs; b) use DMI quirks to detect the second-gen firmware; c) properly report that FN+F3, FN+F4 and FN+F12 are enabled, and available even on mask-less second-gen firmware; 2. Decouple the issuing of hotkey events towards userspace from their reception from the firmware. ALSA mixer and brightness event reporting support will need this feature. 3. Clean up the mess in the hotkey driver a great deal. It is still very convoluted, and wants a full refactoring into a proper event API interface, but that is not going to happen today. 4. Fully reset firmware interface on resume (restore hotkey mask and status). 5. Stop losing polled events for no good reason when changing the mask and poll frequencies. We will still lose them when the hotkey_source_mask is changed, as well as any that happened between driver suspend and driver resume. The hotkey subdriver now has the notion of user-space-visible hotkey event mask, as well as of the set of "hotkey" events the driver needs (because brightness/volume change reports are not just keypress reports in most ThinkPad models). With this rewrite, the ABI level is bumped to 0x020500 should userspace need to know it is dealing with the updated hotkey subdriver. Signed-off-by: Henrique de Moraes Holschuh Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 46 +++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 20 deletions(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index f635fb09d620..aafcaa634191 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -199,18 +199,22 @@ kind to allow it (and it often doesn't!). Not all bits in the mask can be modified. Not all bits that can be modified do anything. Not all hot keys can be individually controlled -by the mask. Some models do not support the mask at all, and in those -models, hot keys cannot be controlled individually. The behaviour of -the mask is, therefore, highly dependent on the ThinkPad model. +by the mask. Some models do not support the mask at all. The behaviour +of the mask is, therefore, highly dependent on the ThinkPad model. + +The driver will filter out any unmasked hotkeys, so even if the firmware +doesn't allow disabling an specific hotkey, the driver will not report +events for unmasked hotkeys. Note that unmasking some keys prevents their default behavior. For example, if Fn+F5 is unmasked, that key will no longer enable/disable -Bluetooth by itself. +Bluetooth by itself in firmware. -Note also that not all Fn key combinations are supported through ACPI. -For example, on the X40, the brightness, volume and "Access IBM" buttons -do not generate ACPI events even with this driver. They *can* be used -through the "ThinkPad Buttons" utility, see http://www.nongnu.org/tpb/ +Note also that not all Fn key combinations are supported through ACPI +depending on the ThinkPad model and firmware version. On those +ThinkPads, it is still possible to support some extra hotkeys by +polling the "CMOS NVRAM" at least 10 times per second. The driver +attempts to enables this functionality automatically when required. procfs notes: @@ -255,18 +259,11 @@ sysfs notes: 1: does nothing hotkey_mask: - bit mask to enable driver-handling (and depending on + bit mask to enable reporting (and depending on the firmware, ACPI event generation) for each hot key (see above). Returns the current status of the hot keys mask, and allows one to modify it. - Note: when NVRAM polling is active, the firmware mask - will be different from the value returned by - hotkey_mask. The driver will retain enabled bits for - hotkeys that are under NVRAM polling even if the - firmware refuses them, and will not set these bits on - the firmware hot key mask. - hotkey_all_mask: bit mask that should enable event reporting for all supported hot keys, when echoed to hotkey_mask above. @@ -279,7 +276,8 @@ sysfs notes: bit mask that should enable event reporting for all supported hot keys, except those which are always handled by the firmware anyway. Echo it to - hotkey_mask above, to use. + hotkey_mask above, to use. This is the default mask + used by the driver. hotkey_source_mask: bit mask that selects which hot keys will the driver @@ -287,9 +285,10 @@ sysfs notes: based on the capabilities reported by the ACPI firmware, but it can be overridden at runtime. - Hot keys whose bits are set in both hotkey_source_mask - and also on hotkey_mask are polled for in NVRAM. Only a - few hot keys are available through CMOS NVRAM polling. + Hot keys whose bits are set in hotkey_source_mask are + polled for in NVRAM, and reported as hotkey events if + enabled in hotkey_mask. Only a few hot keys are + available through CMOS NVRAM polling. Warning: when in NVRAM mode, the volume up/down/mute keys are synthesized according to changes in the mixer, @@ -621,6 +620,8 @@ For Lenovo models *with* ACPI backlight control: 2. Do *NOT* load up ACPI video, enable the hotkeys in thinkpad-acpi, and map them to KEY_BRIGHTNESS_UP and KEY_BRIGHTNESS_DOWN. Process these keys on userspace somehow (e.g. by calling xbacklight). + The driver will do this automatically if it detects that ACPI video + has been disabled. Bluetooth @@ -1459,3 +1460,8 @@ Sysfs interface changelog: 0x020400: Marker for 16 LEDs support. Also, LEDs that are known to not exist in a given model are not registered with the LED sysfs class anymore. + +0x020500: Updated hotkey driver, hotkey_mask is always available + and it is always able to disable hot keys. Very old + thinkpads are properly supported. hotkey_bios_mask + is deprecated and marked for removal. -- cgit v1.2.1 From f4edeeb3937d5f9953b5722f1cca9573d5ffe8a0 Mon Sep 17 00:00:00 2001 From: Abhishek Kulkarni Date: Tue, 22 Sep 2009 11:34:04 -0500 Subject: 9p: Update documentation to add fscache related bits Update the documentation to describe FS-Cache related caching parameters. This patch also updates the pointers to 9p-related papers and adds pointer to the Wiki. Signed-off-by: Abhishek Kulkarni Signed-off-by: Eric Van Hensbergen --- Documentation/filesystems/9p.txt | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index 6208f55c44c3..57e0b80a5274 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -18,11 +18,11 @@ the 9p client is available in the form of a USENIX paper: Other applications are described in the following papers: * XCPU & Clustering - http://www.xcpu.org/xcpu-talk.pdf + http://xcpu.org/papers/xcpu-talk.pdf * KVMFS: control file system for KVM - http://www.xcpu.org/kvmfs.pdf - * CellFS: A New ProgrammingModel for the Cell BE - http://www.xcpu.org/cellfs-talk.pdf + http://xcpu.org/papers/kvmfs.pdf + * CellFS: A New Programming Model for the Cell BE + http://xcpu.org/papers/cellfs-talk.pdf * PROSE I/O: Using 9p to enable Application Partitions http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf @@ -48,6 +48,7 @@ OPTIONS (see rfdno and wfdno) virtio - connect to the next virtio channel available (from lguest or KVM with trans_virtio module) + rdma - connect to a specified RDMA channel uname=name user name to attempt mount as on the remote server. The server may override or ignore this value. Certain user @@ -59,16 +60,22 @@ OPTIONS cache=mode specifies a caching policy. By default, no caches are used. loose = no attempts are made at consistency, intended for exclusive, read-only mounts + fscache = use FS-Cache for a persistent, read-only + cache backend. debug=n specifies debug level. The debug level is a bitmask. - 0x01 = display verbose error messages - 0x02 = developer debug (DEBUG_CURRENT) - 0x04 = display 9p trace - 0x08 = display VFS trace - 0x10 = display Marshalling debug - 0x20 = display RPC debug - 0x40 = display transport debug - 0x80 = display allocation debug + 0x01 = display verbose error messages + 0x02 = developer debug (DEBUG_CURRENT) + 0x04 = display 9p trace + 0x08 = display VFS trace + 0x10 = display Marshalling debug + 0x20 = display RPC debug + 0x40 = display transport debug + 0x80 = display allocation debug + 0x100 = display protocol message debug + 0x200 = display Fid debug + 0x400 = display packet debug + 0x800 = display fscache tracing debug rfdno=n the file descriptor for reading with trans=fd @@ -100,6 +107,10 @@ OPTIONS any = v9fs does single attach and performs all operations as one user + cachetag cache tag to use the specified persistent cache. + cache tags for existing cache sessions can be listed at + /sys/fs/9p/caches. (applies only to cache=fscache) + RESOURCES ========= @@ -118,7 +129,7 @@ and export. A Linux version of the 9p server is now maintained under the npfs project on sourceforge (http://sourceforge.net/projects/npfs). The currently maintained version is the single-threaded version of the server (named spfs) -available from the same CVS repository. +available from the same SVN repository. There are user and developer mailing lists available through the v9fs project on sourceforge (http://sourceforge.net/projects/v9fs). @@ -126,7 +137,8 @@ on sourceforge (http://sourceforge.net/projects/v9fs). A stand-alone version of the module (which should build for any 2.6 kernel) is available via (http://github.com/ericvh/9p-sac/tree/master) -News and other information is maintained on SWiK (http://swik.net/v9fs). +News and other information is maintained on SWiK (http://swik.net/v9fs) +and the Wiki (http://sf.net/apps/mediawiki/v9fs/index.php). Bug reports may be issued through the kernel.org bugzilla (http://bugzilla.kernel.org) -- cgit v1.2.1 From 91f17e02a224dc649eaffc8e0bca6db85efb9cd7 Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Wed, 23 Sep 2009 22:59:42 +0200 Subject: hwmon: Delete deprecated FSC drivers The legacy fscpos and fscher drivers have been replaced by the unified fschmd driver. The transition period is over now, we can delete them. Signed-off-by: Jean Delvare Acked-by: Hans de Goede --- Documentation/feature-removal-schedule.txt | 8 -- Documentation/hwmon/fscher | 169 ----------------------------- 2 files changed, 177 deletions(-) delete mode 100644 Documentation/hwmon/fscher (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index fa75220f8d34..89a47b5aff07 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -354,14 +354,6 @@ Who: Krzysztof Piotr Oledzki --------------------------- -What: fscher and fscpos drivers -When: June 2009 -Why: Deprecated by the new fschmd driver. -Who: Hans de Goede - Jean Delvare - ---------------------------- - What: sysfs ui for changing p4-clockmod parameters When: September 2009 Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and diff --git a/Documentation/hwmon/fscher b/Documentation/hwmon/fscher deleted file mode 100644 index 64031659aff3..000000000000 --- a/Documentation/hwmon/fscher +++ /dev/null @@ -1,169 +0,0 @@ -Kernel driver fscher -==================== - -Supported chips: - * Fujitsu-Siemens Hermes chip - Prefix: 'fscher' - Addresses scanned: I2C 0x73 - -Authors: - Reinhard Nissl based on work - from Hermann Jung , - Frodo Looijaard , - Philip Edelbrock - -Description ------------ - -This driver implements support for the Fujitsu-Siemens Hermes chip. It is -described in the 'Register Set Specification BMC Hermes based Systemboard' -from Fujitsu-Siemens. - -The Hermes chip implements a hardware-based system management, e.g. for -controlling fan speed and core voltage. There is also a watchdog counter on -the chip which can trigger an alarm and even shut the system down. - -The chip provides three temperature values (CPU, motherboard and -auxiliary), three voltage values (+12V, +5V and battery) and three fans -(power supply, CPU and auxiliary). - -Temperatures are measured in degrees Celsius. The resolution is 1 degree. - -Fan rotation speeds are reported in RPM (rotations per minute). The value -can be divided by a programmable divider (1, 2 or 4) which is stored on -the chip. - -Voltage sensors (also known as "in" sensors) report their values in volts. - -All values are reported as final values from the driver. There is no need -for further calculations. - - -Detailed description --------------------- - -Below you'll find a single line description of all the bit values. With -this information, you're able to decode e. g. alarms, wdog, etc. To make -use of the watchdog, you'll need to set the watchdog time and enable the -watchdog. After that it is necessary to restart the watchdog time within -the specified period of time, or a system reset will occur. - -* revision - READING & 0xff = 0x??: HERMES revision identification - -* alarms - READING & 0x80 = 0x80: CPU throttling active - READING & 0x80 = 0x00: CPU running at full speed - - READING & 0x10 = 0x10: software event (see control:1) - READING & 0x10 = 0x00: no software event - - READING & 0x08 = 0x08: watchdog event (see wdog:2) - READING & 0x08 = 0x00: no watchdog event - - READING & 0x02 = 0x02: thermal event (see temp*:1) - READING & 0x02 = 0x00: no thermal event - - READING & 0x01 = 0x01: fan event (see fan*:1) - READING & 0x01 = 0x00: no fan event - - READING & 0x13 ! 0x00: ALERT LED is flashing - -* control - READING & 0x01 = 0x01: software event - READING & 0x01 = 0x00: no software event - - WRITING & 0x01 = 0x01: set software event - WRITING & 0x01 = 0x00: clear software event - -* watchdog_control - READING & 0x80 = 0x80: power off on watchdog event while thermal event - READING & 0x80 = 0x00: watchdog power off disabled (just system reset enabled) - - READING & 0x40 = 0x40: watchdog timebase 60 seconds (see also wdog:1) - READING & 0x40 = 0x00: watchdog timebase 2 seconds - - READING & 0x10 = 0x10: watchdog enabled - READING & 0x10 = 0x00: watchdog disabled - - WRITING & 0x80 = 0x80: enable "power off on watchdog event while thermal event" - WRITING & 0x80 = 0x00: disable "power off on watchdog event while thermal event" - - WRITING & 0x40 = 0x40: set watchdog timebase to 60 seconds - WRITING & 0x40 = 0x00: set watchdog timebase to 2 seconds - - WRITING & 0x20 = 0x20: disable watchdog - - WRITING & 0x10 = 0x10: enable watchdog / restart watchdog time - -* watchdog_state - READING & 0x02 = 0x02: watchdog system reset occurred - READING & 0x02 = 0x00: no watchdog system reset occurred - - WRITING & 0x02 = 0x02: clear watchdog event - -* watchdog_preset - READING & 0xff = 0x??: configured watch dog time in units (see wdog:3 0x40) - - WRITING & 0xff = 0x??: configure watch dog time in units - -* in* (0: +5V, 1: +12V, 2: onboard 3V battery) - READING: actual voltage value - -* temp*_status (1: CPU sensor, 2: onboard sensor, 3: auxiliary sensor) - READING & 0x02 = 0x02: thermal event (overtemperature) - READING & 0x02 = 0x00: no thermal event - - READING & 0x01 = 0x01: sensor is working - READING & 0x01 = 0x00: sensor is faulty - - WRITING & 0x02 = 0x02: clear thermal event - -* temp*_input (1: CPU sensor, 2: onboard sensor, 3: auxiliary sensor) - READING: actual temperature value - -* fan*_status (1: power supply fan, 2: CPU fan, 3: auxiliary fan) - READING & 0x04 = 0x04: fan event (fan fault) - READING & 0x04 = 0x00: no fan event - - WRITING & 0x04 = 0x04: clear fan event - -* fan*_div (1: power supply fan, 2: CPU fan, 3: auxiliary fan) - Divisors 2,4 and 8 are supported, both for reading and writing - -* fan*_pwm (1: power supply fan, 2: CPU fan, 3: auxiliary fan) - READING & 0xff = 0x00: fan may be switched off - READING & 0xff = 0x01: fan must run at least at minimum speed (supply: 6V) - READING & 0xff = 0xff: fan must run at maximum speed (supply: 12V) - READING & 0xff = 0x??: fan must run at least at given speed (supply: 6V..12V) - - WRITING & 0xff = 0x00: fan may be switched off - WRITING & 0xff = 0x01: fan must run at least at minimum speed (supply: 6V) - WRITING & 0xff = 0xff: fan must run at maximum speed (supply: 12V) - WRITING & 0xff = 0x??: fan must run at least at given speed (supply: 6V..12V) - -* fan*_input (1: power supply fan, 2: CPU fan, 3: auxiliary fan) - READING: actual RPM value - - -Limitations ------------ - -* Measuring fan speed -It seems that the chip counts "ripples" (typical fans produce 2 ripples per -rotation while VERAX fans produce 18) in a 9-bit register. This register is -read out every second, then the ripple prescaler (2, 4 or 8) is applied and -the result is stored in the 8 bit output register. Due to the limitation of -the counting register to 9 bits, it is impossible to measure a VERAX fan -properly (even with a prescaler of 8). At its maximum speed of 3500 RPM the -fan produces 1080 ripples per second which causes the counting register to -overflow twice, leading to only 186 RPM. - -* Measuring input voltages -in2 ("battery") reports the voltage of the onboard lithium battery and not -+3.3V from the power supply. - -* Undocumented features -Fujitsu-Siemens Computers has not documented all features of the chip so -far. Their software, System Guard, shows that there are a still some -features which cannot be controlled by this implementation. -- cgit v1.2.1 From 708a62bcd5f699756bae81491e64648fbf19e2a4 Mon Sep 17 00:00:00 2001 From: Rudolf Marek Date: Wed, 23 Sep 2009 22:59:42 +0200 Subject: hwmon: (coretemp) Fix Atom CPUs support Fix Atom CPUs support. Intel documents TjMax at 90 degrees C but some Atoms may have 125 degrees C (this is undocumented speculation). Signed-off-by: Rudolf Marek Cc: Huaxu Wan Cc: Kent Liu Signed-off-by: Jean Delvare --- Documentation/hwmon/coretemp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/coretemp b/Documentation/hwmon/coretemp index dbbe6c7025b0..d3d79e658718 100644 --- a/Documentation/hwmon/coretemp +++ b/Documentation/hwmon/coretemp @@ -4,7 +4,7 @@ Kernel driver coretemp Supported chips: * All Intel Core family Prefix: 'coretemp' - CPUID: family 0x6, models 0xe, 0xf, 0x16, 0x17 + CPUID: family 0x6, models 0xe, 0xf, 0x16, 0x17, 0x1c (Atom) Datasheet: Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide http://softwarecommunity.intel.com/Wiki/Mobility/720.htm -- cgit v1.2.1 From eccfed42215bebda0acc3158c1a4ff8325dea275 Mon Sep 17 00:00:00 2001 From: Rudolf Marek Date: Wed, 23 Sep 2009 22:59:42 +0200 Subject: hwmon: (coretemp) Add support for Penryn mobile CPUs Following patch adds support for mobile Penryn CPUs. Intel documents this poorly. I asked the Coretemp author for some help. This is totally untested and may not work. Please test! Signed-off-by: Rudolf Marek Cc: Huaxu Wan Cc: Kent Liu Signed-off-by: Jean Delvare --- Documentation/hwmon/coretemp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/coretemp b/Documentation/hwmon/coretemp index d3d79e658718..65d1e667c36e 100644 --- a/Documentation/hwmon/coretemp +++ b/Documentation/hwmon/coretemp @@ -4,7 +4,9 @@ Kernel driver coretemp Supported chips: * All Intel Core family Prefix: 'coretemp' - CPUID: family 0x6, models 0xe, 0xf, 0x16, 0x17, 0x1c (Atom) + CPUID: family 0x6, models 0xe (Pentium M DC), 0xf (Core 2 DC 65nm), + 0x16 (Core 2 SC 65nm), 0x17 (Penryn 45nm), + 0x1a (Nehalem), 0x1c (Atom). Datasheet: Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide http://softwarecommunity.intel.com/Wiki/Mobility/720.htm -- cgit v1.2.1 From fa08acd7d16cd7ea8114f3844b0ef2505a4276a8 Mon Sep 17 00:00:00 2001 From: Huaxu Wan Date: Wed, 23 Sep 2009 22:59:43 +0200 Subject: hwmon: (coretemp) Add Lynnfield CPU Add Lynnfield processor support. Lynnfield is a quad-core Nehalem based microprocessor for Desktop market, which is introduced in September 2009. Signed-off-by: Huaxu Wan Signed-off-by: Kent Liu Acked-by: Rudolf Marek Signed-off-by: Jean Delvare --- Documentation/hwmon/coretemp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/coretemp b/Documentation/hwmon/coretemp index 65d1e667c36e..92267b62db59 100644 --- a/Documentation/hwmon/coretemp +++ b/Documentation/hwmon/coretemp @@ -6,7 +6,7 @@ Supported chips: Prefix: 'coretemp' CPUID: family 0x6, models 0xe (Pentium M DC), 0xf (Core 2 DC 65nm), 0x16 (Core 2 SC 65nm), 0x17 (Penryn 45nm), - 0x1a (Nehalem), 0x1c (Atom). + 0x1a (Nehalem), 0x1c (Atom), 0x1e (Lynnfield) Datasheet: Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide http://softwarecommunity.intel.com/Wiki/Mobility/720.htm -- cgit v1.2.1 From 25d9e2d15286281ec834b829a4aaf8969011f1cd Mon Sep 17 00:00:00 2001 From: "npiggin@suse.de" Date: Fri, 21 Aug 2009 02:35:05 +1000 Subject: truncate: new helpers Introduce new truncate helpers truncate_pagecache and inode_newsize_ok. vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and into mm/truncate.c. Reviewed-by: Christoph Hellwig Signed-off-by: Nick Piggin Signed-off-by: Al Viro --- Documentation/vm/locking | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/locking b/Documentation/vm/locking index f366fa956179..25fadb448760 100644 --- a/Documentation/vm/locking +++ b/Documentation/vm/locking @@ -80,7 +80,7 @@ Note: PTL can also be used to guarantee that no new clones using the mm start up ... this is a loose form of stability on mm_users. For example, it is used in copy_mm to protect against a racing tlb_gather_mmu single address space optimization, so that the zap_page_range (from -vmtruncate) does not lose sending ipi's to cloned threads that might +truncate) does not lose sending ipi's to cloned threads that might be spawned underneath it and go to user mode to drag in pte's into tlbs. swap_lock -- cgit v1.2.1 From 0288b95b432b88f9daf895b526f64beeaca9ac73 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Wed, 23 Sep 2009 15:56:11 -0700 Subject: doc/filesystems: remove smount program mount(8) handles shared subtrees just fine, so remove the smount program from Documentation/filesystems/sharedsubtree.txt. Fix annoying "Lets" -> "Let's". Insert space between '#' prompt and "mount" command. Signed-off-by: Randy Dunlap Acked-by: Miklos Szeredi Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/sharedsubtree.txt | 209 +++++----------------------- 1 file changed, 34 insertions(+), 175 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/sharedsubtree.txt b/Documentation/filesystems/sharedsubtree.txt index 736540045dc7..b2c1ee5d98fc 100644 --- a/Documentation/filesystems/sharedsubtree.txt +++ b/Documentation/filesystems/sharedsubtree.txt @@ -41,14 +41,14 @@ replicas continue to be exactly same. Here is an example: - Lets say /mnt has a mount that is shared. + Let's say /mnt has a mount that is shared. mount --make-shared /mnt - note: mount command does not yet support the --make-shared flag. - I have included a small C program which does the same by executing - 'smount /mnt shared' + Note: mount(8) command now supports the --make-shared flag, + so the sample 'smount' program is no longer needed and has been + removed. - #mount --bind /mnt /tmp + # mount --bind /mnt /tmp The above command replicates the mount at /mnt to the mountpoint /tmp and the contents of both the mounts remain identical. @@ -58,8 +58,8 @@ replicas continue to be exactly same. #ls /tmp a b c - Now lets say we mount a device at /tmp/a - #mount /dev/sd0 /tmp/a + Now let's say we mount a device at /tmp/a + # mount /dev/sd0 /tmp/a #ls /tmp/a t1 t2 t2 @@ -80,21 +80,20 @@ replicas continue to be exactly same. Here is an example: - Lets say /mnt has a mount which is shared. - #mount --make-shared /mnt + Let's say /mnt has a mount which is shared. + # mount --make-shared /mnt - Lets bind mount /mnt to /tmp - #mount --bind /mnt /tmp + Let's bind mount /mnt to /tmp + # mount --bind /mnt /tmp the new mount at /tmp becomes a shared mount and it is a replica of the mount at /mnt. - Now lets make the mount at /tmp; a slave of /mnt - #mount --make-slave /tmp - [or smount /tmp slave] + Now let's make the mount at /tmp; a slave of /mnt + # mount --make-slave /tmp - lets mount /dev/sd0 on /mnt/a - #mount /dev/sd0 /mnt/a + let's mount /dev/sd0 on /mnt/a + # mount /dev/sd0 /mnt/a #ls /mnt/a t1 t2 t3 @@ -104,9 +103,9 @@ replicas continue to be exactly same. Note the mount event has propagated to the mount at /tmp - However lets see what happens if we mount something on the mount at /tmp + However let's see what happens if we mount something on the mount at /tmp - #mount /dev/sd1 /tmp/b + # mount /dev/sd1 /tmp/b #ls /tmp/b s1 s2 s3 @@ -124,12 +123,11 @@ replicas continue to be exactly same. 2d) A unbindable mount is a unbindable private mount - lets say we have a mount at /mnt and we make is unbindable + let's say we have a mount at /mnt and we make is unbindable - #mount --make-unbindable /mnt - [ smount /mnt unbindable ] + # mount --make-unbindable /mnt - Lets try to bind mount this mount somewhere else. + Let's try to bind mount this mount somewhere else. # mount --bind /mnt /tmp mount: wrong fs type, bad option, bad superblock on /mnt, or too many mounted file systems @@ -139,147 +137,8 @@ replicas continue to be exactly same. 3) smount command - Currently the mount command is not aware of shared subtree features. - Work is in progress to add the support in mount ( util-linux package ). - Till then use the following program. - - ------------------------------------------------------------------------ - // - //this code was developed my Miklos Szeredi - //and modified by Ram Pai - // sample usage: - // smount /tmp shared - // - #include - #include - #include - #include - #include - #include - - #ifndef MS_REC - #define MS_REC 0x4000 /* 16384: Recursive loopback */ - #endif - - #ifndef MS_SHARED - #define MS_SHARED 1<<20 /* Shared */ - #endif - - #ifndef MS_PRIVATE - #define MS_PRIVATE 1<<18 /* Private */ - #endif - - #ifndef MS_SLAVE - #define MS_SLAVE 1<<19 /* Slave */ - #endif - - #ifndef MS_UNBINDABLE - #define MS_UNBINDABLE 1<<17 /* Unbindable */ - #endif - - int main(int argc, char *argv[]) - { - int type; - if(argc != 3) { - fprintf(stderr, "usage: %s dir " - "\n" , argv[0]); - return 1; - } - - fprintf(stdout, "%s %s %s\n", argv[0], argv[1], argv[2]); - - if (strcmp(argv[2],"rshared")==0) - type=(MS_SHARED|MS_REC); - else if (strcmp(argv[2],"rslave")==0) - type=(MS_SLAVE|MS_REC); - else if (strcmp(argv[2],"rprivate")==0) - type=(MS_PRIVATE|MS_REC); - else if (strcmp(argv[2],"runbindable")==0) - type=(MS_UNBINDABLE|MS_REC); - else if (strcmp(argv[2],"shared")==0) - type=MS_SHARED; - else if (strcmp(argv[2],"slave")==0) - type=MS_SLAVE; - else if (strcmp(argv[2],"private")==0) - type=MS_PRIVATE; - else if (strcmp(argv[2],"unbindable")==0) - type=MS_UNBINDABLE; - else { - fprintf(stderr, "invalid operation: %s\n", argv[2]); - return 1; - } - setfsuid(getuid()); - - if(mount("", argv[1], "dontcare", type, "") == -1) { - perror("mount"); - return 1; - } - return 0; - } - ----------------------------------------------------------------------- - - Copy the above code snippet into smount.c - gcc -o smount smount.c - - - (i) To mark all the mounts under /mnt as shared execute the following - command: - - smount /mnt rshared - the corresponding syntax planned for mount command is - mount --make-rshared /mnt - - just to mark a mount /mnt as shared, execute the following - command: - smount /mnt shared - the corresponding syntax planned for mount command is - mount --make-shared /mnt - - (ii) To mark all the shared mounts under /mnt as slave execute the - following - - command: - smount /mnt rslave - the corresponding syntax planned for mount command is - mount --make-rslave /mnt - - just to mark a mount /mnt as slave, execute the following - command: - smount /mnt slave - the corresponding syntax planned for mount command is - mount --make-slave /mnt - - (iii) To mark all the mounts under /mnt as private execute the - following command: - - smount /mnt rprivate - the corresponding syntax planned for mount command is - mount --make-rprivate /mnt - - just to mark a mount /mnt as private, execute the following - command: - smount /mnt private - the corresponding syntax planned for mount command is - mount --make-private /mnt - - NOTE: by default all the mounts are created as private. But if - you want to change some shared/slave/unbindable mount as - private at a later point in time, this command can help. - - (iv) To mark all the mounts under /mnt as unbindable execute the - following - - command: - smount /mnt runbindable - the corresponding syntax planned for mount command is - mount --make-runbindable /mnt - - just to mark a mount /mnt as unbindable, execute the following - command: - smount /mnt unbindable - the corresponding syntax planned for mount command is - mount --make-unbindable /mnt + Modern mount(8) command is aware of shared subtree features, + so use it instead of the 'smount' command. [source code removed] 4) Use cases @@ -558,7 +417,7 @@ replicas continue to be exactly same. then the subtree under the unbindable mount is pruned in the new location. - eg: lets say we have the following mount tree. + eg: let's say we have the following mount tree. A / \ @@ -566,7 +425,7 @@ replicas continue to be exactly same. / \ / \ D E F G - Lets say all the mount except the mount C in the tree are + Let's say all the mount except the mount C in the tree are of a type other than unbindable. If this tree is rbound to say Z @@ -683,13 +542,13 @@ replicas continue to be exactly same. 'b' on mounts that receive propagation from mount 'B' and does not have sub-mounts within them are unmounted. - Example: Lets say 'B1', 'B2', 'B3' are shared mounts that propagate to + Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to each other. - lets say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount + let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount 'B1', 'B2' and 'B3' respectively. - lets say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on + let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on mount 'B1', 'B2' and 'B3' respectively. if 'C1' is unmounted, all the mounts that are most-recently-mounted on @@ -710,7 +569,7 @@ replicas continue to be exactly same. A cloned namespace contains all the mounts as that of the parent namespace. - Lets say 'A' and 'B' are the corresponding mounts in the parent and the + Let's say 'A' and 'B' are the corresponding mounts in the parent and the child namespace. If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to @@ -759,11 +618,11 @@ replicas continue to be exactly same. mount --make-slave /mnt At this point we have the first mount at /tmp and - its root dentry is 1. Lets call this mount 'A' + its root dentry is 1. Let's call this mount 'A' And then we have a second mount at /tmp1 with root - dentry 2. Lets call this mount 'B' + dentry 2. Let's call this mount 'B' Next we have a third mount at /mnt with root dentry - mnt. Lets call this mount 'C' + mnt. Let's call this mount 'C' 'B' is the slave of 'A' and 'C' is a slave of 'B' A -> B -> C @@ -794,7 +653,7 @@ replicas continue to be exactly same. Q3 Why is unbindable mount needed? - Lets say we want to replicate the mount tree at multiple + Let's say we want to replicate the mount tree at multiple locations within the same subtree. if one rbind mounts a tree within the same subtree 'n' times @@ -803,7 +662,7 @@ replicas continue to be exactly same. mounts. Here is a example. step 1: - lets say the root tree has just two directories with + let's say the root tree has just two directories with one vfsmount. root / \ @@ -875,7 +734,7 @@ replicas continue to be exactly same. Unclonable mounts come in handy here. step 1: - lets say the root tree has just two directories with + let's say the root tree has just two directories with one vfsmount. root / \ -- cgit v1.2.1 From 16c01b20ae0572d5a1fe8059f1b4c09f79b73cbf Mon Sep 17 00:00:00 2001 From: Peng Tao Date: Wed, 23 Sep 2009 15:56:13 -0700 Subject: doc/filesystems: more mount cleanups Documentation/filesystems/sharedsubtree.txt needs updating because the mount command in util-linux package is well aware of shared subtree features now. The patch also fixes two typos in sharedsubtree.txt. Signed-off-by: Peng Tao Signed-off-by: Randy Dunlap Cc: Miklos Szeredi Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/sharedsubtree.txt | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/sharedsubtree.txt b/Documentation/filesystems/sharedsubtree.txt index b2c1ee5d98fc..23a181074f94 100644 --- a/Documentation/filesystems/sharedsubtree.txt +++ b/Documentation/filesystems/sharedsubtree.txt @@ -4,7 +4,7 @@ Shared Subtrees Contents: 1) Overview 2) Features - 3) smount command + 3) Setting mount states 4) Use-case 5) Detailed semantics 6) Quiz @@ -135,10 +135,15 @@ replicas continue to be exactly same. Binding a unbindable mount is a invalid operation. -3) smount command +3) Setting mount states - Modern mount(8) command is aware of shared subtree features, - so use it instead of the 'smount' command. [source code removed] + The mount command (util-linux package) can be used to set mount + states: + + mount --make-shared mountpoint + mount --make-slave mountpoint + mount --make-private mountpoint + mount --make-unbindable mountpoint 4) Use cases @@ -209,7 +214,7 @@ replicas continue to be exactly same. mount --rbind / /view/v3 mount --rbind / /view/v4 - and if /usr has a versioning filesystem mounted, than that + and if /usr has a versioning filesystem mounted, then that mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and /view/v4/usr too @@ -249,7 +254,7 @@ replicas continue to be exactly same. For example: mount --make-shared /mnt - mount --bin /mnt /tmp + mount --bind /mnt /tmp The mount at /mnt and that at /tmp are both shared and belong to the same peer group. Anything mounted or unmounted under -- cgit v1.2.1 From bcadbbd4c896c80c263c35ce94b763e5ff58cecd Mon Sep 17 00:00:00 2001 From: Xiaotian Feng Date: Wed, 23 Sep 2009 15:56:13 -0700 Subject: Documentation: update stale definition of file-nr in fs.txt In "documentation: update Documentation/filesystem/proc.txt and Documentation/sysctls" (commit 760df93ec) we merged /proc/sys/fs documentation in Documentation/sysctl/fs.txt and Documentation/filesystem/proc.txt, but stale file-nr definition remained. This patch adds back the right fs-nr definition for 2.6 kernel. Signed-off-by: Xiaotian Feng Cc: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/fs.txt | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 1458448436cc..62682500878a 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -96,13 +96,16 @@ handles that the Linux kernel will allocate. When you get lots of error messages about running out of file handles, you might want to increase this limit. -The three values in file-nr denote the number of allocated -file handles, the number of unused file handles and the maximum -number of file handles. When the allocated file handles come -close to the maximum, but the number of unused file handles is -significantly greater than 0, you've encountered a peak in your -usage of file handles and you don't need to increase the maximum. - +Historically, the three values in file-nr denoted the number of +allocated file handles, the number of allocated but unused file +handles, and the maximum number of file handles. Linux 2.6 always +reports 0 as the number of free file handles -- this is not an +error, it just means that the number of allocated file handles +exactly matches the number of used file handles. + +Attempts to allocate more file descriptors than file-max are +reported with printk, look for "VFS: file-max limit +reached". ============================================================== nr_open: -- cgit v1.2.1 From 2552a99b6e3c3f3c9ee1038e6c1f4669a856c59b Mon Sep 17 00:00:00 2001 From: Jaswinder Singh Rajput Date: Wed, 23 Sep 2009 15:56:14 -0700 Subject: includecheck fix: Documentation, cfag12864b-example.c fix the following 'make includecheck' warning: Documentation/auxdisplay/cfag12864b-example.c: string.h is included more than once. Signed-off-by: Jaswinder Singh Rajput Acked-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/auxdisplay/cfag12864b-example.c | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/auxdisplay/cfag12864b-example.c b/Documentation/auxdisplay/cfag12864b-example.c index 1d2c010bae12..e7823ffb1ca0 100644 --- a/Documentation/auxdisplay/cfag12864b-example.c +++ b/Documentation/auxdisplay/cfag12864b-example.c @@ -194,7 +194,6 @@ static void cfag12864b_blit(void) */ #include -#include #define EXAMPLES 6 -- cgit v1.2.1 From ba36c440ba9486b155c9254ce5e50f5f20eb1fcb Mon Sep 17 00:00:00 2001 From: Josh Triplett Date: Wed, 23 Sep 2009 15:56:15 -0700 Subject: Documentation/vm/.gitignore: add page-types Signed-off-by: Josh Triplett Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/.gitignore | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/vm/.gitignore b/Documentation/vm/.gitignore index 33e8a023df02..09b164a5700f 100644 --- a/Documentation/vm/.gitignore +++ b/Documentation/vm/.gitignore @@ -1 +1,2 @@ +page-types slabinfo -- cgit v1.2.1 From 0b4b2ad5307c76c7105d6e7c724b1c14b8daf482 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 23 Sep 2009 15:56:16 -0700 Subject: page-types: add feature for walking process address space Introduce "-p|--pid " for walking the process address space. The default action is to walk raw memory PFNs. Both the virtual address and physical address of each present pages will be listed: # ./tools/vm/page-types -lp $$ | head -3 voffset offset len flags 400 11bebe 1 __RU_lA____M______________________ 402 11bebc 1 __RU_lA____M______________________ Note that voffset/offset/len are now showed as hex numbers. [akpm@linux-foundation.org: coding-style fixes] Cc: Andi Kleen Signed-off-by: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 200 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 180 insertions(+), 20 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 3eda8ea00852..fa1a30d9e9d5 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -5,6 +5,7 @@ * Copyright (C) 2009 Wu Fengguang */ +#define _LARGEFILE64_SOURCE #include #include #include @@ -13,11 +14,32 @@ #include #include #include +#include #include #include #include +/* + * pagemap kernel ABI bits + */ + +#define PM_ENTRY_BYTES sizeof(uint64_t) +#define PM_STATUS_BITS 3 +#define PM_STATUS_OFFSET (64 - PM_STATUS_BITS) +#define PM_STATUS_MASK (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET) +#define PM_STATUS(nr) (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK) +#define PM_PSHIFT_BITS 6 +#define PM_PSHIFT_OFFSET (PM_STATUS_OFFSET - PM_PSHIFT_BITS) +#define PM_PSHIFT_MASK (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET) +#define PM_PSHIFT(x) (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK) +#define PM_PFRAME_MASK ((1LL << PM_PSHIFT_OFFSET) - 1) +#define PM_PFRAME(x) ((x) & PM_PFRAME_MASK) + +#define PM_PRESENT PM_STATUS(4LL) +#define PM_SWAP PM_STATUS(2LL) + + /* * kernel page flags */ @@ -126,6 +148,14 @@ static int nr_addr_ranges; static unsigned long opt_offset[MAX_ADDR_RANGES]; static unsigned long opt_size[MAX_ADDR_RANGES]; +#define MAX_VMAS 10240 +static int nr_vmas; +static unsigned long pg_start[MAX_VMAS]; +static unsigned long pg_end[MAX_VMAS]; +static unsigned long voffset; + +static int pagemap_fd; + #define MAX_BIT_FILTERS 64 static int nr_bit_filters; static uint64_t opt_mask[MAX_BIT_FILTERS]; @@ -135,7 +165,6 @@ static int page_size; #define PAGES_BATCH (64 << 10) /* 64k pages */ static int kpageflags_fd; -static uint64_t kpageflags_buf[KPF_BYTES * PAGES_BATCH]; #define HASH_SHIFT 13 #define HASH_SIZE (1 << HASH_SHIFT) @@ -158,6 +187,11 @@ static uint64_t page_flags[HASH_SIZE]; type __min2 = (y); \ __min1 < __min2 ? __min1 : __min2; }) +#define max_t(type, x, y) ({ \ + type __max1 = (x); \ + type __max2 = (y); \ + __max1 > __max2 ? __max1 : __max2; }) + static unsigned long pages2mb(unsigned long pages) { return (pages * page_size) >> 20; @@ -224,26 +258,34 @@ static char *page_flag_longname(uint64_t flags) static void show_page_range(unsigned long offset, uint64_t flags) { static uint64_t flags0; + static unsigned long voff; static unsigned long index; static unsigned long count; - if (flags == flags0 && offset == index + count) { + if (flags == flags0 && offset == index + count && + (!opt_pid || voffset == voff + count)) { count++; return; } - if (count) - printf("%lu\t%lu\t%s\n", + if (count) { + if (opt_pid) + printf("%lx\t", voff); + printf("%lx\t%lx\t%s\n", index, count, page_flag_name(flags0)); + } flags0 = flags; index = offset; + voff = voffset; count = 1; } static void show_page(unsigned long offset, uint64_t flags) { - printf("%lu\t%s\n", offset, page_flag_name(flags)); + if (opt_pid) + printf("%lx\t", voffset); + printf("%lx\t%s\n", offset, page_flag_name(flags)); } static void show_summary(void) @@ -383,6 +425,8 @@ static void walk_pfn(unsigned long index, unsigned long count) lseek(kpageflags_fd, index * KPF_BYTES, SEEK_SET); while (count) { + uint64_t kpageflags_buf[KPF_BYTES * PAGES_BATCH]; + batch = min_t(unsigned long, count, PAGES_BATCH); n = read(kpageflags_fd, kpageflags_buf, batch * KPF_BYTES); if (n == 0) @@ -404,6 +448,81 @@ static void walk_pfn(unsigned long index, unsigned long count) } } + +#define PAGEMAP_BATCH 4096 +static unsigned long task_pfn(unsigned long pgoff) +{ + static uint64_t buf[PAGEMAP_BATCH]; + static unsigned long start; + static long count; + uint64_t pfn; + + if (pgoff < start || pgoff >= start + count) { + if (lseek64(pagemap_fd, + (uint64_t)pgoff * PM_ENTRY_BYTES, + SEEK_SET) < 0) { + perror("pagemap seek"); + exit(EXIT_FAILURE); + } + count = read(pagemap_fd, buf, sizeof(buf)); + if (count == 0) + return 0; + if (count < 0) { + perror("pagemap read"); + exit(EXIT_FAILURE); + } + if (count % PM_ENTRY_BYTES) { + fatal("pagemap read not aligned.\n"); + exit(EXIT_FAILURE); + } + count /= PM_ENTRY_BYTES; + start = pgoff; + } + + pfn = buf[pgoff - start]; + if (pfn & PM_PRESENT) + pfn = PM_PFRAME(pfn); + else + pfn = 0; + + return pfn; +} + +static void walk_task(unsigned long index, unsigned long count) +{ + int i = 0; + const unsigned long end = index + count; + + while (index < end) { + + while (pg_end[i] <= index) + if (++i >= nr_vmas) + return; + if (pg_start[i] >= end) + return; + + voffset = max_t(unsigned long, pg_start[i], index); + index = min_t(unsigned long, pg_end[i], end); + + assert(voffset < index); + for (; voffset < index; voffset++) { + unsigned long pfn = task_pfn(voffset); + if (pfn) + walk_pfn(pfn, 1); + } + } +} + +static void add_addr_range(unsigned long offset, unsigned long size) +{ + if (nr_addr_ranges >= MAX_ADDR_RANGES) + fatal("too many addr ranges\n"); + + opt_offset[nr_addr_ranges] = offset; + opt_size[nr_addr_ranges] = min_t(unsigned long, size, ULONG_MAX-offset); + nr_addr_ranges++; +} + static void walk_addr_ranges(void) { int i; @@ -415,10 +534,13 @@ static void walk_addr_ranges(void) } if (!nr_addr_ranges) - walk_pfn(0, ULONG_MAX); + add_addr_range(0, ULONG_MAX); for (i = 0; i < nr_addr_ranges; i++) - walk_pfn(opt_offset[i], opt_size[i]); + if (!opt_pid) + walk_pfn(opt_offset[i], opt_size[i]); + else + walk_task(opt_offset[i], opt_size[i]); close(kpageflags_fd); } @@ -446,8 +568,8 @@ static void usage(void) " -r|--raw Raw mode, for kernel developers\n" " -a|--addr addr-spec Walk a range of pages\n" " -b|--bits bits-spec Walk pages with specified bits\n" -#if 0 /* planned features */ " -p|--pid pid Walk process address space\n" +#if 0 /* planned features */ " -f|--file filename Walk file address space\n" #endif " -l|--list Show page details in ranges\n" @@ -459,7 +581,7 @@ static void usage(void) " N+M pages range from N to N+M-1\n" " N,M pages range from N to M-1\n" " N, pages range from N to end\n" -" ,M pages range from 0 to M\n" +" ,M pages range from 0 to M-1\n" "bits-spec:\n" " bit1,bit2 (flags & (bit1|bit2)) != 0\n" " bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1\n" @@ -496,21 +618,57 @@ static unsigned long long parse_number(const char *str) static void parse_pid(const char *str) { + FILE *file; + char buf[5000]; + opt_pid = parse_number(str); -} -static void parse_file(const char *name) -{ + sprintf(buf, "/proc/%d/pagemap", opt_pid); + pagemap_fd = open(buf, O_RDONLY); + if (pagemap_fd < 0) { + perror(buf); + exit(EXIT_FAILURE); + } + + sprintf(buf, "/proc/%d/maps", opt_pid); + file = fopen(buf, "r"); + if (!file) { + perror(buf); + exit(EXIT_FAILURE); + } + + while (fgets(buf, sizeof(buf), file) != NULL) { + unsigned long vm_start; + unsigned long vm_end; + unsigned long long pgoff; + int major, minor; + char r, w, x, s; + unsigned long ino; + int n; + + n = sscanf(buf, "%lx-%lx %c%c%c%c %llx %x:%x %lu", + &vm_start, + &vm_end, + &r, &w, &x, &s, + &pgoff, + &major, &minor, + &ino); + if (n < 10) { + fprintf(stderr, "unexpected line: %s\n", buf); + continue; + } + pg_start[nr_vmas] = vm_start / page_size; + pg_end[nr_vmas] = vm_end / page_size; + if (++nr_vmas >= MAX_VMAS) { + fprintf(stderr, "too many VMAs\n"); + break; + } + } + fclose(file); } -static void add_addr_range(unsigned long offset, unsigned long size) +static void parse_file(const char *name) { - if (nr_addr_ranges >= MAX_ADDR_RANGES) - fatal("too much addr ranges\n"); - - opt_offset[nr_addr_ranges] = offset; - opt_size[nr_addr_ranges] = size; - nr_addr_ranges++; } static void parse_addr_range(const char *optarg) @@ -676,8 +834,10 @@ int main(int argc, char *argv[]) } } + if (opt_list && opt_pid) + printf("voffset\t"); if (opt_list == 1) - printf("offset\tcount\tflags\n"); + printf("offset\tlen\tflags\n"); if (opt_list == 2) printf("offset\tflags\n"); -- cgit v1.2.1 From c6d57f3312a6619d47c5557b5f6154a74d04ff80 Mon Sep 17 00:00:00 2001 From: Paul Menage Date: Wed, 23 Sep 2009 15:56:19 -0700 Subject: cgroups: support named cgroups hierarchies To simplify referring to cgroup hierarchies in mount statements, and to allow disambiguation in the presence of empty hierarchies and multiply-bindable subsystems this patch adds support for naming a new cgroup hierarchy via the "name=" mount option A pre-existing hierarchy may be specified by either name or by subsystems; a hierarchy's name cannot be changed by a remount operation. Example usage: # To create a hierarchy called "foo" containing the "cpu" subsystem mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1 # To mount the "foo" hierarchy on a second location mount -t cgroup -oname=foo cgroup /mnt/cgroup2 Signed-off-by: Paul Menage Reviewed-by: Li Zefan Cc: KAMEZAWA Hiroyuki Cc: Balbir Singh Cc: Dhaval Giani Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cgroups.txt | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 6eb1a97e88ce..4bccfc19196b 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -408,6 +408,26 @@ You can attach the current shell task by echoing 0: # echo 0 > tasks +2.3 Mounting hierarchies by name +-------------------------------- + +Passing the name= option when mounting a cgroups hierarchy +associates the given name with the hierarchy. This can be used when +mounting a pre-existing hierarchy, in order to refer to it by name +rather than by its set of active subsystems. Each hierarchy is either +nameless, or has a unique name. + +The name should match [\w.-]+ + +When passing a name= option for a new hierarchy, you need to +specify subsystems manually; the legacy behaviour of mounting all +subsystems when none are explicitly specified is not supported when +you give a subsystem a name. + +The name of the subsystem appears as part of the hierarchy description +in /proc/mounts and /proc//cgroups. + + 3. Kernel API ============= -- cgit v1.2.1 From be367d09927023d081f9199665c8500f69f14d22 Mon Sep 17 00:00:00 2001 From: Ben Blum Date: Wed, 23 Sep 2009 15:56:31 -0700 Subject: cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time Alter the ss->can_attach and ss->attach functions to be able to deal with a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a pre-patch to cgroup-procs-writable.patch.) Currently, new mode of the attach function can only tell the subsystem about the old cgroup of the threadgroup leader. No subsystem currently needs that information for each thread that's being moved, but if one were to be added (for example, one that counts tasks within a group) this bit would need to be reworked a bit to tell the subsystem the right information. [hidave.darkstar@gmail.com: fix build] Signed-off-by: Ben Blum Signed-off-by: Paul Menage Acked-by: Li Zefan Reviewed-by: Matt Helsley Cc: "Eric W. Biederman" Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Dave Young Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cgroups.txt | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 4bccfc19196b..455d4e6d346d 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -521,7 +521,7 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be called multiple times against a cgroup. int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *task) + struct task_struct *task, bool threadgroup) (cgroup_mutex held by caller) Called prior to moving a task into a cgroup; if the subsystem @@ -529,14 +529,20 @@ returns an error, this will abort the attach operation. If a NULL task is passed, then a successful result indicates that *any* unspecified task can be moved into the cgroup. Note that this isn't called on a fork. If this method returns 0 (success) then this should -remain valid while the caller holds cgroup_mutex. +remain valid while the caller holds cgroup_mutex. If threadgroup is +true, then a successful result indicates that all threads in the given +thread's threadgroup can be moved together. void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup *old_cgrp, struct task_struct *task) + struct cgroup *old_cgrp, struct task_struct *task, + bool threadgroup) (cgroup_mutex held by caller) Called after the task has been attached to the cgroup, to allow any post-attachment activity that requires memory allocations or blocking. +If threadgroup is true, the subsystem should take care of all threads +in the specified thread's threadgroup. Currently does not support any +subsystem that might need the old_cgrp for every thread in the group. void fork(struct cgroup_subsy *ss, struct task_struct *task) -- cgit v1.2.1 From 4b3bde4c983de36c59e6c1a24701f6fe816f9f55 Mon Sep 17 00:00:00 2001 From: Balbir Singh Date: Wed, 23 Sep 2009 15:56:32 -0700 Subject: memcg: remove the overhead associated with the root cgroup Change the memory cgroup to remove the overhead associated with accounting all pages in the root cgroup. As a side-effect, we can no longer set a memory hard limit in the root cgroup. A new flag to track whether the page has been accounted or not has been added as well. Flags are now set atomically for page_cgroup, pcg_default_flags is now obsolete and removed. [akpm@linux-foundation.org: fix a few documentation glitches] Signed-off-by: Balbir Singh Signed-off-by: Daisuke Nishimura Reviewed-by: KAMEZAWA Hiroyuki Cc: Daisuke Nishimura Cc: Li Zefan Cc: Paul Menage Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/memory.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 23d1262c0775..ab0a02172cf4 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -179,6 +179,9 @@ The reclaim algorithm has not been modified for cgroups, except that pages that are selected for reclaiming come from the per cgroup LRU list. +NOTE: Reclaim does not work for the root cgroup, since we cannot set any +limits on the root cgroup. + 2. Locking The memory controller uses the following hierarchy @@ -210,6 +213,7 @@ We can alter the memory limit: NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, mega or gigabytes. NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). +NOTE: We cannot set limits on the root cgroup any more. # cat /cgroups/0/memory.limit_in_bytes 4194304 -- cgit v1.2.1 From a6df63615b943dbef22df04c19f4506330fe835e Mon Sep 17 00:00:00 2001 From: Balbir Singh Date: Wed, 23 Sep 2009 15:56:34 -0700 Subject: memory controller: soft limit documentation Soft limits is a new feature for the memory resource controller, something similar has existed in the group scheduler in the form of shares. The CPU controllers interpretation of shares is very different though. Soft limits are the most useful feature to have for environments where the administrator wants to overcommit the system, such that only on memory contention do the limits become active. The current soft limits implementation provides a soft_limit_in_bytes interface for the memory controller and not for memory+swap controller. The implementation maintains an RB-Tree of groups that exceed their soft limit and starts reclaiming from the group that exceeds this limit by the maximum amount. This patch: Add documentation for soft limits Signed-off-by: Balbir Singh Cc: KAMEZAWA Hiroyuki Cc: Li Zefan Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/memory.txt | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index ab0a02172cf4..b871f2552b45 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -379,7 +379,42 @@ cgroups created below it. NOTE2: This feature can be enabled/disabled per subtree. -7. TODO +7. Soft limits + +Soft limits allow for greater sharing of memory. The idea behind soft limits +is to allow control groups to use as much of the memory as needed, provided + +a. There is no memory contention +b. They do not exceed their hard limit + +When the system detects memory contention or low memory control groups +are pushed back to their soft limits. If the soft limit of each control +group is very high, they are pushed back as much as possible to make +sure that one control group does not starve the others of memory. + +Please note that soft limits is a best effort feature, it comes with +no guarantees, but it does its best to make sure that when memory is +heavily contended for, memory is allocated based on the soft limit +hints/setup. Currently soft limit based reclaim is setup such that +it gets invoked from balance_pgdat (kswapd). + +7.1 Interface + +Soft limits can be setup by using the following commands (in this example we +assume a soft limit of 256 megabytes) + +# echo 256M > memory.soft_limit_in_bytes + +If we want to change this to 1G, we can at any time use + +# echo 1G > memory.soft_limit_in_bytes + +NOTE1: Soft limits take effect over a long period of time, since they involve + reclaiming memory for balancing between memory cgroups +NOTE2: It is recommended to set the soft limit always below the hard limit, + otherwise the hard limit will take precedence. + +8. TODO 1. Add support for accounting huge pages (as a separate controller) 2. Make per-cgroup scanner reclaim not-shared pages first -- cgit v1.2.1 From a293980c2e261bd5b0d2a77340dd04f684caff58 Mon Sep 17 00:00:00 2001 From: Neil Horman Date: Wed, 23 Sep 2009 15:56:56 -0700 Subject: exec: let do_coredump() limit the number of concurrent dumps to pipes Introduce core pipe limiting sysctl. Since we can dump cores to pipe, rather than directly to the filesystem, we create a condition in which a user can create a very high load on the system simply by running bad applications. If the pipe reader specified in core_pattern is poorly written, we can have lots of ourstandig resources and processes in the system. This sysctl introduces an ability to limit that resource consumption. core_pipe_limit defines how many in-flight dumps may be run in parallel, dumps beyond this value are skipped and a note is made in the kernel log. A special value of 0 in core_pipe_limit denotes unlimited core dumps may be handled (this is the default value). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Neil Horman Reported-by: Earl Chew Cc: Oleg Nesterov Cc: Andi Kleen Cc: Alan Cox Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/kernel.txt | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'Documentation') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index b3d8b4922740..a028b92001ed 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -22,6 +22,7 @@ show up in /proc/sys/kernel: - callhome [ S390 only ] - auto_msgmni - core_pattern +- core_pipe_limit - core_uses_pid - ctrl-alt-del - dentry-state @@ -135,6 +136,27 @@ core_pattern is used to specify a core dumpfile pattern name. ============================================================== +core_pipe_limit: + +This sysctl is only applicable when core_pattern is configured to pipe core +files to user space helper a (when the first character of core_pattern is a '|', +see above). When collecting cores via a pipe to an application, it is +occasionally usefull for the collecting application to gather data about the +crashing process from its /proc/pid directory. In order to do this safely, the +kernel must wait for the collecting process to exit, so as not to remove the +crashing processes proc files prematurely. This in turn creates the possibility +that a misbehaving userspace collecting process can block the reaping of a +crashed process simply by never exiting. This sysctl defends against that. It +defines how many concurrent crashing processes may be piped to user space +applications in parallel. If this value is exceeded, then those crashing +processes above that value are noted via the kernel log and their cores are +skipped. 0 is a special value, indicating that unlimited processes may be +captured in parallel, but that no waiting will take place (i.e. the collecting +process is not guaranteed access to /proc//). This value defaults +to 0. + +============================================================== + core_uses_pid: The default coredump filename is "core". By setting -- cgit v1.2.1 From fbd8ae106850b6a0215c2776e70a75a1b93cafc2 Mon Sep 17 00:00:00 2001 From: Dimitri Sivanich Date: Wed, 23 Sep 2009 15:57:15 -0700 Subject: drivers/char/uv_mmtimer.c: add memory mapped RTC driver for UV This driver memory maps the UV Hub RTC. Signed-off-by: Dimitri Sivanich Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/ioctl/ioctl-number.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index aafca0a8f66a..947374977ca5 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -135,6 +135,7 @@ Code Seq# Include File Comments 'l' 40-7F linux/udf_fs_i.h in development: +'m' 00-09 linux/mmtimer.h 'm' all linux/mtio.h conflict! 'm' all linux/soundcard.h conflict! 'm' all linux/synclink.h conflict! -- cgit v1.2.1 From 8365388827663bd6fb773e3623ed9023c0f82b1d Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 29 Sep 2009 15:59:34 -0400 Subject: ext4: Update documentation about quota mount options Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" --- Documentation/filesystems/ext4.txt | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 18b5ec8cea45..bf4f4b7e11b3 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -282,9 +282,16 @@ stripe=n Number of filesystem blocks that mballoc will try to use for allocation size and alignment. For RAID5/6 systems this should be the number of data disks * RAID chunk size in file system blocks. -delalloc (*) Deferring block allocation until write-out time. -nodelalloc Disable delayed allocation. Blocks are allocation - when data is copied from user to page cache. + +delalloc (*) Defer block allocation until just before ext4 + writes out the block(s) in question. This + allows ext4 to better allocation decisions + more efficiently. +nodelalloc Disable delayed allocation. Blocks are allocated + when the data is copied from userspace to the + page cache, either via the write(2) system call + or when an mmap'ed page which was previously + unallocated is written for the first time. max_batch_time=usec Maximum amount of time ext4 should wait for additional filesystem operations to be batch -- cgit v1.2.1 From a72cb4bc8590d222ac27205444d7f0dcf47ab1d5 Mon Sep 17 00:00:00 2001 From: Miguel de Barros Date: Sun, 27 Sep 2009 22:11:21 +0200 Subject: ALSA: hda - Analog Devices AD1984A add HP Touchsmart model Reference: ALSA bug #0004614 https://bugtrack.alsa-project.org/alsa-bug/view.php?id=4614 port-A (0x11) - front hp-out port-D (0x12) - rear line out port-E (0x1c) - front mic-in port-F (0x16) - Internal speakers digital-mic (0x17) - Internal mic init verbs, mixers, jack sensing and PCI_QUIRK to support this hardware Signed-off-by: Miguel de Barros Signed-off-by: Takashi Iwai --- Documentation/sound/alsa/HD-Audio-Models.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index 97eebd63bedc..a2643cfe7938 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -209,6 +209,7 @@ AD1884A / AD1883 / AD1984A / AD1984B laptop laptop with HP jack sensing mobile mobile devices with HP jack sensing thinkpad Lenovo Thinkpad X300 + touchsmart HP Touchsmart AD1884 ====== -- cgit v1.2.1 From 296c355cd6443d89fa251885a8d78778fe111dc4 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Wed, 30 Sep 2009 00:32:42 -0400 Subject: ext4: Use tracepoints for mb_history trace file The /proc/fs/ext4//mb_history was maintained manually, and had a number of problems: it required a largish amount of memory to be allocated for each ext4 filesystem, and the s_mb_history_lock introduced a CPU contention problem. By ripping out the mb_history code and replacing it with ftrace tracepoints, and we get more functionality: timestamps, event filtering, the ability to correlate mballoc history with other ext4 tracepoints, etc. Signed-off-by: "Theodore Ts'o" --- Documentation/filesystems/proc.txt | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index b5aee7838a00..2c48f945546b 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1113,7 +1113,6 @@ Table 1-12: Files in /proc/fs/ext4/ .............................................................................. File Content mb_groups details of multiblock allocator buddy cache of free blocks - mb_history multiblock allocation history .............................................................................. -- cgit v1.2.1 From ea3acb199a5d7e4da1de0a4288eba993b29f33b9 Mon Sep 17 00:00:00 2001 From: Jason Wessel Date: Thu, 24 Sep 2009 09:08:30 -0500 Subject: x86: earlyprintk: Fix regression to handle serial,ttySn as 1 arg Commit c953094 ("early_printk: Allow more than one early console") introduced a regression in the parsing of the earlyprintk= kernel arguments. If you specify "earlyprintk=serial,ttyS0,115200" as a kernel argument, the "serial,ttyS" should be parsed as a single argument and not as "serial" and then "ttyS". Also update the documentation to reflect you can specify the ttyS directly without the "serial" argument. Signed-off-by: Jason Wessel Cc: Len Brown Cc: Greg KH Cc: Linus Torvalds Cc: Andrew Morton Cc: Johannes Weiner LKML-Reference: <4ABB7D5E.6000301@windriver.com> Signed-off-by: Ingo Molnar --- Documentation/kernel-parameters.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 6fa7292947e5..9107b387e91f 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -671,6 +671,7 @@ and is between 256 and 4096 characters. It is defined in the file earlyprintk= [X86,SH,BLACKFIN] earlyprintk=vga earlyprintk=serial[,ttySn[,baudrate]] + earlyprintk=ttySn[,baudrate] earlyprintk=dbgp[debugController#] Append ",keep" to not disable it when the real console -- cgit v1.2.1 From 610ea6c671685a09afff7ba521bdccda21c84c76 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Thu, 1 Oct 2009 14:31:22 +0100 Subject: ARM: 5738/1: Correct TCM documentation It turns out that the TCM memory can be remap:ed by the MMU just like any other memory. Signed-off-by: Linus Walleij Signed-off-by: Russell King --- Documentation/arm/tcm.txt | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/arm/tcm.txt b/Documentation/arm/tcm.txt index 074f4be6667f..77fd9376e6d7 100644 --- a/Documentation/arm/tcm.txt +++ b/Documentation/arm/tcm.txt @@ -29,11 +29,13 @@ TCM location and size. Notice that this is not a MMU table: you actually move the physical location of the TCM around. At the place you put it, it will mask any underlying RAM from the CPU so it is usually wise not to overlap any physical RAM with -the TCM. The TCM memory exists totally outside the MMU and will -override any MMU mappings. +the TCM. -Code executing inside the ITCM does not "see" any MMU mappings -and e.g. register accesses must be made to physical addresses. +The TCM memory can then be remapped to another address again using +the MMU, but notice that the TCM if often used in situations where +the MMU is turned off. To avoid confusion the current Linux +implementation will map the TCM 1 to 1 from physical to virtual +memory in the location specified by the machine. TCM is used for a few things: -- cgit v1.2.1 From d6f4965d7d2e718eb9b223cb06db5f6a53b73507 Mon Sep 17 00:00:00 2001 From: Andrew Patterson Date: Thu, 17 Sep 2009 13:47:03 -0500 Subject: cciss: Allow triggering of rescan of logical drive topology via sysfs entry Added /sys/bus/pci/devices//ccissX/rescan sysfs entry used to kick off a rescan that discovers logical drive topology changes. Signed-off-by: Andrew Patterson Signed-off-by: Stephen M. Cameron Acked-by: Mike Miller Signed-off-by: Jens Axboe --- Documentation/ABI/testing/sysfs-bus-pci-devices-cciss | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss index 0a92a7c93a62..ac3429def235 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss @@ -31,3 +31,10 @@ Date: March 2009 Kernel Version: 2.6.30 Contact: iss_storagedev@hp.com Description: A symbolic link to /sys/block/cciss!cXdY + +Where: /sys/bus/pci/devices//ccissX/rescan +Date: August 2009 +Kernel Version: 2.6.31 +Contact: iss_storagedev@hp.com +Description: Kicks of a rescan of the controller to discover logical + drive topology changes. -- cgit v1.2.1 From ce84a8aeac4a4a2cc421b3145dd2fb7cae860e4d Mon Sep 17 00:00:00 2001 From: "Stephen M. Cameron" Date: Thu, 17 Sep 2009 13:48:10 -0500 Subject: cciss: Add lunid attribute to each logical drive in /sys Add lunid attribute to each logical drive at /sys/devices//ccissX/cXdY/lunid for controller X, logical drive Y Signed-off-by: Stephen M. Cameron Signed-off-by: Jens Axboe --- Documentation/ABI/testing/sysfs-bus-pci-devices-cciss | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss index ac3429def235..5a6c8d36afc4 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss @@ -38,3 +38,10 @@ Kernel Version: 2.6.31 Contact: iss_storagedev@hp.com Description: Kicks of a rescan of the controller to discover logical drive topology changes. + +Where: /sys/bus/pci/devices//ccissX/cXdY/lunid +Date: August 2009 +Kernel Version: 2.6.31 +Contact: iss_storagedev@hp.com +Description: Displays the 8-byte LUN ID used to address logical + drive Y of controller X. -- cgit v1.2.1 From 3ff1111dc6e27524eeef267ab0ca9b5690594748 Mon Sep 17 00:00:00 2001 From: "Stephen M. Cameron" Date: Thu, 17 Sep 2009 13:48:21 -0500 Subject: cciss: Add a "raid_level" attribute to each logical drive in /sys and change get rid of some magic numbers in raid lavel decoding. Add raid_level attribute to each logical drive at /sys/devices//ccissX/cXdY/raid_level for controller X, logical drive Y Signed-off-by: Stephen M. Cameron Signed-off-by: Jens Axboe --- Documentation/ABI/testing/sysfs-bus-pci-devices-cciss | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss index 5a6c8d36afc4..8d0260256168 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss @@ -45,3 +45,10 @@ Kernel Version: 2.6.31 Contact: iss_storagedev@hp.com Description: Displays the 8-byte LUN ID used to address logical drive Y of controller X. + +Where: /sys/bus/pci/devices//ccissX/cXdY/raid_level +Date: August 2009 +Kernel Version: 2.6.31 +Contact: iss_storagedev@hp.com +Description: Displays the RAID level of logical drive Y of + controller X. -- cgit v1.2.1 From e272afecaf18912e971374df4605496975942e5c Mon Sep 17 00:00:00 2001 From: "Stephen M. Cameron" Date: Thu, 17 Sep 2009 13:48:26 -0500 Subject: cciss: Add usage_count attribute to each logical drive in /sys Add usage_count attribute to each logical drive at /sys/devices//ccissX/cXdY/usage_count for controller X, logical drive Y. The usage count is the number of times the device has currently been opened. Signed-off-by: Stephen M. Cameron Signed-off-by: Jens Axboe --- Documentation/ABI/testing/sysfs-bus-pci-devices-cciss | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss index 8d0260256168..4f29e5f1ebfa 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss @@ -52,3 +52,10 @@ Kernel Version: 2.6.31 Contact: iss_storagedev@hp.com Description: Displays the RAID level of logical drive Y of controller X. + +Where: /sys/bus/pci/devices//ccissX/cXdY/usage_count +Date: August 2009 +Kernel Version: 2.6.31 +Contact: iss_storagedev@hp.com +Description: Displays the usage count (number of opens) of logical drive Y + of controller X. -- cgit v1.2.1 From 4932be778952d4f3c278cbdef0d717358849aa8c Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 1 Oct 2009 15:44:06 -0700 Subject: docs: update patch size in SubmittingPatches This patch size comment is like so last millenium. Update it to modern times. Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/SubmittingPatches | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index b7f9d3b4bbf6..72651f788f4e 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -232,7 +232,7 @@ your e-mail client so that it sends your patches untouched. When sending patches to Linus, always follow step #7. Large changes are not appropriate for mailing lists, and some -maintainers. If your patch, uncompressed, exceeds 40 kB in size, +maintainers. If your patch, uncompressed, exceeds 300 kB in size, it is preferred that you store your patch on an Internet-accessible server, and provide instead a URL (link) pointing to your patch. -- cgit v1.2.1 From 3bfc13c239fd56ebc1ac98a914c6c6b8b0045478 Mon Sep 17 00:00:00 2001 From: HighPoint Linux Team Date: Fri, 11 Sep 2009 17:21:27 +0800 Subject: [SCSI] hptiop: Add RR44xx adapter support Most code changes were made to support RR44xx adapters. - add more PCI device ID. - using PCI BAR[2] to access RR44xx IOP. - using PCI BAR[0] to check and clear RR44xx IRQ. Signed-off-by: HighPoint Linux Team Signed-off-by: James Bottomley --- Documentation/scsi/hptiop.txt | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/scsi/hptiop.txt b/Documentation/scsi/hptiop.txt index a6eb4add1be6..9605179711f4 100644 --- a/Documentation/scsi/hptiop.txt +++ b/Documentation/scsi/hptiop.txt @@ -3,6 +3,25 @@ HIGHPOINT ROCKETRAID 3xxx/4xxx ADAPTER DRIVER (hptiop) Controller Register Map ------------------------- +For RR44xx Intel IOP based adapters, the controller IOP is accessed via PCI BAR0 and BAR2: + + BAR0 offset Register + 0x11C5C Link Interface IRQ Set + 0x11C60 Link Interface IRQ Clear + + BAR2 offset Register + 0x10 Inbound Message Register 0 + 0x14 Inbound Message Register 1 + 0x18 Outbound Message Register 0 + 0x1C Outbound Message Register 1 + 0x20 Inbound Doorbell Register + 0x24 Inbound Interrupt Status Register + 0x28 Inbound Interrupt Mask Register + 0x30 Outbound Interrupt Status Register + 0x34 Outbound Interrupt Mask Register + 0x40 Inbound Queue Port + 0x44 Outbound Queue Port + For Intel IOP based adapters, the controller IOP is accessed via PCI BAR0: BAR0 offset Register @@ -93,7 +112,7 @@ The driver exposes following sysfs attributes: ----------------------------------------------------------------------------- -Copyright (C) 2006-2007 HighPoint Technologies, Inc. All Rights Reserved. +Copyright (C) 2006-2009 HighPoint Technologies, Inc. All Rights Reserved. This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of -- cgit v1.2.1 From b607bd900051efc3308c4edc65dd98b34b230021 Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Fri, 2 Oct 2009 09:55:19 -0700 Subject: net: Fix wrong sizeof Which is why I have always preferred sizeof(struct foo) over sizeof(var). Signed-off-by: Jean Delvare Acked-by: Randy Dunlap Signed-off-by: David S. Miller --- Documentation/networking/timestamping/timestamping.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c index 43d143104210..a7936fe8444a 100644 --- a/Documentation/networking/timestamping/timestamping.c +++ b/Documentation/networking/timestamping/timestamping.c @@ -381,7 +381,7 @@ int main(int argc, char **argv) memset(&hwtstamp, 0, sizeof(hwtstamp)); strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name)); hwtstamp.ifr_data = (void *)&hwconfig; - memset(&hwconfig, 0, sizeof(&hwconfig)); + memset(&hwconfig, 0, sizeof(hwconfig)); hwconfig.tx_type = (so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ? HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF; -- cgit v1.2.1 From 7069331dbe7155f23966f5944109f909fea0c7e4 Mon Sep 17 00:00:00 2001 From: Philipp Reisner Date: Fri, 2 Oct 2009 02:40:05 +0000 Subject: connector: Provide the sender's credentials to the callback Signed-off-by: Philipp Reisner Acked-by: Lars Ellenberg Acked-by: Evgeniy Polyakov Signed-off-by: David S. Miller --- Documentation/connector/cn_test.c | 2 +- Documentation/connector/connector.txt | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/connector/cn_test.c b/Documentation/connector/cn_test.c index 1711adc33373..b07add3467f1 100644 --- a/Documentation/connector/cn_test.c +++ b/Documentation/connector/cn_test.c @@ -34,7 +34,7 @@ static char cn_test_name[] = "cn_test"; static struct sock *nls; static struct timer_list cn_test_timer; -static void cn_test_callback(struct cn_msg *msg) +static void cn_test_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) { pr_info("%s: %lu: idx=%x, val=%x, seq=%u, ack=%u, len=%d: %s.\n", __func__, jiffies, msg->id.idx, msg->id.val, diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index 81e6bf6ead57..78c9466a9aa8 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -23,7 +23,7 @@ handling, etc... The Connector driver allows any kernelspace agents to use netlink based networking for inter-process communication in a significantly easier way: -int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); +int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *)); void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask); struct cb_id @@ -53,15 +53,15 @@ struct cn_msg Connector interfaces. /*****************************************/ -int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); +int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *)); Registers new callback with connector core. struct cb_id *id - unique connector's user identifier. It must be registered in connector.h for legal in-kernel users. char *name - connector's callback symbolic name. - void (*callback) (void *) - connector's callback. - Argument must be dereferenced to struct cn_msg *. + void (*callback) (struct cn..) - connector's callback. + cn_msg and the sender's credentials void cn_del_callback(struct cb_id *id); -- cgit v1.2.1 From 211a641770c2ae191c9589fee69a8fb67f67fffd Mon Sep 17 00:00:00 2001 From: "Justin P. Mattock" Date: Tue, 29 Sep 2009 15:13:58 -0700 Subject: ieee1394: update URLs in debugging-via-ohci1394.txt Update URLs of the userspace tools to use ohci1394_dma=early for debugging. Seems the address ftp://ftp.suse.de/private/bk/firewire/tools/* is not very helpful. After a quick search, seems this was talked about: http://www.mail-archive.com/kgdb-bugreport@lists.sourceforge.net/msg02761.html (can't find the original thread). Signed-off-by: Justin P. Mattock Signed-off-by: Stefan Richter --- Documentation/debugging-via-ohci1394.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/debugging-via-ohci1394.txt b/Documentation/debugging-via-ohci1394.txt index 59a91e5c6909..611f5a5499b1 100644 --- a/Documentation/debugging-via-ohci1394.txt +++ b/Documentation/debugging-via-ohci1394.txt @@ -64,14 +64,14 @@ be used to view the printk buffer of a remote machine, even with live update. Bernhard Kaindl enhanced firescope to support accessing 64-bit machines from 32-bit firescope and vice versa: -- ftp://ftp.suse.de/private/bk/firewire/tools/firescope-0.2.2.tar.bz2 +- http://halobates.de/firewire/firescope-0.2.2.tar.bz2 and he implemented fast system dump (alpha version - read README.txt): -- ftp://ftp.suse.de/private/bk/firewire/tools/firedump-0.1.tar.bz2 +- http://halobates.de/firewire/firedump-0.1.tar.bz2 There is also a gdb proxy for firewire which allows to use gdb to access data which can be referenced from symbols found by gdb in vmlinux: -- ftp://ftp.suse.de/private/bk/firewire/tools/fireproxy-0.33.tar.bz2 +- http://halobates.de/firewire/fireproxy-0.33.tar.bz2 The latest version of this gdb proxy (fireproxy-0.34) can communicate (not yet stable) with kgdb over an memory-based communication module (kgdbom). @@ -178,7 +178,7 @@ Step-by-step instructions for using firescope with early OHCI initialization: Notes ----- -Documentation and specifications: ftp://ftp.suse.de/private/bk/firewire/docs +Documentation and specifications: http://halobates.de/firewire/ FireWire is a trademark of Apple Inc. - for more information please refer to: http://en.wikipedia.org/wiki/FireWire -- cgit v1.2.1 From f546c65cd59275c7b95eba4f9b3ab83b38a5e9cb Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Sun, 4 Oct 2009 22:53:40 +0200 Subject: i2c: Move misc devices documentation Some times ago the eeprom and max6875 drivers moved to drivers/misc/eeprom, but their documentation did not follow. It's finally time to get rid of Documentation/i2c/chips. Signed-off-by: Jean Delvare Cc: Ben Gardner Acked-by: Wolfram Sang --- Documentation/i2c/chips/eeprom | 96 --------------------------------- Documentation/i2c/chips/max6875 | 108 ------------------------------------- Documentation/misc-devices/eeprom | 96 +++++++++++++++++++++++++++++++++ Documentation/misc-devices/max6875 | 108 +++++++++++++++++++++++++++++++++++++ 4 files changed, 204 insertions(+), 204 deletions(-) delete mode 100644 Documentation/i2c/chips/eeprom delete mode 100644 Documentation/i2c/chips/max6875 create mode 100644 Documentation/misc-devices/eeprom create mode 100644 Documentation/misc-devices/max6875 (limited to 'Documentation') diff --git a/Documentation/i2c/chips/eeprom b/Documentation/i2c/chips/eeprom deleted file mode 100644 index f7e8104b5764..000000000000 --- a/Documentation/i2c/chips/eeprom +++ /dev/null @@ -1,96 +0,0 @@ -Kernel driver eeprom -==================== - -Supported chips: - * Any EEPROM chip in the designated address range - Prefix: 'eeprom' - Addresses scanned: I2C 0x50 - 0x57 - Datasheets: Publicly available from: - Atmel (www.atmel.com), - Catalyst (www.catsemi.com), - Fairchild (www.fairchildsemi.com), - Microchip (www.microchip.com), - Philips (www.semiconductor.philips.com), - Rohm (www.rohm.com), - ST (www.st.com), - Xicor (www.xicor.com), - and others. - - Chip Size (bits) Address - 24C01 1K 0x50 (shadows at 0x51 - 0x57) - 24C01A 1K 0x50 - 0x57 (Typical device on DIMMs) - 24C02 2K 0x50 - 0x57 - 24C04 4K 0x50, 0x52, 0x54, 0x56 - (additional data at 0x51, 0x53, 0x55, 0x57) - 24C08 8K 0x50, 0x54 (additional data at 0x51, 0x52, - 0x53, 0x55, 0x56, 0x57) - 24C16 16K 0x50 (additional data at 0x51 - 0x57) - Sony 2K 0x57 - - Atmel 34C02B 2K 0x50 - 0x57, SW write protect at 0x30-37 - Catalyst 34FC02 2K 0x50 - 0x57, SW write protect at 0x30-37 - Catalyst 34RC02 2K 0x50 - 0x57, SW write protect at 0x30-37 - Fairchild 34W02 2K 0x50 - 0x57, SW write protect at 0x30-37 - Microchip 24AA52 2K 0x50 - 0x57, SW write protect at 0x30-37 - ST M34C02 2K 0x50 - 0x57, SW write protect at 0x30-37 - - -Authors: - Frodo Looijaard , - Philip Edelbrock , - Jean Delvare , - Greg Kroah-Hartman , - IBM Corp. - -Description ------------ - -This is a simple EEPROM module meant to enable reading the first 256 bytes -of an EEPROM (on a SDRAM DIMM for example). However, it will access serial -EEPROMs on any I2C adapter. The supported devices are generically called -24Cxx, and are listed above; however the numbering for these -industry-standard devices may vary by manufacturer. - -This module was a programming exercise to get used to the new project -organization laid out by Frodo, but it should be at least completely -effective for decoding the contents of EEPROMs on DIMMs. - -DIMMS will typically contain a 24C01A or 24C02, or the 34C02 variants. -The other devices will not be found on a DIMM because they respond to more -than one address. - -DDC Monitors may contain any device. Often a 24C01, which responds to all 8 -addresses, is found. - -Recent Sony Vaio laptops have an EEPROM at 0x57. We couldn't get the -specification, so it is guess work and far from being complete. - -The Microchip 24AA52/24LCS52, ST M34C02, and others support an additional -software write protect register at 0x30 - 0x37 (0x20 less than the memory -location). The chip responds to "write quick" detection at this address but -does not respond to byte reads. If this register is present, the lower 128 -bytes of the memory array are not write protected. Any byte data write to -this address will write protect the memory array permanently, and the -device will no longer respond at the 0x30-37 address. The eeprom driver -does not support this register. - -Lacking functionality: - -* Full support for larger devices (24C04, 24C08, 24C16). These are not -typically found on a PC. These devices will appear as separate devices at -multiple addresses. - -* Support for really large devices (24C32, 24C64, 24C128, 24C256, 24C512). -These devices require two-byte address fields and are not supported. - -* Enable Writing. Again, no technical reason why not, but making it easy -to change the contents of the EEPROMs (on DIMMs anyway) also makes it easy -to disable the DIMMs (potentially preventing the computer from booting) -until the values are restored somehow. - -Use: - -After inserting the module (and any other required SMBus/i2c modules), you -should have some EEPROM directories in /sys/bus/i2c/devices/* of names such -as "0-0050". Inside each of these is a series of files, the eeprom file -contains the binary data from EEPROM. diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875 deleted file mode 100644 index 10ca43cd1a72..000000000000 --- a/Documentation/i2c/chips/max6875 +++ /dev/null @@ -1,108 +0,0 @@ -Kernel driver max6875 -===================== - -Supported chips: - * Maxim MAX6874, MAX6875 - Prefix: 'max6875' - Addresses scanned: None (see below) - Datasheet: - http://pdfserv.maxim-ic.com/en/ds/MAX6874-MAX6875.pdf - -Author: Ben Gardner - - -Description ------------ - -The Maxim MAX6875 is an EEPROM-programmable power-supply sequencer/supervisor. -It provides timed outputs that can be used as a watchdog, if properly wired. -It also provides 512 bytes of user EEPROM. - -At reset, the MAX6875 reads the configuration EEPROM into its configuration -registers. The chip then begins to operate according to the values in the -registers. - -The Maxim MAX6874 is a similar, mostly compatible device, with more intputs -and outputs: - vin gpi vout -MAX6874 6 4 8 -MAX6875 4 3 5 - -See the datasheet for more information. - - -Sysfs entries -------------- - -eeprom - 512 bytes of user-defined EEPROM space. - - -General Remarks ---------------- - -Valid addresses for the MAX6875 are 0x50 and 0x52. -Valid addresses for the MAX6874 are 0x50, 0x52, 0x54 and 0x56. -The driver does not probe any address, so you must force the address. - -Example: -$ modprobe max6875 force=0,0x50 - -The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple -addresses. For example, for address 0x50, it also reserves 0x51. -The even-address instance is called 'max6875', the odd one is 'dummy'. - - -Programming the chip using i2c-dev ----------------------------------- - -Use the i2c-dev interface to access and program the chips. -Reads and writes are performed differently depending on the address range. - -The configuration registers are at addresses 0x00 - 0x45. -Use i2c_smbus_write_byte_data() to write a register and -i2c_smbus_read_byte_data() to read a register. -The command is the register number. - -Examples: -To write a 1 to register 0x45: - i2c_smbus_write_byte_data(fd, 0x45, 1); - -To read register 0x45: - value = i2c_smbus_read_byte_data(fd, 0x45); - - -The configuration EEPROM is at addresses 0x8000 - 0x8045. -The user EEPROM is at addresses 0x8100 - 0x82ff. - -Use i2c_smbus_write_word_data() to write a byte to EEPROM. - -The command is the upper byte of the address: 0x80, 0x81, or 0x82. -The data word is the lower part of the address or'd with data << 8. - cmd = address >> 8; - val = (address & 0xff) | (data << 8); - -Example: -To write 0x5a to address 0x8003: - i2c_smbus_write_word_data(fd, 0x80, 0x5a03); - - -Reading data from the EEPROM is a little more complicated. -Use i2c_smbus_write_byte_data() to set the read address and then -i2c_smbus_read_byte() or i2c_smbus_read_i2c_block_data() to read the data. - -Example: -To read data starting at offset 0x8100, first set the address: - i2c_smbus_write_byte_data(fd, 0x81, 0x00); - -And then read the data - value = i2c_smbus_read_byte(fd); - - or - - count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer); - -The block read should read 16 bytes. -0x84 is the block read command. - -See the datasheet for more details. - diff --git a/Documentation/misc-devices/eeprom b/Documentation/misc-devices/eeprom new file mode 100644 index 000000000000..f7e8104b5764 --- /dev/null +++ b/Documentation/misc-devices/eeprom @@ -0,0 +1,96 @@ +Kernel driver eeprom +==================== + +Supported chips: + * Any EEPROM chip in the designated address range + Prefix: 'eeprom' + Addresses scanned: I2C 0x50 - 0x57 + Datasheets: Publicly available from: + Atmel (www.atmel.com), + Catalyst (www.catsemi.com), + Fairchild (www.fairchildsemi.com), + Microchip (www.microchip.com), + Philips (www.semiconductor.philips.com), + Rohm (www.rohm.com), + ST (www.st.com), + Xicor (www.xicor.com), + and others. + + Chip Size (bits) Address + 24C01 1K 0x50 (shadows at 0x51 - 0x57) + 24C01A 1K 0x50 - 0x57 (Typical device on DIMMs) + 24C02 2K 0x50 - 0x57 + 24C04 4K 0x50, 0x52, 0x54, 0x56 + (additional data at 0x51, 0x53, 0x55, 0x57) + 24C08 8K 0x50, 0x54 (additional data at 0x51, 0x52, + 0x53, 0x55, 0x56, 0x57) + 24C16 16K 0x50 (additional data at 0x51 - 0x57) + Sony 2K 0x57 + + Atmel 34C02B 2K 0x50 - 0x57, SW write protect at 0x30-37 + Catalyst 34FC02 2K 0x50 - 0x57, SW write protect at 0x30-37 + Catalyst 34RC02 2K 0x50 - 0x57, SW write protect at 0x30-37 + Fairchild 34W02 2K 0x50 - 0x57, SW write protect at 0x30-37 + Microchip 24AA52 2K 0x50 - 0x57, SW write protect at 0x30-37 + ST M34C02 2K 0x50 - 0x57, SW write protect at 0x30-37 + + +Authors: + Frodo Looijaard , + Philip Edelbrock , + Jean Delvare , + Greg Kroah-Hartman , + IBM Corp. + +Description +----------- + +This is a simple EEPROM module meant to enable reading the first 256 bytes +of an EEPROM (on a SDRAM DIMM for example). However, it will access serial +EEPROMs on any I2C adapter. The supported devices are generically called +24Cxx, and are listed above; however the numbering for these +industry-standard devices may vary by manufacturer. + +This module was a programming exercise to get used to the new project +organization laid out by Frodo, but it should be at least completely +effective for decoding the contents of EEPROMs on DIMMs. + +DIMMS will typically contain a 24C01A or 24C02, or the 34C02 variants. +The other devices will not be found on a DIMM because they respond to more +than one address. + +DDC Monitors may contain any device. Often a 24C01, which responds to all 8 +addresses, is found. + +Recent Sony Vaio laptops have an EEPROM at 0x57. We couldn't get the +specification, so it is guess work and far from being complete. + +The Microchip 24AA52/24LCS52, ST M34C02, and others support an additional +software write protect register at 0x30 - 0x37 (0x20 less than the memory +location). The chip responds to "write quick" detection at this address but +does not respond to byte reads. If this register is present, the lower 128 +bytes of the memory array are not write protected. Any byte data write to +this address will write protect the memory array permanently, and the +device will no longer respond at the 0x30-37 address. The eeprom driver +does not support this register. + +Lacking functionality: + +* Full support for larger devices (24C04, 24C08, 24C16). These are not +typically found on a PC. These devices will appear as separate devices at +multiple addresses. + +* Support for really large devices (24C32, 24C64, 24C128, 24C256, 24C512). +These devices require two-byte address fields and are not supported. + +* Enable Writing. Again, no technical reason why not, but making it easy +to change the contents of the EEPROMs (on DIMMs anyway) also makes it easy +to disable the DIMMs (potentially preventing the computer from booting) +until the values are restored somehow. + +Use: + +After inserting the module (and any other required SMBus/i2c modules), you +should have some EEPROM directories in /sys/bus/i2c/devices/* of names such +as "0-0050". Inside each of these is a series of files, the eeprom file +contains the binary data from EEPROM. diff --git a/Documentation/misc-devices/max6875 b/Documentation/misc-devices/max6875 new file mode 100644 index 000000000000..10ca43cd1a72 --- /dev/null +++ b/Documentation/misc-devices/max6875 @@ -0,0 +1,108 @@ +Kernel driver max6875 +===================== + +Supported chips: + * Maxim MAX6874, MAX6875 + Prefix: 'max6875' + Addresses scanned: None (see below) + Datasheet: + http://pdfserv.maxim-ic.com/en/ds/MAX6874-MAX6875.pdf + +Author: Ben Gardner + + +Description +----------- + +The Maxim MAX6875 is an EEPROM-programmable power-supply sequencer/supervisor. +It provides timed outputs that can be used as a watchdog, if properly wired. +It also provides 512 bytes of user EEPROM. + +At reset, the MAX6875 reads the configuration EEPROM into its configuration +registers. The chip then begins to operate according to the values in the +registers. + +The Maxim MAX6874 is a similar, mostly compatible device, with more intputs +and outputs: + vin gpi vout +MAX6874 6 4 8 +MAX6875 4 3 5 + +See the datasheet for more information. + + +Sysfs entries +------------- + +eeprom - 512 bytes of user-defined EEPROM space. + + +General Remarks +--------------- + +Valid addresses for the MAX6875 are 0x50 and 0x52. +Valid addresses for the MAX6874 are 0x50, 0x52, 0x54 and 0x56. +The driver does not probe any address, so you must force the address. + +Example: +$ modprobe max6875 force=0,0x50 + +The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple +addresses. For example, for address 0x50, it also reserves 0x51. +The even-address instance is called 'max6875', the odd one is 'dummy'. + + +Programming the chip using i2c-dev +---------------------------------- + +Use the i2c-dev interface to access and program the chips. +Reads and writes are performed differently depending on the address range. + +The configuration registers are at addresses 0x00 - 0x45. +Use i2c_smbus_write_byte_data() to write a register and +i2c_smbus_read_byte_data() to read a register. +The command is the register number. + +Examples: +To write a 1 to register 0x45: + i2c_smbus_write_byte_data(fd, 0x45, 1); + +To read register 0x45: + value = i2c_smbus_read_byte_data(fd, 0x45); + + +The configuration EEPROM is at addresses 0x8000 - 0x8045. +The user EEPROM is at addresses 0x8100 - 0x82ff. + +Use i2c_smbus_write_word_data() to write a byte to EEPROM. + +The command is the upper byte of the address: 0x80, 0x81, or 0x82. +The data word is the lower part of the address or'd with data << 8. + cmd = address >> 8; + val = (address & 0xff) | (data << 8); + +Example: +To write 0x5a to address 0x8003: + i2c_smbus_write_word_data(fd, 0x80, 0x5a03); + + +Reading data from the EEPROM is a little more complicated. +Use i2c_smbus_write_byte_data() to set the read address and then +i2c_smbus_read_byte() or i2c_smbus_read_i2c_block_data() to read the data. + +Example: +To read data starting at offset 0x8100, first set the address: + i2c_smbus_write_byte_data(fd, 0x81, 0x00); + +And then read the data + value = i2c_smbus_read_byte(fd); + + or + + count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer); + +The block read should read 16 bytes. +0x84 is the block read command. + +See the datasheet for more details. + -- cgit v1.2.1 From b835d7fbd54c42d7b9abb5e8a64f32690ebfad43 Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Sun, 4 Oct 2009 22:53:41 +0200 Subject: max6875: Discard obsolete detect method There is no point in implementing a detect callback for the MAX6875, as this device can't be detected. It was there solely to handle "force" module parameters to instantiate devices, but now we have a better sysfs interface that can do the same. So we can get rid of the ugly module parameters and the detect callback. This basically divides the binary module size by 2. Signed-off-by: Jean Delvare Acked-by: Wolfram Sang Acked-by: Ben Gardner --- Documentation/misc-devices/max6875 | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/misc-devices/max6875 b/Documentation/misc-devices/max6875 index 10ca43cd1a72..1e89ee3ccc1b 100644 --- a/Documentation/misc-devices/max6875 +++ b/Documentation/misc-devices/max6875 @@ -42,10 +42,12 @@ General Remarks Valid addresses for the MAX6875 are 0x50 and 0x52. Valid addresses for the MAX6874 are 0x50, 0x52, 0x54 and 0x56. -The driver does not probe any address, so you must force the address. +The driver does not probe any address, so you explicitly instantiate the +devices. Example: -$ modprobe max6875 force=0,0x50 +$ modprobe max6875 +$ echo max6875 0x50 > /sys/bus/i2c/devices/i2c-0/new_device The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple addresses. For example, for address 0x50, it also reserves 0x51. -- cgit v1.2.1 From 0314b020c49c1d6cd182d2b89775bfa6686660db Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Sun, 4 Oct 2009 22:53:41 +0200 Subject: ds2482: Discard obsolete detect method There is no point in implementing a detect callback for the DS2482, as this device can't be detected. It was there solely to handle "force" module parameters to instantiate devices, but now we have a better sysfs interface that can do the same. So we can get rid of the ugly module parameters and the detect callback. This shrinks the binary module size by 21%. Signed-off-by: Jean Delvare Acked-by: Ben Gardner --- Documentation/w1/masters/ds2482 | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/w1/masters/ds2482 b/Documentation/w1/masters/ds2482 index 9210d6fa5024..299b91c7609f 100644 --- a/Documentation/w1/masters/ds2482 +++ b/Documentation/w1/masters/ds2482 @@ -24,8 +24,8 @@ General Remarks Valid addresses are 0x18, 0x19, 0x1a, and 0x1b. However, the device cannot be detected without writing to the i2c bus, so no -detection is done. -You should force the device address. +detection is done. You should instantiate the device explicitly. -$ modprobe ds2482 force=0,0x18 +$ modprobe ds2482 +$ echo ds2482 0x18 > /sys/bus/i2c/devices/i2c-0/new_device -- cgit v1.2.1 From 2d2a7cff1b63cde1e2d981eea8ae9e69ae9ce96d Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Sun, 4 Oct 2009 22:53:42 +0200 Subject: ltc4215/ltc4245: Discard obsolete detect methods There is no point in implementing a detect callback for the LTC4215 and LTC4245, as these devices can't be detected. It was there solely to handle "force" module parameters to instantiate devices, but now we have a better sysfs interface that can do the same. So we can get rid of the ugly module parameters and the detect callbacks. This shrinks the binary module sizes by 36% and 46%, respectively. Signed-off-by: Jean Delvare Cc: Ira W. Snyder --- Documentation/hwmon/ltc4215 | 7 ++++--- Documentation/hwmon/ltc4245 | 7 ++++--- 2 files changed, 8 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/ltc4215 b/Documentation/hwmon/ltc4215 index 2e6a21eb656c..c196a1846259 100644 --- a/Documentation/hwmon/ltc4215 +++ b/Documentation/hwmon/ltc4215 @@ -22,12 +22,13 @@ Usage Notes ----------- This driver does not probe for LTC4215 devices, due to the fact that some -of the possible addresses are unfriendly to probing. You will need to use -the "force" parameter to tell the driver where to find the device. +of the possible addresses are unfriendly to probing. You will have to +instantiate the devices explicitly. Example: the following will load the driver for an LTC4215 at address 0x44 on I2C bus #0: -$ modprobe ltc4215 force=0,0x44 +$ modprobe ltc4215 +$ echo ltc4215 0x44 > /sys/bus/i2c/devices/i2c-0/new_device Sysfs entries diff --git a/Documentation/hwmon/ltc4245 b/Documentation/hwmon/ltc4245 index bae7a3adc5d8..02838a47d862 100644 --- a/Documentation/hwmon/ltc4245 +++ b/Documentation/hwmon/ltc4245 @@ -23,12 +23,13 @@ Usage Notes ----------- This driver does not probe for LTC4245 devices, due to the fact that some -of the possible addresses are unfriendly to probing. You will need to use -the "force" parameter to tell the driver where to find the device. +of the possible addresses are unfriendly to probing. You will have to +instantiate the devices explicitly. Example: the following will load the driver for an LTC4245 at address 0x23 on I2C bus #1: -$ modprobe ltc4245 force=1,0x23 +$ modprobe ltc4245 +$ echo ltc4245 0x23 > /sys/bus/i2c/devices/i2c-1/new_device Sysfs entries -- cgit v1.2.1 From 03f1805ad0ce5aae02bfe40c29b230abb63179ac Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Sun, 4 Oct 2009 22:53:45 +0200 Subject: i2c: Minor documentation update The sysfs path to i2c adapters has changed recently, update the documentation to reflect that change. Signed-off-by: Jean Delvare --- Documentation/i2c/instantiating-devices | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices index c740b7b41088..e89490270aba 100644 --- a/Documentation/i2c/instantiating-devices +++ b/Documentation/i2c/instantiating-devices @@ -188,7 +188,7 @@ segment, the address is sufficient to uniquely identify the device to be deleted. Example: -# echo eeprom 0x50 > /sys/class/i2c-adapter/i2c-3/new_device +# echo eeprom 0x50 > /sys/bus/i2c/devices/i2c-3/new_device While this interface should only be used when in-kernel device declaration can't be done, there is a variety of cases where it can be helpful: -- cgit v1.2.1 From 896a7cf8d846a9e86fb823be16f4f14ffeb7f074 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 2 Oct 2009 20:24:59 +0000 Subject: pktgen: Fix multiqueue handling It is not currently possible to instruct pktgen to use one selected tx queue. When Robert added multiqueue support in commit 45b270f8, he added an interval (queue_map_min, queue_map_max), and his code doesnt take into account the case of min = max, to select one tx queue exactly. I suspect a high performance setup on a eight txqueue device wants to use exactly eight cpus, and assign one tx queue to each sender. This patchs makes pktgen select the right tx queue, not the first one. Also updates Documentation to reflect Robert changes. Signed-off-by: Eric Dumazet Signed-off-by: Robert Olsson Signed-off-by: David S. Miller --- Documentation/networking/pktgen.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt index c6cf4a3c16e0..61bb645d50e0 100644 --- a/Documentation/networking/pktgen.txt +++ b/Documentation/networking/pktgen.txt @@ -90,6 +90,11 @@ Examples: pgset "dstmac 00:00:00:00:00:00" sets MAC destination address pgset "srcmac 00:00:00:00:00:00" sets MAC source address + pgset "queue_map_min 0" Sets the min value of tx queue interval + pgset "queue_map_max 7" Sets the max value of tx queue interval, for multiqueue devices + To select queue 1 of a given device, + use queue_map_min=1 and queue_map_max=1 + pgset "src_mac_count 1" Sets the number of MACs we'll range through. The 'minimum' MAC is what you set with srcmac. @@ -101,6 +106,9 @@ Examples: IPDST_RND, UDPSRC_RND, UDPDST_RND, MACSRC_RND, MACDST_RND MPLS_RND, VID_RND, SVID_RND + QUEUE_MAP_RND # queue map random + QUEUE_MAP_CPU # queue map mirrors smp_processor_id() + pgset "udp_src_min 9" set UDP source port min, If < udp_src_max, then cycle through the port range. -- cgit v1.2.1 From f1af9f58546e2d98ef078fa30b2ef80a9042131e Mon Sep 17 00:00:00 2001 From: Tilman Schmidt Date: Tue, 6 Oct 2009 12:18:00 +0000 Subject: Documentation: expand isdn/INTERFACE.CAPI document - Note that send_message() may be called in interrupt context. - Describe the storage of CAPI messages and payload data in SKBs. - Add more details to the description of the _cmsg structure. - Describe kernelcapi debugging output. Impact: documentation Signed-off-by: Tilman Schmidt Signed-off-by: David S. Miller --- Documentation/isdn/INTERFACE.CAPI | 83 +++++++++++++++++++++++++++++++-------- 1 file changed, 67 insertions(+), 16 deletions(-) (limited to 'Documentation') diff --git a/Documentation/isdn/INTERFACE.CAPI b/Documentation/isdn/INTERFACE.CAPI index 686e107923ec..5fe8de5cc727 100644 --- a/Documentation/isdn/INTERFACE.CAPI +++ b/Documentation/isdn/INTERFACE.CAPI @@ -60,10 +60,9 @@ open() operation on regular files or character devices. After a successful return from register_appl(), CAPI messages from the application may be passed to the driver for the device via calls to the -send_message() callback function. The CAPI message to send is stored in the -data portion of an skb. Conversely, the driver may call Kernel CAPI's -capi_ctr_handle_message() function to pass a received CAPI message to Kernel -CAPI for forwarding to an application, specifying its ApplID. +send_message() callback function. Conversely, the driver may call Kernel +CAPI's capi_ctr_handle_message() function to pass a received CAPI message to +Kernel CAPI for forwarding to an application, specifying its ApplID. Deregistration requests (CAPI operation CAPI_RELEASE) from applications are forwarded as calls to the release_appl() callback function, passing the same @@ -142,6 +141,7 @@ u16 (*send_message)(struct capi_ctr *ctrlr, struct sk_buff *skb) to accepting or queueing the message. Errors occurring during the actual processing of the message should be signaled with an appropriate reply message. + May be called in process or interrupt context. Calls to this function are not serialized by Kernel CAPI, ie. it must be prepared to be re-entered. @@ -154,7 +154,8 @@ read_proc_t *ctr_read_proc system entry, /proc/capi/controllers/; will be called with a pointer to the device's capi_ctr structure as the last (data) argument -Note: Callback functions are never called in interrupt context. +Note: Callback functions except send_message() are never called in interrupt +context. - to be filled in before calling capi_ctr_ready(): @@ -171,14 +172,40 @@ u8 serial[CAPI_SERIAL_LEN] value to return for CAPI_GET_SERIAL -4.3 The _cmsg Structure +4.3 SKBs + +CAPI messages are passed between Kernel CAPI and the driver via send_message() +and capi_ctr_handle_message(), stored in the data portion of a socket buffer +(skb). Each skb contains a single CAPI message coded according to the CAPI 2.0 +standard. + +For the data transfer messages, DATA_B3_REQ and DATA_B3_IND, the actual +payload data immediately follows the CAPI message itself within the same skb. +The Data and Data64 parameters are not used for processing. The Data64 +parameter may be omitted by setting the length field of the CAPI message to 22 +instead of 30. + + +4.4 The _cmsg Structure (declared in ) The _cmsg structure stores the contents of a CAPI 2.0 message in an easily -accessible form. It contains members for all possible CAPI 2.0 parameters, of -which only those appearing in the message type currently being processed are -actually used. Unused members should be set to zero. +accessible form. It contains members for all possible CAPI 2.0 parameters, +including subparameters of the Additional Info and B Protocol structured +parameters, with the following exceptions: + +* second Calling party number (CONNECT_IND) + +* Data64 (DATA_B3_REQ and DATA_B3_IND) + +* Sending complete (subparameter of Additional Info, CONNECT_REQ and INFO_REQ) + +* Global Configuration (subparameter of B Protocol, CONNECT_REQ, CONNECT_RESP + and SELECT_B_PROTOCOL_REQ) + +Only those parameters appearing in the message type currently being processed +are actually used. Unused members should be set to zero. Members are named after the CAPI 2.0 standard names of the parameters they represent. See for the exact spelling. Member data @@ -190,18 +217,19 @@ u16 for CAPI parameters of type 'word' u32 for CAPI parameters of type 'dword' -_cstruct for CAPI parameters of type 'struct' not containing any - variably-sized (struct) subparameters (eg. 'Called Party Number') +_cstruct for CAPI parameters of type 'struct' The member is a pointer to a buffer containing the parameter in CAPI encoding (length + content). It may also be NULL, which will be taken to represent an empty (zero length) parameter. + Subparameters are stored in encoded form within the content part. -_cmstruct for CAPI parameters of type 'struct' containing 'struct' - subparameters ('Additional Info' and 'B Protocol') +_cmstruct alternative representation for CAPI parameters of type 'struct' + (used only for the 'Additional Info' and 'B Protocol' parameters) The representation is a single byte containing one of the values: - CAPI_DEFAULT: the parameter is empty - CAPI_COMPOSE: the values of the subparameters are stored - individually in the corresponding _cmsg structure members + CAPI_DEFAULT: The parameter is empty/absent. + CAPI_COMPOSE: The parameter is present. + Subparameter values are stored individually in the corresponding + _cmsg structure members. Functions capi_cmsg2message() and capi_message2cmsg() are provided to convert messages between their transport encoding described in the CAPI 2.0 standard @@ -297,3 +325,26 @@ char *capi_cmd2str(u8 Command, u8 Subcommand) be NULL if the command/subcommand is not one of those defined in the CAPI 2.0 standard. + +7. Debugging + +The module kernelcapi has a module parameter showcapimsgs controlling some +debugging output produced by the module. It can only be set when the module is +loaded, via a parameter "showcapimsgs=" to the modprobe command, either on +the command line or in the configuration file. + +If the lowest bit of showcapimsgs is set, kernelcapi logs controller and +application up and down events. + +In addition, every registered CAPI controller has an associated traceflag +parameter controlling how CAPI messages sent from and to tha controller are +logged. The traceflag parameter is initialized with the value of the +showcapimsgs parameter when the controller is registered, but can later be +changed via the MANUFACTURER_REQ command KCAPI_CMD_TRACE. + +If the value of traceflag is non-zero, CAPI messages are logged. +DATA_B3 messages are only logged if the value of traceflag is > 2. + +If the lowest bit of traceflag is set, only the command/subcommand and message +length are logged. Otherwise, kernelcapi logs a readable representation of +the entire message. -- cgit v1.2.1 From aa07a99412f56ad56faecbaa683f3bc0ae99abc2 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 7 Oct 2009 15:35:55 -0700 Subject: IB: Fix typo in udev rule documentation The proper syntax for udev rules is KERNEL==... instead of KERNEL=... Signed-off-by: Bart Van Assche Reported-by: Lukasz Jurewicz Signed-off-by: Roland Dreier --- Documentation/infiniband/user_mad.txt | 4 ++-- Documentation/infiniband/user_verbs.txt | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/infiniband/user_mad.txt b/Documentation/infiniband/user_mad.txt index 744687dd195b..8a366959f5cc 100644 --- a/Documentation/infiniband/user_mad.txt +++ b/Documentation/infiniband/user_mad.txt @@ -128,8 +128,8 @@ Setting IsSM Capability Bit To create the appropriate character device files automatically with udev, a rule like - KERNEL="umad*", NAME="infiniband/%k" - KERNEL="issm*", NAME="infiniband/%k" + KERNEL=="umad*", NAME="infiniband/%k" + KERNEL=="issm*", NAME="infiniband/%k" can be used. This will create device nodes named diff --git a/Documentation/infiniband/user_verbs.txt b/Documentation/infiniband/user_verbs.txt index f847501e50b5..afe3f8da9018 100644 --- a/Documentation/infiniband/user_verbs.txt +++ b/Documentation/infiniband/user_verbs.txt @@ -58,7 +58,7 @@ Memory pinning To create the appropriate character device files automatically with udev, a rule like - KERNEL="uverbs*", NAME="infiniband/%k" + KERNEL=="uverbs*", NAME="infiniband/%k" can be used. This will create device nodes named -- cgit v1.2.1 From c73602ad31cdcf7e6651f43d12f65b5b9b825b6f Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Wed, 7 Oct 2009 16:32:22 -0700 Subject: ksm: more on default values Adjust the max_kernel_pages default to a quarter of totalram_pages, instead of nr_free_buffer_pages() / 4: the KSM pages themselves come from highmem, and even on a 16GB PAE machine, 4GB of KSM pages would only be pinning 32MB of lowmem with their rmap_items, so no need for the more obscure calculation (nor for its own special init function). There is no way for the user to switch KSM on if CONFIG_SYSFS is not enabled, so in that case default run to KSM_RUN_MERGE. Update KSM Documentation and Kconfig to reflect the new defaults. Signed-off-by: Hugh Dickins Cc: Izik Eidus Cc: Andrea Arcangeli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/ksm.txt | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt index 72a22f65960e..262d8e6793a3 100644 --- a/Documentation/vm/ksm.txt +++ b/Documentation/vm/ksm.txt @@ -52,15 +52,15 @@ The KSM daemon is controlled by sysfs files in /sys/kernel/mm/ksm/, readable by all but writable only by root: max_kernel_pages - set to maximum number of kernel pages that KSM may use - e.g. "echo 2000 > /sys/kernel/mm/ksm/max_kernel_pages" + e.g. "echo 100000 > /sys/kernel/mm/ksm/max_kernel_pages" Value 0 imposes no limit on the kernel pages KSM may use; but note that any process using MADV_MERGEABLE can cause KSM to allocate these pages, unswappable until it exits. - Default: 2000 (chosen for demonstration purposes) + Default: quarter of memory (chosen to not pin too much) pages_to_scan - how many present pages to scan before ksmd goes to sleep - e.g. "echo 200 > /sys/kernel/mm/ksm/pages_to_scan" - Default: 200 (chosen for demonstration purposes) + e.g. "echo 100 > /sys/kernel/mm/ksm/pages_to_scan" + Default: 100 (chosen for demonstration purposes) sleep_millisecs - how many milliseconds ksmd should sleep before next scan e.g. "echo 20 > /sys/kernel/mm/ksm/sleep_millisecs" @@ -70,7 +70,8 @@ run - set 0 to stop ksmd from running but keep merged pages, set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run", set 2 to stop ksmd and unmerge all pages currently merged, but leave mergeable areas registered for next run - Default: 1 (for immediate use by apps which register) + Default: 0 (must be changed to 1 to activate KSM, + except if CONFIG_SYSFS is disabled) The effectiveness of KSM and MADV_MERGEABLE is shown in /sys/kernel/mm/ksm/: @@ -86,4 +87,4 @@ pages_volatile embraces several different kinds of activity, but a high proportion there would also indicate poor use of madvise MADV_MERGEABLE. Izik Eidus, -Hugh Dickins, 30 July 2009 +Hugh Dickins, 24 Sept 2009 -- cgit v1.2.1 From 7823da36ce8e42d66941887eb922768d259763f2 Mon Sep 17 00:00:00 2001 From: Paul Menage Date: Wed, 7 Oct 2009 16:32:26 -0700 Subject: cgroups: update documentation of cgroups tasks and procs files Update documentation of cgroups tasks and procs files Document the cgroup.procs file. Clarify the semantics of the cgroup.procs and tasks files. Although the current cgroup.procs interface returns a sorted and uniqified list of pids, potential future performance enhancements could result in those properties being removed - explicitly document this aspect of the API. There are no existing users of cgroup.procs, so compatibility isn't an issue. There are users of the "tasks" file, but none that would appear to break in the event of the sorted property being broken. The standard "libcpuset" explicitly sorts the results of reading from the tasks file, and "libcg" and other users don't appear to care about ordering. Signed-off-by: Paul Menage Reviewed-by: Li Zefan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cgroups.txt | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 455d4e6d346d..0b33bfe7dde9 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -227,7 +227,14 @@ as the path relative to the root of the cgroup file system. Each cgroup is represented by a directory in the cgroup file system containing the following files describing that cgroup: - - tasks: list of tasks (by pid) attached to that cgroup + - tasks: list of tasks (by pid) attached to that cgroup. This list + is not guaranteed to be sorted. Writing a thread id into this file + moves the thread into this cgroup. + - cgroup.procs: list of tgids in the cgroup. This list is not + guaranteed to be sorted or free of duplicate tgids, and userspace + should sort/uniquify the list if this property is required. + Writing a tgid into this file moves all threads with that tgid into + this cgroup. - notify_on_release flag: run the release agent on exit? - release_agent: the path to use for release notifications (this file exists in the top cgroup only) @@ -374,7 +381,7 @@ Now you want to do something with this cgroup. In this directory you can find several files: # ls -notify_on_release tasks +cgroup.procs notify_on_release tasks (plus whatever files added by the attached subsystems) Now attach your shell to this cgroup: -- cgit v1.2.1 From 253fb02d62571e5455eedc9e39b9d660e86a40f0 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:27 -0700 Subject: pagemap: export KPF_HWPOISON This flag indicates a hardware detected memory corruption on the page. Any future access of the page data may bring down the machine. Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 2 ++ Documentation/vm/pagemap.txt | 4 ++++ 2 files changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index fa1a30d9e9d5..87f57228f56e 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -69,6 +69,7 @@ #define KPF_COMPOUND_TAIL 16 #define KPF_HUGE 17 #define KPF_UNEVICTABLE 18 +#define KPF_HWPOISON 19 #define KPF_NOPAGE 20 /* [32-] kernel hacking assistances */ @@ -116,6 +117,7 @@ static char *page_flag_names[] = { [KPF_COMPOUND_TAIL] = "T:compound_tail", [KPF_HUGE] = "G:huge", [KPF_UNEVICTABLE] = "u:unevictable", + [KPF_HWPOISON] = "X:hwpoison", [KPF_NOPAGE] = "n:nopage", [KPF_RESERVED] = "r:reserved", diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt index 600a304a828c..2fdd84a19109 100644 --- a/Documentation/vm/pagemap.txt +++ b/Documentation/vm/pagemap.txt @@ -57,6 +57,7 @@ There are three components to pagemap: 16. COMPOUND_TAIL 16. HUGE 18. UNEVICTABLE + 19. HWPOISON 20. NOPAGE Short descriptions to the page flags: @@ -86,6 +87,9 @@ Short descriptions to the page flags: 17. HUGE this is an integral part of a HugeTLB page +19. HWPOISON + hardware detected memory corruption on this page: don't touch the data! + 20. NOPAGE no page frame exists at the requested address -- cgit v1.2.1 From a1bbb5ec39042fb762e7f4bcc634da0d87834193 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:28 -0700 Subject: pagemap: document KPF_KSM and show it in page-types It indicates to the system admin that processes mapping such pages may be eating less physical memory than the reported numbers by legacy tools. Signed-off-by: Wu Fengguang Cc: Hugh Dickins Cc: Izik Eidus Acked-by: Chris Wright Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 2 ++ Documentation/vm/pagemap.txt | 4 ++++ 2 files changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 87f57228f56e..9899fa239eed 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -71,6 +71,7 @@ #define KPF_UNEVICTABLE 18 #define KPF_HWPOISON 19 #define KPF_NOPAGE 20 +#define KPF_KSM 21 /* [32-] kernel hacking assistances */ #define KPF_RESERVED 32 @@ -119,6 +120,7 @@ static char *page_flag_names[] = { [KPF_UNEVICTABLE] = "u:unevictable", [KPF_HWPOISON] = "X:hwpoison", [KPF_NOPAGE] = "n:nopage", + [KPF_KSM] = "x:ksm", [KPF_RESERVED] = "r:reserved", [KPF_MLOCKED] = "m:mlocked", diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt index 2fdd84a19109..df09b9650a81 100644 --- a/Documentation/vm/pagemap.txt +++ b/Documentation/vm/pagemap.txt @@ -59,6 +59,7 @@ There are three components to pagemap: 18. UNEVICTABLE 19. HWPOISON 20. NOPAGE + 21. KSM Short descriptions to the page flags: @@ -93,6 +94,9 @@ Short descriptions to the page flags: 20. NOPAGE no page frame exists at the requested address +21. KSM + identical memory pages dynamically shared between one or more processes + [IO related page flags] 1. ERROR IO error occurred 3. UPTODATE page has up-to-date data -- cgit v1.2.1 From 0c57effe27eb6544eb44d5fac563b7334e3bc771 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:28 -0700 Subject: page-types: add GPL note Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 9899fa239eed..e46fb5b2107d 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -2,7 +2,10 @@ * page-types: Tool for querying page flags * * Copyright (C) 2009 Intel corporation - * Copyright (C) 2009 Wu Fengguang + * + * Authors: Wu Fengguang + * + * Released under the General Public License (GPL). */ #define _LARGEFILE64_SOURCE -- cgit v1.2.1 From 31bbf66eaaaf50ba79e50ab7d3c89531b31c0614 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:29 -0700 Subject: page-types: introduce checked_open() This helps merge duplicate code (now and future) and outstand the main logic. Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index e46fb5b2107d..6bdcc0632c24 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -214,6 +214,18 @@ static void fatal(const char *x, ...) exit(EXIT_FAILURE); } +int checked_open(const char *pathname, int flags) +{ + int fd = open(pathname, flags); + + if (fd < 0) { + perror(pathname); + exit(EXIT_FAILURE); + } + + return fd; +} + /* * page flag names @@ -534,11 +546,7 @@ static void walk_addr_ranges(void) { int i; - kpageflags_fd = open(PROC_KPAGEFLAGS, O_RDONLY); - if (kpageflags_fd < 0) { - perror(PROC_KPAGEFLAGS); - exit(EXIT_FAILURE); - } + kpageflags_fd = checked_open(PROC_KPAGEFLAGS, O_RDONLY); if (!nr_addr_ranges) add_addr_range(0, ULONG_MAX); @@ -631,11 +639,7 @@ static void parse_pid(const char *str) opt_pid = parse_number(str); sprintf(buf, "/proc/%d/pagemap", opt_pid); - pagemap_fd = open(buf, O_RDONLY); - if (pagemap_fd < 0) { - perror(buf); - exit(EXIT_FAILURE); - } + pagemap_fd = checked_open(buf, O_RDONLY); sprintf(buf, "/proc/%d/maps", opt_pid); file = fopen(buf, "r"); -- cgit v1.2.1 From 4a1b6726feee3132905996ae4cd1c7dc2104e721 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:30 -0700 Subject: page-types: make standalone pagemap/kpageflags read routines Refactor the code to be more modular and easier to reuse. Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 154 ++++++++++++++++++++++++------------------ 1 file changed, 89 insertions(+), 65 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 6bdcc0632c24..18e891a329e0 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -161,8 +161,6 @@ static unsigned long pg_start[MAX_VMAS]; static unsigned long pg_end[MAX_VMAS]; static unsigned long voffset; -static int pagemap_fd; - #define MAX_BIT_FILTERS 64 static int nr_bit_filters; static uint64_t opt_mask[MAX_BIT_FILTERS]; @@ -170,7 +168,7 @@ static uint64_t opt_bits[MAX_BIT_FILTERS]; static int page_size; -#define PAGES_BATCH (64 << 10) /* 64k pages */ +static int pagemap_fd; static int kpageflags_fd; #define HASH_SHIFT 13 @@ -226,6 +224,62 @@ int checked_open(const char *pathname, int flags) return fd; } +/* + * pagemap/kpageflags routines + */ + +static unsigned long do_u64_read(int fd, char *name, + uint64_t *buf, + unsigned long index, + unsigned long count) +{ + long bytes; + + if (index > ULONG_MAX / 8) + fatal("index overflow: %lu\n", index); + + if (lseek(fd, index * 8, SEEK_SET) < 0) { + perror(name); + exit(EXIT_FAILURE); + } + + bytes = read(fd, buf, count * 8); + if (bytes < 0) { + perror(name); + exit(EXIT_FAILURE); + } + if (bytes % 8) + fatal("partial read: %lu bytes\n", bytes); + + return bytes / 8; +} + +static unsigned long kpageflags_read(uint64_t *buf, + unsigned long index, + unsigned long pages) +{ + return do_u64_read(kpageflags_fd, PROC_KPAGEFLAGS, buf, index, pages); +} + +static unsigned long pagemap_read(uint64_t *buf, + unsigned long index, + unsigned long pages) +{ + return do_u64_read(pagemap_fd, "/proc/pid/pagemap", buf, index, pages); +} + +static unsigned long pagemap_pfn(uint64_t val) +{ + unsigned long pfn; + + if (val & PM_PRESENT) + pfn = PM_PFRAME(val); + else + pfn = 0; + + return pfn; +} + /* * page flag names @@ -432,79 +486,53 @@ static void add_page(unsigned long offset, uint64_t flags) total_pages++; } +#define KPAGEFLAGS_BATCH (64 << 10) /* 64k pages */ static void walk_pfn(unsigned long index, unsigned long count) { + uint64_t buf[KPAGEFLAGS_BATCH]; unsigned long batch; - unsigned long n; + unsigned long pages; unsigned long i; - if (index > ULONG_MAX / KPF_BYTES) - fatal("index overflow: %lu\n", index); - - lseek(kpageflags_fd, index * KPF_BYTES, SEEK_SET); - while (count) { - uint64_t kpageflags_buf[KPF_BYTES * PAGES_BATCH]; - - batch = min_t(unsigned long, count, PAGES_BATCH); - n = read(kpageflags_fd, kpageflags_buf, batch * KPF_BYTES); - if (n == 0) + batch = min_t(unsigned long, count, KPAGEFLAGS_BATCH); + pages = kpageflags_read(buf, index, batch); + if (pages == 0) break; - if (n < 0) { - perror(PROC_KPAGEFLAGS); - exit(EXIT_FAILURE); - } - - if (n % KPF_BYTES != 0) - fatal("partial read: %lu bytes\n", n); - n = n / KPF_BYTES; - for (i = 0; i < n; i++) - add_page(index + i, kpageflags_buf[i]); + for (i = 0; i < pages; i++) + add_page(index + i, buf[i]); - index += batch; - count -= batch; + index += pages; + count -= pages; } } - -#define PAGEMAP_BATCH 4096 -static unsigned long task_pfn(unsigned long pgoff) +#define PAGEMAP_BATCH (64 << 10) +static void walk_vma(unsigned long index, unsigned long count) { - static uint64_t buf[PAGEMAP_BATCH]; - static unsigned long start; - static long count; - uint64_t pfn; + uint64_t buf[PAGEMAP_BATCH]; + unsigned long batch; + unsigned long pages; + unsigned long pfn; + unsigned long i; - if (pgoff < start || pgoff >= start + count) { - if (lseek64(pagemap_fd, - (uint64_t)pgoff * PM_ENTRY_BYTES, - SEEK_SET) < 0) { - perror("pagemap seek"); - exit(EXIT_FAILURE); - } - count = read(pagemap_fd, buf, sizeof(buf)); - if (count == 0) - return 0; - if (count < 0) { - perror("pagemap read"); - exit(EXIT_FAILURE); - } - if (count % PM_ENTRY_BYTES) { - fatal("pagemap read not aligned.\n"); - exit(EXIT_FAILURE); - } - count /= PM_ENTRY_BYTES; - start = pgoff; - } + while (count) { + batch = min_t(unsigned long, count, PAGEMAP_BATCH); + pages = pagemap_read(buf, index, batch); + if (pages == 0) + break; - pfn = buf[pgoff - start]; - if (pfn & PM_PRESENT) - pfn = PM_PFRAME(pfn); - else - pfn = 0; + for (i = 0; i < pages; i++) { + pfn = pagemap_pfn(buf[i]); + voffset = index + i; + if (pfn) + walk_pfn(pfn, 1); + } - return pfn; + index += pages; + count -= pages; + } } static void walk_task(unsigned long index, unsigned long count) @@ -524,11 +552,7 @@ static void walk_task(unsigned long index, unsigned long count) index = min_t(unsigned long, pg_end[i], end); assert(voffset < index); - for (; voffset < index; voffset++) { - unsigned long pfn = task_pfn(voffset); - if (pfn) - walk_pfn(pfn, 1); - } + walk_vma(voffset, index - voffset); } } -- cgit v1.2.1 From e577ebde9fb161bdc87511763c75dfad291059e2 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:30 -0700 Subject: page-types: make voffset local variables Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 18e891a329e0..56f33b63d160 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -159,7 +159,6 @@ static unsigned long opt_size[MAX_ADDR_RANGES]; static int nr_vmas; static unsigned long pg_start[MAX_VMAS]; static unsigned long pg_end[MAX_VMAS]; -static unsigned long voffset; #define MAX_BIT_FILTERS 64 static int nr_bit_filters; @@ -328,7 +327,8 @@ static char *page_flag_longname(uint64_t flags) * page list and summary */ -static void show_page_range(unsigned long offset, uint64_t flags) +static void show_page_range(unsigned long voffset, + unsigned long offset, uint64_t flags) { static uint64_t flags0; static unsigned long voff; @@ -354,7 +354,8 @@ static void show_page_range(unsigned long offset, uint64_t flags) count = 1; } -static void show_page(unsigned long offset, uint64_t flags) +static void show_page(unsigned long voffset, + unsigned long offset, uint64_t flags) { if (opt_pid) printf("%lx\t", voffset); @@ -435,7 +436,6 @@ static uint64_t well_known_flags(uint64_t flags) return flags; } - /* * page frame walker */ @@ -467,7 +467,8 @@ static int hash_slot(uint64_t flags) exit(EXIT_FAILURE); } -static void add_page(unsigned long offset, uint64_t flags) +static void add_page(unsigned long voffset, + unsigned long offset, uint64_t flags) { flags = expand_overloaded_flags(flags); @@ -478,16 +479,18 @@ static void add_page(unsigned long offset, uint64_t flags) return; if (opt_list == 1) - show_page_range(offset, flags); + show_page_range(voffset, offset, flags); else if (opt_list == 2) - show_page(offset, flags); + show_page(voffset, offset, flags); nr_pages[hash_slot(flags)]++; total_pages++; } #define KPAGEFLAGS_BATCH (64 << 10) /* 64k pages */ -static void walk_pfn(unsigned long index, unsigned long count) +static void walk_pfn(unsigned long voffset, + unsigned long index, + unsigned long count) { uint64_t buf[KPAGEFLAGS_BATCH]; unsigned long batch; @@ -501,7 +504,7 @@ static void walk_pfn(unsigned long index, unsigned long count) break; for (i = 0; i < pages; i++) - add_page(index + i, buf[i]); + add_page(voffset + i, index + i, buf[i]); index += pages; count -= pages; @@ -525,9 +528,8 @@ static void walk_vma(unsigned long index, unsigned long count) for (i = 0; i < pages; i++) { pfn = pagemap_pfn(buf[i]); - voffset = index + i; if (pfn) - walk_pfn(pfn, 1); + walk_pfn(index + i, pfn, 1); } index += pages; @@ -537,8 +539,9 @@ static void walk_vma(unsigned long index, unsigned long count) static void walk_task(unsigned long index, unsigned long count) { - int i = 0; const unsigned long end = index + count; + unsigned long start; + int i = 0; while (index < end) { @@ -548,11 +551,11 @@ static void walk_task(unsigned long index, unsigned long count) if (pg_start[i] >= end) return; - voffset = max_t(unsigned long, pg_start[i], index); - index = min_t(unsigned long, pg_end[i], end); + start = max_t(unsigned long, pg_start[i], index); + index = min_t(unsigned long, pg_end[i], end); - assert(voffset < index); - walk_vma(voffset, index - voffset); + assert(start < index); + walk_vma(start, index - start); } } @@ -577,7 +580,7 @@ static void walk_addr_ranges(void) for (i = 0; i < nr_addr_ranges; i++) if (!opt_pid) - walk_pfn(opt_offset[i], opt_size[i]); + walk_pfn(0, opt_offset[i], opt_size[i]); else walk_task(opt_offset[i], opt_size[i]); @@ -879,7 +882,7 @@ int main(int argc, char *argv[]) walk_addr_ranges(); if (opt_list == 1) - show_page_range(0, 0); /* drain the buffer */ + show_page_range(0, 0, 0); /* drain the buffer */ if (opt_no_summary) return 0; -- cgit v1.2.1 From 48640d69f5f06fe951a4de065a7f374cb9ee6040 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:31 -0700 Subject: page-types: introduce kpageflags_flags() Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 56f33b63d160..11b9a03f7b10 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -436,6 +436,16 @@ static uint64_t well_known_flags(uint64_t flags) return flags; } +static uint64_t kpageflags_flags(uint64_t flags) +{ + flags = expand_overloaded_flags(flags); + + if (!opt_raw) + flags = well_known_flags(flags); + + return flags; +} + /* * page frame walker */ @@ -470,10 +480,7 @@ static int hash_slot(uint64_t flags) static void add_page(unsigned long voffset, unsigned long offset, uint64_t flags) { - flags = expand_overloaded_flags(flags); - - if (!opt_raw) - flags = well_known_flags(flags); + flags = kpageflags_flags(flags); if (!bit_mask_ok(flags)) return; -- cgit v1.2.1 From a54fed9f70a2765f4476e1ce3d691a2f31df258f Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 7 Oct 2009 16:32:32 -0700 Subject: page-types: add hwpoison/unpoison feature For hwpoison stress testing. The debugfs mount point is assumed to be /debug/. Signed-off-by: Wu Fengguang Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/page-types.c | 73 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 72 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 11b9a03f7b10..3ec4f2a22585 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -170,6 +170,13 @@ static int page_size; static int pagemap_fd; static int kpageflags_fd; +static int opt_hwpoison; +static int opt_unpoison; + +static char *hwpoison_debug_fs = "/debug/hwpoison"; +static int hwpoison_inject_fd; +static int hwpoison_forget_fd; + #define HASH_SHIFT 13 #define HASH_SIZE (1 << HASH_SHIFT) #define HASH_MASK (HASH_SIZE - 1) @@ -446,6 +453,53 @@ static uint64_t kpageflags_flags(uint64_t flags) return flags; } +/* + * page actions + */ + +static void prepare_hwpoison_fd(void) +{ + char buf[100]; + + if (opt_hwpoison && !hwpoison_inject_fd) { + sprintf(buf, "%s/corrupt-pfn", hwpoison_debug_fs); + hwpoison_inject_fd = checked_open(buf, O_WRONLY); + } + + if (opt_unpoison && !hwpoison_forget_fd) { + sprintf(buf, "%s/renew-pfn", hwpoison_debug_fs); + hwpoison_forget_fd = checked_open(buf, O_WRONLY); + } +} + +static int hwpoison_page(unsigned long offset) +{ + char buf[100]; + int len; + + len = sprintf(buf, "0x%lx\n", offset); + len = write(hwpoison_inject_fd, buf, len); + if (len < 0) { + perror("hwpoison inject"); + return len; + } + return 0; +} + +static int unpoison_page(unsigned long offset) +{ + char buf[100]; + int len; + + len = sprintf(buf, "0x%lx\n", offset); + len = write(hwpoison_forget_fd, buf, len); + if (len < 0) { + perror("hwpoison forget"); + return len; + } + return 0; +} + /* * page frame walker */ @@ -485,6 +539,11 @@ static void add_page(unsigned long voffset, if (!bit_mask_ok(flags)) return; + if (opt_hwpoison) + hwpoison_page(offset); + if (opt_unpoison) + unpoison_page(offset); + if (opt_list == 1) show_page_range(voffset, offset, flags); else if (opt_list == 2) @@ -624,6 +683,8 @@ static void usage(void) " -l|--list Show page details in ranges\n" " -L|--list-each Show page details one by one\n" " -N|--no-summary Don't show summay info\n" +" -X|--hwpoison hwpoison pages\n" +" -x|--unpoison unpoison pages\n" " -h|--help Show this usage message\n" "addr-spec:\n" " N one page at offset N (unit: pages)\n" @@ -833,6 +894,8 @@ static struct option opts[] = { { "list" , 0, NULL, 'l' }, { "list-each" , 0, NULL, 'L' }, { "no-summary", 0, NULL, 'N' }, + { "hwpoison" , 0, NULL, 'X' }, + { "unpoison" , 0, NULL, 'x' }, { "help" , 0, NULL, 'h' }, { NULL , 0, NULL, 0 } }; @@ -844,7 +907,7 @@ int main(int argc, char *argv[]) page_size = getpagesize(); while ((c = getopt_long(argc, argv, - "rp:f:a:b:lLNh", opts, NULL)) != -1) { + "rp:f:a:b:lLNXxh", opts, NULL)) != -1) { switch (c) { case 'r': opt_raw = 1; @@ -870,6 +933,14 @@ int main(int argc, char *argv[]) case 'N': opt_no_summary = 1; break; + case 'X': + opt_hwpoison = 1; + prepare_hwpoison_fd(); + break; + case 'x': + opt_unpoison = 1; + prepare_hwpoison_fd(); + break; case 'h': usage(); exit(0); -- cgit v1.2.1 From d0153ca35d344d9b640dc305031b0703ba3f30f0 Mon Sep 17 00:00:00 2001 From: Alok Kataria Date: Tue, 29 Sep 2009 10:25:24 -0700 Subject: x86, vmi: Mark VMI deprecated and schedule it for removal Add text in feature-removal.txt indicating that VMI will be removed in the 2.6.37 timeframe. Signed-off-by: Alok N Kataria Acked-by: Chris Wright LKML-Reference: <1254193238.13456.48.camel@ank32.eng.vmware.com> [ removed a bogus Kconfig change, marked (DEPRECATED) in Kconfig ] Signed-off-by: H. Peter Anvin Signed-off-by: Ingo Molnar --- Documentation/feature-removal-schedule.txt | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 89a47b5aff07..04e6c819b28a 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -451,3 +451,33 @@ Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR will also allow making ALSA OSS emulation independent of sound_core. The dependency will be broken then too. Who: Tejun Heo + +---------------------------- + +What: Support for VMware's guest paravirtuliazation technique [VMI] will be + dropped. +When: 2.6.37 or earlier. +Why: With the recent innovations in CPU hardware acceleration technologies + from Intel and AMD, VMware ran a few experiments to compare these + techniques to guest paravirtualization technique on VMware's platform. + These hardware assisted virtualization techniques have outperformed the + performance benefits provided by VMI in most of the workloads. VMware + expects that these hardware features will be ubiquitous in a couple of + years, as a result, VMware has started a phased retirement of this + feature from the hypervisor. We will be removing this feature from the + Kernel too. Right now we are targeting 2.6.37 but can retire earlier if + technical reasons (read opportunity to remove major chunk of pvops) + arise. + + Please note that VMI has always been an optimization and non-VMI kernels + still work fine on VMware's platform. + Latest versions of VMware's product which support VMI are, + Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence + releases for these products will continue supporting VMI. + + For more details about VMI retirement take a look at this, + http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html + +Who: Alok N Kataria + +---------------------------- -- cgit v1.2.1 From 6dbce5218298c0c77a740b3ffe97cb8c304e1710 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 17 Sep 2009 17:37:12 +0200 Subject: ext3: Update documentation about ext3 quota mount options Signed-off-by: Jan Kara --- Documentation/filesystems/ext3.txt | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt index 570f9bd9be2b..05d5cf1d743f 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.txt @@ -123,10 +123,18 @@ resuid=n The user ID which may use the reserved blocks. sb=n Use alternate superblock at this location. -quota -noquota -grpquota -usrquota +quota These options are ignored by the filesystem. They +noquota are used only by quota tools to recognize volumes +grpquota where quota should be turned on. See documentation +usrquota in the quota-tools package for more details + (http://sourceforge.net/projects/linuxquota). + +jqfmt= These options tell filesystem details about quota +usrjquota= so that quota information can be properly updated +grpjquota= during journal replay. They replace the above + quota options. See documentation in the quota-tools + package for more details + (http://sourceforge.net/projects/linuxquota). bh (*) ext3 associates buffer heads to data pages to nobh (a) cache disk block mapping information -- cgit v1.2.1 From 54930531a00af5a1c33361a02e67dd1802110465 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Sun, 11 Oct 2009 17:38:29 +0200 Subject: ALSA: hda - Fix mute sound with STAC9227/9228 codecs On FSC laptops, the sound gets muted gradually when the volume is chnaged. This is due to the wrong volume-knob widget setup. The delta bit (bit 7) shouldn't be set for these devices. This patch adds a new quirk to set the value 0x7f to the widget 0x24 instead of 0xff. Reference: Novell bnc#546006 http://bugzilla.novell.com/show_bug.cgi?id=546006 Signed-off-by: Takashi Iwai --- Documentation/sound/alsa/HD-Audio-Models.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index a2643cfe7938..4bf953b55b8e 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -359,6 +359,7 @@ STAC9227/9228/9229/927x 5stack-no-fp D965 5stack without front panel dell-3stack Dell Dimension E520 dell-bios Fixes with Dell BIOS setup + volknob Fixes with volume-knob widget 0x24 auto BIOS setup (default) STAC92HD71B* -- cgit v1.2.1 From 99b830aa553668a2c433e4cbff130224a75c74bb Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Mon, 12 Oct 2009 15:45:13 +0000 Subject: USB: rename Documentation/ABI/.../sysfs-class-usb_host The usb_host class is no more. Rename its documentation file (which only contained WUSB specific files) to .../sysfs-class-uwb_rc-wusbhc. Signed-off-by: David Vrabel Signed-off-by: Greg Kroah-Hartman --- Documentation/ABI/testing/sysfs-class-usb_host | 25 ---------------------- .../ABI/testing/sysfs-class-uwb_rc-wusbhc | 25 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 25 deletions(-) delete mode 100644 Documentation/ABI/testing/sysfs-class-usb_host create mode 100644 Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-class-usb_host b/Documentation/ABI/testing/sysfs-class-usb_host deleted file mode 100644 index 46b66ad1f1b4..000000000000 --- a/Documentation/ABI/testing/sysfs-class-usb_host +++ /dev/null @@ -1,25 +0,0 @@ -What: /sys/class/usb_host/usb_hostN/wusb_chid -Date: July 2008 -KernelVersion: 2.6.27 -Contact: David Vrabel -Description: - Write the CHID (16 space-separated hex octets) for this host controller. - This starts the host controller, allowing it to accept connection from - WUSB devices. - - Set an all zero CHID to stop the host controller. - -What: /sys/class/usb_host/usb_hostN/wusb_trust_timeout -Date: July 2008 -KernelVersion: 2.6.27 -Contact: David Vrabel -Description: - Devices that haven't sent a WUSB packet to the host - within 'wusb_trust_timeout' ms are considered to have - disconnected and are removed. The default value of - 4000 ms is the value required by the WUSB - specification. - - Since this relates to security (specifically, the - lifetime of PTKs and GTKs) it should not be changed - from the default. diff --git a/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc b/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc new file mode 100644 index 000000000000..4e8106f7cfd9 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc @@ -0,0 +1,25 @@ +What: /sys/class/uwb_rc/uwbN/wusbhc/wusb_chid +Date: July 2008 +KernelVersion: 2.6.27 +Contact: David Vrabel +Description: + Write the CHID (16 space-separated hex octets) for this host controller. + This starts the host controller, allowing it to accept connection from + WUSB devices. + + Set an all zero CHID to stop the host controller. + +What: /sys/class/uwb_rc/uwbN/wusbhc/wusb_trust_timeout +Date: July 2008 +KernelVersion: 2.6.27 +Contact: David Vrabel +Description: + Devices that haven't sent a WUSB packet to the host + within 'wusb_trust_timeout' ms are considered to have + disconnected and are removed. The default value of + 4000 ms is the value required by the WUSB + specification. + + Since this relates to security (specifically, the + lifetime of PTKs and GTKs) it should not be changed + from the default. -- cgit v1.2.1 From 1243ba98e3abecd12e9e90dd390801b95ef070f2 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Wed, 14 Oct 2009 12:43:22 -0600 Subject: Update flex_arrays.txt The 2.6.32 merge window brought a number of changes to the flexible array API; this patch updates the documentation to match the new state of affairs. Acked-by: David Rientjes Signed-off-by: Jonathan Corbet --- Documentation/flexible-arrays.txt | 43 +++++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/flexible-arrays.txt b/Documentation/flexible-arrays.txt index 84eb26808dee..cb8a3a00cc92 100644 --- a/Documentation/flexible-arrays.txt +++ b/Documentation/flexible-arrays.txt @@ -1,5 +1,5 @@ Using flexible arrays in the kernel -Last updated for 2.6.31 +Last updated for 2.6.32 Jonathan Corbet Large contiguous memory allocations can be unreliable in the Linux kernel. @@ -40,6 +40,13 @@ argument is passed directly to the internal memory allocation calls. With the current code, using flags to ask for high memory is likely to lead to notably unpleasant side effects. +It is also possible to define flexible arrays at compile time with: + + DEFINE_FLEX_ARRAY(name, element_size, total); + +This macro will result in a definition of an array with the given name; the +element size and total will be checked for validity at compile time. + Storing data into a flexible array is accomplished with a call to: int flex_array_put(struct flex_array *array, unsigned int element_nr, @@ -76,16 +83,30 @@ particular element has never been allocated. Note that it is possible to get back a valid pointer for an element which has never been stored in the array. Memory for array elements is allocated one page at a time; a single allocation could provide memory for several -adjacent elements. The flexible array code does not know if a specific -element has been written; it only knows if the associated memory is -present. So a flex_array_get() call on an element which was never stored -in the array has the potential to return a pointer to random data. If the -caller does not have a separate way to know which elements were actually -stored, it might be wise, at least, to add GFP_ZERO to the flags argument -to ensure that all elements are zeroed. - -There is no way to remove a single element from the array. It is possible, -though, to remove all elements with a call to: +adjacent elements. Flexible array elements are normally initialized to the +value FLEX_ARRAY_FREE (defined as 0x6c in ), so errors +involving that number probably result from use of unstored array entries. +Note that, if array elements are allocated with __GFP_ZERO, they will be +initialized to zero and this poisoning will not happen. + +Individual elements in the array can be cleared with: + + int flex_array_clear(struct flex_array *array, unsigned int element_nr); + +This function will set the given element to FLEX_ARRAY_FREE and return +zero. If storage for the indicated element is not allocated for the array, +flex_array_clear() will return -EINVAL instead. Note that clearing an +element does not release the storage associated with it; to reduce the +allocated size of an array, call: + + int flex_array_shrink(struct flex_array *array); + +The return value will be the number of pages of memory actually freed. +This function works by scanning the array for pages containing nothing but +FLEX_ARRAY_FREE bytes, so (1) it can be expensive, and (2) it will not work +if the array's pages are allocated with __GFP_ZERO. + +It is possible to remove all elements of an array with a call to: void flex_array_free_parts(struct flex_array *array); -- cgit v1.2.1