From 21c13a4f7bc185552c4b402b792c3bbb9aa69df0 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Tue, 7 Jun 2011 11:35:52 -0400 Subject: usb-storage: redo incorrect reads Some USB mass-storage devices have bugs that cause them not to handle the first READ(10) command they receive correctly. The Corsair Padlock v2 returns completely bogus data for its first read (possibly it returns the data in encrypted form even though the device is supposed to be unlocked). The Feiya SD/SDHC card reader fails to complete the first READ(10) command after it is plugged in or after a new card is inserted, returning a status code that indicates it thinks the command was invalid, which prevents the kernel from retrying the read. Since the first read of a new device or a new medium is for the partition sector, the kernel is unable to retrieve the device's partition table. Users have to manually issue an "hdparm -z" or "blockdev --rereadpt" command before they can access the device. This patch (as1470) works around the problem. It adds a new quirk flag, US_FL_INVALID_READ10, indicating that the first READ(10) should always be retried immediately, as should any failing READ(10) commands (provided the preceding READ(10) command succeeded, to avoid getting stuck in a loop). The patch also adds appropriate unusual_devs entries containing the new flag. Signed-off-by: Alan Stern Tested-by: Sven Geggus Tested-by: Paul Hartman CC: Matthew Dharm CC: Signed-off-by: Greg Kroah-Hartman --- Documentation/kernel-parameters.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index d9a203b058f1..fd248a318211 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2598,6 +2598,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. unlock ejectable media); m = MAX_SECTORS_64 (don't transfer more than 64 sectors = 32 KB at a time); + n = INITIAL_READ10 (force a retry of the + initial READ(10) command); o = CAPACITY_OK (accept the capacity reported by the device); r = IGNORE_RESIDUE (the device reports -- cgit v1.2.1 From f699bf2328521cc3e20c412fcdb9ffe1255c360f Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 9 Jun 2011 11:43:04 +1000 Subject: md:Documentation/md.txt - fix typo Reported-by: CoolCold Signed-off-by: NeilBrown --- Documentation/md.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/md.txt b/Documentation/md.txt index 2366b1c8cf19..f0eee83ff78a 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -555,7 +555,7 @@ also have sync_min sync_max The two values, given as numbers of sectors, indicate a range - withing the array where 'check'/'repair' will operate. Must be + within the array where 'check'/'repair' will operate. Must be a multiple of chunk_size. When it reaches "sync_max" it will pause, rather than complete. You can use 'select' or 'poll' on "sync_completed" to wait for -- cgit v1.2.1 From 09223371deac67d08ca0b70bd18787920284c967 Mon Sep 17 00:00:00 2001 From: Shaohua Li Date: Tue, 14 Jun 2011 13:26:25 +0800 Subject: rcu: Use softirq to address performance regression Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread) introduced performance regression. In an AIM7 test, this commit degraded performance by about 40%. The commit runs rcu callbacks in a kthread instead of softirq. We observed high rate of context switch which is caused by this. Out test system has 64 CPUs and HZ is 1000, so we saw more than 64k context switch per second which is caused by RCU's per-CPU kthread. A trace showed that most of the time the RCU per-CPU kthread doesn't actually handle any callbacks, but instead just does a very small amount of work handling grace periods. This means that RCU's per-CPU kthreads are making the scheduler do quite a bit of work in order to allow a very small amount of RCU-related processing to be done. Alex Shi's analysis determined that this slowdown is due to lock contention within the scheduler. Unfortunately, as Peter Zijlstra points out, the scheduler's real-time semantics require global action, which means that this contention is inherent in real-time scheduling. (Yes, perhaps someone will come up with a workaround -- otherwise, -rt is not going to do well on large SMP systems -- but this patch will work around this issue in the meantime. And "the meantime" might well be forever.) This patch therefore re-introduces softirq processing to RCU, but only for core RCU work. RCU callbacks are still executed in kthread context, so that only a small amount of RCU work runs in softirq context in the common case. This should minimize ksoftirqd execution, allowing us to skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels. Signed-off-by: Shaohua Li Tested-by: "Alex,Shi" Signed-off-by: Paul E. McKenney --- Documentation/filesystems/proc.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index f48178024067..db3b1aba32a3 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -843,6 +843,7 @@ Provides counts of softirq handlers serviced since boot time, for each cpu. TASKLET: 0 0 0 290 SCHED: 27035 26983 26971 26746 HRTIMER: 0 0 0 0 + RCU: 1678 1769 2178 2250 1.3 IDE devices in /proc/ide -- cgit v1.2.1 From a59ec1e7ff98cc4365d5b1bff4e7102e86b5716b Mon Sep 17 00:00:00 2001 From: Michael Hennerich Date: Wed, 15 Jun 2011 15:08:11 -0700 Subject: backlight: new driver for the ADP8870 backlight devices Signed-off-by: Michael Hennerich Signed-off-by: Mike Frysinger Cc: Richard Purdie Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- .../testing/sysfs-class-backlight-driver-adp8870 | 56 ++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 b/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 new file mode 100644 index 000000000000..aa11dbdd794b --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 @@ -0,0 +1,56 @@ +What: /sys/class/backlight//_max +What: /sys/class/backlight//l1_daylight_max +What: /sys/class/backlight//l2_bright_max +What: /sys/class/backlight//l3_office_max +What: /sys/class/backlight//l4_indoor_max +What: /sys/class/backlight//l5_dark_max +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Control the maximum brightness for + on this . Values are between 0 and 127. This file + will also show the brightness level stored for this + . + +What: /sys/class/backlight//_dim +What: /sys/class/backlight//l2_bright_dim +What: /sys/class/backlight//l3_office_dim +What: /sys/class/backlight//l4_indoor_dim +What: /sys/class/backlight//l5_dark_dim +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Control the dim brightness for + on this . Values are between 0 and 127, typically + set to 0. Full off when the backlight is disabled. + This file will also show the dim brightness level stored for + this . + +What: /sys/class/backlight//ambient_light_level +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Get conversion value of the light sensor. + This value is updated every 80 ms (when the light sensor + is enabled). Returns integer between 0 (dark) and + 8000 (max ambient brightness) + +What: /sys/class/backlight//ambient_light_zone +Date: Mai 2011 +KernelVersion: 2.6.40 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Get/Set current ambient light zone. Reading returns + integer between 1..5 (1 = daylight, 2 = bright, ..., 5 = dark). + Writing a value between 1..5 forces the backlight controller + to enter the corresponding ambient light zone. + Writing 0 returns to normal/automatic ambient light level + operation. The ambient light sensing feature on these devices + is an extension to the API documented in + Documentation/ABI/stable/sysfs-class-backlight. + It can be enabled by writing the value stored in + /sys/class/backlight//max_brightness to + /sys/class/backlight//brightness. \ No newline at end of file -- cgit v1.2.1 From 50c35e5ba255fd8428cef8ff076da8d23bfd4909 Mon Sep 17 00:00:00 2001 From: Ying Han Date: Wed, 15 Jun 2011 15:08:16 -0700 Subject: memcg: add documentation for the memory.numastat API [akpm@linux-foundation.org: rework text, fit it into 80-cols] Signed-off-by: Ying Han Reviewed-by: KOSAKI Motohiro Acked-by: Balbir Singh Acked-by: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/memory.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'Documentation') diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 7c163477fcd8..510d64570d60 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -70,6 +70,7 @@ Brief summary of control files. (See sysctl's vm.swappiness) memory.move_charge_at_immigrate # set/show controls of moving charges memory.oom_control # set/show oom controls. + memory.numa_stat # show the number of memory usage per numa node 1. History @@ -464,6 +465,24 @@ value for efficient access. (Of course, when necessary, it's synchronized.) If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat(see 5.2). +5.6 numa_stat + +This is similar to numa_maps but operates on a per-memcg basis. This is +useful for providing visibility into the numa locality information within +an memcg since the pages are allowed to be allocated from any physical +node. One of the usecases is evaluating application performance by +combining this information with the application's cpu allocation. + +We export "total", "file", "anon" and "unevictable" pages per-node for +each memcg. The ouput format of memory.numa_stat is: + +total= N0= N1= ... +file= N0= N1= ... +anon= N0= N1= ... +unevictable= N0= N1= ... + +And we have total = file + anon + unevictable. + 6. Hierarchy support The memory controller supports a deep hierarchy and hierarchical accounting. -- cgit v1.2.1 From 31b5f8eeece4c0d70b649bfac7759cf7e3f915dd Mon Sep 17 00:00:00 2001 From: "akpm@linux-foundation.org" Date: Wed, 15 Jun 2011 15:08:52 -0700 Subject: Documentation/feature-removal-schedule.txt: remove ns_cgroup from feature-removal-schedule.txt Commit a77aea92010acf ("cgroup: remove the ns_cgroup") removed the ns_cgroup but it forgot to remove the related doc in feature-removal-schedule.txt. Signed-off-by: WANG Cong Cc: Daniel Lezcano Cc: Serge E. Hallyn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/feature-removal-schedule.txt | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 1a9446b59153..72e238465b0b 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -481,23 +481,6 @@ Who: FUJITA Tomonori ---------------------------- -What: namespace cgroup (ns_cgroup) -When: 2.6.38 -Why: The ns_cgroup leads to some problems: - * cgroup creation is out-of-control - * cgroup name can conflict when pids are looping - * it is not possible to have a single process handling - a lot of namespaces without falling in a exponential creation time - * we may want to create a namespace without creating a cgroup - - The ns_cgroup is replaced by a compatibility flag 'clone_children', - where a newly created cgroup will copy the parent cgroup values. - The userspace has to manually create a cgroup and add a task to - the 'tasks' file. -Who: Daniel Lezcano - ----------------------------- - What: iwlwifi disable_hw_scan module parameters When: 2.6.40 Why: Hareware scan is the prefer method for iwlwifi devices for -- cgit v1.2.1 From 04c55715cbd5de526046745bca3d3b6ffa6641c6 Mon Sep 17 00:00:00 2001 From: Andrew Murray Date: Wed, 15 Jun 2011 12:57:09 -0700 Subject: Documentation: update printk-formats.txt This patch updates the incomplete documentation concerning the printk extended format specifiers. Signed-off-by: Andrew Murray Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/printk-formats.txt | 119 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 117 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index 1b5a5ddbc3ef..5df176ed59b8 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -9,7 +9,121 @@ If variable is of Type, use printk format specifier: size_t %zu or %zx ssize_t %zd or %zx -Raw pointer value SHOULD be printed with %p. +Raw pointer value SHOULD be printed with %p. The kernel supports +the following extended format specifiers for pointer types: + +Symbols/Function Pointers: + + %pF versatile_init+0x0/0x110 + %pf versatile_init + %pS versatile_init+0x0/0x110 + %ps versatile_init + %pB prev_fn_of_versatile_init+0x88/0x88 + + For printing symbols and function pointers. The 'S' and 's' specifiers + result in the symbol name with ('S') or without ('s') offsets. Where + this is used on a kernel without KALLSYMS - the symbol address is + printed instead. + + The 'B' specifier results in the symbol name with offsets and should be + used when printing stack backtraces. The specifier takes into + consideration the effect of compiler optimisations which may occur + when tail-call's are used and marked with the noreturn GCC attribute. + + On ia64, ppc64 and parisc64 architectures function pointers are + actually function descriptors which must first be resolved. The 'F' and + 'f' specifiers perform this resolution and then provide the same + functionality as the 'S' and 's' specifiers. + +Kernel Pointers: + + %pK 0x01234567 or 0x0123456789abcdef + + For printing kernel pointers which should be hidden from unprivileged + users. The behaviour of %pK depends on the kptr_restrict sysctl - see + Documentation/sysctl/kernel.txt for more details. + +Struct Resources: + + %pr [mem 0x60000000-0x6fffffff flags 0x2200] or + [mem 0x0000000060000000-0x000000006fffffff flags 0x2200] + %pR [mem 0x60000000-0x6fffffff pref] or + [mem 0x0000000060000000-0x000000006fffffff pref] + + For printing struct resources. The 'R' and 'r' specifiers result in a + printed resource with ('R') or without ('r') a decoded flags member. + +MAC/FDDI addresses: + + %pM 00:01:02:03:04:05 + %pMF 00-01-02-03-04-05 + %pm 000102030405 + + For printing 6-byte MAC/FDDI addresses in hex notation. The 'M' and 'm' + specifiers result in a printed address with ('M') or without ('m') byte + separators. The default byte separator is the colon (':'). + + Where FDDI addresses are concerned the 'F' specifier can be used after + the 'M' specifier to use dash ('-') separators instead of the default + separator. + +IPv4 addresses: + + %pI4 1.2.3.4 + %pi4 001.002.003.004 + %p[Ii][hnbl] + + For printing IPv4 dot-separated decimal addresses. The 'I4' and 'i4' + specifiers result in a printed address with ('i4') or without ('I4') + leading zeros. + + The additional 'h', 'n', 'b', and 'l' specifiers are used to specify + host, network, big or little endian order addresses respectively. Where + no specifier is provided the default network/big endian order is used. + +IPv6 addresses: + + %pI6 0001:0002:0003:0004:0005:0006:0007:0008 + %pi6 00010002000300040005000600070008 + %pI6c 1:2:3:4:5:6:7:8 + + For printing IPv6 network-order 16-bit hex addresses. The 'I6' and 'i6' + specifiers result in a printed address with ('I6') or without ('i6') + colon-separators. Leading zeros are always used. + + The additional 'c' specifier can be used with the 'I' specifier to + print a compressed IPv6 address as described by + http://tools.ietf.org/html/rfc5952 + +UUID/GUID addresses: + + %pUb 00010203-0405-0607-0809-0a0b0c0d0e0f + %pUB 00010203-0405-0607-0809-0A0B0C0D0E0F + %pUl 03020100-0504-0706-0809-0a0b0c0e0e0f + %pUL 03020100-0504-0706-0809-0A0B0C0E0E0F + + For printing 16-byte UUID/GUIDs addresses. The additional 'l', 'L', + 'b' and 'B' specifiers are used to specify a little endian order in + lower ('l') or upper case ('L') hex characters - and big endian order + in lower ('b') or upper case ('B') hex characters. + + Where no additional specifiers are used the default little endian + order with lower case hex characters will be printed. + +struct va_format: + + %pV + + For printing struct va_format structures. These contain a format string + and va_list as follows: + + struct va_format { + const char *fmt; + va_list *va; + }; + + Do not use this feature without some mechanism to verify the + correctness of the format string and va_list arguments. u64 SHOULD be printed with %llu/%llx, (unsigned long long): @@ -32,4 +146,5 @@ Reminder: sizeof() result is of type size_t. Thank you for your cooperation and attention. -By Randy Dunlap +By Randy Dunlap and +Andrew Murray -- cgit v1.2.1 From 06a2c45d6b4a7586eba7cd20dd656b08d8b63c2f Mon Sep 17 00:00:00 2001 From: "Maxin B. John" Date: Wed, 15 Jun 2011 12:58:29 -0700 Subject: Documentation: update kmemleak supported archs Instead of listing the architectures that are supported by kmemleak in Documentation/kmemleak.txt, just refer people to the list of supported architecutures in lib/Kconfig.debug so that Documentation/kmemleak.txt does not need more updates for this. Signed-off-by: Maxin B. John Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/kmemleak.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index 090e6ee04536..51063e681ca4 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt @@ -11,7 +11,9 @@ with the difference that the orphan objects are not freed but only reported via /sys/kernel/debug/kmemleak. A similar method is used by the Valgrind tool (memcheck --leak-check) to detect the memory leaks in user-space applications. -Kmemleak is supported on x86, arm, powerpc, sparc, sh, microblaze and tile. + +Please check DEBUG_KMEMLEAK dependencies in lib/Kconfig.debug for supported +architectures. Usage ----- -- cgit v1.2.1 From f6e07d38078e82a6aeaae00bb134591ef5ac1167 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=B6rg=20Sommer?= Date: Wed, 15 Jun 2011 12:59:45 -0700 Subject: Documentation: update cgroupfs mount point MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to mount cgroupfs on") the canonical mountpoint for the cgroup filesystem is /sys/fs/cgroup. Hence, this should be used in the documentation. Signed-off-by: Jörg Sommer Acked-by: Paul Menage Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/accounting/cgroupstats.txt | 4 +- Documentation/cgroups/blkio-controller.txt | 29 +++++++------- Documentation/cgroups/cgroups.txt | 58 +++++++++++++++++----------- Documentation/cgroups/cpuacct.txt | 21 +++++----- Documentation/cgroups/cpusets.txt | 28 +++++++------- Documentation/cgroups/devices.txt | 6 +-- Documentation/cgroups/freezer-subsystem.txt | 20 +++++----- Documentation/cgroups/memory.txt | 17 ++++---- Documentation/scheduler/sched-design-CFS.txt | 7 ++-- Documentation/scheduler/sched-rt-group.txt | 7 ++-- Documentation/vm/hwpoison.txt | 6 +-- 11 files changed, 109 insertions(+), 94 deletions(-) (limited to 'Documentation') diff --git a/Documentation/accounting/cgroupstats.txt b/Documentation/accounting/cgroupstats.txt index eda40fd39cad..d16a9849e60e 100644 --- a/Documentation/accounting/cgroupstats.txt +++ b/Documentation/accounting/cgroupstats.txt @@ -21,7 +21,7 @@ information will not be available. To extract cgroup statistics a utility very similar to getdelays.c has been developed, the sample output of the utility is shown below -~/balbir/cgroupstats # ./getdelays -C "/cgroup/a" +~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup/a" sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0 -~/balbir/cgroupstats # ./getdelays -C "/cgroup" +~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup" sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2 diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index 465351d4cf85..b1b1bfadc9c0 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -28,16 +28,19 @@ cgroups. Here is what you can do. - Enable group scheduling in CFQ CONFIG_CFQ_GROUP_IOSCHED=y -- Compile and boot into kernel and mount IO controller (blkio). +- Compile and boot into kernel and mount IO controller (blkio); see + cgroups.txt, Why are cgroups needed?. - mount -t cgroup -o blkio none /cgroup + mount -t tmpfs cgroup_root /sys/fs/cgroup + mkdir /sys/fs/cgroup/blkio + mount -t cgroup -o blkio none /sys/fs/cgroup/blkio - Create two cgroups - mkdir -p /cgroup/test1/ /cgroup/test2 + mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2 - Set weights of group test1 and test2 - echo 1000 > /cgroup/test1/blkio.weight - echo 500 > /cgroup/test2/blkio.weight + echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight + echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight - Create two same size files (say 512MB each) on same disk (file1, file2) and launch two dd threads in different cgroup to read those files. @@ -46,12 +49,12 @@ cgroups. Here is what you can do. echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/sdb/zerofile1 of=/dev/null & - echo $! > /cgroup/test1/tasks - cat /cgroup/test1/tasks + echo $! > /sys/fs/cgroup/blkio/test1/tasks + cat /sys/fs/cgroup/blkio/test1/tasks dd if=/mnt/sdb/zerofile2 of=/dev/null & - echo $! > /cgroup/test2/tasks - cat /cgroup/test2/tasks + echo $! > /sys/fs/cgroup/blkio/test2/tasks + cat /sys/fs/cgroup/blkio/test2/tasks - At macro level, first dd should finish first. To get more precise data, keep on looking at (with the help of script), at blkio.disk_time and @@ -68,13 +71,13 @@ Throttling/Upper Limit policy - Enable throttling in block layer CONFIG_BLK_DEV_THROTTLING=y -- Mount blkio controller - mount -t cgroup -o blkio none /cgroup/blkio +- Mount blkio controller (see cgroups.txt, Why are cgroups needed?) + mount -t cgroup -o blkio none /sys/fs/cgroup/blkio - Specify a bandwidth rate on particular device for root group. The format for policy is ": ". - echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device + echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.read_bps_device Above will put a limit of 1MB/second on reads happening for root group on device having major/minor number 8:16. @@ -149,7 +152,7 @@ Proportional weight policy files Following is the format. - #echo dev_maj:dev_minor weight > /path/to/cgroup/blkio.weight_device + # echo dev_maj:dev_minor weight > blkio.weight_device Configure weight=300 on /dev/sdb (8:16) in this cgroup # echo 8:16 300 > blkio.weight_device # cat blkio.weight_device diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 0ed99f08f1f3..15bca101ff62 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -138,7 +138,7 @@ With the ability to classify tasks differently for different resources the admin can easily set up a script which receives exec notifications and depending on who is launching the browser he can - # echo browser_pid > /mnt///tasks + # echo browser_pid > /sys/fs/cgroup///tasks With only a single hierarchy, he now would potentially have to create a separate cgroup for every browser launched and associate it with @@ -153,9 +153,9 @@ apps enhanced CPU power, With ability to write pids directly to resource classes, it's just a matter of : - # echo pid > /mnt/network//tasks + # echo pid > /sys/fs/cgroup/network//tasks (after some time) - # echo pid > /mnt/network//tasks + # echo pid > /sys/fs/cgroup/network//tasks Without this ability, he would have to split the cgroup into multiple separate ones and then associate the new cgroups with the @@ -310,21 +310,24 @@ subsystem, this is the case for the cpuset. To start a new job that is to be contained within a cgroup, using the "cpuset" cgroup subsystem, the steps are something like: - 1) mkdir /dev/cgroup - 2) mount -t cgroup -ocpuset cpuset /dev/cgroup - 3) Create the new cgroup by doing mkdir's and write's (or echo's) in - the /dev/cgroup virtual file system. - 4) Start a task that will be the "founding father" of the new job. - 5) Attach that task to the new cgroup by writing its pid to the - /dev/cgroup tasks file for that cgroup. - 6) fork, exec or clone the job tasks from this founding father task. + 1) mount -t tmpfs cgroup_root /sys/fs/cgroup + 2) mkdir /sys/fs/cgroup/cpuset + 3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset + 4) Create the new cgroup by doing mkdir's and write's (or echo's) in + the /sys/fs/cgroup virtual file system. + 5) Start a task that will be the "founding father" of the new job. + 6) Attach that task to the new cgroup by writing its pid to the + /sys/fs/cgroup/cpuset/tasks file for that cgroup. + 7) fork, exec or clone the job tasks from this founding father task. For example, the following sequence of commands will setup a cgroup named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, and then start a subshell 'sh' in that cgroup: - mount -t cgroup cpuset -ocpuset /dev/cgroup - cd /dev/cgroup + mount -t tmpfs cgroup_root /sys/fs/cgroup + mkdir /sys/fs/cgroup/cpuset + mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset + cd /sys/fs/cgroup/cpuset mkdir Charlie cd Charlie /bin/echo 2-3 > cpuset.cpus @@ -345,7 +348,7 @@ Creating, modifying, using the cgroups can be done through the cgroup virtual filesystem. To mount a cgroup hierarchy with all available subsystems, type: -# mount -t cgroup xxx /dev/cgroup +# mount -t cgroup xxx /sys/fs/cgroup The "xxx" is not interpreted by the cgroup code, but will appear in /proc/mounts so may be any useful identifying string that you like. @@ -354,23 +357,32 @@ Note: Some subsystems do not work without some user input first. For instance, if cpusets are enabled the user will have to populate the cpus and mems files for each new cgroup created before that group can be used. +As explained in section `1.2 Why are cgroups needed?' you should create +different hierarchies of cgroups for each single resource or group of +resources you want to control. Therefore, you should mount a tmpfs on +/sys/fs/cgroup and create directories for each cgroup resource or resource +group. + +# mount -t tmpfs cgroup_root /sys/fs/cgroup +# mkdir /sys/fs/cgroup/rg1 + To mount a cgroup hierarchy with just the cpuset and memory subsystems, type: -# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup +# mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1 To change the set of subsystems bound to a mounted hierarchy, just remount with different options: -# mount -o remount,cpuset,blkio hier1 /dev/cgroup +# mount -o remount,cpuset,blkio hier1 /sys/fs/cgroup/rg1 Now memory is removed from the hierarchy and blkio is added. Note this will add blkio to the hierarchy but won't remove memory or cpuset, because the new options are appended to the old ones: -# mount -o remount,blkio /dev/cgroup +# mount -o remount,blkio /sys/fs/cgroup/rg1 To Specify a hierarchy's release_agent: # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \ - xxx /dev/cgroup + xxx /sys/fs/cgroup/rg1 Note that specifying 'release_agent' more than once will return failure. @@ -379,17 +391,17 @@ when the hierarchy consists of a single (root) cgroup. Supporting the ability to arbitrarily bind/unbind subsystems from an existing cgroup hierarchy is intended to be implemented in the future. -Then under /dev/cgroup you can find a tree that corresponds to the -tree of the cgroups in the system. For instance, /dev/cgroup +Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the +tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1 is the cgroup that holds the whole system. If you want to change the value of release_agent: -# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent +# echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent It can also be changed via remount. -If you want to create a new cgroup under /dev/cgroup: -# cd /dev/cgroup +If you want to create a new cgroup under /sys/fs/cgroup/rg1: +# cd /sys/fs/cgroup/rg1 # mkdir my_cgroup Now you want to do something with this cgroup. diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt index 8b930946c52a..9ad85df4b983 100644 --- a/Documentation/cgroups/cpuacct.txt +++ b/Documentation/cgroups/cpuacct.txt @@ -10,26 +10,25 @@ directly present in its group. Accounting groups can be created by first mounting the cgroup filesystem. -# mkdir /cgroups -# mount -t cgroup -ocpuacct none /cgroups - -With the above step, the initial or the parent accounting group -becomes visible at /cgroups. At bootup, this group includes all the -tasks in the system. /cgroups/tasks lists the tasks in this cgroup. -/cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by -this group which is essentially the CPU time obtained by all the tasks +# mount -t cgroup -ocpuacct none /sys/fs/cgroup + +With the above step, the initial or the parent accounting group becomes +visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in +the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. +/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained +by this group which is essentially the CPU time obtained by all the tasks in the system. -New accounting groups can be created under the parent group /cgroups. +New accounting groups can be created under the parent group /sys/fs/cgroup. -# cd /cgroups +# cd /sys/fs/cgroup # mkdir g1 # echo $$ > g1 The above steps create a new group g1 and move the current shell process (bash) into it. CPU time consumed by this bash and its children can be obtained from g1/cpuacct.usage and the same is accumulated in -/cgroups/cpuacct.usage also. +/sys/fs/cgroup/cpuacct.usage also. cpuacct.stat file lists a few statistics which further divide the CPU time obtained by the cgroup into user and system times. Currently diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index 98a30829af7a..5b0d78e55ccc 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt @@ -661,21 +661,21 @@ than stress the kernel. To start a new job that is to be contained within a cpuset, the steps are: - 1) mkdir /dev/cpuset - 2) mount -t cgroup -ocpuset cpuset /dev/cpuset + 1) mkdir /sys/fs/cgroup/cpuset + 2) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset 3) Create the new cpuset by doing mkdir's and write's (or echo's) in - the /dev/cpuset virtual file system. + the /sys/fs/cgroup/cpuset virtual file system. 4) Start a task that will be the "founding father" of the new job. 5) Attach that task to the new cpuset by writing its pid to the - /dev/cpuset tasks file for that cpuset. + /sys/fs/cgroup/cpuset tasks file for that cpuset. 6) fork, exec or clone the job tasks from this founding father task. For example, the following sequence of commands will setup a cpuset named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, and then start a subshell 'sh' in that cpuset: - mount -t cgroup -ocpuset cpuset /dev/cpuset - cd /dev/cpuset + mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset + cd /sys/fs/cgroup/cpuset mkdir Charlie cd Charlie /bin/echo 2-3 > cpuset.cpus @@ -710,14 +710,14 @@ Creating, modifying, using the cpusets can be done through the cpuset virtual filesystem. To mount it, type: -# mount -t cgroup -o cpuset cpuset /dev/cpuset +# mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset -Then under /dev/cpuset you can find a tree that corresponds to the -tree of the cpusets in the system. For instance, /dev/cpuset +Then under /sys/fs/cgroup/cpuset you can find a tree that corresponds to the +tree of the cpusets in the system. For instance, /sys/fs/cgroup/cpuset is the cpuset that holds the whole system. -If you want to create a new cpuset under /dev/cpuset: -# cd /dev/cpuset +If you want to create a new cpuset under /sys/fs/cgroup/cpuset: +# cd /sys/fs/cgroup/cpuset # mkdir my_cpuset Now you want to do something with this cpuset. @@ -765,12 +765,12 @@ wrapper around the cgroup filesystem. The command -mount -t cpuset X /dev/cpuset +mount -t cpuset X /sys/fs/cgroup/cpuset is equivalent to -mount -t cgroup -ocpuset,noprefix X /dev/cpuset -echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent +mount -t cgroup -ocpuset,noprefix X /sys/fs/cgroup/cpuset +echo "/sbin/cpuset_release_agent" > /sys/fs/cgroup/cpuset/release_agent 2.2 Adding/removing cpus ------------------------ diff --git a/Documentation/cgroups/devices.txt b/Documentation/cgroups/devices.txt index 57ca4c89fe5c..16624a7f8222 100644 --- a/Documentation/cgroups/devices.txt +++ b/Documentation/cgroups/devices.txt @@ -22,16 +22,16 @@ removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance - echo 'c 1:3 mr' > /cgroups/1/devices.allow + echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing - echo a > /cgroups/1/devices.deny + echo a > /sys/fs/cgroup/1/devices.deny will remove the default 'a *:* rwm' entry. Doing - echo a > /cgroups/1/devices.allow + echo a > /sys/fs/cgroup/1/devices.allow will add the 'a *:* rwm' entry to the whitelist. diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt index 41f37fea1276..c21d77742a07 100644 --- a/Documentation/cgroups/freezer-subsystem.txt +++ b/Documentation/cgroups/freezer-subsystem.txt @@ -59,28 +59,28 @@ is non-freezable. * Examples of usage : - # mkdir /containers - # mount -t cgroup -ofreezer freezer /containers - # mkdir /containers/0 - # echo $some_pid > /containers/0/tasks + # mkdir /sys/fs/cgroup/freezer + # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer + # mkdir /sys/fs/cgroup/freezer/0 + # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks to get status of the freezer subsystem : - # cat /containers/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state THAWED to freeze all tasks in the container : - # echo FROZEN > /containers/0/freezer.state - # cat /containers/0/freezer.state + # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state FREEZING - # cat /containers/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state FROZEN to unfreeze all tasks in the container : - # echo THAWED > /containers/0/freezer.state - # cat /containers/0/freezer.state + # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state + # cat /sys/fs/cgroup/freezer/0/freezer.state THAWED This is the basic mechanism which should do the right thing for user space task diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 510d64570d60..ffec2416aa74 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -264,16 +264,17 @@ b. Enable CONFIG_RESOURCE_COUNTERS c. Enable CONFIG_CGROUP_MEM_RES_CTLR d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) -1. Prepare the cgroups -# mkdir -p /cgroups -# mount -t cgroup none /cgroups -o memory +1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) +# mount -t tmpfs none /sys/fs/cgroup +# mkdir /sys/fs/cgroup/memory +# mount -t cgroup none /sys/fs/cgroup/memory -o memory 2. Make the new group and move bash into it -# mkdir /cgroups/0 -# echo $$ > /cgroups/0/tasks +# mkdir /sys/fs/cgroup/memory/0 +# echo $$ > /sys/fs/cgroup/memory/0/tasks Since now we're in the 0 cgroup, we can alter the memory limit: -# echo 4M > /cgroups/0/memory.limit_in_bytes +# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) @@ -281,11 +282,11 @@ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes, Gibibytes.) NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited). NOTE: We cannot set limits on the root cgroup any more. -# cat /cgroups/0/memory.limit_in_bytes +# cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes 4194304 We can check the usage: -# cat /cgroups/0/memory.usage_in_bytes +# cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes 1216512 A successful write to this file does not guarantee a successful set of diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt index 99961993257a..91ecff07cede 100644 --- a/Documentation/scheduler/sched-design-CFS.txt +++ b/Documentation/scheduler/sched-design-CFS.txt @@ -223,9 +223,10 @@ When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each group created using the pseudo filesystem. See example steps below to create task groups and modify their CPU share using the "cgroups" pseudo filesystem. - # mkdir /dev/cpuctl - # mount -t cgroup -ocpu none /dev/cpuctl - # cd /dev/cpuctl + # mount -t tmpfs cgroup_root /sys/fs/cgroup + # mkdir /sys/fs/cgroup/cpu + # mount -t cgroup -ocpu none /sys/fs/cgroup/cpu + # cd /sys/fs/cgroup/cpu # mkdir multimedia # create "multimedia" group of tasks # mkdir browser # create "browser" group of tasks diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt index 605b0d40329d..71b54d549987 100644 --- a/Documentation/scheduler/sched-rt-group.txt +++ b/Documentation/scheduler/sched-rt-group.txt @@ -129,9 +129,8 @@ priority! Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real CPU bandwidth to task groups. -This uses the /cgroup virtual file system and -"/cgroup//cpu.rt_runtime_us" to control the CPU time reserved for each -control group. +This uses the cgroup virtual file system and "/cpu.rt_runtime_us" +to control the CPU time reserved for each control group. For more information on working with control groups, you should read Documentation/cgroups/cgroups.txt as well. @@ -150,7 +149,7 @@ For now, this can be simplified to just the following (but see Future plans): =============== There is work in progress to make the scheduling period for each group -("/cgroup//cpu.rt_period_us") configurable as well. +("/cpu.rt_period_us") configurable as well. The constraint on the period is that a subgroup must have a smaller or equal period to its parent. But realistically its not very useful _yet_ diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index 12f9ba20ccb7..550068466605 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -129,12 +129,12 @@ Limit injection to pages owned by memgroup. Specified by inode number of the memcg. Example: - mkdir /cgroup/hwpoison + mkdir /sys/fs/cgroup/mem/hwpoison usemem -m 100 -s 1000 & - echo `jobs -p` > /cgroup/hwpoison/tasks + echo `jobs -p` > /sys/fs/cgroup/mem/hwpoison/tasks - memcg_ino=$(ls -id /cgroup/hwpoison | cut -f1 -d' ') + memcg_ino=$(ls -id /sys/fs/cgroup/mem/hwpoison | cut -f1 -d' ') echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg page-types -p `pidof init` --hwpoison # shall do nothing -- cgit v1.2.1 From 67de0162fbb78713fcb23cb2502b380faa8bde73 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=B6rg=20Sommer?= Date: Wed, 15 Jun 2011 13:00:47 -0700 Subject: Documentation: fix cgroup typos and formatting MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix format and spelling. Signed-off-by: Jörg Sommer Acked-by: Paul Menage Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/cgroups/blkio-controller.txt | 2 +- Documentation/cgroups/cgroups.txt | 2 +- Documentation/cgroups/memory.txt | 22 +++++++++++----------- 3 files changed, 13 insertions(+), 13 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index b1b1bfadc9c0..cd45c8ea7463 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -111,7 +111,7 @@ Hierarchical Cgroups CFQ and throttling will practically treat all groups at same level. pivot - / | \ \ + / / \ \ root test1 test2 test3 Down the line we can implement hierarchical accounting/control support diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 15bca101ff62..cd67e90003c0 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -142,7 +142,7 @@ and depending on who is launching the browser he can With only a single hierarchy, he now would potentially have to create a separate cgroup for every browser launched and associate it with -approp network and other resource class. This may lead to +appropriate network and other resource class. This may lead to proliferation of such cgroups. Also lets say that the administrator would like to give enhanced network diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index ffec2416aa74..06eb6d957c83 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -1,8 +1,8 @@ Memory Resource Controller -NOTE: The Memory Resource Controller has been generically been referred - to as the memory controller in this document. Do not confuse memory - controller used here with the memory controller that is used in hardware. +NOTE: The Memory Resource Controller has generically been referred to as the + memory controller in this document. Do not confuse memory controller + used here with the memory controller that is used in hardware. (For editors) In this document: @@ -182,7 +182,7 @@ behind this approach is that a cgroup that aggressively uses a shared page will eventually get charged for it (once it is uncharged from the cgroup that brought it in -- this will happen on memory pressure). -Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.. +Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. When you do swapoff and make swapped-out pages of shmem(tmpfs) to be backed into memory in force, charges for pages are accounted against the caller of swapoff rather than the users of shmem. @@ -214,7 +214,7 @@ affecting global LRU, memory+swap limit is better than just limiting swap from OS point of view. * What happens when a cgroup hits memory.memsw.limit_in_bytes -When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out +When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out in this cgroup. Then, swap-out will not be done by cgroup routine and file caches are dropped. But as mentioned above, global LRU can do swapout memory from it for sanity of the system's memory management state. You can't forbid @@ -491,13 +491,13 @@ The hierarchy is created by creating the appropriate cgroups in the cgroup filesystem. Consider for example, the following cgroup filesystem hierarchy - root + root / | \ - / | \ - a b c - | \ - | \ - d e + / | \ + a b c + | \ + | \ + d e In the diagram above, with hierarchical accounting enabled, all memory usage of e, is accounted to its ancestors up until the root (i.e, c and root), -- cgit v1.2.1