From 79e2e7597de5e36a2f7f0dca13011f46ee37c809 Mon Sep 17 00:00:00 2001 From: Michael Witten Date: Sun, 16 Jan 2011 21:43:10 +0000 Subject: Kconfig: Typo: seti -> set Also, I introduced some punctuation to facilitate reading. Signed-off-by: Michael Witten Signed-off-by: Jiri Kosina --- init/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 4f6cdbf523eb..9715b228a998 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -743,8 +743,8 @@ config BLK_CGROUP This option only enables generic Block IO controller infrastructure. One needs to also enable actual IO controlling logic/policy. For - enabling proportional weight division of disk bandwidth in CFQ seti - CONFIG_CFQ_GROUP_IOSCHED=y and for enabling throttling policy set + enabling proportional weight division of disk bandwidth in CFQ, set + CONFIG_CFQ_GROUP_IOSCHED=y; for enabling throttling policy, set CONFIG_BLK_THROTTLE=y. See Documentation/cgroups/blkio-controller.txt for more information. -- cgit v1.2.1 From c5e0591ae3a13adfe2d6d685ba9cc6d84d0e58df Mon Sep 17 00:00:00 2001 From: Michael Witten Date: Mon, 17 Jan 2011 00:08:41 +0000 Subject: Kconfig: BLK_THROTTLE -> BLK_DEV_THROTTLING It would seem that `CONFIG_BLK_THROTTLE' doesn't exist, as it is only referenced in the documentation for `CONFIG_BLK_CGROUP'. The only other choice is `CONFIG_BLK_DEV_THROTTLING': $ git grep --cached THROTTL -- \*Kconfig block/Kconfig:config BLK_DEV_THROTTLING init/Kconfig: CONFIG_BLK_THROTTLE=y. Signed-off-by: Michael Witten Signed-off-by: Jiri Kosina --- init/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 9715b228a998..cb0b2051bbb4 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -745,7 +745,7 @@ config BLK_CGROUP One needs to also enable actual IO controlling logic/policy. For enabling proportional weight division of disk bandwidth in CFQ, set CONFIG_CFQ_GROUP_IOSCHED=y; for enabling throttling policy, set - CONFIG_BLK_THROTTLE=y. + CONFIG_BLK_DEV_THROTTLING=y. See Documentation/cgroups/blkio-controller.txt for more information. -- cgit v1.2.1 From 5d6a4ea576bb8d9583d1d12538403661f7d26e31 Mon Sep 17 00:00:00 2001 From: Ferenc Wagner Date: Mon, 10 Jan 2011 19:04:22 +0100 Subject: sysfs: Capitalize description of SYSFS_DEPRECATED{_V2} options Signed-off-by: Ferenc Wagner Signed-off-by: Greg Kroah-Hartman --- init/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index be788c0957d4..e72fd101039e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -814,7 +814,7 @@ config MM_OWNER bool config SYSFS_DEPRECATED - bool "enable deprecated sysfs features to support old userspace tools" + bool "Enable deprecated sysfs features to support old userspace tools" depends on SYSFS default n help @@ -837,7 +837,7 @@ config SYSFS_DEPRECATED need to say Y here. config SYSFS_DEPRECATED_V2 - bool "enabled deprecated sysfs features by default" + bool "Enable deprecated sysfs features by default" default n depends on SYSFS depends on SYSFS_DEPRECATED -- cgit v1.2.1 From e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 Mon Sep 17 00:00:00 2001 From: Stephane Eranian Date: Mon, 14 Feb 2011 11:20:01 +0200 Subject: perf: Add cgroup support This kernel patch adds the ability to filter monitoring based on container groups (cgroups). This is for use in per-cpu mode only. The cgroup to monitor is passed as a file descriptor in the pid argument to the syscall. The file descriptor must be opened to the cgroup name in the cgroup filesystem. For instance, if the cgroup name is foo and cgroupfs is mounted in /cgroup, then the file descriptor is opened to /cgroup/foo. Cgroup mode is activated by passing PERF_FLAG_PID_CGROUP in the flags argument to the syscall. For instance to measure in cgroup foo on CPU1 assuming cgroupfs is mounted under /cgroup: struct perf_event_attr attr; int cgroup_fd, fd; cgroup_fd = open("/cgroup/foo", O_RDONLY); fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP); close(cgroup_fd); Signed-off-by: Stephane Eranian [ added perf_cgroup_{exit,attach} ] Signed-off-by: Peter Zijlstra LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com> Signed-off-by: Ingo Molnar --- init/Kconfig | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index be788c0957d4..20d6bd919b8d 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -683,6 +683,16 @@ config CGROUP_MEM_RES_CTLR_SWAP_ENABLED select this option (if, for some reason, they need to disable it then noswapaccount does the trick). +config CGROUP_PERF + bool "Enable perf_event per-cpu per-container group (cgroup) monitoring" + depends on PERF_EVENTS && CGROUPS + help + This option extends the per-cpu mode to restrict monitoring to + threads which belong to the cgroup specificied and run on the + designated cpu. + + Say N if unsure. + menuconfig CGROUP_SCHED bool "Group CPU scheduler" depends on EXPERIMENTAL -- cgit v1.2.1 From 2d0f25201ee210a0666ec9c41538ba05a07f8bc6 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Thu, 3 Mar 2011 14:26:20 +0800 Subject: perf cgroup: Fix a typo in kernel config s/specificied/specified Signed-off-by: Li Zefan Acked-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <4D6F348C.2050804@cn.fujitsu.com> Signed-off-by: Ingo Molnar --- init/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 20d6bd919b8d..4c4edf2ec4a9 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -688,7 +688,7 @@ config CGROUP_PERF depends on PERF_EVENTS && CGROUPS help This option extends the per-cpu mode to restrict monitoring to - threads which belong to the cgroup specificied and run on the + threads which belong to the cgroup specified and run on the designated cpu. Say N if unsure. -- cgit v1.2.1 From 4ba8216cd90560bc402f52076f64d8546e8aefcb Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 25 Jan 2011 22:52:22 +0100 Subject: BKL: That's all, folks This removes the implementation of the big kernel lock, at last. A lot of people have worked on this in the past, I so the credit for this patch should be with everyone who participated in the hunt. The names on the Cc list are the people that were the most active in this, according to the recorded git history, in alphabetical order. Signed-off-by: Arnd Bergmann Acked-by: Alan Cox Cc: Alessio Igor Bogani Cc: Al Viro Cc: Andrew Hendry Cc: Andrew Morton Cc: Christoph Hellwig Cc: Eric W. Biederman Cc: Frederic Weisbecker Cc: Hans Verkuil Acked-by: Ingo Molnar Cc: Jan Blunck Cc: John Kacur Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Oliver Neukum Cc: Paul Menage Acked-by: Thomas Gleixner Cc: Trond Myklebust --- init/Kconfig | 5 ----- 1 file changed, 5 deletions(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index be788c0957d4..a88d1c919a4d 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -69,11 +69,6 @@ config BROKEN_ON_SMP depends on BROKEN || !SMP default y -config LOCK_KERNEL - bool - depends on (SMP || PREEMPT) && BKL - default y - config INIT_ENV_ARG_LIMIT int default 32 if !UML -- cgit v1.2.1 From 990d6c2d7aee921e3bce22b2d6a750fd552262be Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Sat, 29 Jan 2011 18:43:26 +0530 Subject: vfs: Add name to file handle conversion support The syscall also return mount id which can be used to lookup file system specific information such as uuid in /proc//mountinfo Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro --- init/Kconfig | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index be788c0957d4..e72fa17fe559 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -287,6 +287,18 @@ config BSD_PROCESS_ACCT_V3 for processing it. A preliminary version of these tools is available at . +config FHANDLE + bool "open by fhandle syscalls" + select EXPORTFS + help + If you say Y here, a user level program will be able to map + file names to handle and then later use the handle for + different file system operations. This is useful in implementing + userspace file servers, which now track files using handles instead + of names. The handle would remain the same even if file names + get renamed. Enables open_by_handle_at(2) and name_to_handle_at(2) + syscalls. + config TASKSTATS bool "Export task/process statistics through netlink (EXPERIMENTAL)" depends on NET -- cgit v1.2.1 From 80cdc6dae76ea67d2b21bdca8df17ef47251eb8b Mon Sep 17 00:00:00 2001 From: Mandeep Singh Baines Date: Tue, 22 Mar 2011 16:33:54 -0700 Subject: fs: use appropriate printk priority levels printk()s without a priority level default to KERN_WARNING. To reduce noise at KERN_WARNING, this patch set the priority level appriopriately for unleveled printks()s. This should be useful to folks that look at dmesg warnings closely. Signed-off-by: Mandeep Singh Baines Cc: Jens Axboe Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/do_mounts.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'init') diff --git a/init/do_mounts.c b/init/do_mounts.c index 2b54bef33b55..3e0112157795 100644 --- a/init/do_mounts.c +++ b/init/do_mounts.c @@ -293,7 +293,8 @@ static int __init do_mount_root(char *name, char *fs, int flags, void *data) sys_chdir((const char __user __force *)"/root"); ROOT_DEV = current->fs->pwd.mnt->mnt_sb->s_dev; - printk("VFS: Mounted root (%s filesystem)%s on device %u:%u.\n", + printk(KERN_INFO + "VFS: Mounted root (%s filesystem)%s on device %u:%u.\n", current->fs->pwd.mnt->mnt_sb->s_type->name, current->fs->pwd.mnt->mnt_sb->s_flags & MS_RDONLY ? " readonly" : "", MAJOR(ROOT_DEV), MINOR(ROOT_DEV)); -- cgit v1.2.1 From 34db18a054c600b6f81787165669dc572fe4de25 Mon Sep 17 00:00:00 2001 From: Amerigo Wang Date: Tue, 22 Mar 2011 16:34:06 -0700 Subject: smp: move smp setup functions to kernel/smp.c Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter setup functions from init/main.c to kenrel/smp.c, saves some #ifdef CONFIG_SMP. Signed-off-by: WANG Cong Cc: Rakib Mullick Cc: David Howells Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Tejun Heo Cc: Arnd Bergmann Cc: Akinobu Mita Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/main.c | 90 +------------------------------------------------------------ 1 file changed, 1 insertion(+), 89 deletions(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index 33c37c379e96..3627bb37225c 100644 --- a/init/main.c +++ b/init/main.c @@ -129,63 +129,6 @@ static char *static_command_line; static char *execute_command; static char *ramdisk_execute_command; -#ifdef CONFIG_SMP -/* Setup configured maximum number of CPUs to activate */ -unsigned int setup_max_cpus = NR_CPUS; -EXPORT_SYMBOL(setup_max_cpus); - - -/* - * Setup routine for controlling SMP activation - * - * Command-line option of "nosmp" or "maxcpus=0" will disable SMP - * activation entirely (the MPS table probe still happens, though). - * - * Command-line option of "maxcpus=", where is an integer - * greater than 0, limits the maximum number of CPUs activated in - * SMP mode to . - */ - -void __weak arch_disable_smp_support(void) { } - -static int __init nosmp(char *str) -{ - setup_max_cpus = 0; - arch_disable_smp_support(); - - return 0; -} - -early_param("nosmp", nosmp); - -/* this is hard limit */ -static int __init nrcpus(char *str) -{ - int nr_cpus; - - get_option(&str, &nr_cpus); - if (nr_cpus > 0 && nr_cpus < nr_cpu_ids) - nr_cpu_ids = nr_cpus; - - return 0; -} - -early_param("nr_cpus", nrcpus); - -static int __init maxcpus(char *str) -{ - get_option(&str, &setup_max_cpus); - if (setup_max_cpus == 0) - arch_disable_smp_support(); - - return 0; -} - -early_param("maxcpus", maxcpus); -#else -static const unsigned int setup_max_cpus = NR_CPUS; -#endif - /* * If set, this is an indication to the drivers that reset the underlying * device before going ahead with the initialization otherwise driver might @@ -362,7 +305,7 @@ static int __init rdinit_setup(char *str) __setup("rdinit=", rdinit_setup); #ifndef CONFIG_SMP - +static const unsigned int setup_max_cpus = NR_CPUS; #ifdef CONFIG_X86_LOCAL_APIC static void __init smp_init(void) { @@ -374,37 +317,6 @@ static void __init smp_init(void) static inline void setup_nr_cpu_ids(void) { } static inline void smp_prepare_cpus(unsigned int maxcpus) { } - -#else - -/* Setup number of possible processor ids */ -int nr_cpu_ids __read_mostly = NR_CPUS; -EXPORT_SYMBOL(nr_cpu_ids); - -/* An arch may set nr_cpu_ids earlier if needed, so this would be redundant */ -static void __init setup_nr_cpu_ids(void) -{ - nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1; -} - -/* Called by boot processor to activate the rest. */ -static void __init smp_init(void) -{ - unsigned int cpu; - - /* FIXME: This should be done in userspace --RR */ - for_each_present_cpu(cpu) { - if (num_online_cpus() >= setup_max_cpus) - break; - if (!cpu_online(cpu)) - cpu_up(cpu); - } - - /* Any cleanup work */ - printk(KERN_INFO "Brought up %ld CPUs\n", (long)num_online_cpus()); - smp_cpus_done(setup_max_cpus); -} - #endif /* -- cgit v1.2.1 From 71c696b1d0310da3ab8033d743282959bd49d28b Mon Sep 17 00:00:00 2001 From: Phil Carmody Date: Tue, 22 Mar 2011 16:34:12 -0700 Subject: calibrate: extract fall-back calculation into own helper The motivation for this patch series is that currently our OMAP calibrates itself using the trial-and-error binary chop fallback that some other architectures no longer need to perform. This is a lengthy process, taking 0.2s in an environment where boot time is of great interest. Patch 2/4 has two optimisations. Firstly, it replaces the initial repeated- doubling to find the relevant power of 2 with a tight loop that just does as much as it can in a jiffy. Secondly, it doesn't binary chop over an entire power of 2 range, it choses a much smaller range based on how much it squeezed in, and failed to squeeze in, during the first stage. Both are significant optimisations, and bring our calibration down from 23 jiffies to 5, and, in the process, often arrive at a more accurate lpj value. The 'bands' and 'sub-logarithmic' growth may look over-engineered, but they only cost a small level of inaccuracy in the initial guess (for all architectures) in order to avoid the very large inaccuracies that appeared during testing (on x86_64 architectures, and presumably others with less metronomic operation). Note that due to the existence of the TSC and other timers, the x86_64 will not typically use this fallback routine, but I wanted to code defensively, able to cope with all kinds of processor behaviours and kernel command line options. Patch 3/4 is an additional trap for the nightmare scenario where the initial estimate is very inaccurate, possibly due to things like SMIs. It simply retries with a larger bound. Stephen said: I tried this patch set out on an MSM7630. : : Before: : : Calibrating delay loop... 681.57 BogoMIPS (lpj=3407872) : : After: : : Calibrating delay loop... 680.75 BogoMIPS (lpj=3403776) : : But the really good news is calibration time dropped from ~247ms to ~56ms. : Sadly we won't be able to benefit from this should my udelay patches make : it into ARM because we would be using calibrate_delay_direct() instead (at : least on machines who choose to). Can we somehow reapply the logic behind : this to calibrate_delay_direct()? That would be even better, but this is : definitely a boot time improvement. : : Or maybe we could just replace calibrate_delay_direct() with this fallback : calculation? If __delay() is a thin wrapper around read_current_timer() : it should work just as well (plus patch 3 makes it handle SMIs). I'll try : that out. This patch: ... so that it can be modified more clinically. This is almost entirely cosmetic. The only change to the operation is that the global variable is only set once after the estimation is completed, rather than taking on all the intermediate values. However, there are no readers of that variable, so this change is unimportant. Signed-off-by: Phil Carmody Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Tested-by: Stephen Boyd Cc: Greg KH Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/calibrate.c | 73 +++++++++++++++++++++++++++++++------------------------- 1 file changed, 40 insertions(+), 33 deletions(-) (limited to 'init') diff --git a/init/calibrate.c b/init/calibrate.c index 24fe022c55f9..b71643a7acae 100644 --- a/init/calibrate.c +++ b/init/calibrate.c @@ -119,10 +119,47 @@ static unsigned long __cpuinit calibrate_delay_direct(void) {return 0;} */ #define LPS_PREC 8 -void __cpuinit calibrate_delay(void) +static unsigned long __cpuinit calibrate_delay_converge(void) { - unsigned long ticks, loopbit; + unsigned long lpj, ticks, loopbit; int lps_precision = LPS_PREC; + + lpj = (1<<12); + while ((lpj <<= 1) != 0) { + /* wait for "start of" clock tick */ + ticks = jiffies; + while (ticks == jiffies) + /* nothing */; + /* Go .. */ + ticks = jiffies; + __delay(lpj); + ticks = jiffies - ticks; + if (ticks) + break; + } + + /* + * Do a binary approximation to get lpj set to + * equal one clock (up to lps_precision bits) + */ + lpj >>= 1; + loopbit = lpj; + while (lps_precision-- && (loopbit >>= 1)) { + lpj |= loopbit; + ticks = jiffies; + while (ticks == jiffies) + /* nothing */; + ticks = jiffies; + __delay(lpj); + if (jiffies != ticks) /* longer than 1 tick */ + lpj &= ~loopbit; + } + + return lpj; +} + +void __cpuinit calibrate_delay(void) +{ static bool printed; if (preset_lpj) { @@ -139,39 +176,9 @@ void __cpuinit calibrate_delay(void) pr_info("Calibrating delay using timer " "specific routine.. "); } else { - loops_per_jiffy = (1<<12); - if (!printed) pr_info("Calibrating delay loop... "); - while ((loops_per_jiffy <<= 1) != 0) { - /* wait for "start of" clock tick */ - ticks = jiffies; - while (ticks == jiffies) - /* nothing */; - /* Go .. */ - ticks = jiffies; - __delay(loops_per_jiffy); - ticks = jiffies - ticks; - if (ticks) - break; - } - - /* - * Do a binary approximation to get loops_per_jiffy set to - * equal one clock (up to lps_precision bits) - */ - loops_per_jiffy >>= 1; - loopbit = loops_per_jiffy; - while (lps_precision-- && (loopbit >>= 1)) { - loops_per_jiffy |= loopbit; - ticks = jiffies; - while (ticks == jiffies) - /* nothing */; - ticks = jiffies; - __delay(loops_per_jiffy); - if (jiffies != ticks) /* longer than 1 tick */ - loops_per_jiffy &= ~loopbit; - } + loops_per_jiffy = calibrate_delay_converge(); } if (!printed) pr_cont("%lu.%02lu BogoMIPS (lpj=%lu)\n", -- cgit v1.2.1 From 191e56880a6a638ce931859317f37deb084b6433 Mon Sep 17 00:00:00 2001 From: Phil Carmody Date: Tue, 22 Mar 2011 16:34:13 -0700 Subject: calibrate: home in on correct lpj value more quickly Binary chop with a jiffy-resync on each step to find an upper bound is slow, so just race in a tight-ish loop to find an underestimate. If done with lots of individual steps, sometimes several hundreds of iterations would be required, which would impose a significant overhead, and make the initial estimate very low. By taking slowly increasing steps there will be less overhead. E.g. an x86_64 2.67GHz could have fitted in 613 individual small delays, but in reality should have been able to fit in a single delay 644 times longer, so underestimated by 31 steps. To reach the equivalent of 644 small delays with the accelerating scheme now requires about 130 iterations, so has <1/4th of the overhead, and can therefore be expected to underestimate by only 7 steps. As now we have a better initial estimate we can binary chop over a smaller range. With the loop overhead in the initial estimate kept low, and the step sizes moderate, we won't have under-estimated by much, so chose as tight a range as we can. Signed-off-by: Phil Carmody Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Tested-by: Stephen Boyd Cc: Greg KH Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/calibrate.c | 57 +++++++++++++++++++++++++++++++++----------------------- 1 file changed, 34 insertions(+), 23 deletions(-) (limited to 'init') diff --git a/init/calibrate.c b/init/calibrate.c index b71643a7acae..f9000dfbe227 100644 --- a/init/calibrate.c +++ b/init/calibrate.c @@ -110,8 +110,8 @@ static unsigned long __cpuinit calibrate_delay_direct(void) {return 0;} /* * This is the number of bits of precision for the loops_per_jiffy. Each - * bit takes on average 1.5/HZ seconds. This (like the original) is a little - * better than 1% + * time we refine our estimate after the first takes 1.5/HZ seconds, so try + * to start with a good estimate. * For the boot cpu we can skip the delay calibration and assign it a value * calculated based on the timer frequency. * For the rest of the CPUs we cannot assume that the timer frequency is same as @@ -121,38 +121,49 @@ static unsigned long __cpuinit calibrate_delay_direct(void) {return 0;} static unsigned long __cpuinit calibrate_delay_converge(void) { - unsigned long lpj, ticks, loopbit; - int lps_precision = LPS_PREC; + /* First stage - slowly accelerate to find initial bounds */ + unsigned long lpj, ticks, loopadd, chop_limit; + int trials = 0, band = 0, trial_in_band = 0; lpj = (1<<12); - while ((lpj <<= 1) != 0) { - /* wait for "start of" clock tick */ - ticks = jiffies; - while (ticks == jiffies) - /* nothing */; - /* Go .. */ - ticks = jiffies; - __delay(lpj); - ticks = jiffies - ticks; - if (ticks) - break; - } + + /* wait for "start of" clock tick */ + ticks = jiffies; + while (ticks == jiffies) + ; /* nothing */ + /* Go .. */ + ticks = jiffies; + do { + if (++trial_in_band == (1<> (LPS_PREC + 1); /* * Do a binary approximation to get lpj set to - * equal one clock (up to lps_precision bits) + * equal one clock (up to LPS_PREC bits) */ - lpj >>= 1; - loopbit = lpj; - while (lps_precision-- && (loopbit >>= 1)) { - lpj |= loopbit; + while (loopadd > chop_limit) { + lpj += loopadd; ticks = jiffies; while (ticks == jiffies) - /* nothing */; + ; /* nothing */ ticks = jiffies; __delay(lpj); if (jiffies != ticks) /* longer than 1 tick */ - lpj &= ~loopbit; + lpj -= loopadd; + loopadd >>= 1; } return lpj; -- cgit v1.2.1 From b1b5f65e53af770ede22c113e249de2f6fa53706 Mon Sep 17 00:00:00 2001 From: Phil Carmody Date: Tue, 22 Mar 2011 16:34:15 -0700 Subject: calibrate: retry with wider bounds when converge seems to fail Systems with unmaskable interrupts such as SMIs may massively underestimate loops_per_jiffy, and fail to converge anywhere near the real value. A case seen on x86_64 was an initial estimate of 256<<12, which converged to 511<<12 where the real value should have been over 630<<12. This admitedly requires bypassing the TSC calibration (lpj_fine), and a failure to settle in the direct calibration too, but is physically possible. This failure does not depend on my previous calibration optimisation, but by luck is easy to fix with the optimisation in place with a trivial retry loop. In the context of the optimised converging method, as we can no longer trust the starting estimate, enlarge the search bounds exponentially so that the number of retries is logarithmically bounded. [akpm@linux-foundation.org: mention x86_64 SMIs in comment] Signed-off-by: Phil Carmody Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Tested-by: Stephen Boyd Cc: Greg KH Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/calibrate.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) (limited to 'init') diff --git a/init/calibrate.c b/init/calibrate.c index f9000dfbe227..76ac9194cbc4 100644 --- a/init/calibrate.c +++ b/init/calibrate.c @@ -122,7 +122,7 @@ static unsigned long __cpuinit calibrate_delay_direct(void) {return 0;} static unsigned long __cpuinit calibrate_delay_converge(void) { /* First stage - slowly accelerate to find initial bounds */ - unsigned long lpj, ticks, loopadd, chop_limit; + unsigned long lpj, lpj_base, ticks, loopadd, loopadd_base, chop_limit; int trials = 0, band = 0, trial_in_band = 0; lpj = (1<<12); @@ -146,14 +146,18 @@ static unsigned long __cpuinit calibrate_delay_converge(void) * the largest likely undershoot. This defines our chop bounds. */ trials -= band; - loopadd = lpj * band; - lpj *= trials; - chop_limit = lpj >> (LPS_PREC + 1); + loopadd_base = lpj * band; + lpj_base = lpj * trials; + +recalibrate: + lpj = lpj_base; + loopadd = loopadd_base; /* * Do a binary approximation to get lpj set to * equal one clock (up to LPS_PREC bits) */ + chop_limit = lpj >> LPS_PREC; while (loopadd > chop_limit) { lpj += loopadd; ticks = jiffies; @@ -165,6 +169,16 @@ static unsigned long __cpuinit calibrate_delay_converge(void) lpj -= loopadd; loopadd >>= 1; } + /* + * If we incremented every single time possible, presume we've + * massively underestimated initially, and retry with a higher + * start, and larger range. (Only seen on x86_64, due to SMIs) + */ + if (lpj + loopadd * 2 == lpj_base + loopadd_base * 2) { + lpj_base = lpj; + loopadd_base <<= 2; + goto recalibrate; + } return lpj; } -- cgit v1.2.1 From ea611b2699b51a762ef03f805f9616e65d98f68e Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Tue, 22 Mar 2011 16:34:49 -0700 Subject: init: return proper error code in do_mounts_rd() In do_mounts_rd() if memory cannot be allocated, return -ENOMEM. Signed-off-by: Davidlohr Bueso Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/do_mounts_rd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c index 6e1ee6987c78..fe9acb0ae480 100644 --- a/init/do_mounts_rd.c +++ b/init/do_mounts_rd.c @@ -64,7 +64,7 @@ identify_ramdisk_image(int fd, int start_block, decompress_fn *decompressor) buf = kmalloc(size, GFP_KERNEL); if (!buf) - return -1; + return -ENOMEM; minixsb = (struct minix_super_block *) buf; ext2sb = (struct ext2_super_block *) buf; -- cgit v1.2.1 From 45a68628d37222e655219febce9e91b6484789b2 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 23 Mar 2011 16:43:12 -0700 Subject: pid: remove the child_reaper special case in init/main.c This patchset is a cleanup and a preparation to unshare the pid namespace. These prerequisites prepare for Eric's patchset to give a file descriptor to a namespace and join an existing namespace. This patch: It turns out that the existing assignment in copy_process of the child_reaper can handle the initial assignment of child_reaper we just need to generalize the test in kernel/fork.c Signed-off-by: Eric W. Biederman Signed-off-by: Daniel Lezcano Cc: Oleg Nesterov Cc: Alexey Dobriyan Acked-by: Serge E. Hallyn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/main.c | 9 --------- 1 file changed, 9 deletions(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index 3627bb37225c..4a9479ef4540 100644 --- a/init/main.c +++ b/init/main.c @@ -787,15 +787,6 @@ static int __init kernel_init(void * unused) * init can run on any cpu. */ set_cpus_allowed_ptr(current, cpu_all_mask); - /* - * Tell the world that we're going to be the grim - * reaper of innocent orphaned children. - * - * We don't want people to have to make incorrect - * assumptions about where in the task array this - * can be found. - */ - init_pid_ns.child_reaper = current; cad_pid = task_pid(current); -- cgit v1.2.1 From 59607db367c57f515183cb203642291bb14d9c40 Mon Sep 17 00:00:00 2001 From: "Serge E. Hallyn" Date: Wed, 23 Mar 2011 16:43:16 -0700 Subject: userns: add a user_namespace as creator/owner of uts_namespace The expected course of development for user namespaces targeted capabilities is laid out at https://wiki.ubuntu.com/UserNamespace. Goals: - Make it safe for an unprivileged user to unshare namespaces. They will be privileged with respect to the new namespace, but this should only include resources which the unprivileged user already owns. - Provide separate limits and accounting for userids in different namespaces. Status: Currently (as of 2.6.38) you can clone with the CLONE_NEWUSER flag to get a new user namespace if you have the CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID capabilities. What this gets you is a whole new set of userids, meaning that user 500 will have a different 'struct user' in your namespace than in other namespaces. So any accounting information stored in struct user will be unique to your namespace. However, throughout the kernel there are checks which - simply check for a capability. Since root in a child namespace has all capabilities, this means that a child namespace is not constrained. - simply compare uid1 == uid2. Since these are the integer uids, uid 500 in namespace 1 will be said to be equal to uid 500 in namespace 2. As a result, the lxc implementation at lxc.sf.net does not use user namespaces. This is actually helpful because it leaves us free to develop user namespaces in such a way that, for some time, user namespaces may be unuseful. Bugs aside, this patchset is supposed to not at all affect systems which are not actively using user namespaces, and only restrict what tasks in child user namespace can do. They begin to limit privilege to a user namespace, so that root in a container cannot kill or ptrace tasks in the parent user namespace, and can only get world access rights to files. Since all files currently belong to the initila user namespace, that means that child user namespaces can only get world access rights to *all* files. While this temporarily makes user namespaces bad for system containers, it starts to get useful for some sandboxing. I've run the 'runltplite.sh' with and without this patchset and found no difference. This patch: copy_process() handles CLONE_NEWUSER before the rest of the namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS) the new uts namespace will have the new user namespace as its owner. That is what we want, since we want root in that new userns to be able to have privilege over it. Changelog: Feb 15: don't set uts_ns->user_ns if we didn't create a new uts_ns. Feb 23: Move extern init_user_ns declaration from init/version.c to utsname.h. Signed-off-by: Serge E. Hallyn Acked-by: "Eric W. Biederman" Acked-by: Daniel Lezcano Acked-by: David Howells Cc: James Morris Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/version.c | 1 + 1 file changed, 1 insertion(+) (limited to 'init') diff --git a/init/version.c b/init/version.c index adff586401a5..86fe0ccb997a 100644 --- a/init/version.c +++ b/init/version.c @@ -33,6 +33,7 @@ struct uts_namespace init_uts_ns = { .machine = UTS_MACHINE, .domainname = UTS_DOMAINNAME, }, + .user_ns = &init_user_ns, }; EXPORT_SYMBOL_GPL(init_uts_ns); -- cgit v1.2.1