summaryrefslogtreecommitdiffstats
path: root/kernel/sched.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2009-09-111-397/+702
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits) sched: Fix sched::sched_stat_wait tracepoint field sched: Disable NEW_FAIR_SLEEPERS for now sched: Keep kthreads at default priority sched: Re-tune the scheduler latency defaults to decrease worst-case latencies sched: Turn off child_runs_first sched: Ensure that a child can't gain time over it's parent after fork() sched: enable SD_WAKE_IDLE sched: Deal with low-load in wake_affine() sched: Remove short cut from select_task_rq_fair() sched: Turn on SD_BALANCE_NEWIDLE sched: Clean up topology.h sched: Fix dynamic power-balancing crash sched: Remove reciprocal for cpu_power sched: Try to deal with low capacity, fix update_sd_power_savings_stats() sched: Try to deal with low capacity sched: Scale down cpu_power due to RT tasks sched: Implement dynamic cpu_power sched: Add smt_gain sched: Update the cpu_power sum during load-balance sched: Add SD_PREFER_SIBLING ...
| * sched: Fix dynamic power-balancing crashIngo Molnar2009-09-041-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This crash: [ 1774.088275] divide error: 0000 [#1] SMP [ 1774.100355] CPU 13 [ 1774.102498] Modules linked in: [ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN [ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4 Triggers because update_group_power() modifies the sd tree and does temporary calculations there - not considering that other CPUs could observe intermediate values, such as the zero initial value. Calculate it in a temporary variable instead. (we need no memory barrier as these are all statistical values anyway) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090904092742.GA11014@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Remove reciprocal for cpu_powerPeter Zijlstra2009-09-041-67/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Its a source of fail, also, now that cpu_power is dynamical, its a waste of time. before: <idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1 after: bash-1689 [001] 137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1 [ v2: build fix from From: Andreas Herrmann ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.425896304@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Try to deal with low capacity, fix update_sd_power_savings_stats()Gautham R Shenoy2009-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sgs.group_capacity can now be 0, if for some reason group->__cpu_power happens to be less than SCHED_LOAD_SCALE/2. In that case, we need the following fix to make it work for update_sd_power_savings_stats(). That's because both sum_nr_running and group_capacity are unsigned longs. Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Try to deal with low capacityPeter Zijlstra2009-09-041-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When the capacity drops low, we want to migrate load away. Allow the load-balancer to remove all tasks when we hit rock bottom. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.342231003@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Scale down cpu_power due to RT tasksPeter Zijlstra2009-09-041-3/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | Keep an average on the amount of time spend on RT tasks and use that fraction to scale down the cpu_power for regular tasks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.287778431@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Implement dynamic cpu_powerPeter Zijlstra2009-09-041-3/+35
| | | | | | | | | | | | | | | | | | | | | | | | Recompute the cpu_power for each cpu during load-balance. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.162033479@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Add smt_gainPeter Zijlstra2009-09-041-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is that multi-threading a core yields more work capacity than a single thread, provide a way to express a static gain for threads. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.073345955@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Update the cpu_power sum during load-balancePeter Zijlstra2009-09-041-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | In order to prepare for a more dynamic cpu_power, update the group sum while walking the sched domains during load-balance. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083825.985050292@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Add SD_PREFER_SIBLINGPeter Zijlstra2009-09-041-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | Do the placement thing using SD flags. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083825.897028974@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: Restore __cpu_power to a straight sum of powerPeter Zijlstra2009-09-041-16/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_power is supposed to be a representation of the process capacity of the cpu, not a value to randomly tweak in order to affect placement. Remove the placement hacks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083825.810860576@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| *-. Merge branches 'sched/domains' and 'sched/clock' into sched/coreIngo Molnar2009-09-041-245/+320
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: both topics are ready now, and we want to merge dependent changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Consolidate definition of variable sd in __build_sched_domainsAndreas Herrmann2009-08-181-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818110229.GM29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out build of NUMA sched groups from __build_sched_domainsAndreas Herrmann2009-08-181-63/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to further strip down __build_sched_domains(). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818110111.GL29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out build of ALLNODES sched groups from __build_sched_domainsAndreas Herrmann2009-08-181-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the sake of completeness. Now all calls to init_sched_build_groups() are contained in build_sched_groups(). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818110013.GK29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out build of CPU sched groups from __build_sched_domainsAndreas Herrmann2009-08-181-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to further strip down __build_sched_domains(). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818105928.GJ29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out build of MC sched groups from __build_sched_domainsAndreas Herrmann2009-08-181-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to further strip down __build_sched_domains(). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818105838.GI29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out build of SMT sched groups from __build_sched_domainsAndreas Herrmann2009-08-181-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to further strip down __build_sched_domains(). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818105751.GH29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out build of SMT sched domain from __build_sched_domainsAndreas Herrmann2009-08-181-13/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to further strip down __build_sched_domains(). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818105703.GG29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out build of MC sched domain from __build_sched_domainsAndreas Herrmann2009-08-181-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to further strip down __build_sched_domains(). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818105614.GF29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out build of CPU sched domain from __build_sched_domainsAndreas Herrmann2009-08-181-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to further strip down __build_sched_domains(). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818105455.GE29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out build of NUMA sched domain from __build_sched_domainsAndreas Herrmann2009-08-181-25/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... to further strip down __build_sched_domains(). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818105406.GD29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Separate out allocation/free/goto-hell from __build_sched_domainsAndreas Herrmann2009-08-181-72/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818105300.GC29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched: Use structure to store local data in __build_sched_domainsAndreas Herrmann2009-08-181-76/+89
| | |/ | | | | | | | | | | | | | | | | | | Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090818105152.GB29515@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Provide iowait countersArjan van de Ven2009-09-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For counting how long an application has been waiting for (disk) IO, there currently is only the HZ sample driven information available, while for all other counters in this class, a high resolution version is available via CONFIG_SCHEDSTATS. In order to make an improved bootchart tool possible, we also need a higher resolution version of the iowait time. This patch below adds this scheduler statistic to the kernel. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4A64B813.1080506@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Rename init_cfs_rq => init_tg_cfs_rqAnirban Sinha2009-08-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | ... so that it does not share a common name with a function within the same scope. Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com> LKML-Reference: <DDFD17CC94A9BD49A82147DDF7D545C501EA98A6@exchange.ZeugmaSystems.local> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Fix division by zero - reallyPeter Zijlstra2009-08-281-21/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When re-computing the shares for each task group's cpu representation we need the ratio of weight on each cpu vs the total weight of the sched domain. Since load-balancing is loosely (read not) synchronized, the weight of individual cpus can change between doing the sum and calculating the ratio. The previous patch dealt with only one of the race scenarios, this patch side steps them all by saving a snapshot of all the individual cpu weights, thereby always working on a consistent set. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: torvalds@linux-foundation.org Cc: jes@sgi.com Cc: jens.axboe@oracle.com Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1251371336.18584.77.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Avoid division by zeroPeter Zijlstra2009-08-211-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch a5004278f0525dcb9aa43703ef77bf371ea837cd (sched: Fix cgroup smp fairness) introduced the possibility of a divide-by-zero because load-balancing is not synchronized between sched_domains. This can cause the state of cpus to change between the first and second loop over the sched domain in tg_shares_up(). Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jes Sorensen <jes@sgi.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1250855934.7538.30.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Use for_each_class macro in move_one_task()Hiroshi Shimamoto2009-08-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Replace for loop with the macro for_each_class to cleanup. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> LKML-Reference: <4A8A277D.4090304@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Ensure the migration task doesn't go away during usePeter Zijlstra2009-08-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Like sched_migrate_task(), set_cpus_allowed_ptr() should hold onto the migration thread too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Fully integrate cpus_active_map and root-domain codeGregory Haskins2009-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reflect "active" cpus in the rq->rd->online field, instead of the online_map. The motivation is that things that use the root-domain code (such as cpupri) only care about cpus classified as "active" anyway. By synchronizing the root-domain state with the active map, we allow several optimizations. For instance, we can remove an extra cpumask_and from the scheduler hotpath by utilizing rq->rd->online (since it is now a cached version of cpu_active_map & rq->rd->span). Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Enhance the pre/post scheduling logicGregory Haskins2009-08-021-32/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently have an explicit "needs_post" vtable method which returns a stack variable for whether we should later run post-schedule. This leads to an awkward exchange of the variable as it bubbles back up out of the context switch. Peter Zijlstra observed that this information could be stored in the run-queue itself instead of handled on the stack. Therefore, we revert to the method of having context_switch return void, and update an internal rq->post_schedule variable when we require further processing. In addition, we fix a race condition where we try to access current->sched_class without holding the rq->lock. This is technically racy, as the sched-class could change out from under us. Instead, we reference the per-rq post_schedule variable with the runqueue unlocked, but with preemption disabled to see if we need to reacquire the rq->lock. Finally, we clean the code up slightly by removing the #ifdef CONFIG_SMP conditionals from the schedule() call, and implement some inline helper functions instead. This patch passes checkpatch, and rt-migrate. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Check for pushing rt tasks after all schedulingSteven Rostedt2009-08-021-11/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current method for pushing RT tasks after scheduling only happens after a context switch. But we found cases where a task is set up on a run queue to be pushed but the push never happens because the schedule chooses the same task. This bug was found with the help of Gregory Haskins and the use of ftrace (trace_printk). It tooks several days for both of us analyzing the code and the trace output to find this. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729042526.205923666@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Optimize unused cgroup configurationPeter Zijlstra2009-08-021-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | When cgroup group scheduling is built in, skip some code paths if we don't have any (but the root) cgroups configured. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Fix cgroup smp fairnessPeter Zijlstra2009-08-021-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ec4e0e2fe018992d980910db901637c814575914 ("fix inconsistency when redistribute per-cpu tg->cfs_rq shares") broke cgroup smp fairness. In order to avoid starvation of newly placed tasks, we never quite set the share of an empty cpu group-task to 0, but instead we set it as if there's a single NICE-0 task present. If however we actually set this in cfs_rq[cpu]->shares, that means the total shares for that group will be slightly inflated every time we balance, causing the observed unfairness. Fix this by setting cfs_rq[cpu]->shares to 0 but actually setting the effective weight of the related se to the inflated number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248696557.6987.1615.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | Merge branch 'sched/urgent' into sched/coreIngo Molnar2009-08-021-2/+2
| |\ \ | | |/ | | | | | | | | | | | | Merge reason: avoid upcoming patch conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Fix return value of migration_init()Thomas Gleixner2009-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | migration_init() returns the return value of the hotplug notifier. In the success case this is NOTIFY_OK which is 1. initcall_debug evaluates that as an error code because init calls are expected to return 0 on success. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | sched: Pull up the might_sleep() check into cond_resched()Frederic Weisbecker2009-07-181-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | might_sleep() is called late-ish in cond_resched(), after the need_resched()/preempt enabled/system running tests are checked. It's better to check the sleeps while atomic earlier and not depend on some environment datas that reduce the chances to detect a problem. Also define cond_resched_*() helpers as macros, so that the FILE/LINE reported in the sleeping while atomic warning displays the real origin and not sched.h Changes in v2: - Call __might_sleep() directly instead of might_sleep() which may call cond_resched() - Turn cond_resched() into a macro so that the file:line couple reported refers to the caller of cond_resched() and not __cond_resched() itself. Changes in v3: - Also propagate this __might_sleep() pull up to cond_resched_lock() and cond_resched_softirq() Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-6-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Add a preempt count base offset to __might_sleep()Frederic Weisbecker2009-07-181-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a preempt count base offset to compare against the current preempt level count. It prepares to pull up the might_sleep check from cond_resched() to cond_resched_lock() and cond_resched_bh(). For these two helpers, we need to respectively ensure that once we'll unlock the given spinlock / reenable local softirqs, we will reach a sleepable state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> [ Move and rename preempt_count_equals() ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Cover the CONFIG_DEBUG_SPINLOCK_SLEEP off-case for __might_sleep()Frederic Weisbecker2009-07-181-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cover the off case for __might_sleep(), so that we avoid #ifdefs in files that make use of it. Especially, this prepares for the __might_sleep() pull up on cond_resched(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Remove obsolete comment in __cond_resched()Frederic Weisbecker2009-07-181-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the outdated comment from __cond_resched() related to the now removed Big Kernel Semaphore. Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: Drop the need_resched() loop from cond_resched()Frederic Weisbecker2009-07-181-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The schedule() function is a loop that reschedules the current task while the TIF_NEED_RESCHED flag is set: void schedule(void) { need_resched: /* schedule code */ if (need_resched()) goto need_resched; } And cond_resched() repeat this loop: do { add_preempt_count(PREEMPT_ACTIVE); schedule(); sub_preempt_count(PREEMPT_ACTIVE); } while(need_resched()); This loop is needless because schedule() already did the check and nothing can set TIF_NEED_RESCHED between schedule() exit and the loop check in need_resched(). Then remove this needless loop. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1247725694-6082-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | Merge branch 'linus' into sched/coreIngo Molnar2009-07-181-15/+42
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: branch had an old upstream base (-rc1-ish), but also merge to avoid a conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched: Hide runqueues from direct reference at source code level for ↵Hitoshi Mitake2009-06-291-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __raw_get_cpu_var() Hide __raw_get_cpu_var() as well - thus all the direct references to runqueues will abstracted out. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> LKML-Reference: <20090629.144457.886429910353660979.mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Merge branch 'linus' into sched/coreIngo Molnar2009-06-291-14/+16
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: we will merge a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | sched: Add SCHED_RESET_ON_FORK functionality for nice < 0 tasksMike Galbraith2009-06-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Lennart Poettering <mzxreary@0pointer.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245228482.27326.1.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | sched: Clean up SCHED_RESET_ON_FORKMike Galbraith2009-06-171-16/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make SCHED_RESET_ON_FORK sched_fork() bits a self-contained unlikely code path. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Lennart Poettering <mzxreary@0pointer.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1245228361.18329.6.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | sched: Introduce SCHED_RESET_ON_FORK scheduling policy flagLennart Poettering2009-06-151-9/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new flag SCHED_RESET_ON_FORK which can be passed to the kernel via sched_setscheduler(), ORed in the policy parameter. If set this will make sure that when the process forks a) the scheduling priority is reset to DEFAULT_PRIO if it was higher and b) the scheduling policy is reset to SCHED_NORMAL if it was either SCHED_FIFO or SCHED_RR. Why have this? Currently, if a process is real-time scheduled this will 'leak' to all its child processes. For security reasons it is often (always?) a good idea to make sure that if a process acquires RT scheduling this is confined to this process and only this process. More specifically this makes the per-process resource limit RLIMIT_RTTIME useful for security purposes, because it makes it impossible to use a fork bomb to circumvent the per-process RLIMIT_RTTIME accounting. This feature is also useful for tools like 'renice' which can then change the nice level of a process without having this spill to all its child processes. Why expose this via sched_setscheduler() and not other syscalls such as prctl() or sched_setparam()? prctl() does not take a pid parameter. Due to that it would be impossible to modify this flag for other processes than the current one. The struct passed to sched_setparam() can unfortunately not be extended without breaking compatibility, since sched_setparam() lacks a size parameter. How to use this from userspace? In your RT program simply replace this: sched_setscheduler(pid, SCHED_FIFO, &param); by this: sched_setscheduler(pid, SCHED_FIFO|SCHED_RESET_ON_FORK, &param); Signed-off-by: Lennart Poettering <lennart@poettering.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090615152714.GA29092@tango.0pointer.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2009-09-111-3/+128
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits) rcu: Move end of special early-boot RCU operation earlier rcu: Changes from reviews: avoid casts, fix/add warnings, improve comments rcu: Create rcutree plugins to handle hotplug CPU for multi-level trees rcu: Remove lockdep annotations from RCU's _notrace() API members rcu: Add #ifdef to suppress __rcu_offline_cpu() warning in !HOTPLUG_CPU builds rcu: Add CPU-offline processing for single-node configurations rcu: Add "notrace" to RCU function headers used by ftrace rcu: Remove CONFIG_PREEMPT_RCU rcu: Merge preemptable-RCU functionality into hierarchical RCU rcu: Simplify rcu_pending()/rcu_check_callbacks() API rcu: Use debugfs_remove_recursive() simplify code. rcu: Merge per-RCU-flavor initialization into pre-existing macro rcu: Fix online/offline indication for rcudata.csv trace file rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.h rcu: Renamings to increase RCU clarity rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMP rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation. rcu: Fix typo in rcu_irq_exit() comment header rcu: Make rcupreempt_trace.c look at offline CPUs ...
| * | | | | rcu: Renamings to increase RCU clarityPaul E. McKenney2009-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make RCU-sched, RCU-bh, and RCU-preempt be underlying implementations, with "RCU" defined in terms of one of the three. Update the outdated rcu_qsctr_inc() names, as these functions no longer increment anything. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <12509746132696-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
OpenPOWER on IntegriCloud