diff options
author | Tejun Heo <tj@kernel.org> | 2014-02-25 10:04:01 -0500 |
---|---|---|
committer | Tejun Heo <tj@kernel.org> | 2014-02-25 10:04:01 -0500 |
commit | b3dc094e93905ae9c1bc0815402ad8e5b203d068 (patch) | |
tree | 6d99ba4737ccbf7ce94f06937db571a7fcb902a4 /include/linux/cgroup.h | |
parent | c75611282cf1bf717c1866e7a7eb4d0743815187 (diff) | |
download | talos-obmc-linux-b3dc094e93905ae9c1bc0815402ad8e5b203d068.tar.gz talos-obmc-linux-b3dc094e93905ae9c1bc0815402ad8e5b203d068.zip |
cgroup: use css_set->mg_tasks to track target tasks during migration
Currently, while migrating tasks from one cgroup to another,
cgroup_attach_task() builds a flex array of all target tasks;
unfortunately, this has a couple issues.
* Flex array has size limit. On 64bit, struct task_and_cgroup is
24bytes making the flex element limit around 87k. It is a high
number but not impossible to hit. This means that the current
cgroup implementation can't migrate a process with more than 87k
threads.
* Process migration involves memory allocation whose size is dependent
on the number of threads the process has. This means that cgroup
core can't guarantee success or failure of multi-process migrations
as memory allocation failure can happen in the middle. This is in
part because cgroup can't grab threadgroup locks of multiple
processes at the same time, so when there are multiple processes to
migrate, it is imposible to tell how many tasks are to be migrated
beforehand.
Note that this already affects cgroup_transfer_tasks(). cgroup
currently cannot guarantee atomic success or failure of the
operation. It may fail in the middle and after such failure cgroup
doesn't have enough information to roll back properly. It just
aborts with some tasks migrated and others not.
To resolve the situation, this patch updates the migration path to use
task->cg_list to track target tasks. The previous patch already added
css_set->mg_tasks and updated iterations in non-migration paths to
include them during task migration. This patch updates migration path
to actually make use of it.
Instead of putting onto a flex_array, each target task is moved from
its css_set->tasks list to css_set->mg_tasks and the migration path
keeps trace of all the source css_sets and the associated cgroups.
Once all source css_sets are determined, the destination css_set for
each is determined, linked to the matching source css_set and put on a
separate list.
To iterate the target tasks, migration path just needs to iterat
through either the source or target css_sets, depending on whether
migration has been committed or not, and the tasks on their ->mg_tasks
lists. cgroup_taskset is updated to contain the list_heads for source
and target css_sets and the iteration cursor. cgroup_taskset_*() are
accordingly updated to walk through css_sets and their ->mg_tasks.
This resolves the above listed issues with moderate additional
complexity.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Diffstat (limited to 'include/linux/cgroup.h')
-rw-r--r-- | include/linux/cgroup.h | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 528e2aed36c3..3a1cb265afd6 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -346,6 +346,22 @@ struct css_set { */ struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT]; + /* + * List of csets participating in the on-going migration either as + * source or destination. Protected by cgroup_mutex. + */ + struct list_head mg_node; + + /* + * If this cset is acting as the source of migration the following + * two fields are set. mg_src_cgrp is the source cgroup of the + * on-going migration and mg_dst_cset is the destination cset the + * target tasks on this cset should be migrated to. Protected by + * cgroup_mutex. + */ + struct cgroup *mg_src_cgrp; + struct css_set *mg_dst_cset; + /* For RCU-protected deletion */ struct rcu_head rcu_head; }; |