summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorMatt Helsley <matthltc@us.ibm.com>2008-10-18 20:27:24 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2008-10-20 08:52:34 -0700
commitbde5ab65581a63e9f4f4bacfae8f201d04d25bed (patch)
treeb159612b2acc416878a1a80a0aea47f7469163d2 /Documentation
parent1aece34833721d64eb33fc15cd923c727296d3d3 (diff)
downloadblackbird-op-linux-bde5ab65581a63e9f4f4bacfae8f201d04d25bed.tar.gz
blackbird-op-linux-bde5ab65581a63e9f4f4bacfae8f201d04d25bed.zip
container freezer: document the cgroup freezer subsystem.
Describe why we need the freezer subsystem and how to use it in a documentation file. Since the cgroups.txt file is focused on the subsystem-agnostic portions of cgroups make a directory and move the old cgroups.txt file at the same time. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: containers@lists.linux-foundation.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/cgroups/cgroups.txt (renamed from Documentation/cgroups.txt)0
-rw-r--r--Documentation/cgroups/freezer-subsystem.txt99
-rw-r--r--Documentation/cpusets.txt2
3 files changed, 100 insertions, 1 deletions
diff --git a/Documentation/cgroups.txt b/Documentation/cgroups/cgroups.txt
index d9014aa0eb68..d9014aa0eb68 100644
--- a/Documentation/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
new file mode 100644
index 000000000000..c50ab58b72eb
--- /dev/null
+++ b/Documentation/cgroups/freezer-subsystem.txt
@@ -0,0 +1,99 @@
+ The cgroup freezer is useful to batch job management system which start
+and stop sets of tasks in order to schedule the resources of a machine
+according to the desires of a system administrator. This sort of program
+is often used on HPC clusters to schedule access to the cluster as a
+whole. The cgroup freezer uses cgroups to describe the set of tasks to
+be started/stopped by the batch job management system. It also provides
+a means to start and stop the tasks composing the job.
+
+ The cgroup freezer will also be useful for checkpointing running groups
+of tasks. The freezer allows the checkpoint code to obtain a consistent
+image of the tasks by attempting to force the tasks in a cgroup into a
+quiescent state. Once the tasks are quiescent another task can
+walk /proc or invoke a kernel interface to gather information about the
+quiesced tasks. Checkpointed tasks can be restarted later should a
+recoverable error occur. This also allows the checkpointed tasks to be
+migrated between nodes in a cluster by copying the gathered information
+to another node and restarting the tasks there.
+
+ Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
+and resuming tasks in userspace. Both of these signals are observable
+from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
+blocked, or ignored it can be seen by waiting or ptracing parent tasks.
+SIGCONT is especially unsuitable since it can be caught by the task. Any
+programs designed to watch for SIGSTOP and SIGCONT could be broken by
+attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
+demonstrate this problem using nested bash shells:
+
+ $ echo $$
+ 16644
+ $ bash
+ $ echo $$
+ 16690
+
+ From a second, unrelated bash shell:
+ $ kill -SIGSTOP 16690
+ $ kill -SIGCONT 16990
+
+ <at this point 16990 exits and causes 16644 to exit too>
+
+ This happens because bash can observe both signals and choose how it
+responds to them.
+
+ Another example of a program which catches and responds to these
+signals is gdb. In fact any program designed to use ptrace is likely to
+have a problem with this method of stopping and resuming tasks.
+
+ In contrast, the cgroup freezer uses the kernel freezer code to
+prevent the freeze/unfreeze cycle from becoming visible to the tasks
+being frozen. This allows the bash example above and gdb to run as
+expected.
+
+ The freezer subsystem in the container filesystem defines a file named
+freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
+cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup.
+Reading will return the current state.
+
+* Examples of usage :
+
+ # mkdir /containers/freezer
+ # mount -t cgroup -ofreezer freezer /containers
+ # mkdir /containers/0
+ # echo $some_pid > /containers/0/tasks
+
+to get status of the freezer subsystem :
+
+ # cat /containers/0/freezer.state
+ THAWED
+
+to freeze all tasks in the container :
+
+ # echo FROZEN > /containers/0/freezer.state
+ # cat /containers/0/freezer.state
+ FREEZING
+ # cat /containers/0/freezer.state
+ FROZEN
+
+to unfreeze all tasks in the container :
+
+ # echo THAWED > /containers/0/freezer.state
+ # cat /containers/0/freezer.state
+ THAWED
+
+This is the basic mechanism which should do the right thing for user space task
+in a simple scenario.
+
+It's important to note that freezing can be incomplete. In that case we return
+EBUSY. This means that some tasks in the cgroup are busy doing something that
+prevents us from completely freezing the cgroup at this time. After EBUSY,
+the cgroup will remain partially frozen -- reflected by freezer.state reporting
+"FREEZING" when read. The state will remain "FREEZING" until one of these
+things happens:
+
+ 1) Userspace cancels the freezing operation by writing "THAWED" to
+ the freezer.state file
+ 2) Userspace retries the freezing operation by writing "FROZEN" to
+ the freezer.state file (writing "FREEZING" is not legal
+ and returns EIO)
+ 3) The tasks that blocked the cgroup from entering the "FROZEN"
+ state disappear from the cgroup's set of tasks.
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index 47e568a9370a..5c86c258c791 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -48,7 +48,7 @@ hooks, beyond what is already present, required to manage dynamic
job placement on large systems.
Cpusets use the generic cgroup subsystem described in
-Documentation/cgroup.txt.
+Documentation/cgroups/cgroups.txt.
Requests by a task, using the sched_setaffinity(2) system call to
include CPUs in its CPU affinity mask, and using the mbind(2) and
OpenPOWER on IntegriCloud