diff options
author | Mauro Carvalho Chehab <mchehab+samsung@kernel.org> | 2019-06-27 13:08:35 -0300 |
---|---|---|
committer | Mauro Carvalho Chehab <mchehab+samsung@kernel.org> | 2019-07-15 11:03:02 -0300 |
commit | da82c92f1150f66afabf78d2c85ef9ac18dc6d38 (patch) | |
tree | c8f1c200188e0edc8391fa14d02eff60c6347adb /Documentation/cgroup-v1/memcg_test.rst | |
parent | 83bbf6e103544d65f17f4b2ccea1c6a51c0b0769 (diff) | |
download | talos-op-linux-da82c92f1150f66afabf78d2c85ef9ac18dc6d38.tar.gz talos-op-linux-da82c92f1150f66afabf78d2c85ef9ac18dc6d38.zip |
docs: cgroup-v1: add it to the admin-guide book
Those files belong to the admin guide, so add them.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Diffstat (limited to 'Documentation/cgroup-v1/memcg_test.rst')
-rw-r--r-- | Documentation/cgroup-v1/memcg_test.rst | 355 |
1 files changed, 0 insertions, 355 deletions
diff --git a/Documentation/cgroup-v1/memcg_test.rst b/Documentation/cgroup-v1/memcg_test.rst deleted file mode 100644 index 91bd18c6a514..000000000000 --- a/Documentation/cgroup-v1/memcg_test.rst +++ /dev/null @@ -1,355 +0,0 @@ -===================================================== -Memory Resource Controller(Memcg) Implementation Memo -===================================================== - -Last Updated: 2010/2 - -Base Kernel Version: based on 2.6.33-rc7-mm(candidate for 34). - -Because VM is getting complex (one of reasons is memcg...), memcg's behavior -is complex. This is a document for memcg's internal behavior. -Please note that implementation details can be changed. - -(*) Topics on API should be in Documentation/cgroup-v1/memory.rst) - -0. How to record usage ? -======================== - - 2 objects are used. - - page_cgroup ....an object per page. - - Allocated at boot or memory hotplug. Freed at memory hot removal. - - swap_cgroup ... an entry per swp_entry. - - Allocated at swapon(). Freed at swapoff(). - - The page_cgroup has USED bit and double count against a page_cgroup never - occurs. swap_cgroup is used only when a charged page is swapped-out. - -1. Charge -========= - - a page/swp_entry may be charged (usage += PAGE_SIZE) at - - mem_cgroup_try_charge() - -2. Uncharge -=========== - - a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by - - mem_cgroup_uncharge() - Called when a page's refcount goes down to 0. - - mem_cgroup_uncharge_swap() - Called when swp_entry's refcnt goes down to 0. A charge against swap - disappears. - -3. charge-commit-cancel -======================= - - Memcg pages are charged in two steps: - - - mem_cgroup_try_charge() - - mem_cgroup_commit_charge() or mem_cgroup_cancel_charge() - - At try_charge(), there are no flags to say "this page is charged". - at this point, usage += PAGE_SIZE. - - At commit(), the page is associated with the memcg. - - At cancel(), simply usage -= PAGE_SIZE. - -Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. - -4. Anonymous -============ - - Anonymous page is newly allocated at - - page fault into MAP_ANONYMOUS mapping. - - Copy-On-Write. - - 4.1 Swap-in. - At swap-in, the page is taken from swap-cache. There are 2 cases. - - (a) If the SwapCache is newly allocated and read, it has no charges. - (b) If the SwapCache has been mapped by processes, it has been - charged already. - - 4.2 Swap-out. - At swap-out, typical state transition is below. - - (a) add to swap cache. (marked as SwapCache) - swp_entry's refcnt += 1. - (b) fully unmapped. - swp_entry's refcnt += # of ptes. - (c) write back to swap. - (d) delete from swap cache. (remove from SwapCache) - swp_entry's refcnt -= 1. - - - Finally, at task exit, - (e) zap_pte() is called and swp_entry's refcnt -=1 -> 0. - -5. Page Cache -============= - - Page Cache is charged at - - add_to_page_cache_locked(). - - The logic is very clear. (About migration, see below) - - Note: - __remove_from_page_cache() is called by remove_from_page_cache() - and __remove_mapping(). - -6. Shmem(tmpfs) Page Cache -=========================== - - The best way to understand shmem's page state transition is to read - mm/shmem.c. - - But brief explanation of the behavior of memcg around shmem will be - helpful to understand the logic. - - Shmem's page (just leaf page, not direct/indirect block) can be on - - - radix-tree of shmem's inode. - - SwapCache. - - Both on radix-tree and SwapCache. This happens at swap-in - and swap-out, - - It's charged when... - - - A new page is added to shmem's radix-tree. - - A swp page is read. (move a charge from swap_cgroup to page_cgroup) - -7. Page Migration -================= - - mem_cgroup_migrate() - -8. LRU -====== - Each memcg has its own private LRU. Now, its handling is under global - VM's control (means that it's handled under global pgdat->lru_lock). - Almost all routines around memcg's LRU is called by global LRU's - list management functions under pgdat->lru_lock. - - A special function is mem_cgroup_isolate_pages(). This scans - memcg's private LRU and call __isolate_lru_page() to extract a page - from LRU. - - (By __isolate_lru_page(), the page is removed from both of global and - private LRU.) - - -9. Typical Tests. -================= - - Tests for racy cases. - -9.1 Small limit to memcg. -------------------------- - - When you do test to do racy case, it's good test to set memcg's limit - to be very small rather than GB. Many races found in the test under - xKB or xxMB limits. - - (Memory behavior under GB and Memory behavior under MB shows very - different situation.) - -9.2 Shmem ---------- - - Historically, memcg's shmem handling was poor and we saw some amount - of troubles here. This is because shmem is page-cache but can be - SwapCache. Test with shmem/tmpfs is always good test. - -9.3 Migration -------------- - - For NUMA, migration is an another special case. To do easy test, cpuset - is useful. Following is a sample script to do migration:: - - mount -t cgroup -o cpuset none /opt/cpuset - - mkdir /opt/cpuset/01 - echo 1 > /opt/cpuset/01/cpuset.cpus - echo 0 > /opt/cpuset/01/cpuset.mems - echo 1 > /opt/cpuset/01/cpuset.memory_migrate - mkdir /opt/cpuset/02 - echo 1 > /opt/cpuset/02/cpuset.cpus - echo 1 > /opt/cpuset/02/cpuset.mems - echo 1 > /opt/cpuset/02/cpuset.memory_migrate - - In above set, when you moves a task from 01 to 02, page migration to - node 0 to node 1 will occur. Following is a script to migrate all - under cpuset.:: - - -- - move_task() - { - for pid in $1 - do - /bin/echo $pid >$2/tasks 2>/dev/null - echo -n $pid - echo -n " " - done - echo END - } - - G1_TASK=`cat ${G1}/tasks` - G2_TASK=`cat ${G2}/tasks` - move_task "${G1_TASK}" ${G2} & - -- - -9.4 Memory hotplug ------------------- - - memory hotplug test is one of good test. - - to offline memory, do following:: - - # echo offline > /sys/devices/system/memory/memoryXXX/state - - (XXX is the place of memory) - - This is an easy way to test page migration, too. - -9.5 mkdir/rmdir ---------------- - - When using hierarchy, mkdir/rmdir test should be done. - Use tests like the following:: - - echo 1 >/opt/cgroup/01/memory/use_hierarchy - mkdir /opt/cgroup/01/child_a - mkdir /opt/cgroup/01/child_b - - set limit to 01. - add limit to 01/child_b - run jobs under child_a and child_b - - create/delete following groups at random while jobs are running:: - - /opt/cgroup/01/child_a/child_aa - /opt/cgroup/01/child_b/child_bb - /opt/cgroup/01/child_c - - running new jobs in new group is also good. - -9.6 Mount with other subsystems -------------------------------- - - Mounting with other subsystems is a good test because there is a - race and lock dependency with other cgroup subsystems. - - example:: - - # mount -t cgroup none /cgroup -o cpuset,memory,cpu,devices - - and do task move, mkdir, rmdir etc...under this. - -9.7 swapoff ------------ - - Besides management of swap is one of complicated parts of memcg, - call path of swap-in at swapoff is not same as usual swap-in path.. - It's worth to be tested explicitly. - - For example, test like following is good: - - (Shell-A):: - - # mount -t cgroup none /cgroup -o memory - # mkdir /cgroup/test - # echo 40M > /cgroup/test/memory.limit_in_bytes - # echo 0 > /cgroup/test/tasks - - Run malloc(100M) program under this. You'll see 60M of swaps. - - (Shell-B):: - - # move all tasks in /cgroup/test to /cgroup - # /sbin/swapoff -a - # rmdir /cgroup/test - # kill malloc task. - - Of course, tmpfs v.s. swapoff test should be tested, too. - -9.8 OOM-Killer --------------- - - Out-of-memory caused by memcg's limit will kill tasks under - the memcg. When hierarchy is used, a task under hierarchy - will be killed by the kernel. - - In this case, panic_on_oom shouldn't be invoked and tasks - in other groups shouldn't be killed. - - It's not difficult to cause OOM under memcg as following. - - Case A) when you can swapoff:: - - #swapoff -a - #echo 50M > /memory.limit_in_bytes - - run 51M of malloc - - Case B) when you use mem+swap limitation:: - - #echo 50M > memory.limit_in_bytes - #echo 50M > memory.memsw.limit_in_bytes - - run 51M of malloc - -9.9 Move charges at task migration ----------------------------------- - - Charges associated with a task can be moved along with task migration. - - (Shell-A):: - - #mkdir /cgroup/A - #echo $$ >/cgroup/A/tasks - - run some programs which uses some amount of memory in /cgroup/A. - - (Shell-B):: - - #mkdir /cgroup/B - #echo 1 >/cgroup/B/memory.move_charge_at_immigrate - #echo "pid of the program running in group A" >/cgroup/B/tasks - - You can see charges have been moved by reading ``*.usage_in_bytes`` or - memory.stat of both A and B. - - See 8.2 of Documentation/cgroup-v1/memory.rst to see what value should - be written to move_charge_at_immigrate. - -9.10 Memory thresholds ----------------------- - - Memory controller implements memory thresholds using cgroups notification - API. You can use tools/cgroup/cgroup_event_listener.c to test it. - - (Shell-A) Create cgroup and run event listener:: - - # mkdir /cgroup/A - # ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M - - (Shell-B) Add task to cgroup and try to allocate and free memory:: - - # echo $$ >/cgroup/A/tasks - # a="$(dd if=/dev/zero bs=1M count=10)" - # a= - - You will see message from cgroup_event_listener every time you cross - the thresholds. - - Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds. - - It's good idea to test root cgroup as well. |