summaryrefslogtreecommitdiffstats
path: root/Documentation/sysctl
diff options
context:
space:
mode:
authorRik van Riel <riel@redhat.com>2013-10-07 11:29:39 +0100
committerIngo Molnar <mingo@kernel.org>2013-10-09 14:48:21 +0200
commitde1c9ce6f07fec0381a39a9d0b379ea35aa1167f (patch)
treed96bf1a2b25dfa84d3fe5f6fe00fb780800e3ef3 /Documentation/sysctl
parent1e3646ffc64b232cb14a5ef01d7b98997c1b73f9 (diff)
downloadtalos-op-linux-de1c9ce6f07fec0381a39a9d0b379ea35aa1167f.tar.gz
talos-op-linux-de1c9ce6f07fec0381a39a9d0b379ea35aa1167f.zip
sched/numa: Skip some page migrations after a shared fault
Shared faults can lead to lots of unnecessary page migrations, slowing down the system, and causing private faults to hit the per-pgdat migration ratelimit. This patch adds sysctl numa_balancing_migrate_deferred, which specifies how many shared page migrations to skip unconditionally, after each page migration that is skipped because it is a shared fault. This reduces the number of page migrations back and forth in shared fault situations. It also gives a strong preference to the tasks that are already running where most of the memory is, and to moving the other tasks to near the memory. Testing this with a much higher scan rate than the default still seems to result in fewer page migrations than before. Memory seems to be somewhat better consolidated than previously, with multi-instance specjbb runs on a 4 node system. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-62-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'Documentation/sysctl')
-rw-r--r--Documentation/sysctl/kernel.txt10
1 files changed, 9 insertions, 1 deletions
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 84f17800f8b5..4273b2d71a27 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -375,7 +375,8 @@ feature should be disabled. Otherwise, if the system overhead from the
feature is too high then the rate the kernel samples for NUMA hinting
faults may be controlled by the numa_balancing_scan_period_min_ms,
numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
-numa_balancing_scan_size_mb and numa_balancing_settle_count sysctls.
+numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and
+numa_balancing_migrate_deferred.
==============================================================
@@ -421,6 +422,13 @@ the schedule balancer stops pushing the task towards a preferred node. This
gives the scheduler a chance to place the task on an alternative node if the
preferred node is overloaded.
+numa_balancing_migrate_deferred is how many page migrations get skipped
+unconditionally, after a page migration is skipped because a page is shared
+with other tasks. This reduces page migration overhead, and determines
+how much stronger the "move task near its memory" policy scheduler becomes,
+versus the "move memory near its task" memory management policy, for workloads
+with shared memory.
+
==============================================================
osrelease, ostype & version:
OpenPOWER on IntegriCloud