summaryrefslogtreecommitdiffstats
path: root/Documentation/cpu-freq
diff options
context:
space:
mode:
authorRafael J. Wysocki <rjw@rjwysocki.net>2017-03-13 23:59:57 +0100
committerJonathan Corbet <corbet@lwn.net>2017-03-13 17:08:42 -0600
commit2a0e49279850d28c450f27e51b419ce90bacdcdc (patch)
tree96e995e194a1bb9926a4f1c4fa01571bf218e148 /Documentation/cpu-freq
parent8fa1bb506fc9b5b0f7b5e42cee4f8213325a98ee (diff)
downloadtalos-op-linux-2a0e49279850d28c450f27e51b419ce90bacdcdc.tar.gz
talos-op-linux-2a0e49279850d28c450f27e51b419ce90bacdcdc.zip
cpufreq: User/admin documentation update and consolidation
The user/admin documentation of cpufreq is badly outdated. It conains stale and/or inaccurate information along with things that are not particularly useful. Also, some of the important pieces are missing from it. For this reason, add a new user/admin document for cpufreq containing current information to admin-guide and drop the old outdated .txt documents it is replacing. Since there will be more PM documents in admin-guide going forward, create a separate directory for them and put the cpufreq document in there right away. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation/cpu-freq')
-rw-r--r--Documentation/cpu-freq/boost.txt93
-rw-r--r--Documentation/cpu-freq/governors.txt301
-rw-r--r--Documentation/cpu-freq/index.txt7
-rw-r--r--Documentation/cpu-freq/user-guide.txt228
4 files changed, 0 insertions, 629 deletions
diff --git a/Documentation/cpu-freq/boost.txt b/Documentation/cpu-freq/boost.txt
deleted file mode 100644
index dd62e1334f0a..000000000000
--- a/Documentation/cpu-freq/boost.txt
+++ /dev/null
@@ -1,93 +0,0 @@
-Processor boosting control
-
- - information for users -
-
-Quick guide for the impatient:
---------------------
-/sys/devices/system/cpu/cpufreq/boost
-controls the boost setting for the whole system. You can read and write
-that file with either "0" (boosting disabled) or "1" (boosting allowed).
-Reading or writing 1 does not mean that the system is boosting at this
-very moment, but only that the CPU _may_ raise the frequency at it's
-discretion.
---------------------
-
-Introduction
--------------
-Some CPUs support a functionality to raise the operating frequency of
-some cores in a multi-core package if certain conditions apply, mostly
-if the whole chip is not fully utilized and below it's intended thermal
-budget. The decision about boost disable/enable is made either at hardware
-(e.g. x86) or software (e.g ARM).
-On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core",
-in technical documentation "Core performance boost". In Linux we use
-the term "boost" for convenience.
-
-Rationale for disable switch
-----------------------------
-
-Though the idea is to just give better performance without any user
-intervention, sometimes the need arises to disable this functionality.
-Most systems offer a switch in the (BIOS) firmware to disable the
-functionality at all, but a more fine-grained and dynamic control would
-be desirable:
-1. While running benchmarks, reproducible results are important. Since
- the boosting functionality depends on the load of the whole package,
- single thread performance can vary. By explicitly disabling the boost
- functionality at least for the benchmark's run-time the system will run
- at a fixed frequency and results are reproducible again.
-2. To examine the impact of the boosting functionality it is helpful
- to do tests with and without boosting.
-3. Boosting means overclocking the processor, though under controlled
- conditions. By raising the frequency and the voltage the processor
- will consume more power than without the boosting, which may be
- undesirable for instance for mobile users. Disabling boosting may
- save power here, though this depends on the workload.
-
-
-User controlled switch
-----------------------
-
-To allow the user to toggle the boosting functionality, the cpufreq core
-driver exports a sysfs knob to enable or disable it. There is a file:
-/sys/devices/system/cpu/cpufreq/boost
-which can either read "0" (boosting disabled) or "1" (boosting enabled).
-The file is exported only when cpufreq driver supports boosting.
-Explicitly changing the permissions and writing to that file anyway will
-return EINVAL.
-
-On supported CPUs one can write either a "0" or a "1" into this file.
-This will either disable the boost functionality on all cores in the
-whole system (0) or will allow the software or hardware to boost at will
-(1).
-
-Writing a "1" does not explicitly boost the system, but just allows the
-CPU to boost at their discretion. Some implementations take external
-factors like the chip's temperature into account, so boosting once does
-not necessarily mean that it will occur every time even using the exact
-same software setup.
-
-
-AMD legacy cpb switch
----------------------
-The AMD powernow-k8 driver used to support a very similar switch to
-disable or enable the "Core Performance Boost" feature of some AMD CPUs.
-This switch was instantiated in each CPU's cpufreq directory
-(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb".
-Though the per CPU existence hints at a more fine grained control, the
-actual implementation only supported a system-global switch semantics,
-which was simply reflected into each CPU's file. Writing a 0 or 1 into it
-would pull the other CPUs to the same state.
-For compatibility reasons this file and its behavior is still supported
-on AMD CPUs, though it is now protected by a config switch
-(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created,
-even with the config option set.
-This functionality is considered legacy and will be removed in some future
-kernel version.
-
-More fine grained boosting control
-----------------------------------
-
-Technically it is possible to switch the boosting functionality at least
-on a per package basis, for some CPUs even per core. Currently the driver
-does not support it, but this may be implemented in the future.
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
deleted file mode 100644
index 61b3184b6c24..000000000000
--- a/Documentation/cpu-freq/governors.txt
+++ /dev/null
@@ -1,301 +0,0 @@
- CPU frequency and voltage scaling code in the Linux(TM) kernel
-
-
- L i n u x C P U F r e q
-
- C P U F r e q G o v e r n o r s
-
- - information for users and developers -
-
-
- Dominik Brodowski <linux@brodo.de>
- some additions and corrections by Nico Golde <nico@ngolde.de>
- Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Viresh Kumar <viresh.kumar@linaro.org>
-
-
-
- Clock scaling allows you to change the clock speed of the CPUs on the
- fly. This is a nice method to save battery power, because the lower
- the clock speed, the less power the CPU consumes.
-
-
-Contents:
----------
-1. What is a CPUFreq Governor?
-
-2. Governors In the Linux Kernel
-2.1 Performance
-2.2 Powersave
-2.3 Userspace
-2.4 Ondemand
-2.5 Conservative
-2.6 Schedutil
-
-3. The Governor Interface in the CPUfreq Core
-
-4. References
-
-
-1. What Is A CPUFreq Governor?
-==============================
-
-Most cpufreq drivers (except the intel_pstate and longrun) or even most
-cpu frequency scaling algorithms only allow the CPU frequency to be set
-to predefined fixed values. In order to offer dynamic frequency
-scaling, the cpufreq core must be able to tell these drivers of a
-"target frequency". So these specific drivers will be transformed to
-offer a "->target/target_index/fast_switch()" call instead of the
-"->setpolicy()" call. For set_policy drivers, all stays the same,
-though.
-
-How to decide what frequency within the CPUfreq policy should be used?
-That's done using "cpufreq governors".
-
-Basically, it's the following flow graph:
-
-CPU can be set to switch independently | CPU can only be set
- within specific "limits" | to specific frequencies
-
- "CPUfreq policy"
- consists of frequency limits (policy->{min,max})
- and CPUfreq governor to be used
- / \
- / \
- / the cpufreq governor decides
- / (dynamically or statically)
- / what target_freq to set within
- / the limits of policy->{min,max}
- / \
- / \
- Using the ->setpolicy call, Using the ->target/target_index/fast_switch call,
- the limits and the the frequency closest
- "policy" is set. to target_freq is set.
- It is assured that it
- is within policy->{min,max}
-
-
-2. Governors In the Linux Kernel
-================================
-
-2.1 Performance
----------------
-
-The CPUfreq governor "performance" sets the CPU statically to the
-highest frequency within the borders of scaling_min_freq and
-scaling_max_freq.
-
-
-2.2 Powersave
--------------
-
-The CPUfreq governor "powersave" sets the CPU statically to the
-lowest frequency within the borders of scaling_min_freq and
-scaling_max_freq.
-
-
-2.3 Userspace
--------------
-
-The CPUfreq governor "userspace" allows the user, or any userspace
-program running with UID "root", to set the CPU to a specific frequency
-by making a sysfs file "scaling_setspeed" available in the CPU-device
-directory.
-
-
-2.4 Ondemand
-------------
-
-The CPUfreq governor "ondemand" sets the CPU frequency depending on the
-current system load. Load estimation is triggered by the scheduler
-through the update_util_data->func hook; when triggered, cpufreq checks
-the CPU-usage statistics over the last period and the governor sets the
-CPU accordingly. The CPU must have the capability to switch the
-frequency very quickly.
-
-Sysfs files:
-
-* sampling_rate:
-
- Measured in uS (10^-6 seconds), this is how often you want the kernel
- to look at the CPU usage and to make decisions on what to do about the
- frequency. Typically this is set to values of around '10000' or more.
- It's default value is (cmp. with users-guide.txt): transition_latency
- * 1000. Be aware that transition latency is in ns and sampling_rate
- is in us, so you get the same sysfs value by default. Sampling rate
- should always get adjusted considering the transition latency to set
- the sampling rate 750 times as high as the transition latency in the
- bash (as said, 1000 is default), do:
-
- $ echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
-
-* sampling_rate_min:
-
- The sampling rate is limited by the HW transition latency:
- transition_latency * 100
-
- Or by kernel restrictions:
- - If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed.
- - If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is
- used, the limits depend on the CONFIG_HZ option:
- HZ=1000: min=20000us (20ms)
- HZ=250: min=80000us (80ms)
- HZ=100: min=200000us (200ms)
-
- The highest value of kernel and HW latency restrictions is shown and
- used as the minimum sampling rate.
-
-* up_threshold:
-
- This defines what the average CPU usage between the samplings of
- 'sampling_rate' needs to be for the kernel to make a decision on
- whether it should increase the frequency. For example when it is set
- to its default value of '95' it means that between the checking
- intervals the CPU needs to be on average more than 95% in use to then
- decide that the CPU frequency needs to be increased.
-
-* ignore_nice_load:
-
- This parameter takes a value of '0' or '1'. When set to '0' (its
- default), all processes are counted towards the 'cpu utilisation'
- value. When set to '1', the processes that are run with a 'nice'
- value will not count (and thus be ignored) in the overall usage
- calculation. This is useful if you are running a CPU intensive
- calculation on your laptop that you do not care how long it takes to
- complete as you can 'nice' it and prevent it from taking part in the
- deciding process of whether to increase your CPU frequency.
-
-* sampling_down_factor:
-
- This parameter controls the rate at which the kernel makes a decision
- on when to decrease the frequency while running at top speed. When set
- to 1 (the default) decisions to reevaluate load are made at the same
- interval regardless of current clock speed. But when set to greater
- than 1 (e.g. 100) it acts as a multiplier for the scheduling interval
- for reevaluating load when the CPU is at its top speed due to high
- load. This improves performance by reducing the overhead of load
- evaluation and helping the CPU stay at its top speed when truly busy,
- rather than shifting back and forth in speed. This tunable has no
- effect on behavior at lower speeds/lower CPU loads.
-
-* powersave_bias:
-
- This parameter takes a value between 0 to 1000. It defines the
- percentage (times 10) value of the target frequency that will be
- shaved off of the target. For example, when set to 100 -- 10%, when
- ondemand governor would have targeted 1000 MHz, it will target
- 1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0
- (disabled) by default.
-
- When AMD frequency sensitivity powersave bias driver --
- drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter
- defines the workload frequency sensitivity threshold in which a lower
- frequency is chosen instead of ondemand governor's original target.
- The frequency sensitivity is a hardware reported (on AMD Family 16h
- Processors and above) value between 0 to 100% that tells software how
- the performance of the workload running on a CPU will change when
- frequency changes. A workload with sensitivity of 0% (memory/IO-bound)
- will not perform any better on higher core frequency, whereas a
- workload with sensitivity of 100% (CPU-bound) will perform better
- higher the frequency. When the driver is loaded, this is set to 400 by
- default -- for CPUs running workloads with sensitivity value below
- 40%, a lower frequency is chosen. Unloading the driver or writing 0
- will disable this feature.
-
-
-2.5 Conservative
-----------------
-
-The CPUfreq governor "conservative", much like the "ondemand"
-governor, sets the CPU frequency depending on the current usage. It
-differs in behaviour in that it gracefully increases and decreases the
-CPU speed rather than jumping to max speed the moment there is any load
-on the CPU. This behaviour is more suitable in a battery powered
-environment. The governor is tweaked in the same manner as the
-"ondemand" governor through sysfs with the addition of:
-
-* freq_step:
-
- This describes what percentage steps the cpu freq should be increased
- and decreased smoothly by. By default the cpu frequency will increase
- in 5% chunks of your maximum cpu frequency. You can change this value
- to anywhere between 0 and 100 where '0' will effectively lock your CPU
- at a speed regardless of its load whilst '100' will, in theory, make
- it behave identically to the "ondemand" governor.
-
-* down_threshold:
-
- Same as the 'up_threshold' found for the "ondemand" governor but for
- the opposite direction. For example when set to its default value of
- '20' it means that if the CPU usage needs to be below 20% between
- samples to have the frequency decreased.
-
-* sampling_down_factor:
-
- Similar functionality as in "ondemand" governor. But in
- "conservative", it controls the rate at which the kernel makes a
- decision on when to decrease the frequency while running in any speed.
- Load for frequency increase is still evaluated every sampling rate.
-
-
-2.6 Schedutil
--------------
-
-The "schedutil" governor aims at better integration with the Linux
-kernel scheduler. Load estimation is achieved through the scheduler's
-Per-Entity Load Tracking (PELT) mechanism, which also provides
-information about the recent load [1]. This governor currently does
-load based DVFS only for tasks managed by CFS. RT and DL scheduler tasks
-are always run at the highest frequency. Unlike all the other
-governors, the code is located under the kernel/sched/ directory.
-
-Sysfs files:
-
-* rate_limit_us:
-
- This contains a value in microseconds. The governor waits for
- rate_limit_us time before reevaluating the load again, after it has
- evaluated the load once.
-
-For an in-depth comparison with the other governors refer to [2].
-
-
-3. The Governor Interface in the CPUfreq Core
-=============================================
-
-A new governor must register itself with the CPUfreq core using
-"cpufreq_register_governor". The struct cpufreq_governor, which has to
-be passed to that function, must contain the following values:
-
-governor->name - A unique name for this governor.
-governor->owner - .THIS_MODULE for the governor module (if appropriate).
-
-plus a set of hooks to the functions implementing the governor's logic.
-
-The CPUfreq governor may call the CPU processor driver using one of
-these two functions:
-
-int cpufreq_driver_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation);
-
-int __cpufreq_driver_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation);
-
-target_freq must be within policy->min and policy->max, of course.
-What's the difference between these two functions? When your governor is
-in a direct code path of a call to governor callbacks, like
-governor->start(), the policy->rwsem is still held in the cpufreq core,
-and there's no need to lock it again (in fact, this would cause a
-deadlock). So use __cpufreq_driver_target only in these cases. In all
-other cases (for example, when there's a "daemonized" function that
-wakes up every second), use cpufreq_driver_target to take policy->rwsem
-before the command is passed to the cpufreq driver.
-
-4. References
-=============
-
-[1] Per-entity load tracking: https://lwn.net/Articles/531853/
-[2] Improvements in CPU frequency management: https://lwn.net/Articles/682391/
-
diff --git a/Documentation/cpu-freq/index.txt b/Documentation/cpu-freq/index.txt
index ef1d39247b05..03a7cee6ac73 100644
--- a/Documentation/cpu-freq/index.txt
+++ b/Documentation/cpu-freq/index.txt
@@ -21,8 +21,6 @@ Documents in this directory:
amd-powernow.txt - AMD powernow driver specific file.
-boost.txt - Frequency boosting support.
-
core.txt - General description of the CPUFreq core and
of CPUFreq notifiers.
@@ -32,17 +30,12 @@ cpufreq-nforce2.txt - nVidia nForce2 platform specific file.
cpufreq-stats.txt - General description of sysfs cpufreq stats.
-governors.txt - What are cpufreq governors and how to
- implement them?
-
index.txt - File index, Mailing list and Links (this document)
intel-pstate.txt - Intel pstate cpufreq driver specific file.
pcc-cpufreq.txt - PCC cpufreq driver specific file.
-user-guide.txt - User Guide to CPUFreq
-
Mailing List
------------
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt
deleted file mode 100644
index 391da64e9492..000000000000
--- a/Documentation/cpu-freq/user-guide.txt
+++ /dev/null
@@ -1,228 +0,0 @@
- CPU frequency and voltage scaling code in the Linux(TM) kernel
-
-
- L i n u x C P U F r e q
-
- U S E R G U I D E
-
-
- Dominik Brodowski <linux@brodo.de>
-
-
-
- Clock scaling allows you to change the clock speed of the CPUs on the
- fly. This is a nice method to save battery power, because the lower
- the clock speed, the less power the CPU consumes.
-
-
-Contents:
----------
-1. Supported Architectures and Processors
-1.1 ARM and ARM64
-1.2 x86
-1.3 sparc64
-1.4 ppc
-1.5 SuperH
-1.6 Blackfin
-
-2. "Policy" / "Governor"?
-2.1 Policy
-2.2 Governor
-
-3. How to change the CPU cpufreq policy and/or speed
-3.1 Preferred interface: sysfs
-
-
-
-1. Supported Architectures and Processors
-=========================================
-
-1.1 ARM and ARM64
------------------
-
-Almost all ARM and ARM64 platforms support CPU frequency scaling.
-
-1.2 x86
--------
-
-The following processors for the x86 architecture are supported by cpufreq:
-
-AMD Elan - SC400, SC410
-AMD mobile K6-2+
-AMD mobile K6-3+
-AMD mobile Duron
-AMD mobile Athlon
-AMD Opteron
-AMD Athlon 64
-Cyrix Media GXm
-Intel mobile PIII and Intel mobile PIII-M on certain chipsets
-Intel Pentium 4, Intel Xeon
-Intel Pentium M (Centrino)
-National Semiconductors Geode GX
-Transmeta Crusoe
-Transmeta Efficeon
-VIA Cyrix 3 / C3
-various processors on some ACPI 2.0-compatible systems [*]
-And many more
-
-[*] Only if "ACPI Processor Performance States" are available
-to the ACPI<->BIOS interface.
-
-
-1.3 sparc64
------------
-
-The following processors for the sparc64 architecture are supported by
-cpufreq:
-
-UltraSPARC-III
-
-
-1.4 ppc
--------
-
-Several "PowerBook" and "iBook2" notebooks are supported.
-The following POWER processors are supported in powernv mode:
-POWER8
-POWER9
-
-1.5 SuperH
-----------
-
-All SuperH processors supporting rate rounding through the clock
-framework are supported by cpufreq.
-
-1.6 Blackfin
-------------
-
-The following Blackfin processors are supported by cpufreq:
-
-BF522, BF523, BF524, BF525, BF526, BF527, Rev 0.1 or higher
-BF531, BF532, BF533, Rev 0.3 or higher
-BF534, BF536, BF537, Rev 0.2 or higher
-BF561, Rev 0.3 or higher
-BF542, BF544, BF547, BF548, BF549, Rev 0.1 or higher
-
-
-2. "Policy" / "Governor" ?
-==========================
-
-Some CPU frequency scaling-capable processor switch between various
-frequencies and operating voltages "on the fly" without any kernel or
-user involvement. This guarantees very fast switching to a frequency
-which is high enough to serve the user's needs, but low enough to save
-power.
-
-
-2.1 Policy
-----------
-
-On these systems, all you can do is select the lower and upper
-frequency limit as well as whether you want more aggressive
-power-saving or more instantly available processing power.
-
-
-2.2 Governor
-------------
-
-On all other cpufreq implementations, these boundaries still need to
-be set. Then, a "governor" must be selected. Such a "governor" decides
-what speed the processor shall run within the boundaries. One such
-"governor" is the "userspace" governor. This one allows the user - or
-a yet-to-implement userspace program - to decide what specific speed
-the processor shall run at.
-
-
-3. How to change the CPU cpufreq policy and/or speed
-====================================================
-
-3.1 Preferred Interface: sysfs
-------------------------------
-
-The preferred interface is located in the sysfs filesystem. If you
-mounted it at /sys, the cpufreq interface is located in a subdirectory
-"cpufreq" within the cpu-device directory
-(e.g. /sys/devices/system/cpu/cpu0/cpufreq/ for the first CPU).
-
-affected_cpus : List of Online CPUs that require software
- coordination of frequency.
-
-cpuinfo_cur_freq : Current frequency of the CPU as obtained from
- the hardware, in KHz. This is the frequency
- the CPU actually runs at.
-
-cpuinfo_min_freq : this file shows the minimum operating
- frequency the processor can run at(in kHz)
-
-cpuinfo_max_freq : this file shows the maximum operating
- frequency the processor can run at(in kHz)
-
-cpuinfo_transition_latency The time it takes on this CPU to
- switch between two frequencies in nano
- seconds. If unknown or known to be
- that high that the driver does not
- work with the ondemand governor, -1
- (CPUFREQ_ETERNAL) will be returned.
- Using this information can be useful
- to choose an appropriate polling
- frequency for a kernel governor or
- userspace daemon. Make sure to not
- switch the frequency too often
- resulting in performance loss.
-
-related_cpus : List of Online + Offline CPUs that need software
- coordination of frequency.
-
-scaling_available_frequencies : List of available frequencies, in KHz.
-
-scaling_available_governors : this file shows the CPUfreq governors
- available in this kernel. You can see the
- currently activated governor in
-
-scaling_cur_freq : Current frequency of the CPU as determined by
- the governor and cpufreq core, in KHz. This is
- the frequency the kernel thinks the CPU runs
- at.
-
-scaling_driver : this file shows what cpufreq driver is
- used to set the frequency on this CPU
-
-scaling_governor, and by "echoing" the name of another
- governor you can change it. Please note
- that some governors won't load - they only
- work on some specific architectures or
- processors.
-
-scaling_min_freq and
-scaling_max_freq show the current "policy limits" (in
- kHz). By echoing new values into these
- files, you can change these limits.
- NOTE: when setting a policy you need to
- first set scaling_max_freq, then
- scaling_min_freq.
-
-scaling_setspeed This can be read to get the currently programmed
- value by the governor. This can be written to
- change the current frequency for a group of
- CPUs, represented by a policy. This is supported
- currently only by the userspace governor.
-
-bios_limit : If the BIOS tells the OS to limit a CPU to
- lower frequencies, the user can read out the
- maximum available frequency from this file.
- This typically can happen through (often not
- intended) BIOS settings, restrictions
- triggered through a service processor or other
- BIOS/HW based implementations.
- This does not cover thermal ACPI limitations
- which can be detected through the generic
- thermal driver.
-
-If you have selected the "userspace" governor which allows you to
-set the CPU operating frequency to a specific value, you can read out
-the current frequency in
-
-scaling_setspeed. By "echoing" a new frequency into this
- you can change the speed of the CPU,
- but only within the limits of
- scaling_min_freq and scaling_max_freq.
OpenPOWER on IntegriCloud