summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/removed/raw1394_legacy_isochronous16
-rw-r--r--Documentation/DocBook/kernel-api.tmpl11
-rw-r--r--Documentation/block/barrier.txt16
-rw-r--r--Documentation/feature-removal-schedule.txt18
-rw-r--r--Documentation/kernel-parameters.txt43
-rw-r--r--Documentation/networking/00-INDEX3
-rw-r--r--Documentation/networking/sk98lin.txt568
-rw-r--r--Documentation/networking/spider_net.txt204
-rw-r--r--Documentation/power_supply_class.txt167
-rw-r--r--Documentation/sched-design-CFS.txt119
10 files changed, 520 insertions, 645 deletions
diff --git a/Documentation/ABI/removed/raw1394_legacy_isochronous b/Documentation/ABI/removed/raw1394_legacy_isochronous
new file mode 100644
index 000000000000..1b629622d883
--- /dev/null
+++ b/Documentation/ABI/removed/raw1394_legacy_isochronous
@@ -0,0 +1,16 @@
+What: legacy isochronous ABI of raw1394 (1st generation iso ABI)
+Date: June 2007 (scheduled), removed in kernel v2.6.23
+Contact: linux1394-devel@lists.sourceforge.net
+Description:
+ The two request types RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN have
+ been deprecated for quite some time. They are very inefficient as they
+ come with high interrupt load and several layers of callbacks for each
+ packet. Because of these deficiencies, the video1394 and dv1394 drivers
+ and the 3rd-generation isochronous ABI in raw1394 (rawiso) were created.
+
+Users:
+ libraw1394 users via the long deprecated API raw1394_iso_write,
+ raw1394_start_iso_write, raw1394_start_iso_rcv, raw1394_stop_iso_rcv
+
+ libdc1394, which optionally uses these old libraw1394 calls
+ alternatively to the more efficient video1394 ABI
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 38f88b6ae405..8c5698a8c2e1 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -643,4 +643,15 @@ X!Idrivers/video/console/fonts.c
!Edrivers/spi/spi.c
</chapter>
+ <chapter id="splice">
+ <title>splice API</title>
+ <para>)
+ splice is a method for moving blocks of data around inside the
+ kernel, without continually transferring it between the kernel
+ and user space.
+ </para>
+!Iinclude/linux/splice.h
+!Ffs/splice.c
+ </chapter>
+
</book>
diff --git a/Documentation/block/barrier.txt b/Documentation/block/barrier.txt
index a272c3db8094..7d279f2f5bb2 100644
--- a/Documentation/block/barrier.txt
+++ b/Documentation/block/barrier.txt
@@ -82,23 +82,12 @@ including draining and flushing.
typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq);
int blk_queue_ordered(request_queue_t *q, unsigned ordered,
- prepare_flush_fn *prepare_flush_fn,
- unsigned gfp_mask);
-
-int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered,
- prepare_flush_fn *prepare_flush_fn,
- unsigned gfp_mask);
-
-The only difference between the two functions is whether or not the
-caller is holding q->queue_lock on entry. The latter expects the
-caller is holding the lock.
+ prepare_flush_fn *prepare_flush_fn);
@q : the queue in question
@ordered : the ordered mode the driver/device supports
@prepare_flush_fn : this function should prepare @rq such that it
flushes cache to physical medium when executed
-@gfp_mask : gfp_mask used when allocating data structures
- for ordered processing
For example, SCSI disk driver's prepare_flush_fn looks like the
following.
@@ -106,9 +95,10 @@ following.
static void sd_prepare_flush(request_queue_t *q, struct request *rq)
{
memset(rq->cmd, 0, sizeof(rq->cmd));
- rq->flags |= REQ_BLOCK_PC;
+ rq->cmd_type = REQ_TYPE_BLOCK_PC;
rq->timeout = SD_TIMEOUT;
rq->cmd[0] = SYNCHRONIZE_CACHE;
+ rq->cmd_len = 10;
}
The following seven ordered modes are supported. The following table
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 7d3f205b0ba5..3a159dac04f5 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -49,16 +49,6 @@ Who: Adrian Bunk <bunk@stusta.de>
---------------------------
-What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN
-When: June 2007
-Why: Deprecated in favour of the more efficient and robust rawiso interface.
- Affected are applications which use the deprecated part of libraw1394
- (raw1394_iso_write, raw1394_start_iso_write, raw1394_start_iso_rcv,
- raw1394_stop_iso_rcv) or bypass libraw1394.
-Who: Dan Dennedy <dan@dennedy.org>, Stefan Richter <stefanr@s5r6.in-berlin.de>
-
----------------------------
-
What: old NCR53C9x driver
When: October 2007
Why: Replaced by the much better esp_scsi driver. Actual low-level
@@ -258,14 +248,6 @@ Who: Len Brown <len.brown@intel.com>
---------------------------
-What: sk98lin network driver
-When: July 2007
-Why: In kernel tree version of driver is unmaintained. Sk98lin driver
- replaced by the skge driver.
-Who: Stephen Hemminger <shemminger@osdl.org>
-
----------------------------
-
What: Compaq touchscreen device emulation
When: Oct 2007
Files: drivers/input/tsdev.c
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index af50f9bbe68e..4d880b3d1f35 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1014,49 +1014,6 @@ and is between 256 and 4096 characters. It is defined in the file
mga= [HW,DRM]
- migration_cost=
- [KNL,SMP] debug: override scheduler migration costs
- Format: <level-1-usecs>,<level-2-usecs>,...
- This debugging option can be used to override the
- default scheduler migration cost matrix. The numbers
- are indexed by 'CPU domain distance'.
- E.g. migration_cost=1000,2000,3000 on an SMT NUMA
- box will set up an intra-core migration cost of
- 1 msec, an inter-core migration cost of 2 msecs,
- and an inter-node migration cost of 3 msecs.
-
- WARNING: using the wrong values here can break
- scheduler performance, so it's only for scheduler
- development purposes, not production environments.
-
- migration_debug=
- [KNL,SMP] migration cost auto-detect verbosity
- Format=<0|1|2>
- If a system's migration matrix reported at bootup
- seems erroneous then this option can be used to
- increase verbosity of the detection process.
- We default to 0 (no extra messages), 1 will print
- some more information, and 2 will be really
- verbose (probably only useful if you also have a
- serial console attached to the system).
-
- migration_factor=
- [KNL,SMP] multiply/divide migration costs by a factor
- Format=<percent>
- This debug option can be used to proportionally
- increase or decrease the auto-detected migration
- costs for all entries of the migration matrix.
- E.g. migration_factor=150 will increase migration
- costs by 50%. (and thus the scheduler will be less
- eager migrating cache-hot tasks)
- migration_factor=80 will decrease migration costs
- by 20%. (thus the scheduler will be more eager to
- migrate tasks)
-
- WARNING: using the wrong values here can break
- scheduler performance, so it's only for scheduler
- development purposes, not production environments.
-
mousedev.tap_time=
[MOUSE] Maximum time between finger touching and
leaving touchpad surface for touch to be considered
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index 153d84d281e6..d63f480afb74 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -96,9 +96,6 @@ routing.txt
- the new routing mechanism
shaper.txt
- info on the module that can shape/limit transmitted traffic.
-sk98lin.txt
- - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit
- Ethernet Adapter family driver info
skfp.txt
- SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info.
smc9.txt
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt
deleted file mode 100644
index 8590a954df1d..000000000000
--- a/Documentation/networking/sk98lin.txt
+++ /dev/null
@@ -1,568 +0,0 @@
-(C)Copyright 1999-2004 Marvell(R).
-All rights reserved
-===========================================================================
-
-sk98lin.txt created 13-Feb-2004
-
-Readme File for sk98lin v6.23
-Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX
-
-This file contains
- 1 Overview
- 2 Required Files
- 3 Installation
- 3.1 Driver Installation
- 3.2 Inclusion of adapter at system start
- 4 Driver Parameters
- 4.1 Per-Port Parameters
- 4.2 Adapter Parameters
- 5 Large Frame Support
- 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
- 7 Troubleshooting
-
-===========================================================================
-
-
-1 Overview
-===========
-
-The sk98lin driver supports the Marvell Yukon and SysKonnect
-SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has
-been tested with Linux on Intel/x86 machines.
-***
-
-
-2 Required Files
-=================
-
-The linux kernel source.
-No additional files required.
-***
-
-
-3 Installation
-===============
-
-It is recommended to download the latest version of the driver from the
-SysKonnect web site www.syskonnect.com. If you have downloaded the latest
-driver, the Linux kernel has to be patched before the driver can be
-installed. For details on how to patch a Linux kernel, refer to the
-patch.txt file.
-
-3.1 Driver Installation
-------------------------
-
-The following steps describe the actions that are required to install
-the driver and to start it manually. These steps should be carried
-out for the initial driver setup. Once confirmed to be ok, they can
-be included in the system start.
-
-NOTE 1: To perform the following tasks you need 'root' access.
-
-NOTE 2: In case of problems, please read the section "Troubleshooting"
- below.
-
-The driver can either be integrated into the kernel or it can be compiled
-as a module. Select the appropriate option during the kernel
-configuration.
-
-Compile/use the driver as a module
-----------------------------------
-To compile the driver, go to the directory /usr/src/linux and
-execute the command "make menuconfig" or "make xconfig" and proceed as
-follows:
-
-To integrate the driver permanently into the kernel, proceed as follows:
-
-1. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
-2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support"
- with (*)
-3. Build a new kernel when the configuration of the above options is
- finished.
-4. Install the new kernel.
-5. Reboot your system.
-
-To use the driver as a module, proceed as follows:
-
-1. Enable 'loadable module support' in the kernel.
-2. For automatic driver start, enable the 'Kernel module loader'.
-3. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
-4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support"
- with (M)
-5. Execute the command "make modules".
-6. Execute the command "make modules_install".
- The appropriate modules will be installed.
-7. Reboot your system.
-
-
-Load the module manually
-------------------------
-To load the module manually, proceed as follows:
-
-1. Enter "modprobe sk98lin".
-2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in
- your computer and you have a /proc file system, execute the command:
- "ls /proc/net/sk98lin/"
- This should produce an output containing a line with the following
- format:
- eth0 eth1 ...
- which indicates that your adapter has been found and initialized.
-
- NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx
- adapter installed, the adapters will be listed as 'eth0',
- 'eth1', 'eth2', etc.
- For each adapter, repeat steps 3 and 4 below.
-
- NOTE 2: If you have other Ethernet adapters installed, your Marvell
- Yukon or SysKonnect SK-98xx adapter will be mapped to the
- next available number, e.g. 'eth1'. The mapping is executed
- automatically.
- The module installation message (displayed either in a system
- log file or on the console) prints a line for each adapter
- found containing the corresponding 'ethX'.
-
-3. Select an IP address and assign it to the respective adapter by
- entering:
- ifconfig eth0 <ip-address>
- With this command, the adapter is connected to the Ethernet.
-
- SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter
- is now active, the link status LED of the primary port is active and
- the link status LED of the secondary port (on dual port adapters) is
- blinking (if the ports are connected to a switch or hub).
- SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active.
- In addition, you will receive a status message on the console stating
- "ethX: network connection up using port Y" and showing the selected
- connection parameters (x stands for the ethernet device number
- (0,1,2, etc), y stands for the port name (A or B)).
-
- NOTE: If you are in doubt about IP addresses, ask your network
- administrator for assistance.
-
-4. Your adapter should now be fully operational.
- Use 'ping <otherstation>' to verify the connection to other computers
- on your network.
-5. To check the adapter configuration view /proc/net/sk98lin/[devicename].
- For example by executing:
- "cat /proc/net/sk98lin/eth0"
-
-Unload the module
------------------
-To stop and unload the driver modules, proceed as follows:
-
-1. Execute the command "ifconfig eth0 down".
-2. Execute the command "rmmod sk98lin".
-
-3.2 Inclusion of adapter at system start
------------------------------------------
-
-Since a large number of different Linux distributions are
-available, we are unable to describe a general installation procedure
-for the driver module.
-Because the driver is now integrated in the kernel, installation should
-be easy, using the standard mechanism of your distribution.
-Refer to the distribution's manual for installation of ethernet adapters.
-
-***
-
-4 Driver Parameters
-====================
-
-Parameters can be set at the command line after the module has been
-loaded with the command 'modprobe'.
-In some distributions, the configuration tools are able to pass parameters
-to the driver module.
-
-If you use the kernel module loader, you can set driver parameters
-in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier).
-To set the driver parameters in this file, proceed as follows:
-
-1. Insert a line of the form :
- options sk98lin ...
- For "...", the same syntax is required as described for the command
- line parameters of modprobe below.
-2. To activate the new parameters, either reboot your computer
- or
- unload and reload the driver.
- The syntax of the driver parameters is:
-
- modprobe sk98lin parameter=value1[,value2[,value3...]]
-
- where value1 refers to the first adapter, value2 to the second etc.
-
-NOTE: All parameters are case sensitive. Write them exactly as shown
- below.
-
-Example:
-Suppose you have two adapters. You want to set auto-negotiation
-on the first adapter to ON and on the second adapter to OFF.
-You also want to set DuplexCapabilities on the first adapter
-to FULL, and on the second adapter to HALF.
-Then, you must enter:
-
- modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half
-
-NOTE: The number of adapters that can be configured this way is
- limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM).
- The current limit is 16. If you happen to install
- more adapters, adjust this and recompile.
-
-
-4.1 Per-Port Parameters
-------------------------
-
-These settings are available for each port on the adapter.
-In the following description, '?' stands for the port for
-which you set the parameter (A or B).
-
-Speed
------
-Parameter: Speed_?
-Values: 10, 100, 1000, Auto
-Default: Auto
-
-This parameter is used to set the speed capabilities. It is only valid
-for the SK-98xx V2.0 copper adapters.
-Usually, the speed is negotiated between the two ports during link
-establishment. If this fails, a port can be forced to a specific setting
-with this parameter.
-
-Auto-Negotiation
-----------------
-Parameter: AutoNeg_?
-Values: On, Off, Sense
-Default: On
-
-The "Sense"-mode automatically detects whether the link partner supports
-auto-negotiation or not.
-
-Duplex Capabilities
--------------------
-Parameter: DupCap_?
-Values: Half, Full, Both
-Default: Both
-
-This parameters is only relevant if auto-negotiation for this port is
-not set to "Sense". If auto-negotiation is set to "On", all three values
-are possible. If it is set to "Off", only "Full" and "Half" are allowed.
-This parameter is useful if your link partner does not support all
-possible combinations.
-
-Flow Control
-------------
-Parameter: FlowCtrl_?
-Values: Sym, SymOrRem, LocSend, None
-Default: SymOrRem
-
-This parameter can be used to set the flow control capabilities the
-port reports during auto-negotiation. It can be set for each port
-individually.
-Possible modes:
- -- Sym = Symmetric: both link partners are allowed to send
- PAUSE frames
- -- SymOrRem = SymmetricOrRemote: both or only remote partner
- are allowed to send PAUSE frames
- -- LocSend = LocalSend: only local link partner is allowed
- to send PAUSE frames
- -- None = no link partner is allowed to send PAUSE frames
-
-NOTE: This parameter is ignored if auto-negotiation is set to "Off".
-
-Role in Master-Slave-Negotiation (1000Base-T only)
---------------------------------------------------
-Parameter: Role_?
-Values: Auto, Master, Slave
-Default: Auto
-
-This parameter is only valid for the SK-9821 and SK-9822 adapters.
-For two 1000Base-T ports to communicate, one must take the role of the
-master (providing timing information), while the other must be the
-slave. Usually, this is negotiated between the two ports during link
-establishment. If this fails, a port can be forced to a specific setting
-with this parameter.
-
-
-4.2 Adapter Parameters
------------------------
-
-Connection Type (SK-98xx V2.0 copper adapters only)
----------------
-Parameter: ConType
-Values: Auto, 100FD, 100HD, 10FD, 10HD
-Default: Auto
-
-The parameter 'ConType' is a combination of all five per-port parameters
-within one single parameter. This simplifies the configuration of both ports
-of an adapter card! The different values of this variable reflect the most
-meaningful combinations of port parameters.
-
-The following table shows the values of 'ConType' and the corresponding
-combinations of the per-port parameters:
-
- ConType | DupCap AutoNeg FlowCtrl Role Speed
- ----------+------------------------------------------------------
- Auto | Both On SymOrRem Auto Auto
- 100FD | Full Off None Auto (ignored) 100
- 100HD | Half Off None Auto (ignored) 100
- 10FD | Full Off None Auto (ignored) 10
- 10HD | Half Off None Auto (ignored) 10
-
-Stating any other port parameter together with this 'ConType' variable
-will result in a merged configuration of those settings. This due to
-the fact, that the per-port parameters (e.g. Speed_? ) have a higher
-priority than the combined variable 'ConType'.
-
-NOTE: This parameter is always used on both ports of the adapter card.
-
-Interrupt Moderation
---------------------
-Parameter: Moderation
-Values: None, Static, Dynamic
-Default: None
-
-Interrupt moderation is employed to limit the maximum number of interrupts
-the driver has to serve. That is, one or more interrupts (which indicate any
-transmit or receive packet to be processed) are queued until the driver
-processes them. When queued interrupts are to be served, is determined by the
-'IntsPerSec' parameter, which is explained later below.
-
-Possible modes:
-
- -- None - No interrupt moderation is applied on the adapter card.
- Therefore, each transmit or receive interrupt is served immediately
- as soon as it appears on the interrupt line of the adapter card.
-
- -- Static - Interrupt moderation is applied on the adapter card.
- All transmit and receive interrupts are queued until a complete
- moderation interval ends. If such a moderation interval ends, all
- queued interrupts are processed in one big bunch without any delay.
- The term 'static' reflects the fact, that interrupt moderation is
- always enabled, regardless how much network load is currently
- passing via a particular interface. In addition, the duration of
- the moderation interval has a fixed length that never changes while
- the driver is operational.
-
- -- Dynamic - Interrupt moderation might be applied on the adapter card,
- depending on the load of the system. If the driver detects that the
- system load is too high, the driver tries to shield the system against
- too much network load by enabling interrupt moderation. If - at a later
- time - the CPU utilization decreases again (or if the network load is
- negligible) the interrupt moderation will automatically be disabled.
-
-Interrupt moderation should be used when the driver has to handle one or more
-interfaces with a high network load, which - as a consequence - leads also to a
-high CPU utilization. When moderation is applied in such high network load
-situations, CPU load might be reduced by 20-30%.
-
-NOTE: The drawback of using interrupt moderation is an increase of the round-
-trip-time (RTT), due to the queueing and serving of interrupts at dedicated
-moderation times.
-
-Interrupts per second
----------------------
-Parameter: IntsPerSec
-Values: 30...40000 (interrupts per second)
-Default: 2000
-
-This parameter is only used if either static or dynamic interrupt moderation
-is used on a network adapter card. Using this parameter if no moderation is
-applied will lead to no action performed.
-
-This parameter determines the length of any interrupt moderation interval.
-Assuming that static interrupt moderation is to be used, an 'IntsPerSec'
-parameter value of 2000 will lead to an interrupt moderation interval of
-500 microseconds.
-
-NOTE: The duration of the moderation interval is to be chosen with care.
-At first glance, selecting a very long duration (e.g. only 100 interrupts per
-second) seems to be meaningful, but the increase of packet-processing delay
-is tremendous. On the other hand, selecting a very short moderation time might
-compensate the use of any moderation being applied.
-
-
-Preferred Port
---------------
-Parameter: PrefPort
-Values: A, B
-Default: A
-
-This is used to force the preferred port to A or B (on dual-port network
-adapters). The preferred port is the one that is used if both are detected
-as fully functional.
-
-RLMT Mode (Redundant Link Management Technology)
-------------------------------------------------
-Parameter: RlmtMode
-Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet
-Default: CheckLinkState
-
-RLMT monitors the status of the port. If the link of the active port
-fails, RLMT switches immediately to the standby link. The virtual link is
-maintained as long as at least one 'physical' link is up.
-
-Possible modes:
-
- -- CheckLinkState - Check link state only: RLMT uses the link state
- reported by the adapter hardware for each individual port to
- determine whether a port can be used for all network traffic or
- not.
-
- -- CheckLocalPort - In this mode, RLMT monitors the network path
- between the two ports of an adapter by regularly exchanging packets
- between them. This mode requires a network configuration in which
- the two ports are able to "see" each other (i.e. there must not be
- any router between the ports).
-
- -- CheckSeg - Check local port and segmentation: This mode supports the
- same functions as the CheckLocalPort mode and additionally checks
- network segmentation between the ports. Therefore, this mode is only
- to be used if Gigabit Ethernet switches are installed on the network
- that have been configured to use the Spanning Tree protocol.
-
- -- DualNet - In this mode, ports A and B are used as separate devices.
- If you have a dual port adapter, port A will be configured as eth0
- and port B as eth1. Both ports can be used independently with
- distinct IP addresses. The preferred port setting is not used.
- RLMT is turned off.
-
-NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations
- where a network path between the ports on one adapter exists.
- Moreover, they are not designed to work where adapters are connected
- back-to-back.
-***
-
-
-5 Large Frame Support
-======================
-
-The driver supports large frames (also called jumbo frames). Using large
-frames can result in an improved throughput if transferring large amounts
-of data.
-To enable large frames, set the MTU (maximum transfer unit) of the
-interface to the desired value (up to 9000), execute the following
-command:
- ifconfig eth0 mtu 9000
-This will only work if you have two adapters connected back-to-back
-or if you use a switch that supports large frames. When using a switch,
-it should be configured to allow large frames and auto-negotiation should
-be set to OFF. The setting must be configured on all adapters that can be
-reached by the large frames. If one adapter is not set to receive large
-frames, it will simply drop them.
-
-You can switch back to the standard ethernet frame size by executing the
-following command:
- ifconfig eth0 mtu 1500
-
-To permanently configure this setting, add a script with the 'ifconfig'
-line to the system startup sequence (named something like "S99sk98lin"
-in /etc/rc.d/rc2.d).
-***
-
-
-6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
-==================================================================
-
-The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and
-Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad.
-These features are only available after installation of open source
-modules available on the Internet:
-For VLAN go to: http://www.candelatech.com/~greear/vlan.html
-For Link Aggregation go to: http://www.st.rim.or.jp/~yumo
-
-NOTE: SysKonnect GmbH does not offer any support for these open source
- modules and does not take the responsibility for any kind of
- failures or problems arising in connection with these modules.
-
-NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may
- cause problems when unloading the driver.
-
-
-7 Troubleshooting
-==================
-
-If any problems occur during the installation process, check the
-following list:
-
-
-Problem: The SK-98xx adapter cannot be found by the driver.
-Solution: In /proc/pci search for the following entry:
- 'Ethernet controller: SysKonnect SK-98xx ...'
- If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has
- been found by the system and should be operational.
- If this entry does not exist or if the file '/proc/pci' is not
- found, there may be a hardware problem or the PCI support may
- not be enabled in your kernel.
- The adapter can be checked using the diagnostics program which
- is available on the SysKonnect web site:
- www.syskonnect.com
-
- Some COMPAQ machines have problems dealing with PCI under Linux.
- This problem is described in the 'PCI howto' document
- (included in some distributions or available from the
- web, e.g. at 'www.linux.org').
-
-
-Problem: Programs such as 'ifconfig' or 'route' cannot be found or the
- error message 'Operation not permitted' is displayed.
-Reason: You are not logged in as user 'root'.
-Solution: Logout and login as 'root' or change to 'root' via 'su'.
-
-
-Problem: Upon use of the command 'ping <address>' the message
- "ping: sendto: Network is unreachable" is displayed.
-Reason: Your route is not set correctly.
-Solution: If you are using RedHat, you probably forgot to set up the
- route in the 'network configuration'.
- Check the existing routes with the 'route' command and check
- if an entry for 'eth0' exists, and if so, if it is set correctly.
-
-
-Problem: The driver can be started, the adapter is connected to the
- network, but you cannot receive or transmit any packets;
- e.g. 'ping' does not work.
-Reason: There is an incorrect route in your routing table.
-Solution: Check the routing table with the command 'route' and read the
- manual help pages dealing with routes (enter 'man route').
-
-NOTE: Although the 2.2.x kernel versions generate the routing entry
- automatically, problems of this kind may occur here as well. We've
- come across a situation in which the driver started correctly at
- system start, but after the driver has been removed and reloaded,
- the route of the adapter's network pointed to the 'dummy0'device
- and had to be corrected manually.
-
-
-Problem: Your computer should act as a router between multiple
- IP subnetworks (using multiple adapters), but computers in
- other subnetworks cannot be reached.
-Reason: Either the router's kernel is not configured for IP forwarding
- or the routing table and gateway configuration of at least one
- computer is not working.
-
-Problem: Upon driver start, the following error message is displayed:
- "eth0: -- ERROR --
- Class: internal Software error
- Nr: 0xcc
- Msg: SkGeInitPort() cannot init running ports"
-Reason: You are using a driver compiled for single processor machines
- on a multiprocessor machine with SMP (Symmetric MultiProcessor)
- kernel.
-Solution: Configure your kernel appropriately and recompile the kernel or
- the modules.
-
-
-
-If your problem is not listed here, please contact SysKonnect's technical
-support for help (linux@syskonnect.de).
-When contacting our technical support, please ensure that the following
-information is available:
-- System Manufacturer and HW Informations (CPU, Memory... )
-- PCI-Boards in your system
-- Distribution
-- Kernel version
-- Driver version
-***
-
-
-
-***End of Readme File***
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt
new file mode 100644
index 000000000000..4b4adb8eb14f
--- /dev/null
+++ b/Documentation/networking/spider_net.txt
@@ -0,0 +1,204 @@
+
+ The Spidernet Device Driver
+ ===========================
+
+Written by Linas Vepstas <linas@austin.ibm.com>
+
+Version of 7 June 2007
+
+Abstract
+========
+This document sketches the structure of portions of the spidernet
+device driver in the Linux kernel tree. The spidernet is a gigabit
+ethernet device built into the Toshiba southbridge commonly used
+in the SONY Playstation 3 and the IBM QS20 Cell blade.
+
+The Structure of the RX Ring.
+=============================
+The receive (RX) ring is a circular linked list of RX descriptors,
+together with three pointers into the ring that are used to manage its
+contents.
+
+The elements of the ring are called "descriptors" or "descrs"; they
+describe the received data. This includes a pointer to a buffer
+containing the received data, the buffer size, and various status bits.
+
+There are three primary states that a descriptor can be in: "empty",
+"full" and "not-in-use". An "empty" or "ready" descriptor is ready
+to receive data from the hardware. A "full" descriptor has data in it,
+and is waiting to be emptied and processed by the OS. A "not-in-use"
+descriptor is neither empty or full; it is simply not ready. It may
+not even have a data buffer in it, or is otherwise unusable.
+
+During normal operation, on device startup, the OS (specifically, the
+spidernet device driver) allocates a set of RX descriptors and RX
+buffers. These are all marked "empty", ready to receive data. This
+ring is handed off to the hardware, which sequentially fills in the
+buffers, and marks them "full". The OS follows up, taking the full
+buffers, processing them, and re-marking them empty.
+
+This filling and emptying is managed by three pointers, the "head"
+and "tail" pointers, managed by the OS, and a hardware current
+descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
+currently being filled. When this descr is filled, the hardware
+marks it full, and advances the GDACTDPA by one. Thus, when there is
+flowing RX traffic, every descr behind it should be marked "full",
+and everything in front of it should be "empty". If the hardware
+discovers that the current descr is not empty, it will signal an
+interrupt, and halt processing.
+
+The tail pointer tails or trails the hardware pointer. When the
+hardware is ahead, the tail pointer will be pointing at a "full"
+descr. The OS will process this descr, and then mark it "not-in-use",
+and advance the tail pointer. Thus, when there is flowing RX traffic,
+all of the descrs in front of the tail pointer should be "full", and
+all of those behind it should be "not-in-use". When RX traffic is not
+flowing, then the tail pointer can catch up to the hardware pointer.
+The OS will then note that the current tail is "empty", and halt
+processing.
+
+The head pointer (somewhat mis-named) follows after the tail pointer.
+When traffic is flowing, then the head pointer will be pointing at
+a "not-in-use" descr. The OS will perform various housekeeping duties
+on this descr. This includes allocating a new data buffer and
+dma-mapping it so as to make it visible to the hardware. The OS will
+then mark the descr as "empty", ready to receive data. Thus, when there
+is flowing RX traffic, everything in front of the head pointer should
+be "not-in-use", and everything behind it should be "empty". If no
+RX traffic is flowing, then the head pointer can catch up to the tail
+pointer, at which point the OS will notice that the head descr is
+"empty", and it will halt processing.
+
+Thus, in an idle system, the GDACTDPA, tail and head pointers will
+all be pointing at the same descr, which should be "empty". All of the
+other descrs in the ring should be "empty" as well.
+
+The show_rx_chain() routine will print out the the locations of the
+GDACTDPA, tail and head pointers. It will also summarize the contents
+of the ring, starting at the tail pointer, and listing the status
+of the descrs that follow.
+
+A typical example of the output, for a nearly idle system, might be
+
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=20
+net eth1: Chain head is at 20
+net eth1: HW curr desc (GDACTDPA) is at 21
+net eth1: Have 1 descrs with stat=x40800101
+net eth1: HW next desc (GDACNEXTDA) is at 22
+net eth1: Last 255 descrs with stat=xa0800000
+
+In the above, the hardware has filled in one descr, number 20. Both
+head and tail are pointing at 20, because it has not yet been emptied.
+Meanwhile, hw is pointing at 21, which is free.
+
+The "Have nnn decrs" refers to the descr starting at the tail: in this
+case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
+to all of the rest of the descrs, from the last status change. The "nnn"
+is a count of how many descrs have exactly the same status.
+
+The status x4... corresponds to "full" and status xa... corresponds
+to "empty". The actual value printed is RXCOMST_A.
+
+In the device driver source code, a different set of names are
+used for these same concepts, so that
+
+"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
+"full" == SPIDER_NET_DESCR_FRAME_END == 0x4
+"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
+
+
+The RX RAM full bug/feature
+===========================
+
+As long as the OS can empty out the RX buffers at a rate faster than
+the hardware can fill them, there is no problem. If, for some reason,
+the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
+pointer will catch up to the head, notice the not-empty condition,
+ad stop. However, RX packets may still continue arriving on the wire.
+The spidernet chip can save some limited number of these in local RAM.
+When this local ram fills up, the spider chip will issue an interrupt
+indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
+will be set in GHIINT1STS). When the RX ram full condition occurs,
+a certain bug/feature is triggered that has to be specially handled.
+This section describes the special handling for this condition.
+
+When the OS finally has a chance to run, it will empty out the RX ring.
+In particular, it will clear the descriptor on which the hardware had
+stopped. However, once the hardware has decided that a certain
+descriptor is invalid, it will not restart at that descriptor; instead
+it will restart at the next descr. This potentially will lead to a
+deadlock condition, as the tail pointer will be pointing at this descr,
+which, from the OS point of view, is empty; the OS will be waiting for
+this descr to be filled. However, the hardware has skipped this descr,
+and is filling the next descrs. Since the OS doesn't see this, there
+is a potential deadlock, with the OS waiting for one descr to fill,
+while the hardware is waiting for a different set of descrs to become
+empty.
+
+A call to show_rx_chain() at this point indicates the nature of the
+problem. A typical print when the network is hung shows the following:
+
+net eth1: Spider RX RAM full, incoming packets might be discarded!
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=255
+net eth1: Chain head is at 255
+net eth1: HW curr desc (GDACTDPA) is at 0
+net eth1: Have 1 descrs with stat=xa0800000
+net eth1: HW next desc (GDACNEXTDA) is at 1
+net eth1: Have 127 descrs with stat=x40800101
+net eth1: Have 1 descrs with stat=x40800001
+net eth1: Have 126 descrs with stat=x40800101
+net eth1: Last 1 descrs with stat=xa0800000
+
+Both the tail and head pointers are pointing at descr 255, which is
+marked xa... which is "empty". Thus, from the OS point of view, there
+is nothing to be done. In particular, there is the implicit assumption
+that everything in front of the "empty" descr must surely also be empty,
+as explained in the last section. The OS is waiting for descr 255 to
+become non-empty, which, in this case, will never happen.
+
+The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
+Since its already full, the hardware can do nothing more, and thus has
+halted processing. Notice that descrs 0 through 254 are all marked
+"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
+descr 254, since tail was at 255.) Thus, the system is deadlocked,
+and there can be no forward progress; the OS thinks there's nothing
+to do, and the hardware has nowhere to put incoming data.
+
+This bug/feature is worked around with the spider_net_resync_head_ptr()
+routine. When the driver receives RX interrupts, but an examination
+of the RX chain seems to show it is empty, then it is probable that
+the hardware has skipped a descr or two (sometimes dozens under heavy
+network conditions). The spider_net_resync_head_ptr() subroutine will
+search the ring for the next full descr, and the driver will resume
+operations there. Since this will leave "holes" in the ring, there
+is also a spider_net_resync_tail_ptr() that will skip over such holes.
+
+As of this writing, the spider_net_resync() strategy seems to work very
+well, even under heavy network loads.
+
+
+The TX ring
+===========
+The TX ring uses a low-watermark interrupt scheme to make sure that
+the TX queue is appropriately serviced for large packet sizes.
+
+For packet sizes greater than about 1KBytes, the kernel can fill
+the TX ring quicker than the device can drain it. Once the ring
+is full, the netdev is stopped. When there is room in the ring,
+the netdev needs to be reawakened, so that more TX packets are placed
+in the ring. The hardware can empty the ring about four times per jiffy,
+so its not appropriate to wait for the poll routine to refill, since
+the poll routine runs only once per jiffy. The low-watermark mechanism
+marks a descr about 1/4th of the way from the bottom of the queue, so
+that an interrupt is generated when the descr is processed. This
+interrupt wakes up the netdev, which can then refill the queue.
+For large packets, this mechanism generates a relatively small number
+of interrupts, about 1K/sec. For smaller packets, this will drop to zero
+interrupts, as the hardware can empty the queue faster than the kernel
+can fill it.
+
+
+ ======= END OF DOCUMENT ========
+
diff --git a/Documentation/power_supply_class.txt b/Documentation/power_supply_class.txt
new file mode 100644
index 000000000000..9758cf433c06
--- /dev/null
+++ b/Documentation/power_supply_class.txt
@@ -0,0 +1,167 @@
+Linux power supply class
+========================
+
+Synopsis
+~~~~~~~~
+Power supply class used to represent battery, UPS, AC or DC power supply
+properties to user-space.
+
+It defines core set of attributes, which should be applicable to (almost)
+every power supply out there. Attributes are available via sysfs and uevent
+interfaces.
+
+Each attribute has well defined meaning, up to unit of measure used. While
+the attributes provided are believed to be universally applicable to any
+power supply, specific monitoring hardware may not be able to provide them
+all, so any of them may be skipped.
+
+Power supply class is extensible, and allows to define drivers own attributes.
+The core attribute set is subject to the standard Linux evolution (i.e.
+if it will be found that some attribute is applicable to many power supply
+types or their drivers, it can be added to the core set).
+
+It also integrates with LED framework, for the purpose of providing
+typically expected feedback of battery charging/fully charged status and
+AC/USB power supply online status. (Note that specific details of the
+indication (including whether to use it at all) are fully controllable by
+user and/or specific machine defaults, per design principles of LED
+framework).
+
+
+Attributes/properties
+~~~~~~~~~~~~~~~~~~~~~
+Power supply class has predefined set of attributes, this eliminates code
+duplication across drivers. Power supply class insist on reusing its
+predefined attributes *and* their units.
+
+So, userspace gets predictable set of attributes and their units for any
+kind of power supply, and can process/present them to a user in consistent
+manner. Results for different power supplies and machines are also directly
+comparable.
+
+See drivers/power/ds2760_battery.c and drivers/power/pda_power.c for the
+example how to declare and handle attributes.
+
+
+Units
+~~~~~
+Quoting include/linux/power_supply.h:
+
+ All voltages, currents, charges, energies, time and temperatures in µV,
+ µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise
+ stated. It's driver's job to convert its raw values to units in which
+ this class operates.
+
+
+Attributes/properties detailed
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~
+~ ~
+~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~
+~ of battery, this class distinguish these terms. Don't mix them! ~
+~ ~
+~ CHARGE_* attributes represents capacity in µAh only. ~
+~ ENERGY_* attributes represents capacity in µWh only. ~
+~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~
+~ ~
+~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
+
+Postfixes:
+_AVG - *hardware* averaged value, use it if your hardware is really able to
+report averaged values.
+_NOW - momentary/instantaneous values.
+
+STATUS - this attribute represents operating status (charging, full,
+discharging (i.e. powering a load), etc.). This corresponds to
+BATTERY_STATUS_* values, as defined in battery.h.
+
+HEALTH - represents health of the battery, values corresponds to
+POWER_SUPPLY_HEALTH_*, defined in battery.h.
+
+VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
+minimal power supply voltages. Maximal/minimal means values of voltages
+when battery considered "full"/"empty" at normal conditions. Yes, there is
+no direct relation between voltage and battery capacity, but some dumb
+batteries use voltage for very approximated calculation of capacity.
+Battery driver also can use this attribute just to inform userspace
+about maximal and minimal voltage thresholds of a given battery.
+
+CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when
+battery considered full/empty.
+
+ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy.
+
+CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value
+of charge when battery became full/empty". It also could mean "value of
+charge when battery considered full/empty at given conditions (temperature,
+age)". I.e. these attributes represents real thresholds, not design values.
+
+ENERGY_FULL, ENERGY_EMPTY - same as above but for energy.
+
+CAPACITY - capacity in percents.
+CAPACITY_LEVEL - capacity level. This corresponds to
+POWER_SUPPLY_CAPACITY_LEVEL_*.
+
+TEMP - temperature of the power supply.
+TEMP_AMBIENT - ambient temperature.
+
+TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e.
+while battery powers a load)
+TIME_TO_FULL - seconds left for battery to be considered full (i.e.
+while battery is charging)
+
+
+Battery <-> external power supply interaction
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Often power supplies are acting as supplies and supplicants at the same
+time. Batteries are good example. So, batteries usually care if they're
+externally powered or not.
+
+For that case, power supply class implements notification mechanism for
+batteries.
+
+External power supply (AC) lists supplicants (batteries) names in
+"supplied_to" struct member, and each power_supply_changed() call
+issued by external power supply will notify supplicants via
+external_power_changed callback.
+
+
+QA
+~~
+Q: Where is POWER_SUPPLY_PROP_XYZ attribute?
+A: If you cannot find attribute suitable for your driver needs, feel free
+ to add it and send patch along with your driver.
+
+ The attributes available currently are the ones currently provided by the
+ drivers written.
+
+ Good candidates to add in future: model/part#, cycle_time, manufacturer,
+ etc.
+
+
+Q: I have some very specific attribute (e.g. battery color), should I add
+ this attribute to standard ones?
+A: Most likely, no. Such attribute can be placed in the driver itself, if
+ it is useful. Of course, if the attribute in question applicable to
+ large set of batteries, provided by many drivers, and/or comes from
+ some general battery specification/standard, it may be a candidate to
+ be added to the core attribute set.
+
+
+Q: Suppose, my battery monitoring chip/firmware does not provides capacity
+ in percents, but provides charge_{now,full,empty}. Should I calculate
+ percentage capacity manually, inside the driver, and register CAPACITY
+ attribute? The same question about time_to_empty/time_to_full.
+A: Most likely, no. This class is designed to export properties which are
+ directly measurable by the specific hardware available.
+
+ Inferring not available properties using some heuristics or mathematical
+ model is not subject of work for a battery driver. Such functionality
+ should be factored out, and in fact, apm_power, the driver to serve
+ legacy APM API on top of power supply class, uses a simple heuristic of
+ approximating remaining battery capacity based on its charge, current,
+ voltage and so on. But full-fledged battery model is likely not subject
+ for kernel at all, as it would require floating point calculation to deal
+ with things like differential equations and Kalman filters. This is
+ better be handled by batteryd/libbattery, yet to be written.
diff --git a/Documentation/sched-design-CFS.txt b/Documentation/sched-design-CFS.txt
new file mode 100644
index 000000000000..16feebb7bdc0
--- /dev/null
+++ b/Documentation/sched-design-CFS.txt
@@ -0,0 +1,119 @@
+
+This is the CFS scheduler.
+
+80% of CFS's design can be summed up in a single sentence: CFS basically
+models an "ideal, precise multi-tasking CPU" on real hardware.
+
+"Ideal multi-tasking CPU" is a (non-existent :-)) CPU that has 100%
+physical power and which can run each task at precise equal speed, in
+parallel, each at 1/nr_running speed. For example: if there are 2 tasks
+running then it runs each at 50% physical power - totally in parallel.
+
+On real hardware, we can run only a single task at once, so while that
+one task runs, the other tasks that are waiting for the CPU are at a
+disadvantage - the current task gets an unfair amount of CPU time. In
+CFS this fairness imbalance is expressed and tracked via the per-task
+p->wait_runtime (nanosec-unit) value. "wait_runtime" is the amount of
+time the task should now run on the CPU for it to become completely fair
+and balanced.
+
+( small detail: on 'ideal' hardware, the p->wait_runtime value would
+ always be zero - no task would ever get 'out of balance' from the
+ 'ideal' share of CPU time. )
+
+CFS's task picking logic is based on this p->wait_runtime value and it
+is thus very simple: it always tries to run the task with the largest
+p->wait_runtime value. In other words, CFS tries to run the task with
+the 'gravest need' for more CPU time. So CFS always tries to split up
+CPU time between runnable tasks as close to 'ideal multitasking
+hardware' as possible.
+
+Most of the rest of CFS's design just falls out of this really simple
+concept, with a few add-on embellishments like nice levels,
+multiprocessing and various algorithm variants to recognize sleepers.
+
+In practice it works like this: the system runs a task a bit, and when
+the task schedules (or a scheduler tick happens) the task's CPU usage is
+'accounted for': the (small) time it just spent using the physical CPU
+is deducted from p->wait_runtime. [minus the 'fair share' it would have
+gotten anyway]. Once p->wait_runtime gets low enough so that another
+task becomes the 'leftmost task' of the time-ordered rbtree it maintains
+(plus a small amount of 'granularity' distance relative to the leftmost
+task so that we do not over-schedule tasks and trash the cache) then the
+new leftmost task is picked and the current task is preempted.
+
+The rq->fair_clock value tracks the 'CPU time a runnable task would have
+fairly gotten, had it been runnable during that time'. So by using
+rq->fair_clock values we can accurately timestamp and measure the
+'expected CPU time' a task should have gotten. All runnable tasks are
+sorted in the rbtree by the "rq->fair_clock - p->wait_runtime" key, and
+CFS picks the 'leftmost' task and sticks to it. As the system progresses
+forwards, newly woken tasks are put into the tree more and more to the
+right - slowly but surely giving a chance for every task to become the
+'leftmost task' and thus get on the CPU within a deterministic amount of
+time.
+
+Some implementation details:
+
+ - the introduction of Scheduling Classes: an extensible hierarchy of
+ scheduler modules. These modules encapsulate scheduling policy
+ details and are handled by the scheduler core without the core
+ code assuming about them too much.
+
+ - sched_fair.c implements the 'CFS desktop scheduler': it is a
+ replacement for the vanilla scheduler's SCHED_OTHER interactivity
+ code.
+
+ I'd like to give credit to Con Kolivas for the general approach here:
+ he has proven via RSDL/SD that 'fair scheduling' is possible and that
+ it results in better desktop scheduling. Kudos Con!
+
+ The CFS patch uses a completely different approach and implementation
+ from RSDL/SD. My goal was to make CFS's interactivity quality exceed
+ that of RSDL/SD, which is a high standard to meet :-) Testing
+ feedback is welcome to decide this one way or another. [ and, in any
+ case, all of SD's logic could be added via a kernel/sched_sd.c module
+ as well, if Con is interested in such an approach. ]
+
+ CFS's design is quite radical: it does not use runqueues, it uses a
+ time-ordered rbtree to build a 'timeline' of future task execution,
+ and thus has no 'array switch' artifacts (by which both the vanilla
+ scheduler and RSDL/SD are affected).
+
+ CFS uses nanosecond granularity accounting and does not rely on any
+ jiffies or other HZ detail. Thus the CFS scheduler has no notion of
+ 'timeslices' and has no heuristics whatsoever. There is only one
+ central tunable:
+
+ /proc/sys/kernel/sched_granularity_ns
+
+ which can be used to tune the scheduler from 'desktop' (low
+ latencies) to 'server' (good batching) workloads. It defaults to a
+ setting suitable for desktop workloads. SCHED_BATCH is handled by the
+ CFS scheduler module too.
+
+ Due to its design, the CFS scheduler is not prone to any of the
+ 'attacks' that exist today against the heuristics of the stock
+ scheduler: fiftyp.c, thud.c, chew.c, ring-test.c, massive_intr.c all
+ work fine and do not impact interactivity and produce the expected
+ behavior.
+
+ the CFS scheduler has a much stronger handling of nice levels and
+ SCHED_BATCH: both types of workloads should be isolated much more
+ agressively than under the vanilla scheduler.
+
+ ( another detail: due to nanosec accounting and timeline sorting,
+ sched_yield() support is very simple under CFS, and in fact under
+ CFS sched_yield() behaves much better than under any other
+ scheduler i have tested so far. )
+
+ - sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler
+ way than the vanilla scheduler does. It uses 100 runqueues (for all
+ 100 RT priority levels, instead of 140 in the vanilla scheduler)
+ and it needs no expired array.
+
+ - reworked/sanitized SMP load-balancing: the runqueue-walking
+ assumptions are gone from the load-balancing code now, and
+ iterators of the scheduling modules are used. The balancing code got
+ quite a bit simpler as a result.
+
OpenPOWER on IntegriCloud