summaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/00-INDEX3
-rw-r--r--Documentation/networking/ip-sysctl.txt3
-rw-r--r--Documentation/networking/l2tp.txt169
-rw-r--r--Documentation/networking/multiqueue.txt111
-rw-r--r--Documentation/networking/netdevices.txt38
-rw-r--r--Documentation/networking/sk98lin.txt568
-rw-r--r--Documentation/networking/spider_net.txt204
7 files changed, 517 insertions, 579 deletions
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index 153d84d281e6..d63f480afb74 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -96,9 +96,6 @@ routing.txt
- the new routing mechanism
shaper.txt
- info on the module that can shape/limit transmitted traffic.
-sk98lin.txt
- - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit
- Ethernet Adapter family driver info
skfp.txt
- SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info.
smc9.txt
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 8f6067ea5e3e..32c2e9da5f3a 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -880,8 +880,7 @@ accept_redirects - BOOLEAN
accept_source_route - INTEGER
Accept source routing (routing extension header).
- > 0: Accept routing header.
- = 0: Accept only routing header type 2.
+ >= 0: Accept only routing header type 2.
< 0: Do not accept routing header.
Default: 0
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt
new file mode 100644
index 000000000000..2451f551c505
--- /dev/null
+++ b/Documentation/networking/l2tp.txt
@@ -0,0 +1,169 @@
+This brief document describes how to use the kernel's PPPoL2TP driver
+to provide L2TP functionality. L2TP is a protocol that tunnels one or
+more PPP sessions over a UDP tunnel. It is commonly used for VPNs
+(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP
+network infrastructure.
+
+Design
+======
+
+The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by
+which PPP frames carried through an L2TP session are passed through
+the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all
+PPP interaction with the peer. PPP network interfaces are created for
+each local PPP endpoint.
+
+The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP
+control and data frames. L2TP control frames carry messages between
+L2TP clients/servers and are used to setup / teardown tunnels and
+sessions. An L2TP client or server is implemented in userspace and
+will use a regular UDP socket per tunnel. L2TP data frames carry PPP
+frames, which may be PPP control or PPP data. The kernel's PPP
+subsystem arranges for PPP control frames to be delivered to pppd,
+while data frames are forwarded as usual.
+
+Each tunnel and session within a tunnel is assigned a unique tunnel_id
+and session_id. These ids are carried in the L2TP header of every
+control and data packet. The pppol2tp driver uses them to lookup
+internal tunnel and/or session contexts. Zero tunnel / session ids are
+treated specially - zero ids are never assigned to tunnels or sessions
+in the network. In the driver, the tunnel context keeps a pointer to
+the tunnel UDP socket. The session context keeps a pointer to the
+PPPoL2TP socket, as well as other data that lets the driver interface
+to the kernel PPP subsystem.
+
+Note that the pppol2tp kernel driver handles only L2TP data frames;
+L2TP control frames are simply passed up to userspace in the UDP
+tunnel socket. The kernel handles all datapath aspects of the
+protocol, including data packet resequencing (if enabled).
+
+There are a number of requirements on the userspace L2TP daemon in
+order to use the pppol2tp driver.
+
+1. Use a UDP socket per tunnel.
+
+2. Create a single PPPoL2TP socket per tunnel bound to a special null
+ session id. This is used only for communicating with the driver but
+ must remain open while the tunnel is active. Opening this tunnel
+ management socket causes the driver to mark the tunnel socket as an
+ L2TP UDP encapsulation socket and flags it for use by the
+ referenced tunnel id. This hooks up the UDP receive path via
+ udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed
+ in this special PPPoX socket.
+
+3. Create a PPPoL2TP socket per L2TP session. This is typically done
+ by starting pppd with the pppol2tp plugin and appropriate
+ arguments. A PPPoL2TP tunnel management socket (Step 2) must be
+ created before the first PPPoL2TP session socket is created.
+
+When creating PPPoL2TP sockets, the application provides information
+to the driver about the socket in a socket connect() call. Source and
+destination tunnel and session ids are provided, as well as the file
+descriptor of a UDP socket. See struct pppol2tp_addr in
+include/linux/if_ppp.h. Note that zero tunnel / session ids are
+treated specially. When creating the per-tunnel PPPoL2TP management
+socket in Step 2 above, zero source and destination session ids are
+specified, which tells the driver to prepare the supplied UDP file
+descriptor for use as an L2TP tunnel socket.
+
+Userspace may control behavior of the tunnel or session using
+setsockopt and ioctl on the PPPoX socket. The following socket
+options are supported:-
+
+DEBUG - bitmask of debug message categories. See below.
+SENDSEQ - 0 => don't send packets with sequence numbers
+ 1 => send packets with sequence numbers
+RECVSEQ - 0 => receive packet sequence numbers are optional
+ 1 => drop receive packets without sequence numbers
+LNSMODE - 0 => act as LAC.
+ 1 => act as LNS.
+REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder.
+
+Only the DEBUG option is supported by the special tunnel management
+PPPoX socket.
+
+In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided
+to retrieve tunnel and session statistics from the kernel using the
+PPPoX socket of the appropriate tunnel or session.
+
+Debugging
+=========
+
+The driver supports a flexible debug scheme where kernel trace
+messages may be optionally enabled per tunnel and per session. Care is
+needed when debugging a live system since the messages are not
+rate-limited and a busy system could be swamped. Userspace uses
+setsockopt on the PPPoX socket to set a debug mask.
+
+The following debug mask bits are available:
+
+PPPOL2TP_MSG_DEBUG verbose debug (if compiled in)
+PPPOL2TP_MSG_CONTROL userspace - kernel interface
+PPPOL2TP_MSG_SEQ sequence numbers handling
+PPPOL2TP_MSG_DATA data packets
+
+Sample Userspace Code
+=====================
+
+1. Create tunnel management PPPoX socket
+
+ kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
+ if (kernel_fd >= 0) {
+ struct sockaddr_pppol2tp sax;
+ struct sockaddr_in const *peer_addr;
+
+ peer_addr = l2tp_tunnel_get_peer_addr(tunnel);
+ memset(&sax, 0, sizeof(sax));
+ sax.sa_family = AF_PPPOX;
+ sax.sa_protocol = PX_PROTO_OL2TP;
+ sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */
+ sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr;
+ sax.pppol2tp.addr.sin_port = peer_addr->sin_port;
+ sax.pppol2tp.addr.sin_family = AF_INET;
+ sax.pppol2tp.s_tunnel = tunnel_id;
+ sax.pppol2tp.s_session = 0; /* special case: mgmt socket */
+ sax.pppol2tp.d_tunnel = 0;
+ sax.pppol2tp.d_session = 0; /* special case: mgmt socket */
+
+ if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) {
+ perror("connect failed");
+ result = -errno;
+ goto err;
+ }
+ }
+
+2. Create session PPPoX data socket
+
+ struct sockaddr_pppol2tp sax;
+ int fd;
+
+ /* Note, the target socket must be bound already, else it will not be ready */
+ sax.sa_family = AF_PPPOX;
+ sax.sa_protocol = PX_PROTO_OL2TP;
+ sax.pppol2tp.fd = tunnel_fd;
+ sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr;
+ sax.pppol2tp.addr.sin_port = addr->sin_port;
+ sax.pppol2tp.addr.sin_family = AF_INET;
+ sax.pppol2tp.s_tunnel = tunnel_id;
+ sax.pppol2tp.s_session = session_id;
+ sax.pppol2tp.d_tunnel = peer_tunnel_id;
+ sax.pppol2tp.d_session = peer_session_id;
+
+ /* session_fd is the fd of the session's PPPoL2TP socket.
+ * tunnel_fd is the fd of the tunnel UDP socket.
+ */
+ fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax));
+ if (fd < 0 ) {
+ return -errno;
+ }
+ return 0;
+
+Miscellanous
+============
+
+The PPPoL2TP driver was developed as part of the OpenL2TP project by
+Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server,
+designed from the ground up to have the L2TP datapath in the
+kernel. The project also implemented the pppol2tp plugin for pppd
+which allows pppd to use the kernel driver. Details can be found at
+http://openl2tp.sourceforge.net.
diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
new file mode 100644
index 000000000000..00b60cce2224
--- /dev/null
+++ b/Documentation/networking/multiqueue.txt
@@ -0,0 +1,111 @@
+
+ HOWTO for multiqueue network device support
+ ===========================================
+
+Section 1: Base driver requirements for implementing multiqueue support
+Section 2: Qdisc support for multiqueue devices
+Section 3: Brief howto using PRIO or RR for multiqueue devices
+
+
+Intro: Kernel support for multiqueue devices
+---------------------------------------------------------
+
+Kernel support for multiqueue devices is only an API that is presented to the
+netdevice layer for base drivers to implement. This feature is part of the
+core networking stack, and all network devices will be running on the
+multiqueue-aware stack. If a base driver only has one queue, then these
+changes are transparent to that driver.
+
+
+Section 1: Base driver requirements for implementing multiqueue support
+-----------------------------------------------------------------------
+
+Base drivers are required to use the new alloc_etherdev_mq() or
+alloc_netdev_mq() functions to allocate the subqueues for the device. The
+underlying kernel API will take care of the allocation and deallocation of
+the subqueue memory, as well as netdev configuration of where the queues
+exist in memory.
+
+The base driver will also need to manage the queues as it does the global
+netdev->queue_lock today. Therefore base drivers should use the
+netif_{start|stop|wake}_subqueue() functions to manage each queue while the
+device is still operational. netdev->queue_lock is still used when the device
+comes online or when it's completely shut down (unregister_netdev(), etc.).
+
+Finally, the base driver should indicate that it is a multiqueue device. The
+feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
+bitmap on device initialization. Below is an example from e1000:
+
+#ifdef CONFIG_E1000_MQ
+ if ( (adapter->hw.mac.type == e1000_82571) ||
+ (adapter->hw.mac.type == e1000_82572) ||
+ (adapter->hw.mac.type == e1000_80003es2lan))
+ netdev->features |= NETIF_F_MULTI_QUEUE;
+#endif
+
+
+Section 2: Qdisc support for multiqueue devices
+-----------------------------------------------
+
+Currently two qdiscs support multiqueue devices. A new round-robin qdisc,
+sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
+bands and queues, and will store the queue mapping into skb->queue_mapping.
+Use this field in the base driver to determine which queue to send the skb
+to.
+
+sch_rr has been added for hardware that doesn't want scheduling policies from
+software, so it's a straight round-robin qdisc. It uses the same syntax and
+classification priomap that sch_prio uses, so it should be intuitive to
+configure for people who've used sch_prio.
+
+The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been
+built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
+bands requested is equal to the number of queues on the hardware. If they
+are equal, it sets a one-to-one mapping up between the queues and bands. If
+they're not equal, it will not load the qdisc. This is the same behavior
+for RR. Once the association is made, any skb that is classified will have
+skb->queue_mapping set, which will allow the driver to properly queue skb's
+to multiple queues.
+
+
+Section 3: Brief howto using PRIO and RR for multiqueue devices
+---------------------------------------------------------------
+
+The userspace command 'tc,' part of the iproute2 package, is used to configure
+qdiscs. To add the PRIO qdisc to your network device, assuming the device is
+called eth0, run the following command:
+
+# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue
+
+This will create 4 bands, 0 being highest priority, and associate those bands
+to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping
+would look like:
+
+band 0 => queue 0
+band 1 => queue 1
+band 2 => queue 2
+band 3 => queue 3
+
+Traffic will begin flowing through each queue if your TOS values are assigning
+traffic across the various bands. For example, ssh traffic will always try to
+go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
+so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal"
+traffic classification, which is band 1. Therefore pings will be send out
+queue 1 on the NIC.
+
+Note the use of the multiqueue keyword. This is only in versions of iproute2
+that support multiqueue networking devices; if this is omitted when loading
+a qdisc onto a multiqueue device, the qdisc will load and operate the same
+if it were loaded onto a single-queue device (i.e. - sends all traffic to
+queue 0).
+
+Another alternative to multiqueue band allocation can be done by using the
+multiqueue option and specify 0 bands. If this is the case, the qdisc will
+allocate the number of bands to equal the number of queues that the device
+reports, and bring the qdisc online.
+
+The behavior of tc filters remains the same, where it will override TOS priority
+classification.
+
+
+Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index ce1361f95243..37869295fc70 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If
separately allocated data is attached to the network device
(dev->priv) then it is up to the module exit handler to free that.
+MTU
+===
+Each network device has a Maximum Transfer Unit. The MTU does not
+include any link layer protocol overhead. Upper layer protocols must
+not pass a socket buffer (skb) to a device to transmit with more data
+than the mtu. The MTU does not include link layer header overhead, so
+for example on Ethernet if the standard MTU is 1500 bytes used, the
+actual skb will contain up to 1514 bytes because of the Ethernet
+header. Devices should allow for the 4 byte VLAN header as well.
+
+Segmentation Offload (GSO, TSO) is an exception to this rule. The
+upper layer protocol may pass a large socket buffer to the device
+transmit routine, and the device will break that up into separate
+packets based on the current MTU.
+
+MTU is symmetrical and applies both to receive and transmit. A device
+must be able to receive at least the maximum size packet allowed by
+the MTU. A network device may use the MTU as mechanism to size receive
+buffers, but the device should allow packets with VLAN header. With
+standard Ethernet mtu of 1500 bytes, the device should allow up to
+1518 byte packets (1500 + 14 header + 4 tag). The device may either:
+drop, truncate, or pass up oversize packets, but dropping oversize
+packets is preferred.
+
struct net_device synchronization rules
=======================================
@@ -43,16 +67,17 @@ dev->get_stats:
dev->hard_start_xmit:
Synchronization: netif_tx_lock spinlock.
+
When the driver sets NETIF_F_LLTX in dev->features this will be
called without holding netif_tx_lock. In this case the driver
has to lock by itself when needed. It is recommended to use a try lock
- for this and return -1 when the spin lock fails.
+ for this and return NETDEV_TX_LOCKED when the spin lock fails.
The locking there should also properly protect against
- set_multicast_list
- Context: Process with BHs disabled or BH (timer).
- Notes: netif_queue_stopped() is guaranteed false
- Interrupts must be enabled when calling hard_start_xmit.
- (Interrupts must also be enabled when enabling the BH handler.)
+ set_multicast_list.
+
+ Context: Process with BHs disabled or BH (timer),
+ will be called with interrupts disabled by netconsole.
+
Return codes:
o NETDEV_TX_OK everything ok.
o NETDEV_TX_BUSY Cannot transmit packet, try later
@@ -74,4 +99,5 @@ dev->poll:
Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See
dev_close code and comments in net/core/dev.c for more info.
Context: softirq
+ will be called with interrupts disabled by netconsole.
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt
deleted file mode 100644
index 8590a954df1d..000000000000
--- a/Documentation/networking/sk98lin.txt
+++ /dev/null
@@ -1,568 +0,0 @@
-(C)Copyright 1999-2004 Marvell(R).
-All rights reserved
-===========================================================================
-
-sk98lin.txt created 13-Feb-2004
-
-Readme File for sk98lin v6.23
-Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX
-
-This file contains
- 1 Overview
- 2 Required Files
- 3 Installation
- 3.1 Driver Installation
- 3.2 Inclusion of adapter at system start
- 4 Driver Parameters
- 4.1 Per-Port Parameters
- 4.2 Adapter Parameters
- 5 Large Frame Support
- 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
- 7 Troubleshooting
-
-===========================================================================
-
-
-1 Overview
-===========
-
-The sk98lin driver supports the Marvell Yukon and SysKonnect
-SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has
-been tested with Linux on Intel/x86 machines.
-***
-
-
-2 Required Files
-=================
-
-The linux kernel source.
-No additional files required.
-***
-
-
-3 Installation
-===============
-
-It is recommended to download the latest version of the driver from the
-SysKonnect web site www.syskonnect.com. If you have downloaded the latest
-driver, the Linux kernel has to be patched before the driver can be
-installed. For details on how to patch a Linux kernel, refer to the
-patch.txt file.
-
-3.1 Driver Installation
-------------------------
-
-The following steps describe the actions that are required to install
-the driver and to start it manually. These steps should be carried
-out for the initial driver setup. Once confirmed to be ok, they can
-be included in the system start.
-
-NOTE 1: To perform the following tasks you need 'root' access.
-
-NOTE 2: In case of problems, please read the section "Troubleshooting"
- below.
-
-The driver can either be integrated into the kernel or it can be compiled
-as a module. Select the appropriate option during the kernel
-configuration.
-
-Compile/use the driver as a module
-----------------------------------
-To compile the driver, go to the directory /usr/src/linux and
-execute the command "make menuconfig" or "make xconfig" and proceed as
-follows:
-
-To integrate the driver permanently into the kernel, proceed as follows:
-
-1. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
-2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support"
- with (*)
-3. Build a new kernel when the configuration of the above options is
- finished.
-4. Install the new kernel.
-5. Reboot your system.
-
-To use the driver as a module, proceed as follows:
-
-1. Enable 'loadable module support' in the kernel.
-2. For automatic driver start, enable the 'Kernel module loader'.
-3. Select the menu "Network device support" and then "Ethernet(1000Mbit)"
-4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support"
- with (M)
-5. Execute the command "make modules".
-6. Execute the command "make modules_install".
- The appropriate modules will be installed.
-7. Reboot your system.
-
-
-Load the module manually
-------------------------
-To load the module manually, proceed as follows:
-
-1. Enter "modprobe sk98lin".
-2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in
- your computer and you have a /proc file system, execute the command:
- "ls /proc/net/sk98lin/"
- This should produce an output containing a line with the following
- format:
- eth0 eth1 ...
- which indicates that your adapter has been found and initialized.
-
- NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx
- adapter installed, the adapters will be listed as 'eth0',
- 'eth1', 'eth2', etc.
- For each adapter, repeat steps 3 and 4 below.
-
- NOTE 2: If you have other Ethernet adapters installed, your Marvell
- Yukon or SysKonnect SK-98xx adapter will be mapped to the
- next available number, e.g. 'eth1'. The mapping is executed
- automatically.
- The module installation message (displayed either in a system
- log file or on the console) prints a line for each adapter
- found containing the corresponding 'ethX'.
-
-3. Select an IP address and assign it to the respective adapter by
- entering:
- ifconfig eth0 <ip-address>
- With this command, the adapter is connected to the Ethernet.
-
- SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter
- is now active, the link status LED of the primary port is active and
- the link status LED of the secondary port (on dual port adapters) is
- blinking (if the ports are connected to a switch or hub).
- SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active.
- In addition, you will receive a status message on the console stating
- "ethX: network connection up using port Y" and showing the selected
- connection parameters (x stands for the ethernet device number
- (0,1,2, etc), y stands for the port name (A or B)).
-
- NOTE: If you are in doubt about IP addresses, ask your network
- administrator for assistance.
-
-4. Your adapter should now be fully operational.
- Use 'ping <otherstation>' to verify the connection to other computers
- on your network.
-5. To check the adapter configuration view /proc/net/sk98lin/[devicename].
- For example by executing:
- "cat /proc/net/sk98lin/eth0"
-
-Unload the module
------------------
-To stop and unload the driver modules, proceed as follows:
-
-1. Execute the command "ifconfig eth0 down".
-2. Execute the command "rmmod sk98lin".
-
-3.2 Inclusion of adapter at system start
------------------------------------------
-
-Since a large number of different Linux distributions are
-available, we are unable to describe a general installation procedure
-for the driver module.
-Because the driver is now integrated in the kernel, installation should
-be easy, using the standard mechanism of your distribution.
-Refer to the distribution's manual for installation of ethernet adapters.
-
-***
-
-4 Driver Parameters
-====================
-
-Parameters can be set at the command line after the module has been
-loaded with the command 'modprobe'.
-In some distributions, the configuration tools are able to pass parameters
-to the driver module.
-
-If you use the kernel module loader, you can set driver parameters
-in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier).
-To set the driver parameters in this file, proceed as follows:
-
-1. Insert a line of the form :
- options sk98lin ...
- For "...", the same syntax is required as described for the command
- line parameters of modprobe below.
-2. To activate the new parameters, either reboot your computer
- or
- unload and reload the driver.
- The syntax of the driver parameters is:
-
- modprobe sk98lin parameter=value1[,value2[,value3...]]
-
- where value1 refers to the first adapter, value2 to the second etc.
-
-NOTE: All parameters are case sensitive. Write them exactly as shown
- below.
-
-Example:
-Suppose you have two adapters. You want to set auto-negotiation
-on the first adapter to ON and on the second adapter to OFF.
-You also want to set DuplexCapabilities on the first adapter
-to FULL, and on the second adapter to HALF.
-Then, you must enter:
-
- modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half
-
-NOTE: The number of adapters that can be configured this way is
- limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM).
- The current limit is 16. If you happen to install
- more adapters, adjust this and recompile.
-
-
-4.1 Per-Port Parameters
-------------------------
-
-These settings are available for each port on the adapter.
-In the following description, '?' stands for the port for
-which you set the parameter (A or B).
-
-Speed
------
-Parameter: Speed_?
-Values: 10, 100, 1000, Auto
-Default: Auto
-
-This parameter is used to set the speed capabilities. It is only valid
-for the SK-98xx V2.0 copper adapters.
-Usually, the speed is negotiated between the two ports during link
-establishment. If this fails, a port can be forced to a specific setting
-with this parameter.
-
-Auto-Negotiation
-----------------
-Parameter: AutoNeg_?
-Values: On, Off, Sense
-Default: On
-
-The "Sense"-mode automatically detects whether the link partner supports
-auto-negotiation or not.
-
-Duplex Capabilities
--------------------
-Parameter: DupCap_?
-Values: Half, Full, Both
-Default: Both
-
-This parameters is only relevant if auto-negotiation for this port is
-not set to "Sense". If auto-negotiation is set to "On", all three values
-are possible. If it is set to "Off", only "Full" and "Half" are allowed.
-This parameter is useful if your link partner does not support all
-possible combinations.
-
-Flow Control
-------------
-Parameter: FlowCtrl_?
-Values: Sym, SymOrRem, LocSend, None
-Default: SymOrRem
-
-This parameter can be used to set the flow control capabilities the
-port reports during auto-negotiation. It can be set for each port
-individually.
-Possible modes:
- -- Sym = Symmetric: both link partners are allowed to send
- PAUSE frames
- -- SymOrRem = SymmetricOrRemote: both or only remote partner
- are allowed to send PAUSE frames
- -- LocSend = LocalSend: only local link partner is allowed
- to send PAUSE frames
- -- None = no link partner is allowed to send PAUSE frames
-
-NOTE: This parameter is ignored if auto-negotiation is set to "Off".
-
-Role in Master-Slave-Negotiation (1000Base-T only)
---------------------------------------------------
-Parameter: Role_?
-Values: Auto, Master, Slave
-Default: Auto
-
-This parameter is only valid for the SK-9821 and SK-9822 adapters.
-For two 1000Base-T ports to communicate, one must take the role of the
-master (providing timing information), while the other must be the
-slave. Usually, this is negotiated between the two ports during link
-establishment. If this fails, a port can be forced to a specific setting
-with this parameter.
-
-
-4.2 Adapter Parameters
------------------------
-
-Connection Type (SK-98xx V2.0 copper adapters only)
----------------
-Parameter: ConType
-Values: Auto, 100FD, 100HD, 10FD, 10HD
-Default: Auto
-
-The parameter 'ConType' is a combination of all five per-port parameters
-within one single parameter. This simplifies the configuration of both ports
-of an adapter card! The different values of this variable reflect the most
-meaningful combinations of port parameters.
-
-The following table shows the values of 'ConType' and the corresponding
-combinations of the per-port parameters:
-
- ConType | DupCap AutoNeg FlowCtrl Role Speed
- ----------+------------------------------------------------------
- Auto | Both On SymOrRem Auto Auto
- 100FD | Full Off None Auto (ignored) 100
- 100HD | Half Off None Auto (ignored) 100
- 10FD | Full Off None Auto (ignored) 10
- 10HD | Half Off None Auto (ignored) 10
-
-Stating any other port parameter together with this 'ConType' variable
-will result in a merged configuration of those settings. This due to
-the fact, that the per-port parameters (e.g. Speed_? ) have a higher
-priority than the combined variable 'ConType'.
-
-NOTE: This parameter is always used on both ports of the adapter card.
-
-Interrupt Moderation
---------------------
-Parameter: Moderation
-Values: None, Static, Dynamic
-Default: None
-
-Interrupt moderation is employed to limit the maximum number of interrupts
-the driver has to serve. That is, one or more interrupts (which indicate any
-transmit or receive packet to be processed) are queued until the driver
-processes them. When queued interrupts are to be served, is determined by the
-'IntsPerSec' parameter, which is explained later below.
-
-Possible modes:
-
- -- None - No interrupt moderation is applied on the adapter card.
- Therefore, each transmit or receive interrupt is served immediately
- as soon as it appears on the interrupt line of the adapter card.
-
- -- Static - Interrupt moderation is applied on the adapter card.
- All transmit and receive interrupts are queued until a complete
- moderation interval ends. If such a moderation interval ends, all
- queued interrupts are processed in one big bunch without any delay.
- The term 'static' reflects the fact, that interrupt moderation is
- always enabled, regardless how much network load is currently
- passing via a particular interface. In addition, the duration of
- the moderation interval has a fixed length that never changes while
- the driver is operational.
-
- -- Dynamic - Interrupt moderation might be applied on the adapter card,
- depending on the load of the system. If the driver detects that the
- system load is too high, the driver tries to shield the system against
- too much network load by enabling interrupt moderation. If - at a later
- time - the CPU utilization decreases again (or if the network load is
- negligible) the interrupt moderation will automatically be disabled.
-
-Interrupt moderation should be used when the driver has to handle one or more
-interfaces with a high network load, which - as a consequence - leads also to a
-high CPU utilization. When moderation is applied in such high network load
-situations, CPU load might be reduced by 20-30%.
-
-NOTE: The drawback of using interrupt moderation is an increase of the round-
-trip-time (RTT), due to the queueing and serving of interrupts at dedicated
-moderation times.
-
-Interrupts per second
----------------------
-Parameter: IntsPerSec
-Values: 30...40000 (interrupts per second)
-Default: 2000
-
-This parameter is only used if either static or dynamic interrupt moderation
-is used on a network adapter card. Using this parameter if no moderation is
-applied will lead to no action performed.
-
-This parameter determines the length of any interrupt moderation interval.
-Assuming that static interrupt moderation is to be used, an 'IntsPerSec'
-parameter value of 2000 will lead to an interrupt moderation interval of
-500 microseconds.
-
-NOTE: The duration of the moderation interval is to be chosen with care.
-At first glance, selecting a very long duration (e.g. only 100 interrupts per
-second) seems to be meaningful, but the increase of packet-processing delay
-is tremendous. On the other hand, selecting a very short moderation time might
-compensate the use of any moderation being applied.
-
-
-Preferred Port
---------------
-Parameter: PrefPort
-Values: A, B
-Default: A
-
-This is used to force the preferred port to A or B (on dual-port network
-adapters). The preferred port is the one that is used if both are detected
-as fully functional.
-
-RLMT Mode (Redundant Link Management Technology)
-------------------------------------------------
-Parameter: RlmtMode
-Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet
-Default: CheckLinkState
-
-RLMT monitors the status of the port. If the link of the active port
-fails, RLMT switches immediately to the standby link. The virtual link is
-maintained as long as at least one 'physical' link is up.
-
-Possible modes:
-
- -- CheckLinkState - Check link state only: RLMT uses the link state
- reported by the adapter hardware for each individual port to
- determine whether a port can be used for all network traffic or
- not.
-
- -- CheckLocalPort - In this mode, RLMT monitors the network path
- between the two ports of an adapter by regularly exchanging packets
- between them. This mode requires a network configuration in which
- the two ports are able to "see" each other (i.e. there must not be
- any router between the ports).
-
- -- CheckSeg - Check local port and segmentation: This mode supports the
- same functions as the CheckLocalPort mode and additionally checks
- network segmentation between the ports. Therefore, this mode is only
- to be used if Gigabit Ethernet switches are installed on the network
- that have been configured to use the Spanning Tree protocol.
-
- -- DualNet - In this mode, ports A and B are used as separate devices.
- If you have a dual port adapter, port A will be configured as eth0
- and port B as eth1. Both ports can be used independently with
- distinct IP addresses. The preferred port setting is not used.
- RLMT is turned off.
-
-NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations
- where a network path between the ports on one adapter exists.
- Moreover, they are not designed to work where adapters are connected
- back-to-back.
-***
-
-
-5 Large Frame Support
-======================
-
-The driver supports large frames (also called jumbo frames). Using large
-frames can result in an improved throughput if transferring large amounts
-of data.
-To enable large frames, set the MTU (maximum transfer unit) of the
-interface to the desired value (up to 9000), execute the following
-command:
- ifconfig eth0 mtu 9000
-This will only work if you have two adapters connected back-to-back
-or if you use a switch that supports large frames. When using a switch,
-it should be configured to allow large frames and auto-negotiation should
-be set to OFF. The setting must be configured on all adapters that can be
-reached by the large frames. If one adapter is not set to receive large
-frames, it will simply drop them.
-
-You can switch back to the standard ethernet frame size by executing the
-following command:
- ifconfig eth0 mtu 1500
-
-To permanently configure this setting, add a script with the 'ifconfig'
-line to the system startup sequence (named something like "S99sk98lin"
-in /etc/rc.d/rc2.d).
-***
-
-
-6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad)
-==================================================================
-
-The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and
-Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad.
-These features are only available after installation of open source
-modules available on the Internet:
-For VLAN go to: http://www.candelatech.com/~greear/vlan.html
-For Link Aggregation go to: http://www.st.rim.or.jp/~yumo
-
-NOTE: SysKonnect GmbH does not offer any support for these open source
- modules and does not take the responsibility for any kind of
- failures or problems arising in connection with these modules.
-
-NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may
- cause problems when unloading the driver.
-
-
-7 Troubleshooting
-==================
-
-If any problems occur during the installation process, check the
-following list:
-
-
-Problem: The SK-98xx adapter cannot be found by the driver.
-Solution: In /proc/pci search for the following entry:
- 'Ethernet controller: SysKonnect SK-98xx ...'
- If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has
- been found by the system and should be operational.
- If this entry does not exist or if the file '/proc/pci' is not
- found, there may be a hardware problem or the PCI support may
- not be enabled in your kernel.
- The adapter can be checked using the diagnostics program which
- is available on the SysKonnect web site:
- www.syskonnect.com
-
- Some COMPAQ machines have problems dealing with PCI under Linux.
- This problem is described in the 'PCI howto' document
- (included in some distributions or available from the
- web, e.g. at 'www.linux.org').
-
-
-Problem: Programs such as 'ifconfig' or 'route' cannot be found or the
- error message 'Operation not permitted' is displayed.
-Reason: You are not logged in as user 'root'.
-Solution: Logout and login as 'root' or change to 'root' via 'su'.
-
-
-Problem: Upon use of the command 'ping <address>' the message
- "ping: sendto: Network is unreachable" is displayed.
-Reason: Your route is not set correctly.
-Solution: If you are using RedHat, you probably forgot to set up the
- route in the 'network configuration'.
- Check the existing routes with the 'route' command and check
- if an entry for 'eth0' exists, and if so, if it is set correctly.
-
-
-Problem: The driver can be started, the adapter is connected to the
- network, but you cannot receive or transmit any packets;
- e.g. 'ping' does not work.
-Reason: There is an incorrect route in your routing table.
-Solution: Check the routing table with the command 'route' and read the
- manual help pages dealing with routes (enter 'man route').
-
-NOTE: Although the 2.2.x kernel versions generate the routing entry
- automatically, problems of this kind may occur here as well. We've
- come across a situation in which the driver started correctly at
- system start, but after the driver has been removed and reloaded,
- the route of the adapter's network pointed to the 'dummy0'device
- and had to be corrected manually.
-
-
-Problem: Your computer should act as a router between multiple
- IP subnetworks (using multiple adapters), but computers in
- other subnetworks cannot be reached.
-Reason: Either the router's kernel is not configured for IP forwarding
- or the routing table and gateway configuration of at least one
- computer is not working.
-
-Problem: Upon driver start, the following error message is displayed:
- "eth0: -- ERROR --
- Class: internal Software error
- Nr: 0xcc
- Msg: SkGeInitPort() cannot init running ports"
-Reason: You are using a driver compiled for single processor machines
- on a multiprocessor machine with SMP (Symmetric MultiProcessor)
- kernel.
-Solution: Configure your kernel appropriately and recompile the kernel or
- the modules.
-
-
-
-If your problem is not listed here, please contact SysKonnect's technical
-support for help (linux@syskonnect.de).
-When contacting our technical support, please ensure that the following
-information is available:
-- System Manufacturer and HW Informations (CPU, Memory... )
-- PCI-Boards in your system
-- Distribution
-- Kernel version
-- Driver version
-***
-
-
-
-***End of Readme File***
diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt
new file mode 100644
index 000000000000..4b4adb8eb14f
--- /dev/null
+++ b/Documentation/networking/spider_net.txt
@@ -0,0 +1,204 @@
+
+ The Spidernet Device Driver
+ ===========================
+
+Written by Linas Vepstas <linas@austin.ibm.com>
+
+Version of 7 June 2007
+
+Abstract
+========
+This document sketches the structure of portions of the spidernet
+device driver in the Linux kernel tree. The spidernet is a gigabit
+ethernet device built into the Toshiba southbridge commonly used
+in the SONY Playstation 3 and the IBM QS20 Cell blade.
+
+The Structure of the RX Ring.
+=============================
+The receive (RX) ring is a circular linked list of RX descriptors,
+together with three pointers into the ring that are used to manage its
+contents.
+
+The elements of the ring are called "descriptors" or "descrs"; they
+describe the received data. This includes a pointer to a buffer
+containing the received data, the buffer size, and various status bits.
+
+There are three primary states that a descriptor can be in: "empty",
+"full" and "not-in-use". An "empty" or "ready" descriptor is ready
+to receive data from the hardware. A "full" descriptor has data in it,
+and is waiting to be emptied and processed by the OS. A "not-in-use"
+descriptor is neither empty or full; it is simply not ready. It may
+not even have a data buffer in it, or is otherwise unusable.
+
+During normal operation, on device startup, the OS (specifically, the
+spidernet device driver) allocates a set of RX descriptors and RX
+buffers. These are all marked "empty", ready to receive data. This
+ring is handed off to the hardware, which sequentially fills in the
+buffers, and marks them "full". The OS follows up, taking the full
+buffers, processing them, and re-marking them empty.
+
+This filling and emptying is managed by three pointers, the "head"
+and "tail" pointers, managed by the OS, and a hardware current
+descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
+currently being filled. When this descr is filled, the hardware
+marks it full, and advances the GDACTDPA by one. Thus, when there is
+flowing RX traffic, every descr behind it should be marked "full",
+and everything in front of it should be "empty". If the hardware
+discovers that the current descr is not empty, it will signal an
+interrupt, and halt processing.
+
+The tail pointer tails or trails the hardware pointer. When the
+hardware is ahead, the tail pointer will be pointing at a "full"
+descr. The OS will process this descr, and then mark it "not-in-use",
+and advance the tail pointer. Thus, when there is flowing RX traffic,
+all of the descrs in front of the tail pointer should be "full", and
+all of those behind it should be "not-in-use". When RX traffic is not
+flowing, then the tail pointer can catch up to the hardware pointer.
+The OS will then note that the current tail is "empty", and halt
+processing.
+
+The head pointer (somewhat mis-named) follows after the tail pointer.
+When traffic is flowing, then the head pointer will be pointing at
+a "not-in-use" descr. The OS will perform various housekeeping duties
+on this descr. This includes allocating a new data buffer and
+dma-mapping it so as to make it visible to the hardware. The OS will
+then mark the descr as "empty", ready to receive data. Thus, when there
+is flowing RX traffic, everything in front of the head pointer should
+be "not-in-use", and everything behind it should be "empty". If no
+RX traffic is flowing, then the head pointer can catch up to the tail
+pointer, at which point the OS will notice that the head descr is
+"empty", and it will halt processing.
+
+Thus, in an idle system, the GDACTDPA, tail and head pointers will
+all be pointing at the same descr, which should be "empty". All of the
+other descrs in the ring should be "empty" as well.
+
+The show_rx_chain() routine will print out the the locations of the
+GDACTDPA, tail and head pointers. It will also summarize the contents
+of the ring, starting at the tail pointer, and listing the status
+of the descrs that follow.
+
+A typical example of the output, for a nearly idle system, might be
+
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=20
+net eth1: Chain head is at 20
+net eth1: HW curr desc (GDACTDPA) is at 21
+net eth1: Have 1 descrs with stat=x40800101
+net eth1: HW next desc (GDACNEXTDA) is at 22
+net eth1: Last 255 descrs with stat=xa0800000
+
+In the above, the hardware has filled in one descr, number 20. Both
+head and tail are pointing at 20, because it has not yet been emptied.
+Meanwhile, hw is pointing at 21, which is free.
+
+The "Have nnn decrs" refers to the descr starting at the tail: in this
+case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers
+to all of the rest of the descrs, from the last status change. The "nnn"
+is a count of how many descrs have exactly the same status.
+
+The status x4... corresponds to "full" and status xa... corresponds
+to "empty". The actual value printed is RXCOMST_A.
+
+In the device driver source code, a different set of names are
+used for these same concepts, so that
+
+"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa
+"full" == SPIDER_NET_DESCR_FRAME_END == 0x4
+"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf
+
+
+The RX RAM full bug/feature
+===========================
+
+As long as the OS can empty out the RX buffers at a rate faster than
+the hardware can fill them, there is no problem. If, for some reason,
+the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
+pointer will catch up to the head, notice the not-empty condition,
+ad stop. However, RX packets may still continue arriving on the wire.
+The spidernet chip can save some limited number of these in local RAM.
+When this local ram fills up, the spider chip will issue an interrupt
+indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
+will be set in GHIINT1STS). When the RX ram full condition occurs,
+a certain bug/feature is triggered that has to be specially handled.
+This section describes the special handling for this condition.
+
+When the OS finally has a chance to run, it will empty out the RX ring.
+In particular, it will clear the descriptor on which the hardware had
+stopped. However, once the hardware has decided that a certain
+descriptor is invalid, it will not restart at that descriptor; instead
+it will restart at the next descr. This potentially will lead to a
+deadlock condition, as the tail pointer will be pointing at this descr,
+which, from the OS point of view, is empty; the OS will be waiting for
+this descr to be filled. However, the hardware has skipped this descr,
+and is filling the next descrs. Since the OS doesn't see this, there
+is a potential deadlock, with the OS waiting for one descr to fill,
+while the hardware is waiting for a different set of descrs to become
+empty.
+
+A call to show_rx_chain() at this point indicates the nature of the
+problem. A typical print when the network is hung shows the following:
+
+net eth1: Spider RX RAM full, incoming packets might be discarded!
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=255
+net eth1: Chain head is at 255
+net eth1: HW curr desc (GDACTDPA) is at 0
+net eth1: Have 1 descrs with stat=xa0800000
+net eth1: HW next desc (GDACNEXTDA) is at 1
+net eth1: Have 127 descrs with stat=x40800101
+net eth1: Have 1 descrs with stat=x40800001
+net eth1: Have 126 descrs with stat=x40800101
+net eth1: Last 1 descrs with stat=xa0800000
+
+Both the tail and head pointers are pointing at descr 255, which is
+marked xa... which is "empty". Thus, from the OS point of view, there
+is nothing to be done. In particular, there is the implicit assumption
+that everything in front of the "empty" descr must surely also be empty,
+as explained in the last section. The OS is waiting for descr 255 to
+become non-empty, which, in this case, will never happen.
+
+The HW pointer is at descr 0. This descr is marked 0x4.. or "full".
+Since its already full, the hardware can do nothing more, and thus has
+halted processing. Notice that descrs 0 through 254 are all marked
+"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is
+descr 254, since tail was at 255.) Thus, the system is deadlocked,
+and there can be no forward progress; the OS thinks there's nothing
+to do, and the hardware has nowhere to put incoming data.
+
+This bug/feature is worked around with the spider_net_resync_head_ptr()
+routine. When the driver receives RX interrupts, but an examination
+of the RX chain seems to show it is empty, then it is probable that
+the hardware has skipped a descr or two (sometimes dozens under heavy
+network conditions). The spider_net_resync_head_ptr() subroutine will
+search the ring for the next full descr, and the driver will resume
+operations there. Since this will leave "holes" in the ring, there
+is also a spider_net_resync_tail_ptr() that will skip over such holes.
+
+As of this writing, the spider_net_resync() strategy seems to work very
+well, even under heavy network loads.
+
+
+The TX ring
+===========
+The TX ring uses a low-watermark interrupt scheme to make sure that
+the TX queue is appropriately serviced for large packet sizes.
+
+For packet sizes greater than about 1KBytes, the kernel can fill
+the TX ring quicker than the device can drain it. Once the ring
+is full, the netdev is stopped. When there is room in the ring,
+the netdev needs to be reawakened, so that more TX packets are placed
+in the ring. The hardware can empty the ring about four times per jiffy,
+so its not appropriate to wait for the poll routine to refill, since
+the poll routine runs only once per jiffy. The low-watermark mechanism
+marks a descr about 1/4th of the way from the bottom of the queue, so
+that an interrupt is generated when the descr is processed. This
+interrupt wakes up the netdev, which can then refill the queue.
+For large packets, this mechanism generates a relatively small number
+of interrupts, about 1K/sec. For smaller packets, this will drop to zero
+interrupts, as the hardware can empty the queue faster than the kernel
+can fill it.
+
+
+ ======= END OF DOCUMENT ========
+
OpenPOWER on IntegriCloud