From 53e6892c0411006848882eacfcfea9e93681b55d Mon Sep 17 00:00:00 2001 From: Michal Marek Date: Tue, 5 Apr 2011 14:32:30 +0200 Subject: kbuild: Allow to override LINUX_COMPILE_BY and LINUX_COMPILE_HOST macros Make it possible to override the user@host string displayed during boot and in /proc/version by the environment variables KBUILD_BUILD_USER and KBUILD_BUILD_HOST. Several distributions patch scripts/mkcompile_h to achieve this, so let's provide an official way. Also, document the KBUILD_BUILD_TIMESTAMP variable while at it. Signed-off-by: Michal Marek --- Documentation/kbuild/kbuild.txt | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt index f1431d099fce..f11ebb33e4a6 100644 --- a/Documentation/kbuild/kbuild.txt +++ b/Documentation/kbuild/kbuild.txt @@ -201,3 +201,15 @@ KBUILD_ENABLE_EXTRA_GCC_CHECKS -------------------------------------------------- If enabled over the make command line with "W=1", it turns on additional gcc -W... options for more extensive build-time checking. + +KBUILD_BUILD_TIMESTAMP +-------------------------------------------------- +Setting this to a date string overrides the timestamp used in the +UTS_VERSION definition (uname -v in the running kernel). The default value +is the output of the date command at one point during build. + +KBUILD_BUILD_USER, KBUILD_BUILD_HOST +-------------------------------------------------- +These two variables allow to override the user@host string displayed during +boot and in /proc/version. The default value is the output of the commands +whoami and host, respectively. -- cgit v1.2.1 From a8b8017c34fefcb763d8b06c294b58d1c480b2e4 Mon Sep 17 00:00:00 2001 From: Michal Marek Date: Thu, 31 Mar 2011 23:16:42 +0200 Subject: initramfs: Use KBUILD_BUILD_TIMESTAMP for generated entries gen_init_cpio gets the current time and uses it for each symlink, special file, and directory. Grab the current time once and make it possible to override it with the KBUILD_BUILD_TIMESTAMP variable for reproducible builds. Signed-off-by: Michal Marek --- Documentation/kbuild/kbuild.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt index f11ebb33e4a6..646e2c114fff 100644 --- a/Documentation/kbuild/kbuild.txt +++ b/Documentation/kbuild/kbuild.txt @@ -205,7 +205,8 @@ gcc -W... options for more extensive build-time checking. KBUILD_BUILD_TIMESTAMP -------------------------------------------------- Setting this to a date string overrides the timestamp used in the -UTS_VERSION definition (uname -v in the running kernel). The default value +UTS_VERSION definition (uname -v in the running kernel). The value has to +be a string that can be passed to date -d. The default value is the output of the date command at one point during build. KBUILD_BUILD_USER, KBUILD_BUILD_HOST -- cgit v1.2.1 From 40df759e2b9ec945f1a5ddc61b3fdfbb6583257e Mon Sep 17 00:00:00 2001 From: Michal Marek Date: Wed, 20 Apr 2011 13:45:30 +0200 Subject: kbuild: Fix build with binutils <= 2.19 The D option of ar is only available in newer versions. Signed-off-by: Michal Marek --- Documentation/kbuild/makefiles.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index 5d145bb443c0..40e082bb8c52 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -955,6 +955,11 @@ When kbuild executes, the following steps are followed (roughly): used when linking modules. This is often a linker script. From commandline LDFLAGS_MODULE shall be used (see kbuild.txt). + KBUILD_ARFLAGS Options for $(AR) when creating archives + + $(KBUILD_ARFLAGS) set by the top level Makefile to "D" (deterministic + mode) if this option is supported by $(AR). + --- 6.2 Add prerequisites to archprepare: The archprepare: rule is used to list prerequisites that need to be -- cgit v1.2.1 From d8ecc5cd8e227bc318513b5306ae88a474b8886d Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Wed, 27 Apr 2011 22:29:49 +0200 Subject: kbuild: asm-generic support There is an increasing amount of header files shared between individual architectures in asm-generic. To avoid a lot of dummy wrapper files that just include the corresponding file in asm-generic provide some basic support in kbuild for this. With the following patch an architecture can maintain a list of files in the file arch/$(ARCH)/include/asm/Kbuild To use a generic file just add: generic-y += For each file listed kbuild will generate the necessary wrapper in arch/$(ARCH)/include/generated/asm. When installing userspace headers a wrapper is likewise created. The original inspiration for this came from the unicore32 patchset - although a different method is used. The patch includes several improvements from Arnd Bergmann. Michael Marek contributed Makefile.asm-generic. Remis Baima did an intial implementation along to achive the same - see https://patchwork.kernel.org/patch/13352/ Signed-off-by: Sam Ravnborg Acked-by: Guan Xuetao Tested-by: Guan Xuetao Acked-by: Arnd Bergmann Cc: Remis Lima Baima Signed-off-by: Michal Marek --- Documentation/kbuild/makefiles.txt | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index 40e082bb8c52..835b64acf0b4 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -40,11 +40,13 @@ This document describes the Linux kernel Makefiles. --- 6.6 Commands useful for building a boot image --- 6.7 Custom kbuild commands --- 6.8 Preprocessing linker scripts + --- 6.9 Generic header files === 7 Kbuild syntax for exported headers --- 7.1 header-y --- 7.2 objhdr-y --- 7.3 destination-y + --- 7.4 generic-y === 8 Kbuild Variables === 9 Makefile language @@ -1214,6 +1216,14 @@ When kbuild executes, the following steps are followed (roughly): The kbuild infrastructure for *lds file are used in several architecture-specific files. +--- 6.9 Generic header files + + The directory include/asm-generic contains the header files + that may be shared between individual architectures. + The recommended approach how to use a generic header file is + to list the file in the Kbuild file. + See "7.4 generic-y" for further info on syntax etc. + === 7 Kbuild syntax for exported headers The kernel include a set of headers that is exported to userspace. @@ -1270,6 +1280,32 @@ See subsequent chapter for the syntax of the Kbuild file. In the example above all exported headers in the Kbuild file will be located in the directory "include/linux" when exported. + --- 7.4 generic-y + + If an architecture uses a verbatim copy of a header from + include/asm-generic then this is listed in the file + arch/$(ARCH)/include/asm/Kbuild like this: + + Example: + #arch/x86/include/asm/Kbuild + generic-y += termios.h + generic-y += rtc.h + + During the prepare phase of the build a wrapper include + file is generated in the directory: + + arch/$(ARCH)/include/generated/asm + + When a header is exported where the architecture uses + the generic header a similar wrapper is generated as part + of the set of exported headers in the directory: + + usr/include/asm + + The generated wrapper will in both cases look like the following: + + Example: termios.h + #include === 8 Kbuild Variables -- cgit v1.2.1 From 59802db0745ddfe5cfd0d965e9d489f1b4713868 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sun, 1 May 2011 18:14:26 -0400 Subject: ext4: remove obsolete mount options from ext4's documentation The block reservation code from ext3 was removed long ago... Signed-off-by: "Theodore Ts'o" --- Documentation/filesystems/ext4.txt | 4 ---- 1 file changed, 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index c79ec58fd7f6..3ae9bc94352a 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -226,10 +226,6 @@ acl Enables POSIX Access Control Lists support. noacl This option disables POSIX Access Control List support. -reservation - -noreservation - bsddf (*) Make 'df' act like BSD. minixdf Make 'df' act like Minix. -- cgit v1.2.1 From df835c2e7fa875618cc14ced0110ea96f79536fb Mon Sep 17 00:00:00 2001 From: Michal Marek Date: Fri, 26 Nov 2010 17:15:11 +0100 Subject: kconfig: Document the new "visible if" syntax Signed-off-by: Michal Marek --- Documentation/kbuild/kconfig-language.txt | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index b507d61fd41c..48a3981c717b 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -113,6 +113,13 @@ applicable everywhere (see syntax). That will limit the usefulness but on the other hand avoid the illegal configurations all over. +- limiting menu display: "visible if" + This attribute is only applicable to menu blocks, if the condition is + false, the menu block is not displayed to the user (the symbols + contained there can still be selected by other symbols, though). It is + similar to a conditional "prompt" attribude for individual menu + entries. Default value of "visible" is true. + - numerical ranges: "range" ["if" ] This allows to limit the range of possible input values for int and hex symbols. The user can only input a value which is larger than @@ -303,7 +310,8 @@ menu: "endmenu" This defines a menu block, see "Menu structure" above for more -information. The only possible options are dependencies. +information. The only possible options are dependencies and "visible" +attributes. if: -- cgit v1.2.1 From 64b81ed7fb1e45c914317b20316f32827bc6444b Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 28 Apr 2011 10:58:52 -0700 Subject: kconfig-language: add to hints Explain a little about kconfig symbol dependencies and symbol existence given optional kconfig language scenarios. Yes, I was bitten by this. Signed-off-by: Randy Dunlap Signed-off-by: Michal Marek --- Documentation/kbuild/kconfig-language.txt | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index 48a3981c717b..44e2649fbb29 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -389,3 +389,25 @@ config FOO limits FOO to module (=m) or disabled (=n). +Kconfig symbol existence +~~~~~~~~~~~~~~~~~~~~~~~~ +The following two methods produce the same kconfig symbol dependencies +but differ greatly in kconfig symbol existence (production) in the +generated config file. + +case 1: + +config FOO + tristate "about foo" + depends on BAR + +vs. case 2: + +if BAR +config FOO + tristate "about foo" +endif + +In case 1, the symbol FOO will always exist in the config file (given +no other dependencies). In case 2, the symbol FOO will only exist in +the config file if BAR is enabled. -- cgit v1.2.1 From bffd2020a972a188750e5cf4b9566950dfdf25a2 Mon Sep 17 00:00:00 2001 From: Peter Foley Date: Mon, 2 May 2011 22:48:03 +0200 Subject: kbuild: move scripts/basic/docproc.c to scripts/docproc.c Move docproc from scripts/basic to scripts so it is only built for *doc targets instead of every time the kernel is built. --- Documentation/DocBook/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 8436b018c289..3cebfa0d1611 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -73,7 +73,7 @@ installmandocs: mandocs ### #External programs used KERNELDOC = $(srctree)/scripts/kernel-doc -DOCPROC = $(objtree)/scripts/basic/docproc +DOCPROC = $(objtree)/scripts/docproc XMLTOFLAGS = -m $(srctree)/Documentation/DocBook/stylesheet.xsl XMLTOFLAGS += --skip-validation -- cgit v1.2.1 From 8417da6f2128008c431c7d130af6cd3d9079922e Mon Sep 17 00:00:00 2001 From: Michal Marek Date: Mon, 2 May 2011 12:51:15 +0200 Subject: kbuild: Fix passing -Wno-* options to gcc 4.4+ Starting with 4.4, gcc will happily accept -Wno- in the cc-option test and complain later when compiling a file that has some other warning. This rather unexpected behavior is intentional as per http://gcc.gnu.org/PR28322, so work around it by testing for support of the opposite option (without the no-). Introduce a new Makefile function cc-disable-warning that does this and update two uses of cc-option in the toplevel Makefile. Reported-and-tested-by: Stephen Rothwell Signed-off-by: Michal Marek --- Documentation/kbuild/makefiles.txt | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index 835b64acf0b4..47435e56c5da 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -501,6 +501,18 @@ more details, with real examples. gcc >= 3.00. For gcc < 3.00, -malign-functions=4 is used. Note: cc-option-align uses KBUILD_CFLAGS for $(CC) options + cc-disable-warning + cc-disable-warning checks if gcc supports a given warning and returns + the commandline switch to disable it. This special function is needed, + because gcc 4.4 and later accept any unknown -Wno-* option and only + warn about it if there is another warning in the source file. + + Example: + KBUILD_CFLAGS += $(call cc-disable-warning, unused-but-set-variable) + + In the above example, -Wno-unused-but-set-variable will be added to + KBUILD_CFLAGS only if gcc really accepts it. + cc-version cc-version returns a numerical version of the $(CC) compiler version. The format is where both are two digits. So for example -- cgit v1.2.1 From 8a4ec67bd5648beb09d7db988a75835b740e950d Mon Sep 17 00:00:00 2001 From: "Stephen M. Cameron" Date: Tue, 3 May 2011 14:54:12 -0500 Subject: cciss: add cciss_tape_cmds module paramter This is to allow number of commands reserved for use by SCSI tape drives and medium changers to be adjusted at driver load time via the kernel parameter cciss_tape_cmds, with a default value of 6, and a range of 2 - 16 inclusive. Previously, the driver limited the number of commands which could be queued to the SCSI half of the the driver to only 2. This is to fix the problem that if you had more than two tape drives, you couldn't, for example, erase or rewind them all at the same time. Signed-off-by: Stephen M. Cameron Signed-off-by: Jens Axboe --- Documentation/blockdev/cciss.txt | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'Documentation') diff --git a/Documentation/blockdev/cciss.txt b/Documentation/blockdev/cciss.txt index 89698e8df7d4..c00c6a5ab21f 100644 --- a/Documentation/blockdev/cciss.txt +++ b/Documentation/blockdev/cciss.txt @@ -169,3 +169,18 @@ is issued which positions the tape to a known position. Typically you must rewind the tape (by issuing "mt -f /dev/st0 rewind" for example) before i/o can proceed again to a tape drive which was reset. +There is a cciss_tape_cmds module parameter which can be used to make cciss +allocate more commands for use by tape drives. Ordinarily only a few commands +(6) are allocated for tape drives because tape drives are slow and +infrequently used and the primary purpose of Smart Array controllers is to +act as a RAID controller for disk drives, so the vast majority of commands +are allocated for disk devices. However, if you have more than a few tape +drives attached to a smart array, the default number of commands may not be +enought (for example, if you have 8 tape drives, you could only rewind 6 +at one time with the default number of commands.) The cciss_tape_cmds module +parameter allows more commands (up to 16 more) to be allocated for use by +tape drives. For example: + + insmod cciss.ko cciss_tape_cmds=16 + +Or, as a kernel boot parameter passed in via grub: cciss.cciss_tape_cmds=8 -- cgit v1.2.1 From a73ddc61e8f14ddc7ab6e6a11dc882ef6a697114 Mon Sep 17 00:00:00 2001 From: Kukjin Kim Date: Wed, 11 May 2011 16:27:51 +0900 Subject: ARM: S5P6442: Removing ARCH_S5P6442 Signed-off-by: Kukjin Kim --- Documentation/arm/Samsung/Overview.txt | 2 -- 1 file changed, 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/arm/Samsung/Overview.txt b/Documentation/arm/Samsung/Overview.txt index c3094ea51aa7..658abb258cef 100644 --- a/Documentation/arm/Samsung/Overview.txt +++ b/Documentation/arm/Samsung/Overview.txt @@ -14,7 +14,6 @@ Introduction - S3C24XX: See Documentation/arm/Samsung-S3C24XX/Overview.txt for full list - S3C64XX: S3C6400 and S3C6410 - S5P6440 - - S5P6442 - S5PC100 - S5PC110 / S5PV210 @@ -36,7 +35,6 @@ Configuration unifying all the SoCs into one kernel. s5p6440_defconfig - S5P6440 specific default configuration - s5p6442_defconfig - S5P6442 specific default configuration s5pc100_defconfig - S5PC100 specific default configuration s5pc110_defconfig - S5PC110 specific default configuration s5pv210_defconfig - S5PV210 specific default configuration -- cgit v1.2.1 From e70bdd41bd0ead91b4a43e9d656ac1569d7c8779 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 11 May 2011 16:35:30 -0700 Subject: Input: rotary-encoder - add support for half-period encoders Add support for encoders that have two detents per input signal period. Signed-off-by: Johan Hovold Acked-by: Daniel Mack Signed-off-by: Dmitry Torokhov --- Documentation/input/rotary-encoder.txt | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'Documentation') diff --git a/Documentation/input/rotary-encoder.txt b/Documentation/input/rotary-encoder.txt index 943e8f6f2b15..92e68bce13a4 100644 --- a/Documentation/input/rotary-encoder.txt +++ b/Documentation/input/rotary-encoder.txt @@ -9,6 +9,9 @@ peripherals with two wires. The outputs are phase-shifted by 90 degrees and by triggering on falling and rising edges, the turn direction can be determined. +Some encoders have both outputs low in stable states, whereas others also have +a stable state with both outputs high (half-period mode). + The phase diagram of these two outputs look like this: _____ _____ _____ @@ -26,6 +29,8 @@ The phase diagram of these two outputs look like this: |<-------->| one step + |<-->| + one step (half-period mode) For more information, please see http://en.wikipedia.org/wiki/Rotary_encoder @@ -34,6 +39,13 @@ For more information, please see 1. Events / state machine ------------------------- +In half-period mode, state a) and c) above are used to determine the +rotational direction based on the last stable state. Events are reported in +states b) and d) given that the new stable state is different from the last +(i.e. the rotation was not reversed half-way). + +Otherwise, the following apply: + a) Rising edge on channel A, channel B in low state This state is used to recognize a clockwise turn @@ -96,6 +108,7 @@ static struct rotary_encoder_platform_data my_rotary_encoder_info = { .gpio_b = GPIO_ROTARY_B, .inverted_a = 0, .inverted_b = 0, + .half_period = false, }; static struct platform_device rotary_encoder_device = { -- cgit v1.2.1 From bc3f07f0906e867270fdc2006b0bbcb130a722c1 Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy Date: Tue, 5 Apr 2011 13:52:20 +0300 Subject: UBIFS: make force in-the-gaps to be a general self-check UBIFS can force itself to use the 'in-the-gaps' commit method - the last resort method which is normally invoced very very rarely. Currently this "force int-the-gaps" debugging feature is a separate test mode. But it is a bit saner to make it to be the "general" self-test check instead. This patch is just a clean-up which should make the debugging code look a bit nicer and easier to use - we have way too many debugging options. Signed-off-by: Artem Bityutskiy --- Documentation/filesystems/ubifs.txt | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ubifs.txt b/Documentation/filesystems/ubifs.txt index d7b13b01e980..7d17e5b91ff4 100644 --- a/Documentation/filesystems/ubifs.txt +++ b/Documentation/filesystems/ubifs.txt @@ -154,7 +154,6 @@ debug_tsts Selects a mode of testing, as follows: Test mode Flag value - Force in-the-gaps method 2 Failure mode for recovery testing 4 For example, set debug_msgs to 5 to display General messages and Mount -- cgit v1.2.1 From 71c6d18859ccb137343017ec995b76d9f62bd9b0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=89ric=20Piel?= Date: Mon, 16 May 2011 22:45:54 -0700 Subject: Input: elantech - describe further the protocol MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit For some Dell laptops, Ubuntu had a special version of the elantech driver with more knowledge on the devices. It can be found there: http://zinc.ubuntu.com/git?p=mid-team/hardy-netbook.git;a=blob;f=drivers/input/mouse/elantech.c;h=d0e2cafed162428f72e3654f4dda85e08ea486b3;hb=refs/heads/abi-22 By inspecting the source code, and doing some test on a real hardware, I have completed the protocol specification (especially for the 6 bytes protocol). It also adds information about the mapping between the version reported by the device and the protocol to use. Signed-off-by: Éric Piel Reviewed-by: Henrik Rydberg Signed-off-by: Dmitry Torokhov --- Documentation/input/elantech.txt | 123 +++++++++++++++++++++++++++++++-------- 1 file changed, 100 insertions(+), 23 deletions(-) (limited to 'Documentation') diff --git a/Documentation/input/elantech.txt b/Documentation/input/elantech.txt index 56941ae1f5db..db798af5ef98 100644 --- a/Documentation/input/elantech.txt +++ b/Documentation/input/elantech.txt @@ -34,7 +34,8 @@ Contents Currently the Linux Elantech touchpad driver is aware of two different hardware versions unimaginatively called version 1 and version 2. Version 1 is found in "older" laptops and uses 4 bytes per packet. Version 2 seems to -be introduced with the EeePC and uses 6 bytes per packet. +be introduced with the EeePC and uses 6 bytes per packet, and provides +additional features such as position of two fingers, and width of the touch. The driver tries to support both hardware versions and should be compatible with the Xorg Synaptics touchpad driver and its graphical configuration @@ -94,18 +95,44 @@ Currently the Linux Elantech touchpad driver provides two extra knobs under can check these bits and reject any packet that appears corrupted. Using this knob you can bypass that check. - It is not known yet whether hardware version 2 provides the same parity - bits. Hence checking is disabled by default. Currently even turning it on - will do nothing. - + Hardware version 2 does not provide the same parity bits. Only some basic + data consistency checking can be done. For now checking is disabled by + default. Currently even turning it on will do nothing. ///////////////////////////////////////////////////////////////////////////// +3. Differentiating hardware versions + ================================= + +To detect the hardware version, read the version number as param[0].param[1].param[2] + + 4 bytes version: (after the arrow is the name given in the Dell-provided driver) + 02.00.22 => EF013 + 02.06.00 => EF019 +In the wild, there appear to be more versions, such as 00.01.64, 01.00.21, +02.00.00, 02.00.04, 02.00.06. + + 6 bytes: + 02.00.30 => EF113 + 02.08.00 => EF023 + 02.08.XX => EF123 + 02.0B.00 => EF215 + 04.01.XX => Scroll_EF051 + 04.02.XX => EF051 +In the wild, there appear to be more versions, such as 04.03.01, 04.04.11. There +appears to be almost no difference, except for EF113, which does not report +pressure/width and has different data consistency checks. + +Probably all the versions with param[0] <= 01 can be considered as +4 bytes/firmware 1. The versions < 02.08.00, with the exception of 02.00.30, as +4 bytes/firmware 2. Everything >= 02.08.00 can be considered as 6 bytes. + +///////////////////////////////////////////////////////////////////////////// -3. Hardware version 1 +4. Hardware version 1 ================== -3.1 Registers +4.1 Registers ~~~~~~~~~ By echoing a hexadecimal value to a register it contents can be altered. @@ -168,7 +195,7 @@ For example: smart edge activation area width? -3.2 Native relative mode 4 byte packet format +4.2 Native relative mode 4 byte packet format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ byte 0: @@ -226,9 +253,13 @@ byte 3: positive = down -3.3 Native absolute mode 4 byte packet format +4.3 Native absolute mode 4 byte packet format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +EF013 and EF019 have a special behaviour (due to a bug in the firmware?), and +when 1 finger is touching, the first 2 position reports must be discarded. +This counting is reset whenever a different number of fingers is reported. + byte 0: firmware version 1.x: @@ -279,11 +310,11 @@ byte 3: ///////////////////////////////////////////////////////////////////////////// -4. Hardware version 2 +5. Hardware version 2 ================== -4.1 Registers +5.1 Registers ~~~~~~~~~ By echoing a hexadecimal value to a register it contents can be altered. @@ -316,16 +347,41 @@ For example: 0x7f = never i.e. tap again to release) -4.2 Native absolute mode 6 byte packet format +5.2 Native absolute mode 6 byte packet format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -4.2.1 One finger touch +5.2.1 Parity checking and packet re-synchronization +There is no parity checking, however some consistency checks can be performed. + +For instance for EF113: + SA1= packet[0]; + A1 = packet[1]; + B1 = packet[2]; + SB1= packet[3]; + C1 = packet[4]; + D1 = packet[5]; + if( (((SA1 & 0x3C) != 0x3C) && ((SA1 & 0xC0) != 0x80)) || // check Byte 1 + (((SA1 & 0x0C) != 0x0C) && ((SA1 & 0xC0) == 0x80)) || // check Byte 1 (one finger pressed) + (((SA1 & 0xC0) != 0x80) && (( A1 & 0xF0) != 0x00)) || // check Byte 2 + (((SB1 & 0x3E) != 0x38) && ((SA1 & 0xC0) != 0x80)) || // check Byte 4 + (((SB1 & 0x0E) != 0x08) && ((SA1 & 0xC0) == 0x80)) || // check Byte 4 (one finger pressed) + (((SA1 & 0xC0) != 0x80) && (( C1 & 0xF0) != 0x00)) ) // check Byte 5 + // error detected + +For all the other ones, there are just a few constant bits: + if( ((packet[0] & 0x0C) != 0x04) || + ((packet[3] & 0x0f) != 0x02) ) + // error detected + + +In case an error is detected, all the packets are shifted by one (and packet[0] is discarded). + +5.2.1 One/Three finger touch ~~~~~~~~~~~~~~~~ byte 0: bit 7 6 5 4 3 2 1 0 - n1 n0 . . . . R L + n1 n0 w3 w2 . . R L L, R = 1 when Left, Right mouse button pressed n1..n0 = numbers of fingers on touchpad @@ -333,24 +389,40 @@ byte 0: byte 1: bit 7 6 5 4 3 2 1 0 - . . . . . x10 x9 x8 + p7 p6 p5 p4 . x10 x9 x8 byte 2: bit 7 6 5 4 3 2 1 0 - x7 x6 x5 x4 x4 x2 x1 x0 + x7 x6 x5 x4 x3 x2 x1 x0 x10..x0 = absolute x value (horizontal) byte 3: bit 7 6 5 4 3 2 1 0 - . . . . . . . . + n4 vf w1 w0 . . . b2 + + n4 = set if more than 3 fingers (only in 3 fingers mode) + vf = a kind of flag ? (only on EF123, 0 when finger is over one + of the buttons, 1 otherwise) + w3..w0 = width of the finger touch (not EF113) + b2 (on EF113 only, 0 otherwise), b2.R.L indicates one button pressed: + 0 = none + 1 = Left + 2 = Right + 3 = Middle (Left and Right) + 4 = Forward + 5 = Back + 6 = Another one + 7 = Another one byte 4: bit 7 6 5 4 3 2 1 0 - . . . . . . y9 y8 + p3 p1 p2 p0 . . y9 y8 + + p7..p0 = pressure (not EF113) byte 5: @@ -363,6 +435,11 @@ byte 5: 4.2.2 Two finger touch ~~~~~~~~~~~~~~~~ +Note that the two pairs of coordinates are not exactly the coordinates of the +two fingers, but only the pair of the lower-left and upper-right coordinates. +So the actual fingers might be situated on the other diagonal of the square +defined by these two points. + byte 0: bit 7 6 5 4 3 2 1 0 @@ -376,14 +453,14 @@ byte 1: bit 7 6 5 4 3 2 1 0 ax7 ax6 ax5 ax4 ax3 ax2 ax1 ax0 - ax8..ax0 = first finger absolute x value + ax8..ax0 = lower-left finger absolute x value byte 2: bit 7 6 5 4 3 2 1 0 ay7 ay6 ay5 ay4 ay3 ay2 ay1 ay0 - ay8..ay0 = first finger absolute y value + ay8..ay0 = lower-left finger absolute y value byte 3: @@ -395,11 +472,11 @@ byte 4: bit 7 6 5 4 3 2 1 0 bx7 bx6 bx5 bx4 bx3 bx2 bx1 bx0 - bx8..bx0 = second finger absolute x value + bx8..bx0 = upper-right finger absolute x value byte 5: bit 7 6 5 4 3 2 1 0 by7 by8 by5 by4 by3 by2 by1 by0 - by8..by0 = second finger absolute y value + by8..by0 = upper-right finger absolute y value -- cgit v1.2.1 From d70d0711edd8076ec2ce0ed109106e2df950681b Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Wed, 18 May 2011 10:37:39 +0200 Subject: block: Add sysfs documentation for the discard topology parameters Add documentation for the discard I/O topology parameters exported in sysfs. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- Documentation/ABI/testing/sysfs-block | 64 +++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index 4873c759d535..c1eb41cb9876 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -142,3 +142,67 @@ Description: with the previous I/O request are enabled. When set to 2, all merge tries are disabled. The default value is 0 - which enables all types of merge tries. + +What: /sys/block//discard_alignment +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may + internally allocate space in units that are bigger than + the exported logical block size. The discard_alignment + parameter indicates how many bytes the beginning of the + device is offset from the internal allocation unit's + natural alignment. + +What: /sys/block///discard_alignment +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may + internally allocate space in units that are bigger than + the exported logical block size. The discard_alignment + parameter indicates how many bytes the beginning of the + partition is offset from the internal allocation unit's + natural alignment. + +What: /sys/block//queue/discard_granularity +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may + internally allocate space using units that are bigger + than the logical block size. The discard_granularity + parameter indicates the size of the internal allocation + unit in bytes if reported by the device. Otherwise the + discard_granularity will be set to match the device's + physical block size. A discard_granularity of 0 means + that the device does not support discard functionality. + +What: /sys/block//queue/discard_max_bytes +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may have + internal limits on the number of bytes that can be + trimmed or unmapped in a single operation. Some storage + protocols also have inherent limits on the number of + blocks that can be described in a single command. The + discard_max_bytes parameter is set by the device driver + to the maximum number of bytes that can be discarded in + a single operation. Discard requests issued to the + device must not exceed this limit. A discard_max_bytes + value of 0 means that the device does not support + discard functionality. + +What: /sys/block//queue/discard_zeroes_data +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may return + stale or random data when a previously discarded block + is read back. This can cause problems if the filesystem + expects discarded blocks to be explicitly cleared. If a + device reports that it deterministically returns zeroes + when a discarded area is read the discard_zeroes_data + parameter will be set to one. Otherwise it will be 0 and + the result of reading a discarded area is undefined. -- cgit v1.2.1 From d410fa4ef99112386de5f218dd7df7b4fca910b4 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 19 May 2011 15:59:38 -0700 Subject: Create Documentation/security/, move LSM-, credentials-, and keys-related files from Documentation/ to Documentation/security/, add Documentation/security/00-INDEX, and update all occurrences of Documentation/ to Documentation/security/. --- Documentation/00-INDEX | 6 +- Documentation/SELinux.txt | 27 - Documentation/Smack.txt | 541 --------- Documentation/apparmor.txt | 39 - Documentation/credentials.txt | 581 ---------- Documentation/filesystems/nfs/idmapper.txt | 4 +- Documentation/keys-request-key.txt | 202 ---- Documentation/keys-trusted-encrypted.txt | 145 --- Documentation/keys.txt | 1290 --------------------- Documentation/networking/dns_resolver.txt | 4 +- Documentation/security/00-INDEX | 18 + Documentation/security/SELinux.txt | 27 + Documentation/security/Smack.txt | 541 +++++++++ Documentation/security/apparmor.txt | 39 + Documentation/security/credentials.txt | 581 ++++++++++ Documentation/security/keys-request-key.txt | 202 ++++ Documentation/security/keys-trusted-encrypted.txt | 145 +++ Documentation/security/keys.txt | 1290 +++++++++++++++++++++ Documentation/security/tomoyo.txt | 55 + Documentation/tomoyo.txt | 55 - 20 files changed, 2904 insertions(+), 2888 deletions(-) delete mode 100644 Documentation/SELinux.txt delete mode 100644 Documentation/Smack.txt delete mode 100644 Documentation/apparmor.txt delete mode 100644 Documentation/credentials.txt delete mode 100644 Documentation/keys-request-key.txt delete mode 100644 Documentation/keys-trusted-encrypted.txt delete mode 100644 Documentation/keys.txt create mode 100644 Documentation/security/00-INDEX create mode 100644 Documentation/security/SELinux.txt create mode 100644 Documentation/security/Smack.txt create mode 100644 Documentation/security/apparmor.txt create mode 100644 Documentation/security/credentials.txt create mode 100644 Documentation/security/keys-request-key.txt create mode 100644 Documentation/security/keys-trusted-encrypted.txt create mode 100644 Documentation/security/keys.txt create mode 100644 Documentation/security/tomoyo.txt delete mode 100644 Documentation/tomoyo.txt (limited to 'Documentation') diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index c17cd4bb2290..c8c1cf631b37 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -192,10 +192,6 @@ kernel-docs.txt - listing of various WWW + books that document kernel internals. kernel-parameters.txt - summary listing of command line / boot prompt args for the kernel. -keys-request-key.txt - - description of the kernel key request service. -keys.txt - - description of the kernel key retention service. kobject.txt - info of the kobject infrastructure of the Linux kernel. kprobes.txt @@ -294,6 +290,8 @@ scheduler/ - directory with info on the scheduler. scsi/ - directory with info on Linux scsi support. +security/ + - directory that contains security-related info serial/ - directory with info on the low level serial API. serial-console.txt diff --git a/Documentation/SELinux.txt b/Documentation/SELinux.txt deleted file mode 100644 index 07eae00f3314..000000000000 --- a/Documentation/SELinux.txt +++ /dev/null @@ -1,27 +0,0 @@ -If you want to use SELinux, chances are you will want -to use the distro-provided policies, or install the -latest reference policy release from - http://oss.tresys.com/projects/refpolicy - -However, if you want to install a dummy policy for -testing, you can do using 'mdp' provided under -scripts/selinux. Note that this requires the selinux -userspace to be installed - in particular you will -need checkpolicy to compile a kernel, and setfiles and -fixfiles to label the filesystem. - - 1. Compile the kernel with selinux enabled. - 2. Type 'make' to compile mdp. - 3. Make sure that you are not running with - SELinux enabled and a real policy. If - you are, reboot with selinux disabled - before continuing. - 4. Run install_policy.sh: - cd scripts/selinux - sh install_policy.sh - -Step 4 will create a new dummy policy valid for your -kernel, with a single selinux user, role, and type. -It will compile the policy, will set your SELINUXTYPE to -dummy in /etc/selinux/config, install the compiled policy -as 'dummy', and relabel your filesystem. diff --git a/Documentation/Smack.txt b/Documentation/Smack.txt deleted file mode 100644 index e9dab41c0fe0..000000000000 --- a/Documentation/Smack.txt +++ /dev/null @@ -1,541 +0,0 @@ - - - "Good for you, you've decided to clean the elevator!" - - The Elevator, from Dark Star - -Smack is the the Simplified Mandatory Access Control Kernel. -Smack is a kernel based implementation of mandatory access -control that includes simplicity in its primary design goals. - -Smack is not the only Mandatory Access Control scheme -available for Linux. Those new to Mandatory Access Control -are encouraged to compare Smack with the other mechanisms -available to determine which is best suited to the problem -at hand. - -Smack consists of three major components: - - The kernel - - A start-up script and a few modified applications - - Configuration data - -The kernel component of Smack is implemented as a Linux -Security Modules (LSM) module. It requires netlabel and -works best with file systems that support extended attributes, -although xattr support is not strictly required. -It is safe to run a Smack kernel under a "vanilla" distribution. -Smack kernels use the CIPSO IP option. Some network -configurations are intolerant of IP options and can impede -access to systems that use them as Smack does. - -The startup script etc-init.d-smack should be installed -in /etc/init.d/smack and should be invoked early in the -start-up process. On Fedora rc5.d/S02smack is recommended. -This script ensures that certain devices have the correct -Smack attributes and loads the Smack configuration if -any is defined. This script invokes two programs that -ensure configuration data is properly formatted. These -programs are /usr/sbin/smackload and /usr/sin/smackcipso. -The system will run just fine without these programs, -but it will be difficult to set access rules properly. - -A version of "ls" that provides a "-M" option to display -Smack labels on long listing is available. - -A hacked version of sshd that allows network logins by users -with specific Smack labels is available. This version does -not work for scp. You must set the /etc/ssh/sshd_config -line: - UsePrivilegeSeparation no - -The format of /etc/smack/usr is: - - username smack - -In keeping with the intent of Smack, configuration data is -minimal and not strictly required. The most important -configuration step is mounting the smackfs pseudo filesystem. - -Add this line to /etc/fstab: - - smackfs /smack smackfs smackfsdef=* 0 0 - -and create the /smack directory for mounting. - -Smack uses extended attributes (xattrs) to store file labels. -The command to set a Smack label on a file is: - - # attr -S -s SMACK64 -V "value" path - -NOTE: Smack labels are limited to 23 characters. The attr command - does not enforce this restriction and can be used to set - invalid Smack labels on files. - -If you don't do anything special all users will get the floor ("_") -label when they log in. If you do want to log in via the hacked ssh -at other labels use the attr command to set the smack value on the -home directory and its contents. - -You can add access rules in /etc/smack/accesses. They take the form: - - subjectlabel objectlabel access - -access is a combination of the letters rwxa which specify the -kind of access permitted a subject with subjectlabel on an -object with objectlabel. If there is no rule no access is allowed. - -A process can see the smack label it is running with by -reading /proc/self/attr/current. A privileged process can -set the process smack by writing there. - -Look for additional programs on http://schaufler-ca.com - -From the Smack Whitepaper: - -The Simplified Mandatory Access Control Kernel - -Casey Schaufler -casey@schaufler-ca.com - -Mandatory Access Control - -Computer systems employ a variety of schemes to constrain how information is -shared among the people and services using the machine. Some of these schemes -allow the program or user to decide what other programs or users are allowed -access to pieces of data. These schemes are called discretionary access -control mechanisms because the access control is specified at the discretion -of the user. Other schemes do not leave the decision regarding what a user or -program can access up to users or programs. These schemes are called mandatory -access control mechanisms because you don't have a choice regarding the users -or programs that have access to pieces of data. - -Bell & LaPadula - -From the middle of the 1980's until the turn of the century Mandatory Access -Control (MAC) was very closely associated with the Bell & LaPadula security -model, a mathematical description of the United States Department of Defense -policy for marking paper documents. MAC in this form enjoyed a following -within the Capital Beltway and Scandinavian supercomputer centers but was -often sited as failing to address general needs. - -Domain Type Enforcement - -Around the turn of the century Domain Type Enforcement (DTE) became popular. -This scheme organizes users, programs, and data into domains that are -protected from each other. This scheme has been widely deployed as a component -of popular Linux distributions. The administrative overhead required to -maintain this scheme and the detailed understanding of the whole system -necessary to provide a secure domain mapping leads to the scheme being -disabled or used in limited ways in the majority of cases. - -Smack - -Smack is a Mandatory Access Control mechanism designed to provide useful MAC -while avoiding the pitfalls of its predecessors. The limitations of Bell & -LaPadula are addressed by providing a scheme whereby access can be controlled -according to the requirements of the system and its purpose rather than those -imposed by an arcane government policy. The complexity of Domain Type -Enforcement and avoided by defining access controls in terms of the access -modes already in use. - -Smack Terminology - -The jargon used to talk about Smack will be familiar to those who have dealt -with other MAC systems and shouldn't be too difficult for the uninitiated to -pick up. There are four terms that are used in a specific way and that are -especially important: - - Subject: A subject is an active entity on the computer system. - On Smack a subject is a task, which is in turn the basic unit - of execution. - - Object: An object is a passive entity on the computer system. - On Smack files of all types, IPC, and tasks can be objects. - - Access: Any attempt by a subject to put information into or get - information from an object is an access. - - Label: Data that identifies the Mandatory Access Control - characteristics of a subject or an object. - -These definitions are consistent with the traditional use in the security -community. There are also some terms from Linux that are likely to crop up: - - Capability: A task that possesses a capability has permission to - violate an aspect of the system security policy, as identified by - the specific capability. A task that possesses one or more - capabilities is a privileged task, whereas a task with no - capabilities is an unprivileged task. - - Privilege: A task that is allowed to violate the system security - policy is said to have privilege. As of this writing a task can - have privilege either by possessing capabilities or by having an - effective user of root. - -Smack Basics - -Smack is an extension to a Linux system. It enforces additional restrictions -on what subjects can access which objects, based on the labels attached to -each of the subject and the object. - -Labels - -Smack labels are ASCII character strings, one to twenty-three characters in -length. Single character labels using special characters, that being anything -other than a letter or digit, are reserved for use by the Smack development -team. Smack labels are unstructured, case sensitive, and the only operation -ever performed on them is comparison for equality. Smack labels cannot -contain unprintable characters, the "/" (slash), the "\" (backslash), the "'" -(quote) and '"' (double-quote) characters. -Smack labels cannot begin with a '-', which is reserved for special options. - -There are some predefined labels: - - _ Pronounced "floor", a single underscore character. - ^ Pronounced "hat", a single circumflex character. - * Pronounced "star", a single asterisk character. - ? Pronounced "huh", a single question mark character. - @ Pronounced "Internet", a single at sign character. - -Every task on a Smack system is assigned a label. System tasks, such as -init(8) and systems daemons, are run with the floor ("_") label. User tasks -are assigned labels according to the specification found in the -/etc/smack/user configuration file. - -Access Rules - -Smack uses the traditional access modes of Linux. These modes are read, -execute, write, and occasionally append. There are a few cases where the -access mode may not be obvious. These include: - - Signals: A signal is a write operation from the subject task to - the object task. - Internet Domain IPC: Transmission of a packet is considered a - write operation from the source task to the destination task. - -Smack restricts access based on the label attached to a subject and the label -attached to the object it is trying to access. The rules enforced are, in -order: - - 1. Any access requested by a task labeled "*" is denied. - 2. A read or execute access requested by a task labeled "^" - is permitted. - 3. A read or execute access requested on an object labeled "_" - is permitted. - 4. Any access requested on an object labeled "*" is permitted. - 5. Any access requested by a task on an object with the same - label is permitted. - 6. Any access requested that is explicitly defined in the loaded - rule set is permitted. - 7. Any other access is denied. - -Smack Access Rules - -With the isolation provided by Smack access separation is simple. There are -many interesting cases where limited access by subjects to objects with -different labels is desired. One example is the familiar spy model of -sensitivity, where a scientist working on a highly classified project would be -able to read documents of lower classifications and anything she writes will -be "born" highly classified. To accommodate such schemes Smack includes a -mechanism for specifying rules allowing access between labels. - -Access Rule Format - -The format of an access rule is: - - subject-label object-label access - -Where subject-label is the Smack label of the task, object-label is the Smack -label of the thing being accessed, and access is a string specifying the sort -of access allowed. The Smack labels are limited to 23 characters. The access -specification is searched for letters that describe access modes: - - a: indicates that append access should be granted. - r: indicates that read access should be granted. - w: indicates that write access should be granted. - x: indicates that execute access should be granted. - -Uppercase values for the specification letters are allowed as well. -Access mode specifications can be in any order. Examples of acceptable rules -are: - - TopSecret Secret rx - Secret Unclass R - Manager Game x - User HR w - New Old rRrRr - Closed Off - - -Examples of unacceptable rules are: - - Top Secret Secret rx - Ace Ace r - Odd spells waxbeans - -Spaces are not allowed in labels. Since a subject always has access to files -with the same label specifying a rule for that case is pointless. Only -valid letters (rwxaRWXA) and the dash ('-') character are allowed in -access specifications. The dash is a placeholder, so "a-r" is the same -as "ar". A lone dash is used to specify that no access should be allowed. - -Applying Access Rules - -The developers of Linux rarely define new sorts of things, usually importing -schemes and concepts from other systems. Most often, the other systems are -variants of Unix. Unix has many endearing properties, but consistency of -access control models is not one of them. Smack strives to treat accesses as -uniformly as is sensible while keeping with the spirit of the underlying -mechanism. - -File system objects including files, directories, named pipes, symbolic links, -and devices require access permissions that closely match those used by mode -bit access. To open a file for reading read access is required on the file. To -search a directory requires execute access. Creating a file with write access -requires both read and write access on the containing directory. Deleting a -file requires read and write access to the file and to the containing -directory. It is possible that a user may be able to see that a file exists -but not any of its attributes by the circumstance of having read access to the -containing directory but not to the differently labeled file. This is an -artifact of the file name being data in the directory, not a part of the file. - -IPC objects, message queues, semaphore sets, and memory segments exist in flat -namespaces and access requests are only required to match the object in -question. - -Process objects reflect tasks on the system and the Smack label used to access -them is the same Smack label that the task would use for its own access -attempts. Sending a signal via the kill() system call is a write operation -from the signaler to the recipient. Debugging a process requires both reading -and writing. Creating a new task is an internal operation that results in two -tasks with identical Smack labels and requires no access checks. - -Sockets are data structures attached to processes and sending a packet from -one process to another requires that the sender have write access to the -receiver. The receiver is not required to have read access to the sender. - -Setting Access Rules - -The configuration file /etc/smack/accesses contains the rules to be set at -system startup. The contents are written to the special file /smack/load. -Rules can be written to /smack/load at any time and take effect immediately. -For any pair of subject and object labels there can be only one rule, with the -most recently specified overriding any earlier specification. - -The program smackload is provided to ensure data is formatted -properly when written to /smack/load. This program reads lines -of the form - - subjectlabel objectlabel mode. - -Task Attribute - -The Smack label of a process can be read from /proc//attr/current. A -process can read its own Smack label from /proc/self/attr/current. A -privileged process can change its own Smack label by writing to -/proc/self/attr/current but not the label of another process. - -File Attribute - -The Smack label of a filesystem object is stored as an extended attribute -named SMACK64 on the file. This attribute is in the security namespace. It can -only be changed by a process with privilege. - -Privilege - -A process with CAP_MAC_OVERRIDE is privileged. - -Smack Networking - -As mentioned before, Smack enforces access control on network protocol -transmissions. Every packet sent by a Smack process is tagged with its Smack -label. This is done by adding a CIPSO tag to the header of the IP packet. Each -packet received is expected to have a CIPSO tag that identifies the label and -if it lacks such a tag the network ambient label is assumed. Before the packet -is delivered a check is made to determine that a subject with the label on the -packet has write access to the receiving process and if that is not the case -the packet is dropped. - -CIPSO Configuration - -It is normally unnecessary to specify the CIPSO configuration. The default -values used by the system handle all internal cases. Smack will compose CIPSO -label values to match the Smack labels being used without administrative -intervention. Unlabeled packets that come into the system will be given the -ambient label. - -Smack requires configuration in the case where packets from a system that is -not smack that speaks CIPSO may be encountered. Usually this will be a Trusted -Solaris system, but there are other, less widely deployed systems out there. -CIPSO provides 3 important values, a Domain Of Interpretation (DOI), a level, -and a category set with each packet. The DOI is intended to identify a group -of systems that use compatible labeling schemes, and the DOI specified on the -smack system must match that of the remote system or packets will be -discarded. The DOI is 3 by default. The value can be read from /smack/doi and -can be changed by writing to /smack/doi. - -The label and category set are mapped to a Smack label as defined in -/etc/smack/cipso. - -A Smack/CIPSO mapping has the form: - - smack level [category [category]*] - -Smack does not expect the level or category sets to be related in any -particular way and does not assume or assign accesses based on them. Some -examples of mappings: - - TopSecret 7 - TS:A,B 7 1 2 - SecBDE 5 2 4 6 - RAFTERS 7 12 26 - -The ":" and "," characters are permitted in a Smack label but have no special -meaning. - -The mapping of Smack labels to CIPSO values is defined by writing to -/smack/cipso. Again, the format of data written to this special file -is highly restrictive, so the program smackcipso is provided to -ensure the writes are done properly. This program takes mappings -on the standard input and sends them to /smack/cipso properly. - -In addition to explicit mappings Smack supports direct CIPSO mappings. One -CIPSO level is used to indicate that the category set passed in the packet is -in fact an encoding of the Smack label. The level used is 250 by default. The -value can be read from /smack/direct and changed by writing to /smack/direct. - -Socket Attributes - -There are two attributes that are associated with sockets. These attributes -can only be set by privileged tasks, but any task can read them for their own -sockets. - - SMACK64IPIN: The Smack label of the task object. A privileged - program that will enforce policy may set this to the star label. - - SMACK64IPOUT: The Smack label transmitted with outgoing packets. - A privileged program may set this to match the label of another - task with which it hopes to communicate. - -Smack Netlabel Exceptions - -You will often find that your labeled application has to talk to the outside, -unlabeled world. To do this there's a special file /smack/netlabel where you can -add some exceptions in the form of : -@IP1 LABEL1 or -@IP2/MASK LABEL2 - -It means that your application will have unlabeled access to @IP1 if it has -write access on LABEL1, and access to the subnet @IP2/MASK if it has write -access on LABEL2. - -Entries in the /smack/netlabel file are matched by longest mask first, like in -classless IPv4 routing. - -A special label '@' and an option '-CIPSO' can be used there : -@ means Internet, any application with any label has access to it --CIPSO means standard CIPSO networking - -If you don't know what CIPSO is and don't plan to use it, you can just do : -echo 127.0.0.1 -CIPSO > /smack/netlabel -echo 0.0.0.0/0 @ > /smack/netlabel - -If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled -Internet access, you can have : -echo 127.0.0.1 -CIPSO > /smack/netlabel -echo 192.168.0.0/16 -CIPSO > /smack/netlabel -echo 0.0.0.0/0 @ > /smack/netlabel - - -Writing Applications for Smack - -There are three sorts of applications that will run on a Smack system. How an -application interacts with Smack will determine what it will have to do to -work properly under Smack. - -Smack Ignorant Applications - -By far the majority of applications have no reason whatever to care about the -unique properties of Smack. Since invoking a program has no impact on the -Smack label associated with the process the only concern likely to arise is -whether the process has execute access to the program. - -Smack Relevant Applications - -Some programs can be improved by teaching them about Smack, but do not make -any security decisions themselves. The utility ls(1) is one example of such a -program. - -Smack Enforcing Applications - -These are special programs that not only know about Smack, but participate in -the enforcement of system policy. In most cases these are the programs that -set up user sessions. There are also network services that provide information -to processes running with various labels. - -File System Interfaces - -Smack maintains labels on file system objects using extended attributes. The -Smack label of a file, directory, or other file system object can be obtained -using getxattr(2). - - len = getxattr("/", "security.SMACK64", value, sizeof (value)); - -will put the Smack label of the root directory into value. A privileged -process can set the Smack label of a file system object with setxattr(2). - - len = strlen("Rubble"); - rc = setxattr("/foo", "security.SMACK64", "Rubble", len, 0); - -will set the Smack label of /foo to "Rubble" if the program has appropriate -privilege. - -Socket Interfaces - -The socket attributes can be read using fgetxattr(2). - -A privileged process can set the Smack label of outgoing packets with -fsetxattr(2). - - len = strlen("Rubble"); - rc = fsetxattr(fd, "security.SMACK64IPOUT", "Rubble", len, 0); - -will set the Smack label "Rubble" on packets going out from the socket if the -program has appropriate privilege. - - rc = fsetxattr(fd, "security.SMACK64IPIN, "*", strlen("*"), 0); - -will set the Smack label "*" as the object label against which incoming -packets will be checked if the program has appropriate privilege. - -Administration - -Smack supports some mount options: - - smackfsdef=label: specifies the label to give files that lack - the Smack label extended attribute. - - smackfsroot=label: specifies the label to assign the root of the - file system if it lacks the Smack extended attribute. - - smackfshat=label: specifies a label that must have read access to - all labels set on the filesystem. Not yet enforced. - - smackfsfloor=label: specifies a label to which all labels set on the - filesystem must have read access. Not yet enforced. - -These mount options apply to all file system types. - -Smack auditing - -If you want Smack auditing of security events, you need to set CONFIG_AUDIT -in your kernel configuration. -By default, all denied events will be audited. You can change this behavior by -writing a single character to the /smack/logging file : -0 : no logging -1 : log denied (default) -2 : log accepted -3 : log denied & accepted - -Events are logged as 'key=value' pairs, for each event you at least will get -the subjet, the object, the rights requested, the action, the kernel function -that triggered the event, plus other pairs depending on the type of event -audited. diff --git a/Documentation/apparmor.txt b/Documentation/apparmor.txt deleted file mode 100644 index 93c1fd7d0635..000000000000 --- a/Documentation/apparmor.txt +++ /dev/null @@ -1,39 +0,0 @@ ---- What is AppArmor? --- - -AppArmor is MAC style security extension for the Linux kernel. It implements -a task centered policy, with task "profiles" being created and loaded -from user space. Tasks on the system that do not have a profile defined for -them run in an unconfined state which is equivalent to standard Linux DAC -permissions. - ---- How to enable/disable --- - -set CONFIG_SECURITY_APPARMOR=y - -If AppArmor should be selected as the default security module then - set CONFIG_DEFAULT_SECURITY="apparmor" - and CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1 - -Build the kernel - -If AppArmor is not the default security module it can be enabled by passing -security=apparmor on the kernel's command line. - -If AppArmor is the default security module it can be disabled by passing -apparmor=0, security=XXXX (where XXX is valid security module), on the -kernel's command line - -For AppArmor to enforce any restrictions beyond standard Linux DAC permissions -policy must be loaded into the kernel from user space (see the Documentation -and tools links). - ---- Documentation --- - -Documentation can be found on the wiki. - ---- Links --- - -Mailing List - apparmor@lists.ubuntu.com -Wiki - http://apparmor.wiki.kernel.org/ -User space tools - https://launchpad.net/apparmor -Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git diff --git a/Documentation/credentials.txt b/Documentation/credentials.txt deleted file mode 100644 index 995baf379c07..000000000000 --- a/Documentation/credentials.txt +++ /dev/null @@ -1,581 +0,0 @@ - ==================== - CREDENTIALS IN LINUX - ==================== - -By: David Howells - -Contents: - - (*) Overview. - - (*) Types of credentials. - - (*) File markings. - - (*) Task credentials. - - - Immutable credentials. - - Accessing task credentials. - - Accessing another task's credentials. - - Altering credentials. - - Managing credentials. - - (*) Open file credentials. - - (*) Overriding the VFS's use of credentials. - - -======== -OVERVIEW -======== - -There are several parts to the security check performed by Linux when one -object acts upon another: - - (1) Objects. - - Objects are things in the system that may be acted upon directly by - userspace programs. Linux has a variety of actionable objects, including: - - - Tasks - - Files/inodes - - Sockets - - Message queues - - Shared memory segments - - Semaphores - - Keys - - As a part of the description of all these objects there is a set of - credentials. What's in the set depends on the type of object. - - (2) Object ownership. - - Amongst the credentials of most objects, there will be a subset that - indicates the ownership of that object. This is used for resource - accounting and limitation (disk quotas and task rlimits for example). - - In a standard UNIX filesystem, for instance, this will be defined by the - UID marked on the inode. - - (3) The objective context. - - Also amongst the credentials of those objects, there will be a subset that - indicates the 'objective context' of that object. This may or may not be - the same set as in (2) - in standard UNIX files, for instance, this is the - defined by the UID and the GID marked on the inode. - - The objective context is used as part of the security calculation that is - carried out when an object is acted upon. - - (4) Subjects. - - A subject is an object that is acting upon another object. - - Most of the objects in the system are inactive: they don't act on other - objects within the system. Processes/tasks are the obvious exception: - they do stuff; they access and manipulate things. - - Objects other than tasks may under some circumstances also be subjects. - For instance an open file may send SIGIO to a task using the UID and EUID - given to it by a task that called fcntl(F_SETOWN) upon it. In this case, - the file struct will have a subjective context too. - - (5) The subjective context. - - A subject has an additional interpretation of its credentials. A subset - of its credentials forms the 'subjective context'. The subjective context - is used as part of the security calculation that is carried out when a - subject acts. - - A Linux task, for example, has the FSUID, FSGID and the supplementary - group list for when it is acting upon a file - which are quite separate - from the real UID and GID that normally form the objective context of the - task. - - (6) Actions. - - Linux has a number of actions available that a subject may perform upon an - object. The set of actions available depends on the nature of the subject - and the object. - - Actions include reading, writing, creating and deleting files; forking or - signalling and tracing tasks. - - (7) Rules, access control lists and security calculations. - - When a subject acts upon an object, a security calculation is made. This - involves taking the subjective context, the objective context and the - action, and searching one or more sets of rules to see whether the subject - is granted or denied permission to act in the desired manner on the - object, given those contexts. - - There are two main sources of rules: - - (a) Discretionary access control (DAC): - - Sometimes the object will include sets of rules as part of its - description. This is an 'Access Control List' or 'ACL'. A Linux - file may supply more than one ACL. - - A traditional UNIX file, for example, includes a permissions mask that - is an abbreviated ACL with three fixed classes of subject ('user', - 'group' and 'other'), each of which may be granted certain privileges - ('read', 'write' and 'execute' - whatever those map to for the object - in question). UNIX file permissions do not allow the arbitrary - specification of subjects, however, and so are of limited use. - - A Linux file might also sport a POSIX ACL. This is a list of rules - that grants various permissions to arbitrary subjects. - - (b) Mandatory access control (MAC): - - The system as a whole may have one or more sets of rules that get - applied to all subjects and objects, regardless of their source. - SELinux and Smack are examples of this. - - In the case of SELinux and Smack, each object is given a label as part - of its credentials. When an action is requested, they take the - subject label, the object label and the action and look for a rule - that says that this action is either granted or denied. - - -==================== -TYPES OF CREDENTIALS -==================== - -The Linux kernel supports the following types of credentials: - - (1) Traditional UNIX credentials. - - Real User ID - Real Group ID - - The UID and GID are carried by most, if not all, Linux objects, even if in - some cases it has to be invented (FAT or CIFS files for example, which are - derived from Windows). These (mostly) define the objective context of - that object, with tasks being slightly different in some cases. - - Effective, Saved and FS User ID - Effective, Saved and FS Group ID - Supplementary groups - - These are additional credentials used by tasks only. Usually, an - EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID - will be used as the objective. For tasks, it should be noted that this is - not always true. - - (2) Capabilities. - - Set of permitted capabilities - Set of inheritable capabilities - Set of effective capabilities - Capability bounding set - - These are only carried by tasks. They indicate superior capabilities - granted piecemeal to a task that an ordinary task wouldn't otherwise have. - These are manipulated implicitly by changes to the traditional UNIX - credentials, but can also be manipulated directly by the capset() system - call. - - The permitted capabilities are those caps that the process might grant - itself to its effective or permitted sets through capset(). This - inheritable set might also be so constrained. - - The effective capabilities are the ones that a task is actually allowed to - make use of itself. - - The inheritable capabilities are the ones that may get passed across - execve(). - - The bounding set limits the capabilities that may be inherited across - execve(), especially when a binary is executed that will execute as UID 0. - - (3) Secure management flags (securebits). - - These are only carried by tasks. These govern the way the above - credentials are manipulated and inherited over certain operations such as - execve(). They aren't used directly as objective or subjective - credentials. - - (4) Keys and keyrings. - - These are only carried by tasks. They carry and cache security tokens - that don't fit into the other standard UNIX credentials. They are for - making such things as network filesystem keys available to the file - accesses performed by processes, without the necessity of ordinary - programs having to know about security details involved. - - Keyrings are a special type of key. They carry sets of other keys and can - be searched for the desired key. Each process may subscribe to a number - of keyrings: - - Per-thread keying - Per-process keyring - Per-session keyring - - When a process accesses a key, if not already present, it will normally be - cached on one of these keyrings for future accesses to find. - - For more information on using keys, see Documentation/keys.txt. - - (5) LSM - - The Linux Security Module allows extra controls to be placed over the - operations that a task may do. Currently Linux supports two main - alternate LSM options: SELinux and Smack. - - Both work by labelling the objects in a system and then applying sets of - rules (policies) that say what operations a task with one label may do to - an object with another label. - - (6) AF_KEY - - This is a socket-based approach to credential management for networking - stacks [RFC 2367]. It isn't discussed by this document as it doesn't - interact directly with task and file credentials; rather it keeps system - level credentials. - - -When a file is opened, part of the opening task's subjective context is -recorded in the file struct created. This allows operations using that file -struct to use those credentials instead of the subjective context of the task -that issued the operation. An example of this would be a file opened on a -network filesystem where the credentials of the opened file should be presented -to the server, regardless of who is actually doing a read or a write upon it. - - -============= -FILE MARKINGS -============= - -Files on disk or obtained over the network may have annotations that form the -objective security context of that file. Depending on the type of filesystem, -this may include one or more of the following: - - (*) UNIX UID, GID, mode; - - (*) Windows user ID; - - (*) Access control list; - - (*) LSM security label; - - (*) UNIX exec privilege escalation bits (SUID/SGID); - - (*) File capabilities exec privilege escalation bits. - -These are compared to the task's subjective security context, and certain -operations allowed or disallowed as a result. In the case of execve(), the -privilege escalation bits come into play, and may allow the resulting process -extra privileges, based on the annotations on the executable file. - - -================ -TASK CREDENTIALS -================ - -In Linux, all of a task's credentials are held in (uid, gid) or through -(groups, keys, LSM security) a refcounted structure of type 'struct cred'. -Each task points to its credentials by a pointer called 'cred' in its -task_struct. - -Once a set of credentials has been prepared and committed, it may not be -changed, barring the following exceptions: - - (1) its reference count may be changed; - - (2) the reference count on the group_info struct it points to may be changed; - - (3) the reference count on the security data it points to may be changed; - - (4) the reference count on any keyrings it points to may be changed; - - (5) any keyrings it points to may be revoked, expired or have their security - attributes changed; and - - (6) the contents of any keyrings to which it points may be changed (the whole - point of keyrings being a shared set of credentials, modifiable by anyone - with appropriate access). - -To alter anything in the cred struct, the copy-and-replace principle must be -adhered to. First take a copy, then alter the copy and then use RCU to change -the task pointer to make it point to the new copy. There are wrappers to aid -with this (see below). - -A task may only alter its _own_ credentials; it is no longer permitted for a -task to alter another's credentials. This means the capset() system call is no -longer permitted to take any PID other than the one of the current process. -Also keyctl_instantiate() and keyctl_negate() functions no longer permit -attachment to process-specific keyrings in the requesting process as the -instantiating process may need to create them. - - -IMMUTABLE CREDENTIALS ---------------------- - -Once a set of credentials has been made public (by calling commit_creds() for -example), it must be considered immutable, barring two exceptions: - - (1) The reference count may be altered. - - (2) Whilst the keyring subscriptions of a set of credentials may not be - changed, the keyrings subscribed to may have their contents altered. - -To catch accidental credential alteration at compile time, struct task_struct -has _const_ pointers to its credential sets, as does struct file. Furthermore, -certain functions such as get_cred() and put_cred() operate on const pointers, -thus rendering casts unnecessary, but require to temporarily ditch the const -qualification to be able to alter the reference count. - - -ACCESSING TASK CREDENTIALS --------------------------- - -A task being able to alter only its own credentials permits the current process -to read or replace its own credentials without the need for any form of locking -- which simplifies things greatly. It can just call: - - const struct cred *current_cred() - -to get a pointer to its credentials structure, and it doesn't have to release -it afterwards. - -There are convenience wrappers for retrieving specific aspects of a task's -credentials (the value is simply returned in each case): - - uid_t current_uid(void) Current's real UID - gid_t current_gid(void) Current's real GID - uid_t current_euid(void) Current's effective UID - gid_t current_egid(void) Current's effective GID - uid_t current_fsuid(void) Current's file access UID - gid_t current_fsgid(void) Current's file access GID - kernel_cap_t current_cap(void) Current's effective capabilities - void *current_security(void) Current's LSM security pointer - struct user_struct *current_user(void) Current's user account - -There are also convenience wrappers for retrieving specific associated pairs of -a task's credentials: - - void current_uid_gid(uid_t *, gid_t *); - void current_euid_egid(uid_t *, gid_t *); - void current_fsuid_fsgid(uid_t *, gid_t *); - -which return these pairs of values through their arguments after retrieving -them from the current task's credentials. - - -In addition, there is a function for obtaining a reference on the current -process's current set of credentials: - - const struct cred *get_current_cred(void); - -and functions for getting references to one of the credentials that don't -actually live in struct cred: - - struct user_struct *get_current_user(void); - struct group_info *get_current_groups(void); - -which get references to the current process's user accounting structure and -supplementary groups list respectively. - -Once a reference has been obtained, it must be released with put_cred(), -free_uid() or put_group_info() as appropriate. - - -ACCESSING ANOTHER TASK'S CREDENTIALS ------------------------------------- - -Whilst a task may access its own credentials without the need for locking, the -same is not true of a task wanting to access another task's credentials. It -must use the RCU read lock and rcu_dereference(). - -The rcu_dereference() is wrapped by: - - const struct cred *__task_cred(struct task_struct *task); - -This should be used inside the RCU read lock, as in the following example: - - void foo(struct task_struct *t, struct foo_data *f) - { - const struct cred *tcred; - ... - rcu_read_lock(); - tcred = __task_cred(t); - f->uid = tcred->uid; - f->gid = tcred->gid; - f->groups = get_group_info(tcred->groups); - rcu_read_unlock(); - ... - } - -Should it be necessary to hold another task's credentials for a long period of -time, and possibly to sleep whilst doing so, then the caller should get a -reference on them using: - - const struct cred *get_task_cred(struct task_struct *task); - -This does all the RCU magic inside of it. The caller must call put_cred() on -the credentials so obtained when they're finished with. - - [*] Note: The result of __task_cred() should not be passed directly to - get_cred() as this may race with commit_cred(). - -There are a couple of convenience functions to access bits of another task's -credentials, hiding the RCU magic from the caller: - - uid_t task_uid(task) Task's real UID - uid_t task_euid(task) Task's effective UID - -If the caller is holding the RCU read lock at the time anyway, then: - - __task_cred(task)->uid - __task_cred(task)->euid - -should be used instead. Similarly, if multiple aspects of a task's credentials -need to be accessed, RCU read lock should be used, __task_cred() called, the -result stored in a temporary pointer and then the credential aspects called -from that before dropping the lock. This prevents the potentially expensive -RCU magic from being invoked multiple times. - -Should some other single aspect of another task's credentials need to be -accessed, then this can be used: - - task_cred_xxx(task, member) - -where 'member' is a non-pointer member of the cred struct. For instance: - - uid_t task_cred_xxx(task, suid); - -will retrieve 'struct cred::suid' from the task, doing the appropriate RCU -magic. This may not be used for pointer members as what they point to may -disappear the moment the RCU read lock is dropped. - - -ALTERING CREDENTIALS --------------------- - -As previously mentioned, a task may only alter its own credentials, and may not -alter those of another task. This means that it doesn't need to use any -locking to alter its own credentials. - -To alter the current process's credentials, a function should first prepare a -new set of credentials by calling: - - struct cred *prepare_creds(void); - -this locks current->cred_replace_mutex and then allocates and constructs a -duplicate of the current process's credentials, returning with the mutex still -held if successful. It returns NULL if not successful (out of memory). - -The mutex prevents ptrace() from altering the ptrace state of a process whilst -security checks on credentials construction and changing is taking place as -the ptrace state may alter the outcome, particularly in the case of execve(). - -The new credentials set should be altered appropriately, and any security -checks and hooks done. Both the current and the proposed sets of credentials -are available for this purpose as current_cred() will return the current set -still at this point. - - -When the credential set is ready, it should be committed to the current process -by calling: - - int commit_creds(struct cred *new); - -This will alter various aspects of the credentials and the process, giving the -LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually -commit the new credentials to current->cred, it will release -current->cred_replace_mutex to allow ptrace() to take place, and it will notify -the scheduler and others of the changes. - -This function is guaranteed to return 0, so that it can be tail-called at the -end of such functions as sys_setresuid(). - -Note that this function consumes the caller's reference to the new credentials. -The caller should _not_ call put_cred() on the new credentials afterwards. - -Furthermore, once this function has been called on a new set of credentials, -those credentials may _not_ be changed further. - - -Should the security checks fail or some other error occur after prepare_creds() -has been called, then the following function should be invoked: - - void abort_creds(struct cred *new); - -This releases the lock on current->cred_replace_mutex that prepare_creds() got -and then releases the new credentials. - - -A typical credentials alteration function would look something like this: - - int alter_suid(uid_t suid) - { - struct cred *new; - int ret; - - new = prepare_creds(); - if (!new) - return -ENOMEM; - - new->suid = suid; - ret = security_alter_suid(new); - if (ret < 0) { - abort_creds(new); - return ret; - } - - return commit_creds(new); - } - - -MANAGING CREDENTIALS --------------------- - -There are some functions to help manage credentials: - - (*) void put_cred(const struct cred *cred); - - This releases a reference to the given set of credentials. If the - reference count reaches zero, the credentials will be scheduled for - destruction by the RCU system. - - (*) const struct cred *get_cred(const struct cred *cred); - - This gets a reference on a live set of credentials, returning a pointer to - that set of credentials. - - (*) struct cred *get_new_cred(struct cred *cred); - - This gets a reference on a set of credentials that is under construction - and is thus still mutable, returning a pointer to that set of credentials. - - -===================== -OPEN FILE CREDENTIALS -===================== - -When a new file is opened, a reference is obtained on the opening task's -credentials and this is attached to the file struct as 'f_cred' in place of -'f_uid' and 'f_gid'. Code that used to access file->f_uid and file->f_gid -should now access file->f_cred->fsuid and file->f_cred->fsgid. - -It is safe to access f_cred without the use of RCU or locking because the -pointer will not change over the lifetime of the file struct, and nor will the -contents of the cred struct pointed to, barring the exceptions listed above -(see the Task Credentials section). - - -======================================= -OVERRIDING THE VFS'S USE OF CREDENTIALS -======================================= - -Under some circumstances it is desirable to override the credentials used by -the VFS, and that can be done by calling into such as vfs_mkdir() with a -different set of credentials. This is done in the following places: - - (*) sys_faccessat(). - - (*) do_coredump(). - - (*) nfs4recover.c. diff --git a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt index b9b4192ea8b5..9c8fd6148656 100644 --- a/Documentation/filesystems/nfs/idmapper.txt +++ b/Documentation/filesystems/nfs/idmapper.txt @@ -47,8 +47,8 @@ request-key will find the first matching line and corresponding program. In this case, /some/other/program will handle all uid lookups and /usr/sbin/nfs.idmap will handle gid, user, and group lookups. -See for more information about the -request-key function. +See for more information +about the request-key function. ========= diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt deleted file mode 100644 index 69686ad12c66..000000000000 --- a/Documentation/keys-request-key.txt +++ /dev/null @@ -1,202 +0,0 @@ - =================== - KEY REQUEST SERVICE - =================== - -The key request service is part of the key retention service (refer to -Documentation/keys.txt). This document explains more fully how the requesting -algorithm works. - -The process starts by either the kernel requesting a service by calling -request_key*(): - - struct key *request_key(const struct key_type *type, - const char *description, - const char *callout_info); - -or: - - struct key *request_key_with_auxdata(const struct key_type *type, - const char *description, - const char *callout_info, - size_t callout_len, - void *aux); - -or: - - struct key *request_key_async(const struct key_type *type, - const char *description, - const char *callout_info, - size_t callout_len); - -or: - - struct key *request_key_async_with_auxdata(const struct key_type *type, - const char *description, - const char *callout_info, - size_t callout_len, - void *aux); - -Or by userspace invoking the request_key system call: - - key_serial_t request_key(const char *type, - const char *description, - const char *callout_info, - key_serial_t dest_keyring); - -The main difference between the access points is that the in-kernel interface -does not need to link the key to a keyring to prevent it from being immediately -destroyed. The kernel interface returns a pointer directly to the key, and -it's up to the caller to destroy the key. - -The request_key*_with_auxdata() calls are like the in-kernel request_key*() -calls, except that they permit auxiliary data to be passed to the upcaller (the -default is NULL). This is only useful for those key types that define their -own upcall mechanism rather than using /sbin/request-key. - -The two async in-kernel calls may return keys that are still in the process of -being constructed. The two non-async ones will wait for construction to -complete first. - -The userspace interface links the key to a keyring associated with the process -to prevent the key from going away, and returns the serial number of the key to -the caller. - - -The following example assumes that the key types involved don't define their -own upcall mechanisms. If they do, then those should be substituted for the -forking and execution of /sbin/request-key. - - -=========== -THE PROCESS -=========== - -A request proceeds in the following manner: - - (1) Process A calls request_key() [the userspace syscall calls the kernel - interface]. - - (2) request_key() searches the process's subscribed keyrings to see if there's - a suitable key there. If there is, it returns the key. If there isn't, - and callout_info is not set, an error is returned. Otherwise the process - proceeds to the next step. - - (3) request_key() sees that A doesn't have the desired key yet, so it creates - two things: - - (a) An uninstantiated key U of requested type and description. - - (b) An authorisation key V that refers to key U and notes that process A - is the context in which key U should be instantiated and secured, and - from which associated key requests may be satisfied. - - (4) request_key() then forks and executes /sbin/request-key with a new session - keyring that contains a link to auth key V. - - (5) /sbin/request-key assumes the authority associated with key U. - - (6) /sbin/request-key execs an appropriate program to perform the actual - instantiation. - - (7) The program may want to access another key from A's context (say a - Kerberos TGT key). It just requests the appropriate key, and the keyring - search notes that the session keyring has auth key V in its bottom level. - - This will permit it to then search the keyrings of process A with the - UID, GID, groups and security info of process A as if it was process A, - and come up with key W. - - (8) The program then does what it must to get the data with which to - instantiate key U, using key W as a reference (perhaps it contacts a - Kerberos server using the TGT) and then instantiates key U. - - (9) Upon instantiating key U, auth key V is automatically revoked so that it - may not be used again. - -(10) The program then exits 0 and request_key() deletes key V and returns key - U to the caller. - -This also extends further. If key W (step 7 above) didn't exist, key W would -be created uninstantiated, another auth key (X) would be created (as per step -3) and another copy of /sbin/request-key spawned (as per step 4); but the -context specified by auth key X will still be process A, as it was in auth key -V. - -This is because process A's keyrings can't simply be attached to -/sbin/request-key at the appropriate places because (a) execve will discard two -of them, and (b) it requires the same UID/GID/Groups all the way through. - - -==================================== -NEGATIVE INSTANTIATION AND REJECTION -==================================== - -Rather than instantiating a key, it is possible for the possessor of an -authorisation key to negatively instantiate a key that's under construction. -This is a short duration placeholder that causes any attempt at re-requesting -the key whilst it exists to fail with error ENOKEY if negated or the specified -error if rejected. - -This is provided to prevent excessive repeated spawning of /sbin/request-key -processes for a key that will never be obtainable. - -Should the /sbin/request-key process exit anything other than 0 or die on a -signal, the key under construction will be automatically negatively -instantiated for a short amount of time. - - -==================== -THE SEARCH ALGORITHM -==================== - -A search of any particular keyring proceeds in the following fashion: - - (1) When the key management code searches for a key (keyring_search_aux) it - firstly calls key_permission(SEARCH) on the keyring it's starting with, - if this denies permission, it doesn't search further. - - (2) It considers all the non-keyring keys within that keyring and, if any key - matches the criteria specified, calls key_permission(SEARCH) on it to see - if the key is allowed to be found. If it is, that key is returned; if - not, the search continues, and the error code is retained if of higher - priority than the one currently set. - - (3) It then considers all the keyring-type keys in the keyring it's currently - searching. It calls key_permission(SEARCH) on each keyring, and if this - grants permission, it recurses, executing steps (2) and (3) on that - keyring. - -The process stops immediately a valid key is found with permission granted to -use it. Any error from a previous match attempt is discarded and the key is -returned. - -When search_process_keyrings() is invoked, it performs the following searches -until one succeeds: - - (1) If extant, the process's thread keyring is searched. - - (2) If extant, the process's process keyring is searched. - - (3) The process's session keyring is searched. - - (4) If the process has assumed the authority associated with a request_key() - authorisation key then: - - (a) If extant, the calling process's thread keyring is searched. - - (b) If extant, the calling process's process keyring is searched. - - (c) The calling process's session keyring is searched. - -The moment one succeeds, all pending errors are discarded and the found key is -returned. - -Only if all these fail does the whole thing fail with the highest priority -error. Note that several errors may have come from LSM. - -The error priority is: - - EKEYREVOKED > EKEYEXPIRED > ENOKEY - -EACCES/EPERM are only returned on a direct search of a specific keyring where -the basal keyring does not grant Search permission. diff --git a/Documentation/keys-trusted-encrypted.txt b/Documentation/keys-trusted-encrypted.txt deleted file mode 100644 index 8fb79bc1ac4b..000000000000 --- a/Documentation/keys-trusted-encrypted.txt +++ /dev/null @@ -1,145 +0,0 @@ - Trusted and Encrypted Keys - -Trusted and Encrypted Keys are two new key types added to the existing kernel -key ring service. Both of these new types are variable length symmetic keys, -and in both cases all keys are created in the kernel, and user space sees, -stores, and loads only encrypted blobs. Trusted Keys require the availability -of a Trusted Platform Module (TPM) chip for greater security, while Encrypted -Keys can be used on any system. All user level blobs, are displayed and loaded -in hex ascii for convenience, and are integrity verified. - -Trusted Keys use a TPM both to generate and to seal the keys. Keys are sealed -under a 2048 bit RSA key in the TPM, and optionally sealed to specified PCR -(integrity measurement) values, and only unsealed by the TPM, if PCRs and blob -integrity verifications match. A loaded Trusted Key can be updated with new -(future) PCR values, so keys are easily migrated to new pcr values, such as -when the kernel and initramfs are updated. The same key can have many saved -blobs under different PCR values, so multiple boots are easily supported. - -By default, trusted keys are sealed under the SRK, which has the default -authorization value (20 zeros). This can be set at takeownership time with the -trouser's utility: "tpm_takeownership -u -z". - -Usage: - keyctl add trusted name "new keylen [options]" ring - keyctl add trusted name "load hex_blob [pcrlock=pcrnum]" ring - keyctl update key "update [options]" - keyctl print keyid - - options: - keyhandle= ascii hex value of sealing key default 0x40000000 (SRK) - keyauth= ascii hex auth for sealing key default 0x00...i - (40 ascii zeros) - blobauth= ascii hex auth for sealed data default 0x00... - (40 ascii zeros) - blobauth= ascii hex auth for sealed data default 0x00... - (40 ascii zeros) - pcrinfo= ascii hex of PCR_INFO or PCR_INFO_LONG (no default) - pcrlock= pcr number to be extended to "lock" blob - migratable= 0|1 indicating permission to reseal to new PCR values, - default 1 (resealing allowed) - -"keyctl print" returns an ascii hex copy of the sealed key, which is in standard -TPM_STORED_DATA format. The key length for new keys are always in bytes. -Trusted Keys can be 32 - 128 bytes (256 - 1024 bits), the upper limit is to fit -within the 2048 bit SRK (RSA) keylength, with all necessary structure/padding. - -Encrypted keys do not depend on a TPM, and are faster, as they use AES for -encryption/decryption. New keys are created from kernel generated random -numbers, and are encrypted/decrypted using a specified 'master' key. The -'master' key can either be a trusted-key or user-key type. The main -disadvantage of encrypted keys is that if they are not rooted in a trusted key, -they are only as secure as the user key encrypting them. The master user key -should therefore be loaded in as secure a way as possible, preferably early in -boot. - -Usage: - keyctl add encrypted name "new key-type:master-key-name keylen" ring - keyctl add encrypted name "load hex_blob" ring - keyctl update keyid "update key-type:master-key-name" - -where 'key-type' is either 'trusted' or 'user'. - -Examples of trusted and encrypted key usage: - -Create and save a trusted key named "kmk" of length 32 bytes: - - $ keyctl add trusted kmk "new 32" @u - 440502848 - - $ keyctl show - Session Keyring - -3 --alswrv 500 500 keyring: _ses - 97833714 --alswrv 500 -1 \_ keyring: _uid.500 - 440502848 --alswrv 500 500 \_ trusted: kmk - - $ keyctl print 440502848 - 0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915 - 3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b - 27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722 - a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec - d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d - dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0 - f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b - e4a8aea2b607ec96931e6f4d4fe563ba - - $ keyctl pipe 440502848 > kmk.blob - -Load a trusted key from the saved blob: - - $ keyctl add trusted kmk "load `cat kmk.blob`" @u - 268728824 - - $ keyctl print 268728824 - 0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915 - 3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b - 27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722 - a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec - d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d - dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0 - f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b - e4a8aea2b607ec96931e6f4d4fe563ba - -Reseal a trusted key under new pcr values: - - $ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`" - $ keyctl print 268728824 - 010100000000002c0002800093c35a09b70fff26e7a98ae786c641e678ec6ffb6b46d805 - 77c8a6377aed9d3219c6dfec4b23ffe3000001005d37d472ac8a44023fbb3d18583a4f73 - d3a076c0858f6f1dcaa39ea0f119911ff03f5406df4f7f27f41da8d7194f45c9f4e00f2e - df449f266253aa3f52e55c53de147773e00f0f9aca86c64d94c95382265968c354c5eab4 - 9638c5ae99c89de1e0997242edfb0b501744e11ff9762dfd951cffd93227cc513384e7e6 - e782c29435c7ec2edafaa2f4c1fe6e7a781b59549ff5296371b42133777dcc5b8b971610 - 94bc67ede19e43ddb9dc2baacad374a36feaf0314d700af0a65c164b7082401740e489c9 - 7ef6a24defe4846104209bf0c3eced7fa1a672ed5b125fc9d8cd88b476a658a4434644ef - df8ae9a178e9f83ba9f08d10fa47e4226b98b0702f06b3b8 - -Create and save an encrypted key "evm" using the above trusted key "kmk": - - $ keyctl add encrypted evm "new trusted:kmk 32" @u - 159771175 - - $ keyctl print 159771175 - trusted:kmk 32 2375725ad57798846a9bbd240de8906f006e66c03af53b1b382dbbc55 - be2a44616e4959430436dc4f2a7a9659aa60bb4652aeb2120f149ed197c564e024717c64 - 5972dcb82ab2dde83376d82b2e3c09ffc - - $ keyctl pipe 159771175 > evm.blob - -Load an encrypted key "evm" from saved blob: - - $ keyctl add encrypted evm "load `cat evm.blob`" @u - 831684262 - - $ keyctl print 831684262 - trusted:kmk 32 2375725ad57798846a9bbd240de8906f006e66c03af53b1b382dbbc55 - be2a44616e4959430436dc4f2a7a9659aa60bb4652aeb2120f149ed197c564e024717c64 - 5972dcb82ab2dde83376d82b2e3c09ffc - - -The initial consumer of trusted keys is EVM, which at boot time needs a high -quality symmetric key for HMAC protection of file metadata. The use of a -trusted key provides strong guarantees that the EVM key has not been -compromised by a user level problem, and when sealed to specific boot PCR -values, protects against boot and offline attacks. Other uses for trusted and -encrypted keys, such as for disk and file encryption are anticipated. diff --git a/Documentation/keys.txt b/Documentation/keys.txt deleted file mode 100644 index 6523a9e6f293..000000000000 --- a/Documentation/keys.txt +++ /dev/null @@ -1,1290 +0,0 @@ - ============================ - KERNEL KEY RETENTION SERVICE - ============================ - -This service allows cryptographic keys, authentication tokens, cross-domain -user mappings, and similar to be cached in the kernel for the use of -filesystems and other kernel services. - -Keyrings are permitted; these are a special type of key that can hold links to -other keys. Processes each have three standard keyring subscriptions that a -kernel service can search for relevant keys. - -The key service can be configured on by enabling: - - "Security options"/"Enable access key retention support" (CONFIG_KEYS) - -This document has the following sections: - - - Key overview - - Key service overview - - Key access permissions - - SELinux support - - New procfs files - - Userspace system call interface - - Kernel services - - Notes on accessing payload contents - - Defining a key type - - Request-key callback service - - Garbage collection - - -============ -KEY OVERVIEW -============ - -In this context, keys represent units of cryptographic data, authentication -tokens, keyrings, etc.. These are represented in the kernel by struct key. - -Each key has a number of attributes: - - - A serial number. - - A type. - - A description (for matching a key in a search). - - Access control information. - - An expiry time. - - A payload. - - State. - - - (*) Each key is issued a serial number of type key_serial_t that is unique for - the lifetime of that key. All serial numbers are positive non-zero 32-bit - integers. - - Userspace programs can use a key's serial numbers as a way to gain access - to it, subject to permission checking. - - (*) Each key is of a defined "type". Types must be registered inside the - kernel by a kernel service (such as a filesystem) before keys of that type - can be added or used. Userspace programs cannot define new types directly. - - Key types are represented in the kernel by struct key_type. This defines a - number of operations that can be performed on a key of that type. - - Should a type be removed from the system, all the keys of that type will - be invalidated. - - (*) Each key has a description. This should be a printable string. The key - type provides an operation to perform a match between the description on a - key and a criterion string. - - (*) Each key has an owner user ID, a group ID and a permissions mask. These - are used to control what a process may do to a key from userspace, and - whether a kernel service will be able to find the key. - - (*) Each key can be set to expire at a specific time by the key type's - instantiation function. Keys can also be immortal. - - (*) Each key can have a payload. This is a quantity of data that represent the - actual "key". In the case of a keyring, this is a list of keys to which - the keyring links; in the case of a user-defined key, it's an arbitrary - blob of data. - - Having a payload is not required; and the payload can, in fact, just be a - value stored in the struct key itself. - - When a key is instantiated, the key type's instantiation function is - called with a blob of data, and that then creates the key's payload in - some way. - - Similarly, when userspace wants to read back the contents of the key, if - permitted, another key type operation will be called to convert the key's - attached payload back into a blob of data. - - (*) Each key can be in one of a number of basic states: - - (*) Uninstantiated. The key exists, but does not have any data attached. - Keys being requested from userspace will be in this state. - - (*) Instantiated. This is the normal state. The key is fully formed, and - has data attached. - - (*) Negative. This is a relatively short-lived state. The key acts as a - note saying that a previous call out to userspace failed, and acts as - a throttle on key lookups. A negative key can be updated to a normal - state. - - (*) Expired. Keys can have lifetimes set. If their lifetime is exceeded, - they traverse to this state. An expired key can be updated back to a - normal state. - - (*) Revoked. A key is put in this state by userspace action. It can't be - found or operated upon (apart from by unlinking it). - - (*) Dead. The key's type was unregistered, and so the key is now useless. - -Keys in the last three states are subject to garbage collection. See the -section on "Garbage collection". - - -==================== -KEY SERVICE OVERVIEW -==================== - -The key service provides a number of features besides keys: - - (*) The key service defines two special key types: - - (+) "keyring" - - Keyrings are special keys that contain a list of other keys. Keyring - lists can be modified using various system calls. Keyrings should not - be given a payload when created. - - (+) "user" - - A key of this type has a description and a payload that are arbitrary - blobs of data. These can be created, updated and read by userspace, - and aren't intended for use by kernel services. - - (*) Each process subscribes to three keyrings: a thread-specific keyring, a - process-specific keyring, and a session-specific keyring. - - The thread-specific keyring is discarded from the child when any sort of - clone, fork, vfork or execve occurs. A new keyring is created only when - required. - - The process-specific keyring is replaced with an empty one in the child on - clone, fork, vfork unless CLONE_THREAD is supplied, in which case it is - shared. execve also discards the process's process keyring and creates a - new one. - - The session-specific keyring is persistent across clone, fork, vfork and - execve, even when the latter executes a set-UID or set-GID binary. A - process can, however, replace its current session keyring with a new one - by using PR_JOIN_SESSION_KEYRING. It is permitted to request an anonymous - new one, or to attempt to create or join one of a specific name. - - The ownership of the thread keyring changes when the real UID and GID of - the thread changes. - - (*) Each user ID resident in the system holds two special keyrings: a user - specific keyring and a default user session keyring. The default session - keyring is initialised with a link to the user-specific keyring. - - When a process changes its real UID, if it used to have no session key, it - will be subscribed to the default session key for the new UID. - - If a process attempts to access its session key when it doesn't have one, - it will be subscribed to the default for its current UID. - - (*) Each user has two quotas against which the keys they own are tracked. One - limits the total number of keys and keyrings, the other limits the total - amount of description and payload space that can be consumed. - - The user can view information on this and other statistics through procfs - files. The root user may also alter the quota limits through sysctl files - (see the section "New procfs files"). - - Process-specific and thread-specific keyrings are not counted towards a - user's quota. - - If a system call that modifies a key or keyring in some way would put the - user over quota, the operation is refused and error EDQUOT is returned. - - (*) There's a system call interface by which userspace programs can create and - manipulate keys and keyrings. - - (*) There's a kernel interface by which services can register types and search - for keys. - - (*) There's a way for the a search done from the kernel to call back to - userspace to request a key that can't be found in a process's keyrings. - - (*) An optional filesystem is available through which the key database can be - viewed and manipulated. - - -====================== -KEY ACCESS PERMISSIONS -====================== - -Keys have an owner user ID, a group access ID, and a permissions mask. The mask -has up to eight bits each for possessor, user, group and other access. Only -six of each set of eight bits are defined. These permissions granted are: - - (*) View - - This permits a key or keyring's attributes to be viewed - including key - type and description. - - (*) Read - - This permits a key's payload to be viewed or a keyring's list of linked - keys. - - (*) Write - - This permits a key's payload to be instantiated or updated, or it allows a - link to be added to or removed from a keyring. - - (*) Search - - This permits keyrings to be searched and keys to be found. Searches can - only recurse into nested keyrings that have search permission set. - - (*) Link - - This permits a key or keyring to be linked to. To create a link from a - keyring to a key, a process must have Write permission on the keyring and - Link permission on the key. - - (*) Set Attribute - - This permits a key's UID, GID and permissions mask to be changed. - -For changing the ownership, group ID or permissions mask, being the owner of -the key or having the sysadmin capability is sufficient. - - -=============== -SELINUX SUPPORT -=============== - -The security class "key" has been added to SELinux so that mandatory access -controls can be applied to keys created within various contexts. This support -is preliminary, and is likely to change quite significantly in the near future. -Currently, all of the basic permissions explained above are provided in SELinux -as well; SELinux is simply invoked after all basic permission checks have been -performed. - -The value of the file /proc/self/attr/keycreate influences the labeling of -newly-created keys. If the contents of that file correspond to an SELinux -security context, then the key will be assigned that context. Otherwise, the -key will be assigned the current context of the task that invoked the key -creation request. Tasks must be granted explicit permission to assign a -particular context to newly-created keys, using the "create" permission in the -key security class. - -The default keyrings associated with users will be labeled with the default -context of the user if and only if the login programs have been instrumented to -properly initialize keycreate during the login process. Otherwise, they will -be labeled with the context of the login program itself. - -Note, however, that the default keyrings associated with the root user are -labeled with the default kernel context, since they are created early in the -boot process, before root has a chance to log in. - -The keyrings associated with new threads are each labeled with the context of -their associated thread, and both session and process keyrings are handled -similarly. - - -================ -NEW PROCFS FILES -================ - -Two files have been added to procfs by which an administrator can find out -about the status of the key service: - - (*) /proc/keys - - This lists the keys that are currently viewable by the task reading the - file, giving information about their type, description and permissions. - It is not possible to view the payload of the key this way, though some - information about it may be given. - - The only keys included in the list are those that grant View permission to - the reading process whether or not it possesses them. Note that LSM - security checks are still performed, and may further filter out keys that - the current process is not authorised to view. - - The contents of the file look like this: - - SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY - 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 - 00000002 I----- 2 perm 1f3f0000 0 0 keyring _uid.0: empty - 00000007 I----- 1 perm 1f3f0000 0 0 keyring _pid.1: empty - 0000018d I----- 1 perm 1f3f0000 0 0 keyring _pid.412: empty - 000004d2 I--Q-- 1 perm 1f3f0000 32 -1 keyring _uid.32: 1/4 - 000004d3 I--Q-- 3 perm 1f3f0000 32 -1 keyring _uid_ses.32: empty - 00000892 I--QU- 1 perm 1f000000 0 0 user metal:copper: 0 - 00000893 I--Q-N 1 35s 1f3f0000 0 0 user metal:silver: 0 - 00000894 I--Q-- 1 10h 003f0000 0 0 user metal:gold: 0 - - The flags are: - - I Instantiated - R Revoked - D Dead - Q Contributes to user's quota - U Under construction by callback to userspace - N Negative key - - This file must be enabled at kernel configuration time as it allows anyone - to list the keys database. - - (*) /proc/key-users - - This file lists the tracking data for each user that has at least one key - on the system. Such data includes quota information and statistics: - - [root@andromeda root]# cat /proc/key-users - 0: 46 45/45 1/100 13/10000 - 29: 2 2/2 2/100 40/10000 - 32: 2 2/2 2/100 40/10000 - 38: 2 2/2 2/100 40/10000 - - The format of each line is - : User ID to which this applies - Structure refcount - / Total number of keys and number instantiated - / Key count quota - / Key size quota - - -Four new sysctl files have been added also for the purpose of controlling the -quota limits on keys: - - (*) /proc/sys/kernel/keys/root_maxkeys - /proc/sys/kernel/keys/root_maxbytes - - These files hold the maximum number of keys that root may have and the - maximum total number of bytes of data that root may have stored in those - keys. - - (*) /proc/sys/kernel/keys/maxkeys - /proc/sys/kernel/keys/maxbytes - - These files hold the maximum number of keys that each non-root user may - have and the maximum total number of bytes of data that each of those - users may have stored in their keys. - -Root may alter these by writing each new limit as a decimal number string to -the appropriate file. - - -=============================== -USERSPACE SYSTEM CALL INTERFACE -=============================== - -Userspace can manipulate keys directly through three new syscalls: add_key, -request_key and keyctl. The latter provides a number of functions for -manipulating keys. - -When referring to a key directly, userspace programs should use the key's -serial number (a positive 32-bit integer). However, there are some special -values available for referring to special keys and keyrings that relate to the -process making the call: - - CONSTANT VALUE KEY REFERENCED - ============================== ====== =========================== - KEY_SPEC_THREAD_KEYRING -1 thread-specific keyring - KEY_SPEC_PROCESS_KEYRING -2 process-specific keyring - KEY_SPEC_SESSION_KEYRING -3 session-specific keyring - KEY_SPEC_USER_KEYRING -4 UID-specific keyring - KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring - KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring - KEY_SPEC_REQKEY_AUTH_KEY -7 assumed request_key() - authorisation key - - -The main syscalls are: - - (*) Create a new key of given type, description and payload and add it to the - nominated keyring: - - key_serial_t add_key(const char *type, const char *desc, - const void *payload, size_t plen, - key_serial_t keyring); - - If a key of the same type and description as that proposed already exists - in the keyring, this will try to update it with the given payload, or it - will return error EEXIST if that function is not supported by the key - type. The process must also have permission to write to the key to be able - to update it. The new key will have all user permissions granted and no - group or third party permissions. - - Otherwise, this will attempt to create a new key of the specified type and - description, and to instantiate it with the supplied payload and attach it - to the keyring. In this case, an error will be generated if the process - does not have permission to write to the keyring. - - The payload is optional, and the pointer can be NULL if not required by - the type. The payload is plen in size, and plen can be zero for an empty - payload. - - A new keyring can be generated by setting type "keyring", the keyring name - as the description (or NULL) and setting the payload to NULL. - - User defined keys can be created by specifying type "user". It is - recommended that a user defined key's description by prefixed with a type - ID and a colon, such as "krb5tgt:" for a Kerberos 5 ticket granting - ticket. - - Any other type must have been registered with the kernel in advance by a - kernel service such as a filesystem. - - The ID of the new or updated key is returned if successful. - - - (*) Search the process's keyrings for a key, potentially calling out to - userspace to create it. - - key_serial_t request_key(const char *type, const char *description, - const char *callout_info, - key_serial_t dest_keyring); - - This function searches all the process's keyrings in the order thread, - process, session for a matching key. This works very much like - KEYCTL_SEARCH, including the optional attachment of the discovered key to - a keyring. - - If a key cannot be found, and if callout_info is not NULL, then - /sbin/request-key will be invoked in an attempt to obtain a key. The - callout_info string will be passed as an argument to the program. - - See also Documentation/keys-request-key.txt. - - -The keyctl syscall functions are: - - (*) Map a special key ID to a real key ID for this process: - - key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id, - int create); - - The special key specified by "id" is looked up (with the key being created - if necessary) and the ID of the key or keyring thus found is returned if - it exists. - - If the key does not yet exist, the key will be created if "create" is - non-zero; and the error ENOKEY will be returned if "create" is zero. - - - (*) Replace the session keyring this process subscribes to with a new one: - - key_serial_t keyctl(KEYCTL_JOIN_SESSION_KEYRING, const char *name); - - If name is NULL, an anonymous keyring is created attached to the process - as its session keyring, displacing the old session keyring. - - If name is not NULL, if a keyring of that name exists, the process - attempts to attach it as the session keyring, returning an error if that - is not permitted; otherwise a new keyring of that name is created and - attached as the session keyring. - - To attach to a named keyring, the keyring must have search permission for - the process's ownership. - - The ID of the new session keyring is returned if successful. - - - (*) Update the specified key: - - long keyctl(KEYCTL_UPDATE, key_serial_t key, const void *payload, - size_t plen); - - This will try to update the specified key with the given payload, or it - will return error EOPNOTSUPP if that function is not supported by the key - type. The process must also have permission to write to the key to be able - to update it. - - The payload is of length plen, and may be absent or empty as for - add_key(). - - - (*) Revoke a key: - - long keyctl(KEYCTL_REVOKE, key_serial_t key); - - This makes a key unavailable for further operations. Further attempts to - use the key will be met with error EKEYREVOKED, and the key will no longer - be findable. - - - (*) Change the ownership of a key: - - long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid); - - This function permits a key's owner and group ID to be changed. Either one - of uid or gid can be set to -1 to suppress that change. - - Only the superuser can change a key's owner to something other than the - key's current owner. Similarly, only the superuser can change a key's - group ID to something other than the calling process's group ID or one of - its group list members. - - - (*) Change the permissions mask on a key: - - long keyctl(KEYCTL_SETPERM, key_serial_t key, key_perm_t perm); - - This function permits the owner of a key or the superuser to change the - permissions mask on a key. - - Only bits the available bits are permitted; if any other bits are set, - error EINVAL will be returned. - - - (*) Describe a key: - - long keyctl(KEYCTL_DESCRIBE, key_serial_t key, char *buffer, - size_t buflen); - - This function returns a summary of the key's attributes (but not its - payload data) as a string in the buffer provided. - - Unless there's an error, it always returns the amount of data it could - produce, even if that's too big for the buffer, but it won't copy more - than requested to userspace. If the buffer pointer is NULL then no copy - will take place. - - A process must have view permission on the key for this function to be - successful. - - If successful, a string is placed in the buffer in the following format: - - ;;;; - - Where type and description are strings, uid and gid are decimal, and perm - is hexadecimal. A NUL character is included at the end of the string if - the buffer is sufficiently big. - - This can be parsed with - - sscanf(buffer, "%[^;];%d;%d;%o;%s", type, &uid, &gid, &mode, desc); - - - (*) Clear out a keyring: - - long keyctl(KEYCTL_CLEAR, key_serial_t keyring); - - This function clears the list of keys attached to a keyring. The calling - process must have write permission on the keyring, and it must be a - keyring (or else error ENOTDIR will result). - - - (*) Link a key into a keyring: - - long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key); - - This function creates a link from the keyring to the key. The process must - have write permission on the keyring and must have link permission on the - key. - - Should the keyring not be a keyring, error ENOTDIR will result; and if the - keyring is full, error ENFILE will result. - - The link procedure checks the nesting of the keyrings, returning ELOOP if - it appears too deep or EDEADLK if the link would introduce a cycle. - - Any links within the keyring to keys that match the new key in terms of - type and description will be discarded from the keyring as the new one is - added. - - - (*) Unlink a key or keyring from another keyring: - - long keyctl(KEYCTL_UNLINK, key_serial_t keyring, key_serial_t key); - - This function looks through the keyring for the first link to the - specified key, and removes it if found. Subsequent links to that key are - ignored. The process must have write permission on the keyring. - - If the keyring is not a keyring, error ENOTDIR will result; and if the key - is not present, error ENOENT will be the result. - - - (*) Search a keyring tree for a key: - - key_serial_t keyctl(KEYCTL_SEARCH, key_serial_t keyring, - const char *type, const char *description, - key_serial_t dest_keyring); - - This searches the keyring tree headed by the specified keyring until a key - is found that matches the type and description criteria. Each keyring is - checked for keys before recursion into its children occurs. - - The process must have search permission on the top level keyring, or else - error EACCES will result. Only keyrings that the process has search - permission on will be recursed into, and only keys and keyrings for which - a process has search permission can be matched. If the specified keyring - is not a keyring, ENOTDIR will result. - - If the search succeeds, the function will attempt to link the found key - into the destination keyring if one is supplied (non-zero ID). All the - constraints applicable to KEYCTL_LINK apply in this case too. - - Error ENOKEY, EKEYREVOKED or EKEYEXPIRED will be returned if the search - fails. On success, the resulting key ID will be returned. - - - (*) Read the payload data from a key: - - long keyctl(KEYCTL_READ, key_serial_t keyring, char *buffer, - size_t buflen); - - This function attempts to read the payload data from the specified key - into the buffer. The process must have read permission on the key to - succeed. - - The returned data will be processed for presentation by the key type. For - instance, a keyring will return an array of key_serial_t entries - representing the IDs of all the keys to which it is subscribed. The user - defined key type will return its data as is. If a key type does not - implement this function, error EOPNOTSUPP will result. - - As much of the data as can be fitted into the buffer will be copied to - userspace if the buffer pointer is not NULL. - - On a successful return, the function will always return the amount of data - available rather than the amount copied. - - - (*) Instantiate a partially constructed key. - - long keyctl(KEYCTL_INSTANTIATE, key_serial_t key, - const void *payload, size_t plen, - key_serial_t keyring); - long keyctl(KEYCTL_INSTANTIATE_IOV, key_serial_t key, - const struct iovec *payload_iov, unsigned ioc, - key_serial_t keyring); - - If the kernel calls back to userspace to complete the instantiation of a - key, userspace should use this call to supply data for the key before the - invoked process returns, or else the key will be marked negative - automatically. - - The process must have write access on the key to be able to instantiate - it, and the key must be uninstantiated. - - If a keyring is specified (non-zero), the key will also be linked into - that keyring, however all the constraints applying in KEYCTL_LINK apply in - this case too. - - The payload and plen arguments describe the payload data as for add_key(). - - The payload_iov and ioc arguments describe the payload data in an iovec - array instead of a single buffer. - - - (*) Negatively instantiate a partially constructed key. - - long keyctl(KEYCTL_NEGATE, key_serial_t key, - unsigned timeout, key_serial_t keyring); - long keyctl(KEYCTL_REJECT, key_serial_t key, - unsigned timeout, unsigned error, key_serial_t keyring); - - If the kernel calls back to userspace to complete the instantiation of a - key, userspace should use this call mark the key as negative before the - invoked process returns if it is unable to fulfil the request. - - The process must have write access on the key to be able to instantiate - it, and the key must be uninstantiated. - - If a keyring is specified (non-zero), the key will also be linked into - that keyring, however all the constraints applying in KEYCTL_LINK apply in - this case too. - - If the key is rejected, future searches for it will return the specified - error code until the rejected key expires. Negating the key is the same - as rejecting the key with ENOKEY as the error code. - - - (*) Set the default request-key destination keyring. - - long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl); - - This sets the default keyring to which implicitly requested keys will be - attached for this thread. reqkey_defl should be one of these constants: - - CONSTANT VALUE NEW DEFAULT KEYRING - ====================================== ====== ======================= - KEY_REQKEY_DEFL_NO_CHANGE -1 No change - KEY_REQKEY_DEFL_DEFAULT 0 Default[1] - KEY_REQKEY_DEFL_THREAD_KEYRING 1 Thread keyring - KEY_REQKEY_DEFL_PROCESS_KEYRING 2 Process keyring - KEY_REQKEY_DEFL_SESSION_KEYRING 3 Session keyring - KEY_REQKEY_DEFL_USER_KEYRING 4 User keyring - KEY_REQKEY_DEFL_USER_SESSION_KEYRING 5 User session keyring - KEY_REQKEY_DEFL_GROUP_KEYRING 6 Group keyring - - The old default will be returned if successful and error EINVAL will be - returned if reqkey_defl is not one of the above values. - - The default keyring can be overridden by the keyring indicated to the - request_key() system call. - - Note that this setting is inherited across fork/exec. - - [1] The default is: the thread keyring if there is one, otherwise - the process keyring if there is one, otherwise the session keyring if - there is one, otherwise the user default session keyring. - - - (*) Set the timeout on a key. - - long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout); - - This sets or clears the timeout on a key. The timeout can be 0 to clear - the timeout or a number of seconds to set the expiry time that far into - the future. - - The process must have attribute modification access on a key to set its - timeout. Timeouts may not be set with this function on negative, revoked - or expired keys. - - - (*) Assume the authority granted to instantiate a key - - long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key); - - This assumes or divests the authority required to instantiate the - specified key. Authority can only be assumed if the thread has the - authorisation key associated with the specified key in its keyrings - somewhere. - - Once authority is assumed, searches for keys will also search the - requester's keyrings using the requester's security label, UID, GID and - groups. - - If the requested authority is unavailable, error EPERM will be returned, - likewise if the authority has been revoked because the target key is - already instantiated. - - If the specified key is 0, then any assumed authority will be divested. - - The assumed authoritative key is inherited across fork and exec. - - - (*) Get the LSM security context attached to a key. - - long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer, - size_t buflen) - - This function returns a string that represents the LSM security context - attached to a key in the buffer provided. - - Unless there's an error, it always returns the amount of data it could - produce, even if that's too big for the buffer, but it won't copy more - than requested to userspace. If the buffer pointer is NULL then no copy - will take place. - - A NUL character is included at the end of the string if the buffer is - sufficiently big. This is included in the returned count. If no LSM is - in force then an empty string will be returned. - - A process must have view permission on the key for this function to be - successful. - - - (*) Install the calling process's session keyring on its parent. - - long keyctl(KEYCTL_SESSION_TO_PARENT); - - This functions attempts to install the calling process's session keyring - on to the calling process's parent, replacing the parent's current session - keyring. - - The calling process must have the same ownership as its parent, the - keyring must have the same ownership as the calling process, the calling - process must have LINK permission on the keyring and the active LSM module - mustn't deny permission, otherwise error EPERM will be returned. - - Error ENOMEM will be returned if there was insufficient memory to complete - the operation, otherwise 0 will be returned to indicate success. - - The keyring will be replaced next time the parent process leaves the - kernel and resumes executing userspace. - - -=============== -KERNEL SERVICES -=============== - -The kernel services for key management are fairly simple to deal with. They can -be broken down into two areas: keys and key types. - -Dealing with keys is fairly straightforward. Firstly, the kernel service -registers its type, then it searches for a key of that type. It should retain -the key as long as it has need of it, and then it should release it. For a -filesystem or device file, a search would probably be performed during the open -call, and the key released upon close. How to deal with conflicting keys due to -two different users opening the same file is left to the filesystem author to -solve. - -To access the key manager, the following header must be #included: - - - -Specific key types should have a header file under include/keys/ that should be -used to access that type. For keys of type "user", for example, that would be: - - - -Note that there are two different types of pointers to keys that may be -encountered: - - (*) struct key * - - This simply points to the key structure itself. Key structures will be at - least four-byte aligned. - - (*) key_ref_t - - This is equivalent to a struct key *, but the least significant bit is set - if the caller "possesses" the key. By "possession" it is meant that the - calling processes has a searchable link to the key from one of its - keyrings. There are three functions for dealing with these: - - key_ref_t make_key_ref(const struct key *key, - unsigned long possession); - - struct key *key_ref_to_ptr(const key_ref_t key_ref); - - unsigned long is_key_possessed(const key_ref_t key_ref); - - The first function constructs a key reference from a key pointer and - possession information (which must be 0 or 1 and not any other value). - - The second function retrieves the key pointer from a reference and the - third retrieves the possession flag. - -When accessing a key's payload contents, certain precautions must be taken to -prevent access vs modification races. See the section "Notes on accessing -payload contents" for more information. - -(*) To search for a key, call: - - struct key *request_key(const struct key_type *type, - const char *description, - const char *callout_info); - - This is used to request a key or keyring with a description that matches - the description specified according to the key type's match function. This - permits approximate matching to occur. If callout_string is not NULL, then - /sbin/request-key will be invoked in an attempt to obtain the key from - userspace. In that case, callout_string will be passed as an argument to - the program. - - Should the function fail error ENOKEY, EKEYEXPIRED or EKEYREVOKED will be - returned. - - If successful, the key will have been attached to the default keyring for - implicitly obtained request-key keys, as set by KEYCTL_SET_REQKEY_KEYRING. - - See also Documentation/keys-request-key.txt. - - -(*) To search for a key, passing auxiliary data to the upcaller, call: - - struct key *request_key_with_auxdata(const struct key_type *type, - const char *description, - const void *callout_info, - size_t callout_len, - void *aux); - - This is identical to request_key(), except that the auxiliary data is - passed to the key_type->request_key() op if it exists, and the callout_info - is a blob of length callout_len, if given (the length may be 0). - - -(*) A key can be requested asynchronously by calling one of: - - struct key *request_key_async(const struct key_type *type, - const char *description, - const void *callout_info, - size_t callout_len); - - or: - - struct key *request_key_async_with_auxdata(const struct key_type *type, - const char *description, - const char *callout_info, - size_t callout_len, - void *aux); - - which are asynchronous equivalents of request_key() and - request_key_with_auxdata() respectively. - - These two functions return with the key potentially still under - construction. To wait for construction completion, the following should be - called: - - int wait_for_key_construction(struct key *key, bool intr); - - The function will wait for the key to finish being constructed and then - invokes key_validate() to return an appropriate value to indicate the state - of the key (0 indicates the key is usable). - - If intr is true, then the wait can be interrupted by a signal, in which - case error ERESTARTSYS will be returned. - - -(*) When it is no longer required, the key should be released using: - - void key_put(struct key *key); - - Or: - - void key_ref_put(key_ref_t key_ref); - - These can be called from interrupt context. If CONFIG_KEYS is not set then - the argument will not be parsed. - - -(*) Extra references can be made to a key by calling the following function: - - struct key *key_get(struct key *key); - - These need to be disposed of by calling key_put() when they've been - finished with. The key pointer passed in will be returned. If the pointer - is NULL or CONFIG_KEYS is not set then the key will not be dereferenced and - no increment will take place. - - -(*) A key's serial number can be obtained by calling: - - key_serial_t key_serial(struct key *key); - - If key is NULL or if CONFIG_KEYS is not set then 0 will be returned (in the - latter case without parsing the argument). - - -(*) If a keyring was found in the search, this can be further searched by: - - key_ref_t keyring_search(key_ref_t keyring_ref, - const struct key_type *type, - const char *description) - - This searches the keyring tree specified for a matching key. Error ENOKEY - is returned upon failure (use IS_ERR/PTR_ERR to determine). If successful, - the returned key will need to be released. - - The possession attribute from the keyring reference is used to control - access through the permissions mask and is propagated to the returned key - reference pointer if successful. - - -(*) To check the validity of a key, this function can be called: - - int validate_key(struct key *key); - - This checks that the key in question hasn't expired or and hasn't been - revoked. Should the key be invalid, error EKEYEXPIRED or EKEYREVOKED will - be returned. If the key is NULL or if CONFIG_KEYS is not set then 0 will be - returned (in the latter case without parsing the argument). - - -(*) To register a key type, the following function should be called: - - int register_key_type(struct key_type *type); - - This will return error EEXIST if a type of the same name is already - present. - - -(*) To unregister a key type, call: - - void unregister_key_type(struct key_type *type); - - -Under some circumstances, it may be desirable to deal with a bundle of keys. -The facility provides access to the keyring type for managing such a bundle: - - struct key_type key_type_keyring; - -This can be used with a function such as request_key() to find a specific -keyring in a process's keyrings. A keyring thus found can then be searched -with keyring_search(). Note that it is not possible to use request_key() to -search a specific keyring, so using keyrings in this way is of limited utility. - - -=================================== -NOTES ON ACCESSING PAYLOAD CONTENTS -=================================== - -The simplest payload is just a number in key->payload.value. In this case, -there's no need to indulge in RCU or locking when accessing the payload. - -More complex payload contents must be allocated and a pointer to them set in -key->payload.data. One of the following ways must be selected to access the -data: - - (1) Unmodifiable key type. - - If the key type does not have a modify method, then the key's payload can - be accessed without any form of locking, provided that it's known to be - instantiated (uninstantiated keys cannot be "found"). - - (2) The key's semaphore. - - The semaphore could be used to govern access to the payload and to control - the payload pointer. It must be write-locked for modifications and would - have to be read-locked for general access. The disadvantage of doing this - is that the accessor may be required to sleep. - - (3) RCU. - - RCU must be used when the semaphore isn't already held; if the semaphore - is held then the contents can't change under you unexpectedly as the - semaphore must still be used to serialise modifications to the key. The - key management code takes care of this for the key type. - - However, this means using: - - rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock() - - to read the pointer, and: - - rcu_dereference() ... rcu_assign_pointer() ... call_rcu() - - to set the pointer and dispose of the old contents after a grace period. - Note that only the key type should ever modify a key's payload. - - Furthermore, an RCU controlled payload must hold a struct rcu_head for the - use of call_rcu() and, if the payload is of variable size, the length of - the payload. key->datalen cannot be relied upon to be consistent with the - payload just dereferenced if the key's semaphore is not held. - - -=================== -DEFINING A KEY TYPE -=================== - -A kernel service may want to define its own key type. For instance, an AFS -filesystem might want to define a Kerberos 5 ticket key type. To do this, it -author fills in a key_type struct and registers it with the system. - -Source files that implement key types should include the following header file: - - - -The structure has a number of fields, some of which are mandatory: - - (*) const char *name - - The name of the key type. This is used to translate a key type name - supplied by userspace into a pointer to the structure. - - - (*) size_t def_datalen - - This is optional - it supplies the default payload data length as - contributed to the quota. If the key type's payload is always or almost - always the same size, then this is a more efficient way to do things. - - The data length (and quota) on a particular key can always be changed - during instantiation or update by calling: - - int key_payload_reserve(struct key *key, size_t datalen); - - With the revised data length. Error EDQUOT will be returned if this is not - viable. - - - (*) int (*vet_description)(const char *description); - - This optional method is called to vet a key description. If the key type - doesn't approve of the key description, it may return an error, otherwise - it should return 0. - - - (*) int (*instantiate)(struct key *key, const void *data, size_t datalen); - - This method is called to attach a payload to a key during construction. - The payload attached need not bear any relation to the data passed to this - function. - - If the amount of data attached to the key differs from the size in - keytype->def_datalen, then key_payload_reserve() should be called. - - This method does not have to lock the key in order to attach a payload. - The fact that KEY_FLAG_INSTANTIATED is not set in key->flags prevents - anything else from gaining access to the key. - - It is safe to sleep in this method. - - - (*) int (*update)(struct key *key, const void *data, size_t datalen); - - If this type of key can be updated, then this method should be provided. - It is called to update a key's payload from the blob of data provided. - - key_payload_reserve() should be called if the data length might change - before any changes are actually made. Note that if this succeeds, the type - is committed to changing the key because it's already been altered, so all - memory allocation must be done first. - - The key will have its semaphore write-locked before this method is called, - but this only deters other writers; any changes to the key's payload must - be made under RCU conditions, and call_rcu() must be used to dispose of - the old payload. - - key_payload_reserve() should be called before the changes are made, but - after all allocations and other potentially failing function calls are - made. - - It is safe to sleep in this method. - - - (*) int (*match)(const struct key *key, const void *desc); - - This method is called to match a key against a description. It should - return non-zero if the two match, zero if they don't. - - This method should not need to lock the key in any way. The type and - description can be considered invariant, and the payload should not be - accessed (the key may not yet be instantiated). - - It is not safe to sleep in this method; the caller may hold spinlocks. - - - (*) void (*revoke)(struct key *key); - - This method is optional. It is called to discard part of the payload - data upon a key being revoked. The caller will have the key semaphore - write-locked. - - It is safe to sleep in this method, though care should be taken to avoid - a deadlock against the key semaphore. - - - (*) void (*destroy)(struct key *key); - - This method is optional. It is called to discard the payload data on a key - when it is being destroyed. - - This method does not need to lock the key to access the payload; it can - consider the key as being inaccessible at this time. Note that the key's - type may have been changed before this function is called. - - It is not safe to sleep in this method; the caller may hold spinlocks. - - - (*) void (*describe)(const struct key *key, struct seq_file *p); - - This method is optional. It is called during /proc/keys reading to - summarise a key's description and payload in text form. - - This method will be called with the RCU read lock held. rcu_dereference() - should be used to read the payload pointer if the payload is to be - accessed. key->datalen cannot be trusted to stay consistent with the - contents of the payload. - - The description will not change, though the key's state may. - - It is not safe to sleep in this method; the RCU read lock is held by the - caller. - - - (*) long (*read)(const struct key *key, char __user *buffer, size_t buflen); - - This method is optional. It is called by KEYCTL_READ to translate the - key's payload into something a blob of data for userspace to deal with. - Ideally, the blob should be in the same format as that passed in to the - instantiate and update methods. - - If successful, the blob size that could be produced should be returned - rather than the size copied. - - This method will be called with the key's semaphore read-locked. This will - prevent the key's payload changing. It is not necessary to use RCU locking - when accessing the key's payload. It is safe to sleep in this method, such - as might happen when the userspace buffer is accessed. - - - (*) int (*request_key)(struct key_construction *cons, const char *op, - void *aux); - - This method is optional. If provided, request_key() and friends will - invoke this function rather than upcalling to /sbin/request-key to operate - upon a key of this type. - - The aux parameter is as passed to request_key_async_with_auxdata() and - similar or is NULL otherwise. Also passed are the construction record for - the key to be operated upon and the operation type (currently only - "create"). - - This method is permitted to return before the upcall is complete, but the - following function must be called under all circumstances to complete the - instantiation process, whether or not it succeeds, whether or not there's - an error: - - void complete_request_key(struct key_construction *cons, int error); - - The error parameter should be 0 on success, -ve on error. The - construction record is destroyed by this action and the authorisation key - will be revoked. If an error is indicated, the key under construction - will be negatively instantiated if it wasn't already instantiated. - - If this method returns an error, that error will be returned to the - caller of request_key*(). complete_request_key() must be called prior to - returning. - - The key under construction and the authorisation key can be found in the - key_construction struct pointed to by cons: - - (*) struct key *key; - - The key under construction. - - (*) struct key *authkey; - - The authorisation key. - - -============================ -REQUEST-KEY CALLBACK SERVICE -============================ - -To create a new key, the kernel will attempt to execute the following command -line: - - /sbin/request-key create \ - - - is the key being constructed, and the three keyrings are the process -keyrings from the process that caused the search to be issued. These are -included for two reasons: - - (1) There may be an authentication token in one of the keyrings that is - required to obtain the key, eg: a Kerberos Ticket-Granting Ticket. - - (2) The new key should probably be cached in one of these rings. - -This program should set it UID and GID to those specified before attempting to -access any more keys. It may then look around for a user specific process to -hand the request off to (perhaps a path held in placed in another key by, for -example, the KDE desktop manager). - -The program (or whatever it calls) should finish construction of the key by -calling KEYCTL_INSTANTIATE or KEYCTL_INSTANTIATE_IOV, which also permits it to -cache the key in one of the keyrings (probably the session ring) before -returning. Alternatively, the key can be marked as negative with KEYCTL_NEGATE -or KEYCTL_REJECT; this also permits the key to be cached in one of the -keyrings. - -If it returns with the key remaining in the unconstructed state, the key will -be marked as being negative, it will be added to the session keyring, and an -error will be returned to the key requestor. - -Supplementary information may be provided from whoever or whatever invoked this -service. This will be passed as the parameter. If no such -information was made available, then "-" will be passed as this parameter -instead. - - -Similarly, the kernel may attempt to update an expired or a soon to expire key -by executing: - - /sbin/request-key update \ - - -In this case, the program isn't required to actually attach the key to a ring; -the rings are provided for reference. - - -================== -GARBAGE COLLECTION -================== - -Dead keys (for which the type has been removed) will be automatically unlinked -from those keyrings that point to them and deleted as soon as possible by a -background garbage collector. - -Similarly, revoked and expired keys will be garbage collected, but only after a -certain amount of time has passed. This time is set as a number of seconds in: - - /proc/sys/kernel/keys/gc_delay diff --git a/Documentation/networking/dns_resolver.txt b/Documentation/networking/dns_resolver.txt index 04ca06325b08..7f531ad83285 100644 --- a/Documentation/networking/dns_resolver.txt +++ b/Documentation/networking/dns_resolver.txt @@ -139,8 +139,8 @@ the key will be discarded and recreated when the data it holds has expired. dns_query() returns a copy of the value attached to the key, or an error if that is indicated instead. -See for further information about -request-key function. +See for further +information about request-key function. ========= diff --git a/Documentation/security/00-INDEX b/Documentation/security/00-INDEX new file mode 100644 index 000000000000..19bc49439cac --- /dev/null +++ b/Documentation/security/00-INDEX @@ -0,0 +1,18 @@ +00-INDEX + - this file. +SELinux.txt + - how to get started with the SELinux security enhancement. +Smack.txt + - documentation on the Smack Linux Security Module. +apparmor.txt + - documentation on the AppArmor security extension. +credentials.txt + - documentation about credentials in Linux. +keys-request-key.txt + - description of the kernel key request service. +keys-trusted-encrypted.txt + - info on the Trusted and Encrypted keys in the kernel key ring service. +keys.txt + - description of the kernel key retention service. +tomoyo.txt + - documentation on the TOMOYO Linux Security Module. diff --git a/Documentation/security/SELinux.txt b/Documentation/security/SELinux.txt new file mode 100644 index 000000000000..07eae00f3314 --- /dev/null +++ b/Documentation/security/SELinux.txt @@ -0,0 +1,27 @@ +If you want to use SELinux, chances are you will want +to use the distro-provided policies, or install the +latest reference policy release from + http://oss.tresys.com/projects/refpolicy + +However, if you want to install a dummy policy for +testing, you can do using 'mdp' provided under +scripts/selinux. Note that this requires the selinux +userspace to be installed - in particular you will +need checkpolicy to compile a kernel, and setfiles and +fixfiles to label the filesystem. + + 1. Compile the kernel with selinux enabled. + 2. Type 'make' to compile mdp. + 3. Make sure that you are not running with + SELinux enabled and a real policy. If + you are, reboot with selinux disabled + before continuing. + 4. Run install_policy.sh: + cd scripts/selinux + sh install_policy.sh + +Step 4 will create a new dummy policy valid for your +kernel, with a single selinux user, role, and type. +It will compile the policy, will set your SELINUXTYPE to +dummy in /etc/selinux/config, install the compiled policy +as 'dummy', and relabel your filesystem. diff --git a/Documentation/security/Smack.txt b/Documentation/security/Smack.txt new file mode 100644 index 000000000000..e9dab41c0fe0 --- /dev/null +++ b/Documentation/security/Smack.txt @@ -0,0 +1,541 @@ + + + "Good for you, you've decided to clean the elevator!" + - The Elevator, from Dark Star + +Smack is the the Simplified Mandatory Access Control Kernel. +Smack is a kernel based implementation of mandatory access +control that includes simplicity in its primary design goals. + +Smack is not the only Mandatory Access Control scheme +available for Linux. Those new to Mandatory Access Control +are encouraged to compare Smack with the other mechanisms +available to determine which is best suited to the problem +at hand. + +Smack consists of three major components: + - The kernel + - A start-up script and a few modified applications + - Configuration data + +The kernel component of Smack is implemented as a Linux +Security Modules (LSM) module. It requires netlabel and +works best with file systems that support extended attributes, +although xattr support is not strictly required. +It is safe to run a Smack kernel under a "vanilla" distribution. +Smack kernels use the CIPSO IP option. Some network +configurations are intolerant of IP options and can impede +access to systems that use them as Smack does. + +The startup script etc-init.d-smack should be installed +in /etc/init.d/smack and should be invoked early in the +start-up process. On Fedora rc5.d/S02smack is recommended. +This script ensures that certain devices have the correct +Smack attributes and loads the Smack configuration if +any is defined. This script invokes two programs that +ensure configuration data is properly formatted. These +programs are /usr/sbin/smackload and /usr/sin/smackcipso. +The system will run just fine without these programs, +but it will be difficult to set access rules properly. + +A version of "ls" that provides a "-M" option to display +Smack labels on long listing is available. + +A hacked version of sshd that allows network logins by users +with specific Smack labels is available. This version does +not work for scp. You must set the /etc/ssh/sshd_config +line: + UsePrivilegeSeparation no + +The format of /etc/smack/usr is: + + username smack + +In keeping with the intent of Smack, configuration data is +minimal and not strictly required. The most important +configuration step is mounting the smackfs pseudo filesystem. + +Add this line to /etc/fstab: + + smackfs /smack smackfs smackfsdef=* 0 0 + +and create the /smack directory for mounting. + +Smack uses extended attributes (xattrs) to store file labels. +The command to set a Smack label on a file is: + + # attr -S -s SMACK64 -V "value" path + +NOTE: Smack labels are limited to 23 characters. The attr command + does not enforce this restriction and can be used to set + invalid Smack labels on files. + +If you don't do anything special all users will get the floor ("_") +label when they log in. If you do want to log in via the hacked ssh +at other labels use the attr command to set the smack value on the +home directory and its contents. + +You can add access rules in /etc/smack/accesses. They take the form: + + subjectlabel objectlabel access + +access is a combination of the letters rwxa which specify the +kind of access permitted a subject with subjectlabel on an +object with objectlabel. If there is no rule no access is allowed. + +A process can see the smack label it is running with by +reading /proc/self/attr/current. A privileged process can +set the process smack by writing there. + +Look for additional programs on http://schaufler-ca.com + +From the Smack Whitepaper: + +The Simplified Mandatory Access Control Kernel + +Casey Schaufler +casey@schaufler-ca.com + +Mandatory Access Control + +Computer systems employ a variety of schemes to constrain how information is +shared among the people and services using the machine. Some of these schemes +allow the program or user to decide what other programs or users are allowed +access to pieces of data. These schemes are called discretionary access +control mechanisms because the access control is specified at the discretion +of the user. Other schemes do not leave the decision regarding what a user or +program can access up to users or programs. These schemes are called mandatory +access control mechanisms because you don't have a choice regarding the users +or programs that have access to pieces of data. + +Bell & LaPadula + +From the middle of the 1980's until the turn of the century Mandatory Access +Control (MAC) was very closely associated with the Bell & LaPadula security +model, a mathematical description of the United States Department of Defense +policy for marking paper documents. MAC in this form enjoyed a following +within the Capital Beltway and Scandinavian supercomputer centers but was +often sited as failing to address general needs. + +Domain Type Enforcement + +Around the turn of the century Domain Type Enforcement (DTE) became popular. +This scheme organizes users, programs, and data into domains that are +protected from each other. This scheme has been widely deployed as a component +of popular Linux distributions. The administrative overhead required to +maintain this scheme and the detailed understanding of the whole system +necessary to provide a secure domain mapping leads to the scheme being +disabled or used in limited ways in the majority of cases. + +Smack + +Smack is a Mandatory Access Control mechanism designed to provide useful MAC +while avoiding the pitfalls of its predecessors. The limitations of Bell & +LaPadula are addressed by providing a scheme whereby access can be controlled +according to the requirements of the system and its purpose rather than those +imposed by an arcane government policy. The complexity of Domain Type +Enforcement and avoided by defining access controls in terms of the access +modes already in use. + +Smack Terminology + +The jargon used to talk about Smack will be familiar to those who have dealt +with other MAC systems and shouldn't be too difficult for the uninitiated to +pick up. There are four terms that are used in a specific way and that are +especially important: + + Subject: A subject is an active entity on the computer system. + On Smack a subject is a task, which is in turn the basic unit + of execution. + + Object: An object is a passive entity on the computer system. + On Smack files of all types, IPC, and tasks can be objects. + + Access: Any attempt by a subject to put information into or get + information from an object is an access. + + Label: Data that identifies the Mandatory Access Control + characteristics of a subject or an object. + +These definitions are consistent with the traditional use in the security +community. There are also some terms from Linux that are likely to crop up: + + Capability: A task that possesses a capability has permission to + violate an aspect of the system security policy, as identified by + the specific capability. A task that possesses one or more + capabilities is a privileged task, whereas a task with no + capabilities is an unprivileged task. + + Privilege: A task that is allowed to violate the system security + policy is said to have privilege. As of this writing a task can + have privilege either by possessing capabilities or by having an + effective user of root. + +Smack Basics + +Smack is an extension to a Linux system. It enforces additional restrictions +on what subjects can access which objects, based on the labels attached to +each of the subject and the object. + +Labels + +Smack labels are ASCII character strings, one to twenty-three characters in +length. Single character labels using special characters, that being anything +other than a letter or digit, are reserved for use by the Smack development +team. Smack labels are unstructured, case sensitive, and the only operation +ever performed on them is comparison for equality. Smack labels cannot +contain unprintable characters, the "/" (slash), the "\" (backslash), the "'" +(quote) and '"' (double-quote) characters. +Smack labels cannot begin with a '-', which is reserved for special options. + +There are some predefined labels: + + _ Pronounced "floor", a single underscore character. + ^ Pronounced "hat", a single circumflex character. + * Pronounced "star", a single asterisk character. + ? Pronounced "huh", a single question mark character. + @ Pronounced "Internet", a single at sign character. + +Every task on a Smack system is assigned a label. System tasks, such as +init(8) and systems daemons, are run with the floor ("_") label. User tasks +are assigned labels according to the specification found in the +/etc/smack/user configuration file. + +Access Rules + +Smack uses the traditional access modes of Linux. These modes are read, +execute, write, and occasionally append. There are a few cases where the +access mode may not be obvious. These include: + + Signals: A signal is a write operation from the subject task to + the object task. + Internet Domain IPC: Transmission of a packet is considered a + write operation from the source task to the destination task. + +Smack restricts access based on the label attached to a subject and the label +attached to the object it is trying to access. The rules enforced are, in +order: + + 1. Any access requested by a task labeled "*" is denied. + 2. A read or execute access requested by a task labeled "^" + is permitted. + 3. A read or execute access requested on an object labeled "_" + is permitted. + 4. Any access requested on an object labeled "*" is permitted. + 5. Any access requested by a task on an object with the same + label is permitted. + 6. Any access requested that is explicitly defined in the loaded + rule set is permitted. + 7. Any other access is denied. + +Smack Access Rules + +With the isolation provided by Smack access separation is simple. There are +many interesting cases where limited access by subjects to objects with +different labels is desired. One example is the familiar spy model of +sensitivity, where a scientist working on a highly classified project would be +able to read documents of lower classifications and anything she writes will +be "born" highly classified. To accommodate such schemes Smack includes a +mechanism for specifying rules allowing access between labels. + +Access Rule Format + +The format of an access rule is: + + subject-label object-label access + +Where subject-label is the Smack label of the task, object-label is the Smack +label of the thing being accessed, and access is a string specifying the sort +of access allowed. The Smack labels are limited to 23 characters. The access +specification is searched for letters that describe access modes: + + a: indicates that append access should be granted. + r: indicates that read access should be granted. + w: indicates that write access should be granted. + x: indicates that execute access should be granted. + +Uppercase values for the specification letters are allowed as well. +Access mode specifications can be in any order. Examples of acceptable rules +are: + + TopSecret Secret rx + Secret Unclass R + Manager Game x + User HR w + New Old rRrRr + Closed Off - + +Examples of unacceptable rules are: + + Top Secret Secret rx + Ace Ace r + Odd spells waxbeans + +Spaces are not allowed in labels. Since a subject always has access to files +with the same label specifying a rule for that case is pointless. Only +valid letters (rwxaRWXA) and the dash ('-') character are allowed in +access specifications. The dash is a placeholder, so "a-r" is the same +as "ar". A lone dash is used to specify that no access should be allowed. + +Applying Access Rules + +The developers of Linux rarely define new sorts of things, usually importing +schemes and concepts from other systems. Most often, the other systems are +variants of Unix. Unix has many endearing properties, but consistency of +access control models is not one of them. Smack strives to treat accesses as +uniformly as is sensible while keeping with the spirit of the underlying +mechanism. + +File system objects including files, directories, named pipes, symbolic links, +and devices require access permissions that closely match those used by mode +bit access. To open a file for reading read access is required on the file. To +search a directory requires execute access. Creating a file with write access +requires both read and write access on the containing directory. Deleting a +file requires read and write access to the file and to the containing +directory. It is possible that a user may be able to see that a file exists +but not any of its attributes by the circumstance of having read access to the +containing directory but not to the differently labeled file. This is an +artifact of the file name being data in the directory, not a part of the file. + +IPC objects, message queues, semaphore sets, and memory segments exist in flat +namespaces and access requests are only required to match the object in +question. + +Process objects reflect tasks on the system and the Smack label used to access +them is the same Smack label that the task would use for its own access +attempts. Sending a signal via the kill() system call is a write operation +from the signaler to the recipient. Debugging a process requires both reading +and writing. Creating a new task is an internal operation that results in two +tasks with identical Smack labels and requires no access checks. + +Sockets are data structures attached to processes and sending a packet from +one process to another requires that the sender have write access to the +receiver. The receiver is not required to have read access to the sender. + +Setting Access Rules + +The configuration file /etc/smack/accesses contains the rules to be set at +system startup. The contents are written to the special file /smack/load. +Rules can be written to /smack/load at any time and take effect immediately. +For any pair of subject and object labels there can be only one rule, with the +most recently specified overriding any earlier specification. + +The program smackload is provided to ensure data is formatted +properly when written to /smack/load. This program reads lines +of the form + + subjectlabel objectlabel mode. + +Task Attribute + +The Smack label of a process can be read from /proc//attr/current. A +process can read its own Smack label from /proc/self/attr/current. A +privileged process can change its own Smack label by writing to +/proc/self/attr/current but not the label of another process. + +File Attribute + +The Smack label of a filesystem object is stored as an extended attribute +named SMACK64 on the file. This attribute is in the security namespace. It can +only be changed by a process with privilege. + +Privilege + +A process with CAP_MAC_OVERRIDE is privileged. + +Smack Networking + +As mentioned before, Smack enforces access control on network protocol +transmissions. Every packet sent by a Smack process is tagged with its Smack +label. This is done by adding a CIPSO tag to the header of the IP packet. Each +packet received is expected to have a CIPSO tag that identifies the label and +if it lacks such a tag the network ambient label is assumed. Before the packet +is delivered a check is made to determine that a subject with the label on the +packet has write access to the receiving process and if that is not the case +the packet is dropped. + +CIPSO Configuration + +It is normally unnecessary to specify the CIPSO configuration. The default +values used by the system handle all internal cases. Smack will compose CIPSO +label values to match the Smack labels being used without administrative +intervention. Unlabeled packets that come into the system will be given the +ambient label. + +Smack requires configuration in the case where packets from a system that is +not smack that speaks CIPSO may be encountered. Usually this will be a Trusted +Solaris system, but there are other, less widely deployed systems out there. +CIPSO provides 3 important values, a Domain Of Interpretation (DOI), a level, +and a category set with each packet. The DOI is intended to identify a group +of systems that use compatible labeling schemes, and the DOI specified on the +smack system must match that of the remote system or packets will be +discarded. The DOI is 3 by default. The value can be read from /smack/doi and +can be changed by writing to /smack/doi. + +The label and category set are mapped to a Smack label as defined in +/etc/smack/cipso. + +A Smack/CIPSO mapping has the form: + + smack level [category [category]*] + +Smack does not expect the level or category sets to be related in any +particular way and does not assume or assign accesses based on them. Some +examples of mappings: + + TopSecret 7 + TS:A,B 7 1 2 + SecBDE 5 2 4 6 + RAFTERS 7 12 26 + +The ":" and "," characters are permitted in a Smack label but have no special +meaning. + +The mapping of Smack labels to CIPSO values is defined by writing to +/smack/cipso. Again, the format of data written to this special file +is highly restrictive, so the program smackcipso is provided to +ensure the writes are done properly. This program takes mappings +on the standard input and sends them to /smack/cipso properly. + +In addition to explicit mappings Smack supports direct CIPSO mappings. One +CIPSO level is used to indicate that the category set passed in the packet is +in fact an encoding of the Smack label. The level used is 250 by default. The +value can be read from /smack/direct and changed by writing to /smack/direct. + +Socket Attributes + +There are two attributes that are associated with sockets. These attributes +can only be set by privileged tasks, but any task can read them for their own +sockets. + + SMACK64IPIN: The Smack label of the task object. A privileged + program that will enforce policy may set this to the star label. + + SMACK64IPOUT: The Smack label transmitted with outgoing packets. + A privileged program may set this to match the label of another + task with which it hopes to communicate. + +Smack Netlabel Exceptions + +You will often find that your labeled application has to talk to the outside, +unlabeled world. To do this there's a special file /smack/netlabel where you can +add some exceptions in the form of : +@IP1 LABEL1 or +@IP2/MASK LABEL2 + +It means that your application will have unlabeled access to @IP1 if it has +write access on LABEL1, and access to the subnet @IP2/MASK if it has write +access on LABEL2. + +Entries in the /smack/netlabel file are matched by longest mask first, like in +classless IPv4 routing. + +A special label '@' and an option '-CIPSO' can be used there : +@ means Internet, any application with any label has access to it +-CIPSO means standard CIPSO networking + +If you don't know what CIPSO is and don't plan to use it, you can just do : +echo 127.0.0.1 -CIPSO > /smack/netlabel +echo 0.0.0.0/0 @ > /smack/netlabel + +If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled +Internet access, you can have : +echo 127.0.0.1 -CIPSO > /smack/netlabel +echo 192.168.0.0/16 -CIPSO > /smack/netlabel +echo 0.0.0.0/0 @ > /smack/netlabel + + +Writing Applications for Smack + +There are three sorts of applications that will run on a Smack system. How an +application interacts with Smack will determine what it will have to do to +work properly under Smack. + +Smack Ignorant Applications + +By far the majority of applications have no reason whatever to care about the +unique properties of Smack. Since invoking a program has no impact on the +Smack label associated with the process the only concern likely to arise is +whether the process has execute access to the program. + +Smack Relevant Applications + +Some programs can be improved by teaching them about Smack, but do not make +any security decisions themselves. The utility ls(1) is one example of such a +program. + +Smack Enforcing Applications + +These are special programs that not only know about Smack, but participate in +the enforcement of system policy. In most cases these are the programs that +set up user sessions. There are also network services that provide information +to processes running with various labels. + +File System Interfaces + +Smack maintains labels on file system objects using extended attributes. The +Smack label of a file, directory, or other file system object can be obtained +using getxattr(2). + + len = getxattr("/", "security.SMACK64", value, sizeof (value)); + +will put the Smack label of the root directory into value. A privileged +process can set the Smack label of a file system object with setxattr(2). + + len = strlen("Rubble"); + rc = setxattr("/foo", "security.SMACK64", "Rubble", len, 0); + +will set the Smack label of /foo to "Rubble" if the program has appropriate +privilege. + +Socket Interfaces + +The socket attributes can be read using fgetxattr(2). + +A privileged process can set the Smack label of outgoing packets with +fsetxattr(2). + + len = strlen("Rubble"); + rc = fsetxattr(fd, "security.SMACK64IPOUT", "Rubble", len, 0); + +will set the Smack label "Rubble" on packets going out from the socket if the +program has appropriate privilege. + + rc = fsetxattr(fd, "security.SMACK64IPIN, "*", strlen("*"), 0); + +will set the Smack label "*" as the object label against which incoming +packets will be checked if the program has appropriate privilege. + +Administration + +Smack supports some mount options: + + smackfsdef=label: specifies the label to give files that lack + the Smack label extended attribute. + + smackfsroot=label: specifies the label to assign the root of the + file system if it lacks the Smack extended attribute. + + smackfshat=label: specifies a label that must have read access to + all labels set on the filesystem. Not yet enforced. + + smackfsfloor=label: specifies a label to which all labels set on the + filesystem must have read access. Not yet enforced. + +These mount options apply to all file system types. + +Smack auditing + +If you want Smack auditing of security events, you need to set CONFIG_AUDIT +in your kernel configuration. +By default, all denied events will be audited. You can change this behavior by +writing a single character to the /smack/logging file : +0 : no logging +1 : log denied (default) +2 : log accepted +3 : log denied & accepted + +Events are logged as 'key=value' pairs, for each event you at least will get +the subjet, the object, the rights requested, the action, the kernel function +that triggered the event, plus other pairs depending on the type of event +audited. diff --git a/Documentation/security/apparmor.txt b/Documentation/security/apparmor.txt new file mode 100644 index 000000000000..93c1fd7d0635 --- /dev/null +++ b/Documentation/security/apparmor.txt @@ -0,0 +1,39 @@ +--- What is AppArmor? --- + +AppArmor is MAC style security extension for the Linux kernel. It implements +a task centered policy, with task "profiles" being created and loaded +from user space. Tasks on the system that do not have a profile defined for +them run in an unconfined state which is equivalent to standard Linux DAC +permissions. + +--- How to enable/disable --- + +set CONFIG_SECURITY_APPARMOR=y + +If AppArmor should be selected as the default security module then + set CONFIG_DEFAULT_SECURITY="apparmor" + and CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1 + +Build the kernel + +If AppArmor is not the default security module it can be enabled by passing +security=apparmor on the kernel's command line. + +If AppArmor is the default security module it can be disabled by passing +apparmor=0, security=XXXX (where XXX is valid security module), on the +kernel's command line + +For AppArmor to enforce any restrictions beyond standard Linux DAC permissions +policy must be loaded into the kernel from user space (see the Documentation +and tools links). + +--- Documentation --- + +Documentation can be found on the wiki. + +--- Links --- + +Mailing List - apparmor@lists.ubuntu.com +Wiki - http://apparmor.wiki.kernel.org/ +User space tools - https://launchpad.net/apparmor +Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git diff --git a/Documentation/security/credentials.txt b/Documentation/security/credentials.txt new file mode 100644 index 000000000000..fc0366cbd7ce --- /dev/null +++ b/Documentation/security/credentials.txt @@ -0,0 +1,581 @@ + ==================== + CREDENTIALS IN LINUX + ==================== + +By: David Howells + +Contents: + + (*) Overview. + + (*) Types of credentials. + + (*) File markings. + + (*) Task credentials. + + - Immutable credentials. + - Accessing task credentials. + - Accessing another task's credentials. + - Altering credentials. + - Managing credentials. + + (*) Open file credentials. + + (*) Overriding the VFS's use of credentials. + + +======== +OVERVIEW +======== + +There are several parts to the security check performed by Linux when one +object acts upon another: + + (1) Objects. + + Objects are things in the system that may be acted upon directly by + userspace programs. Linux has a variety of actionable objects, including: + + - Tasks + - Files/inodes + - Sockets + - Message queues + - Shared memory segments + - Semaphores + - Keys + + As a part of the description of all these objects there is a set of + credentials. What's in the set depends on the type of object. + + (2) Object ownership. + + Amongst the credentials of most objects, there will be a subset that + indicates the ownership of that object. This is used for resource + accounting and limitation (disk quotas and task rlimits for example). + + In a standard UNIX filesystem, for instance, this will be defined by the + UID marked on the inode. + + (3) The objective context. + + Also amongst the credentials of those objects, there will be a subset that + indicates the 'objective context' of that object. This may or may not be + the same set as in (2) - in standard UNIX files, for instance, this is the + defined by the UID and the GID marked on the inode. + + The objective context is used as part of the security calculation that is + carried out when an object is acted upon. + + (4) Subjects. + + A subject is an object that is acting upon another object. + + Most of the objects in the system are inactive: they don't act on other + objects within the system. Processes/tasks are the obvious exception: + they do stuff; they access and manipulate things. + + Objects other than tasks may under some circumstances also be subjects. + For instance an open file may send SIGIO to a task using the UID and EUID + given to it by a task that called fcntl(F_SETOWN) upon it. In this case, + the file struct will have a subjective context too. + + (5) The subjective context. + + A subject has an additional interpretation of its credentials. A subset + of its credentials forms the 'subjective context'. The subjective context + is used as part of the security calculation that is carried out when a + subject acts. + + A Linux task, for example, has the FSUID, FSGID and the supplementary + group list for when it is acting upon a file - which are quite separate + from the real UID and GID that normally form the objective context of the + task. + + (6) Actions. + + Linux has a number of actions available that a subject may perform upon an + object. The set of actions available depends on the nature of the subject + and the object. + + Actions include reading, writing, creating and deleting files; forking or + signalling and tracing tasks. + + (7) Rules, access control lists and security calculations. + + When a subject acts upon an object, a security calculation is made. This + involves taking the subjective context, the objective context and the + action, and searching one or more sets of rules to see whether the subject + is granted or denied permission to act in the desired manner on the + object, given those contexts. + + There are two main sources of rules: + + (a) Discretionary access control (DAC): + + Sometimes the object will include sets of rules as part of its + description. This is an 'Access Control List' or 'ACL'. A Linux + file may supply more than one ACL. + + A traditional UNIX file, for example, includes a permissions mask that + is an abbreviated ACL with three fixed classes of subject ('user', + 'group' and 'other'), each of which may be granted certain privileges + ('read', 'write' and 'execute' - whatever those map to for the object + in question). UNIX file permissions do not allow the arbitrary + specification of subjects, however, and so are of limited use. + + A Linux file might also sport a POSIX ACL. This is a list of rules + that grants various permissions to arbitrary subjects. + + (b) Mandatory access control (MAC): + + The system as a whole may have one or more sets of rules that get + applied to all subjects and objects, regardless of their source. + SELinux and Smack are examples of this. + + In the case of SELinux and Smack, each object is given a label as part + of its credentials. When an action is requested, they take the + subject label, the object label and the action and look for a rule + that says that this action is either granted or denied. + + +==================== +TYPES OF CREDENTIALS +==================== + +The Linux kernel supports the following types of credentials: + + (1) Traditional UNIX credentials. + + Real User ID + Real Group ID + + The UID and GID are carried by most, if not all, Linux objects, even if in + some cases it has to be invented (FAT or CIFS files for example, which are + derived from Windows). These (mostly) define the objective context of + that object, with tasks being slightly different in some cases. + + Effective, Saved and FS User ID + Effective, Saved and FS Group ID + Supplementary groups + + These are additional credentials used by tasks only. Usually, an + EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID + will be used as the objective. For tasks, it should be noted that this is + not always true. + + (2) Capabilities. + + Set of permitted capabilities + Set of inheritable capabilities + Set of effective capabilities + Capability bounding set + + These are only carried by tasks. They indicate superior capabilities + granted piecemeal to a task that an ordinary task wouldn't otherwise have. + These are manipulated implicitly by changes to the traditional UNIX + credentials, but can also be manipulated directly by the capset() system + call. + + The permitted capabilities are those caps that the process might grant + itself to its effective or permitted sets through capset(). This + inheritable set might also be so constrained. + + The effective capabilities are the ones that a task is actually allowed to + make use of itself. + + The inheritable capabilities are the ones that may get passed across + execve(). + + The bounding set limits the capabilities that may be inherited across + execve(), especially when a binary is executed that will execute as UID 0. + + (3) Secure management flags (securebits). + + These are only carried by tasks. These govern the way the above + credentials are manipulated and inherited over certain operations such as + execve(). They aren't used directly as objective or subjective + credentials. + + (4) Keys and keyrings. + + These are only carried by tasks. They carry and cache security tokens + that don't fit into the other standard UNIX credentials. They are for + making such things as network filesystem keys available to the file + accesses performed by processes, without the necessity of ordinary + programs having to know about security details involved. + + Keyrings are a special type of key. They carry sets of other keys and can + be searched for the desired key. Each process may subscribe to a number + of keyrings: + + Per-thread keying + Per-process keyring + Per-session keyring + + When a process accesses a key, if not already present, it will normally be + cached on one of these keyrings for future accesses to find. + + For more information on using keys, see Documentation/security/keys.txt. + + (5) LSM + + The Linux Security Module allows extra controls to be placed over the + operations that a task may do. Currently Linux supports two main + alternate LSM options: SELinux and Smack. + + Both work by labelling the objects in a system and then applying sets of + rules (policies) that say what operations a task with one label may do to + an object with another label. + + (6) AF_KEY + + This is a socket-based approach to credential management for networking + stacks [RFC 2367]. It isn't discussed by this document as it doesn't + interact directly with task and file credentials; rather it keeps system + level credentials. + + +When a file is opened, part of the opening task's subjective context is +recorded in the file struct created. This allows operations using that file +struct to use those credentials instead of the subjective context of the task +that issued the operation. An example of this would be a file opened on a +network filesystem where the credentials of the opened file should be presented +to the server, regardless of who is actually doing a read or a write upon it. + + +============= +FILE MARKINGS +============= + +Files on disk or obtained over the network may have annotations that form the +objective security context of that file. Depending on the type of filesystem, +this may include one or more of the following: + + (*) UNIX UID, GID, mode; + + (*) Windows user ID; + + (*) Access control list; + + (*) LSM security label; + + (*) UNIX exec privilege escalation bits (SUID/SGID); + + (*) File capabilities exec privilege escalation bits. + +These are compared to the task's subjective security context, and certain +operations allowed or disallowed as a result. In the case of execve(), the +privilege escalation bits come into play, and may allow the resulting process +extra privileges, based on the annotations on the executable file. + + +================ +TASK CREDENTIALS +================ + +In Linux, all of a task's credentials are held in (uid, gid) or through +(groups, keys, LSM security) a refcounted structure of type 'struct cred'. +Each task points to its credentials by a pointer called 'cred' in its +task_struct. + +Once a set of credentials has been prepared and committed, it may not be +changed, barring the following exceptions: + + (1) its reference count may be changed; + + (2) the reference count on the group_info struct it points to may be changed; + + (3) the reference count on the security data it points to may be changed; + + (4) the reference count on any keyrings it points to may be changed; + + (5) any keyrings it points to may be revoked, expired or have their security + attributes changed; and + + (6) the contents of any keyrings to which it points may be changed (the whole + point of keyrings being a shared set of credentials, modifiable by anyone + with appropriate access). + +To alter anything in the cred struct, the copy-and-replace principle must be +adhered to. First take a copy, then alter the copy and then use RCU to change +the task pointer to make it point to the new copy. There are wrappers to aid +with this (see below). + +A task may only alter its _own_ credentials; it is no longer permitted for a +task to alter another's credentials. This means the capset() system call is no +longer permitted to take any PID other than the one of the current process. +Also keyctl_instantiate() and keyctl_negate() functions no longer permit +attachment to process-specific keyrings in the requesting process as the +instantiating process may need to create them. + + +IMMUTABLE CREDENTIALS +--------------------- + +Once a set of credentials has been made public (by calling commit_creds() for +example), it must be considered immutable, barring two exceptions: + + (1) The reference count may be altered. + + (2) Whilst the keyring subscriptions of a set of credentials may not be + changed, the keyrings subscribed to may have their contents altered. + +To catch accidental credential alteration at compile time, struct task_struct +has _const_ pointers to its credential sets, as does struct file. Furthermore, +certain functions such as get_cred() and put_cred() operate on const pointers, +thus rendering casts unnecessary, but require to temporarily ditch the const +qualification to be able to alter the reference count. + + +ACCESSING TASK CREDENTIALS +-------------------------- + +A task being able to alter only its own credentials permits the current process +to read or replace its own credentials without the need for any form of locking +- which simplifies things greatly. It can just call: + + const struct cred *current_cred() + +to get a pointer to its credentials structure, and it doesn't have to release +it afterwards. + +There are convenience wrappers for retrieving specific aspects of a task's +credentials (the value is simply returned in each case): + + uid_t current_uid(void) Current's real UID + gid_t current_gid(void) Current's real GID + uid_t current_euid(void) Current's effective UID + gid_t current_egid(void) Current's effective GID + uid_t current_fsuid(void) Current's file access UID + gid_t current_fsgid(void) Current's file access GID + kernel_cap_t current_cap(void) Current's effective capabilities + void *current_security(void) Current's LSM security pointer + struct user_struct *current_user(void) Current's user account + +There are also convenience wrappers for retrieving specific associated pairs of +a task's credentials: + + void current_uid_gid(uid_t *, gid_t *); + void current_euid_egid(uid_t *, gid_t *); + void current_fsuid_fsgid(uid_t *, gid_t *); + +which return these pairs of values through their arguments after retrieving +them from the current task's credentials. + + +In addition, there is a function for obtaining a reference on the current +process's current set of credentials: + + const struct cred *get_current_cred(void); + +and functions for getting references to one of the credentials that don't +actually live in struct cred: + + struct user_struct *get_current_user(void); + struct group_info *get_current_groups(void); + +which get references to the current process's user accounting structure and +supplementary groups list respectively. + +Once a reference has been obtained, it must be released with put_cred(), +free_uid() or put_group_info() as appropriate. + + +ACCESSING ANOTHER TASK'S CREDENTIALS +------------------------------------ + +Whilst a task may access its own credentials without the need for locking, the +same is not true of a task wanting to access another task's credentials. It +must use the RCU read lock and rcu_dereference(). + +The rcu_dereference() is wrapped by: + + const struct cred *__task_cred(struct task_struct *task); + +This should be used inside the RCU read lock, as in the following example: + + void foo(struct task_struct *t, struct foo_data *f) + { + const struct cred *tcred; + ... + rcu_read_lock(); + tcred = __task_cred(t); + f->uid = tcred->uid; + f->gid = tcred->gid; + f->groups = get_group_info(tcred->groups); + rcu_read_unlock(); + ... + } + +Should it be necessary to hold another task's credentials for a long period of +time, and possibly to sleep whilst doing so, then the caller should get a +reference on them using: + + const struct cred *get_task_cred(struct task_struct *task); + +This does all the RCU magic inside of it. The caller must call put_cred() on +the credentials so obtained when they're finished with. + + [*] Note: The result of __task_cred() should not be passed directly to + get_cred() as this may race with commit_cred(). + +There are a couple of convenience functions to access bits of another task's +credentials, hiding the RCU magic from the caller: + + uid_t task_uid(task) Task's real UID + uid_t task_euid(task) Task's effective UID + +If the caller is holding the RCU read lock at the time anyway, then: + + __task_cred(task)->uid + __task_cred(task)->euid + +should be used instead. Similarly, if multiple aspects of a task's credentials +need to be accessed, RCU read lock should be used, __task_cred() called, the +result stored in a temporary pointer and then the credential aspects called +from that before dropping the lock. This prevents the potentially expensive +RCU magic from being invoked multiple times. + +Should some other single aspect of another task's credentials need to be +accessed, then this can be used: + + task_cred_xxx(task, member) + +where 'member' is a non-pointer member of the cred struct. For instance: + + uid_t task_cred_xxx(task, suid); + +will retrieve 'struct cred::suid' from the task, doing the appropriate RCU +magic. This may not be used for pointer members as what they point to may +disappear the moment the RCU read lock is dropped. + + +ALTERING CREDENTIALS +-------------------- + +As previously mentioned, a task may only alter its own credentials, and may not +alter those of another task. This means that it doesn't need to use any +locking to alter its own credentials. + +To alter the current process's credentials, a function should first prepare a +new set of credentials by calling: + + struct cred *prepare_creds(void); + +this locks current->cred_replace_mutex and then allocates and constructs a +duplicate of the current process's credentials, returning with the mutex still +held if successful. It returns NULL if not successful (out of memory). + +The mutex prevents ptrace() from altering the ptrace state of a process whilst +security checks on credentials construction and changing is taking place as +the ptrace state may alter the outcome, particularly in the case of execve(). + +The new credentials set should be altered appropriately, and any security +checks and hooks done. Both the current and the proposed sets of credentials +are available for this purpose as current_cred() will return the current set +still at this point. + + +When the credential set is ready, it should be committed to the current process +by calling: + + int commit_creds(struct cred *new); + +This will alter various aspects of the credentials and the process, giving the +LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually +commit the new credentials to current->cred, it will release +current->cred_replace_mutex to allow ptrace() to take place, and it will notify +the scheduler and others of the changes. + +This function is guaranteed to return 0, so that it can be tail-called at the +end of such functions as sys_setresuid(). + +Note that this function consumes the caller's reference to the new credentials. +The caller should _not_ call put_cred() on the new credentials afterwards. + +Furthermore, once this function has been called on a new set of credentials, +those credentials may _not_ be changed further. + + +Should the security checks fail or some other error occur after prepare_creds() +has been called, then the following function should be invoked: + + void abort_creds(struct cred *new); + +This releases the lock on current->cred_replace_mutex that prepare_creds() got +and then releases the new credentials. + + +A typical credentials alteration function would look something like this: + + int alter_suid(uid_t suid) + { + struct cred *new; + int ret; + + new = prepare_creds(); + if (!new) + return -ENOMEM; + + new->suid = suid; + ret = security_alter_suid(new); + if (ret < 0) { + abort_creds(new); + return ret; + } + + return commit_creds(new); + } + + +MANAGING CREDENTIALS +-------------------- + +There are some functions to help manage credentials: + + (*) void put_cred(const struct cred *cred); + + This releases a reference to the given set of credentials. If the + reference count reaches zero, the credentials will be scheduled for + destruction by the RCU system. + + (*) const struct cred *get_cred(const struct cred *cred); + + This gets a reference on a live set of credentials, returning a pointer to + that set of credentials. + + (*) struct cred *get_new_cred(struct cred *cred); + + This gets a reference on a set of credentials that is under construction + and is thus still mutable, returning a pointer to that set of credentials. + + +===================== +OPEN FILE CREDENTIALS +===================== + +When a new file is opened, a reference is obtained on the opening task's +credentials and this is attached to the file struct as 'f_cred' in place of +'f_uid' and 'f_gid'. Code that used to access file->f_uid and file->f_gid +should now access file->f_cred->fsuid and file->f_cred->fsgid. + +It is safe to access f_cred without the use of RCU or locking because the +pointer will not change over the lifetime of the file struct, and nor will the +contents of the cred struct pointed to, barring the exceptions listed above +(see the Task Credentials section). + + +======================================= +OVERRIDING THE VFS'S USE OF CREDENTIALS +======================================= + +Under some circumstances it is desirable to override the credentials used by +the VFS, and that can be done by calling into such as vfs_mkdir() with a +different set of credentials. This is done in the following places: + + (*) sys_faccessat(). + + (*) do_coredump(). + + (*) nfs4recover.c. diff --git a/Documentation/security/keys-request-key.txt b/Documentation/security/keys-request-key.txt new file mode 100644 index 000000000000..51987bfecfed --- /dev/null +++ b/Documentation/security/keys-request-key.txt @@ -0,0 +1,202 @@ + =================== + KEY REQUEST SERVICE + =================== + +The key request service is part of the key retention service (refer to +Documentation/security/keys.txt). This document explains more fully how +the requesting algorithm works. + +The process starts by either the kernel requesting a service by calling +request_key*(): + + struct key *request_key(const struct key_type *type, + const char *description, + const char *callout_info); + +or: + + struct key *request_key_with_auxdata(const struct key_type *type, + const char *description, + const char *callout_info, + size_t callout_len, + void *aux); + +or: + + struct key *request_key_async(const struct key_type *type, + const char *description, + const char *callout_info, + size_t callout_len); + +or: + + struct key *request_key_async_with_auxdata(const struct key_type *type, + const char *description, + const char *callout_info, + size_t callout_len, + void *aux); + +Or by userspace invoking the request_key system call: + + key_serial_t request_key(const char *type, + const char *description, + const char *callout_info, + key_serial_t dest_keyring); + +The main difference between the access points is that the in-kernel interface +does not need to link the key to a keyring to prevent it from being immediately +destroyed. The kernel interface returns a pointer directly to the key, and +it's up to the caller to destroy the key. + +The request_key*_with_auxdata() calls are like the in-kernel request_key*() +calls, except that they permit auxiliary data to be passed to the upcaller (the +default is NULL). This is only useful for those key types that define their +own upcall mechanism rather than using /sbin/request-key. + +The two async in-kernel calls may return keys that are still in the process of +being constructed. The two non-async ones will wait for construction to +complete first. + +The userspace interface links the key to a keyring associated with the process +to prevent the key from going away, and returns the serial number of the key to +the caller. + + +The following example assumes that the key types involved don't define their +own upcall mechanisms. If they do, then those should be substituted for the +forking and execution of /sbin/request-key. + + +=========== +THE PROCESS +=========== + +A request proceeds in the following manner: + + (1) Process A calls request_key() [the userspace syscall calls the kernel + interface]. + + (2) request_key() searches the process's subscribed keyrings to see if there's + a suitable key there. If there is, it returns the key. If there isn't, + and callout_info is not set, an error is returned. Otherwise the process + proceeds to the next step. + + (3) request_key() sees that A doesn't have the desired key yet, so it creates + two things: + + (a) An uninstantiated key U of requested type and description. + + (b) An authorisation key V that refers to key U and notes that process A + is the context in which key U should be instantiated and secured, and + from which associated key requests may be satisfied. + + (4) request_key() then forks and executes /sbin/request-key with a new session + keyring that contains a link to auth key V. + + (5) /sbin/request-key assumes the authority associated with key U. + + (6) /sbin/request-key execs an appropriate program to perform the actual + instantiation. + + (7) The program may want to access another key from A's context (say a + Kerberos TGT key). It just requests the appropriate key, and the keyring + search notes that the session keyring has auth key V in its bottom level. + + This will permit it to then search the keyrings of process A with the + UID, GID, groups and security info of process A as if it was process A, + and come up with key W. + + (8) The program then does what it must to get the data with which to + instantiate key U, using key W as a reference (perhaps it contacts a + Kerberos server using the TGT) and then instantiates key U. + + (9) Upon instantiating key U, auth key V is automatically revoked so that it + may not be used again. + +(10) The program then exits 0 and request_key() deletes key V and returns key + U to the caller. + +This also extends further. If key W (step 7 above) didn't exist, key W would +be created uninstantiated, another auth key (X) would be created (as per step +3) and another copy of /sbin/request-key spawned (as per step 4); but the +context specified by auth key X will still be process A, as it was in auth key +V. + +This is because process A's keyrings can't simply be attached to +/sbin/request-key at the appropriate places because (a) execve will discard two +of them, and (b) it requires the same UID/GID/Groups all the way through. + + +==================================== +NEGATIVE INSTANTIATION AND REJECTION +==================================== + +Rather than instantiating a key, it is possible for the possessor of an +authorisation key to negatively instantiate a key that's under construction. +This is a short duration placeholder that causes any attempt at re-requesting +the key whilst it exists to fail with error ENOKEY if negated or the specified +error if rejected. + +This is provided to prevent excessive repeated spawning of /sbin/request-key +processes for a key that will never be obtainable. + +Should the /sbin/request-key process exit anything other than 0 or die on a +signal, the key under construction will be automatically negatively +instantiated for a short amount of time. + + +==================== +THE SEARCH ALGORITHM +==================== + +A search of any particular keyring proceeds in the following fashion: + + (1) When the key management code searches for a key (keyring_search_aux) it + firstly calls key_permission(SEARCH) on the keyring it's starting with, + if this denies permission, it doesn't search further. + + (2) It considers all the non-keyring keys within that keyring and, if any key + matches the criteria specified, calls key_permission(SEARCH) on it to see + if the key is allowed to be found. If it is, that key is returned; if + not, the search continues, and the error code is retained if of higher + priority than the one currently set. + + (3) It then considers all the keyring-type keys in the keyring it's currently + searching. It calls key_permission(SEARCH) on each keyring, and if this + grants permission, it recurses, executing steps (2) and (3) on that + keyring. + +The process stops immediately a valid key is found with permission granted to +use it. Any error from a previous match attempt is discarded and the key is +returned. + +When search_process_keyrings() is invoked, it performs the following searches +until one succeeds: + + (1) If extant, the process's thread keyring is searched. + + (2) If extant, the process's process keyring is searched. + + (3) The process's session keyring is searched. + + (4) If the process has assumed the authority associated with a request_key() + authorisation key then: + + (a) If extant, the calling process's thread keyring is searched. + + (b) If extant, the calling process's process keyring is searched. + + (c) The calling process's session keyring is searched. + +The moment one succeeds, all pending errors are discarded and the found key is +returned. + +Only if all these fail does the whole thing fail with the highest priority +error. Note that several errors may have come from LSM. + +The error priority is: + + EKEYREVOKED > EKEYEXPIRED > ENOKEY + +EACCES/EPERM are only returned on a direct search of a specific keyring where +the basal keyring does not grant Search permission. diff --git a/Documentation/security/keys-trusted-encrypted.txt b/Documentation/security/keys-trusted-encrypted.txt new file mode 100644 index 000000000000..8fb79bc1ac4b --- /dev/null +++ b/Documentation/security/keys-trusted-encrypted.txt @@ -0,0 +1,145 @@ + Trusted and Encrypted Keys + +Trusted and Encrypted Keys are two new key types added to the existing kernel +key ring service. Both of these new types are variable length symmetic keys, +and in both cases all keys are created in the kernel, and user space sees, +stores, and loads only encrypted blobs. Trusted Keys require the availability +of a Trusted Platform Module (TPM) chip for greater security, while Encrypted +Keys can be used on any system. All user level blobs, are displayed and loaded +in hex ascii for convenience, and are integrity verified. + +Trusted Keys use a TPM both to generate and to seal the keys. Keys are sealed +under a 2048 bit RSA key in the TPM, and optionally sealed to specified PCR +(integrity measurement) values, and only unsealed by the TPM, if PCRs and blob +integrity verifications match. A loaded Trusted Key can be updated with new +(future) PCR values, so keys are easily migrated to new pcr values, such as +when the kernel and initramfs are updated. The same key can have many saved +blobs under different PCR values, so multiple boots are easily supported. + +By default, trusted keys are sealed under the SRK, which has the default +authorization value (20 zeros). This can be set at takeownership time with the +trouser's utility: "tpm_takeownership -u -z". + +Usage: + keyctl add trusted name "new keylen [options]" ring + keyctl add trusted name "load hex_blob [pcrlock=pcrnum]" ring + keyctl update key "update [options]" + keyctl print keyid + + options: + keyhandle= ascii hex value of sealing key default 0x40000000 (SRK) + keyauth= ascii hex auth for sealing key default 0x00...i + (40 ascii zeros) + blobauth= ascii hex auth for sealed data default 0x00... + (40 ascii zeros) + blobauth= ascii hex auth for sealed data default 0x00... + (40 ascii zeros) + pcrinfo= ascii hex of PCR_INFO or PCR_INFO_LONG (no default) + pcrlock= pcr number to be extended to "lock" blob + migratable= 0|1 indicating permission to reseal to new PCR values, + default 1 (resealing allowed) + +"keyctl print" returns an ascii hex copy of the sealed key, which is in standard +TPM_STORED_DATA format. The key length for new keys are always in bytes. +Trusted Keys can be 32 - 128 bytes (256 - 1024 bits), the upper limit is to fit +within the 2048 bit SRK (RSA) keylength, with all necessary structure/padding. + +Encrypted keys do not depend on a TPM, and are faster, as they use AES for +encryption/decryption. New keys are created from kernel generated random +numbers, and are encrypted/decrypted using a specified 'master' key. The +'master' key can either be a trusted-key or user-key type. The main +disadvantage of encrypted keys is that if they are not rooted in a trusted key, +they are only as secure as the user key encrypting them. The master user key +should therefore be loaded in as secure a way as possible, preferably early in +boot. + +Usage: + keyctl add encrypted name "new key-type:master-key-name keylen" ring + keyctl add encrypted name "load hex_blob" ring + keyctl update keyid "update key-type:master-key-name" + +where 'key-type' is either 'trusted' or 'user'. + +Examples of trusted and encrypted key usage: + +Create and save a trusted key named "kmk" of length 32 bytes: + + $ keyctl add trusted kmk "new 32" @u + 440502848 + + $ keyctl show + Session Keyring + -3 --alswrv 500 500 keyring: _ses + 97833714 --alswrv 500 -1 \_ keyring: _uid.500 + 440502848 --alswrv 500 500 \_ trusted: kmk + + $ keyctl print 440502848 + 0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915 + 3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b + 27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722 + a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec + d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d + dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0 + f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b + e4a8aea2b607ec96931e6f4d4fe563ba + + $ keyctl pipe 440502848 > kmk.blob + +Load a trusted key from the saved blob: + + $ keyctl add trusted kmk "load `cat kmk.blob`" @u + 268728824 + + $ keyctl print 268728824 + 0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915 + 3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b + 27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722 + a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec + d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d + dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0 + f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b + e4a8aea2b607ec96931e6f4d4fe563ba + +Reseal a trusted key under new pcr values: + + $ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`" + $ keyctl print 268728824 + 010100000000002c0002800093c35a09b70fff26e7a98ae786c641e678ec6ffb6b46d805 + 77c8a6377aed9d3219c6dfec4b23ffe3000001005d37d472ac8a44023fbb3d18583a4f73 + d3a076c0858f6f1dcaa39ea0f119911ff03f5406df4f7f27f41da8d7194f45c9f4e00f2e + df449f266253aa3f52e55c53de147773e00f0f9aca86c64d94c95382265968c354c5eab4 + 9638c5ae99c89de1e0997242edfb0b501744e11ff9762dfd951cffd93227cc513384e7e6 + e782c29435c7ec2edafaa2f4c1fe6e7a781b59549ff5296371b42133777dcc5b8b971610 + 94bc67ede19e43ddb9dc2baacad374a36feaf0314d700af0a65c164b7082401740e489c9 + 7ef6a24defe4846104209bf0c3eced7fa1a672ed5b125fc9d8cd88b476a658a4434644ef + df8ae9a178e9f83ba9f08d10fa47e4226b98b0702f06b3b8 + +Create and save an encrypted key "evm" using the above trusted key "kmk": + + $ keyctl add encrypted evm "new trusted:kmk 32" @u + 159771175 + + $ keyctl print 159771175 + trusted:kmk 32 2375725ad57798846a9bbd240de8906f006e66c03af53b1b382dbbc55 + be2a44616e4959430436dc4f2a7a9659aa60bb4652aeb2120f149ed197c564e024717c64 + 5972dcb82ab2dde83376d82b2e3c09ffc + + $ keyctl pipe 159771175 > evm.blob + +Load an encrypted key "evm" from saved blob: + + $ keyctl add encrypted evm "load `cat evm.blob`" @u + 831684262 + + $ keyctl print 831684262 + trusted:kmk 32 2375725ad57798846a9bbd240de8906f006e66c03af53b1b382dbbc55 + be2a44616e4959430436dc4f2a7a9659aa60bb4652aeb2120f149ed197c564e024717c64 + 5972dcb82ab2dde83376d82b2e3c09ffc + + +The initial consumer of trusted keys is EVM, which at boot time needs a high +quality symmetric key for HMAC protection of file metadata. The use of a +trusted key provides strong guarantees that the EVM key has not been +compromised by a user level problem, and when sealed to specific boot PCR +values, protects against boot and offline attacks. Other uses for trusted and +encrypted keys, such as for disk and file encryption are anticipated. diff --git a/Documentation/security/keys.txt b/Documentation/security/keys.txt new file mode 100644 index 000000000000..4d75931d2d79 --- /dev/null +++ b/Documentation/security/keys.txt @@ -0,0 +1,1290 @@ + ============================ + KERNEL KEY RETENTION SERVICE + ============================ + +This service allows cryptographic keys, authentication tokens, cross-domain +user mappings, and similar to be cached in the kernel for the use of +filesystems and other kernel services. + +Keyrings are permitted; these are a special type of key that can hold links to +other keys. Processes each have three standard keyring subscriptions that a +kernel service can search for relevant keys. + +The key service can be configured on by enabling: + + "Security options"/"Enable access key retention support" (CONFIG_KEYS) + +This document has the following sections: + + - Key overview + - Key service overview + - Key access permissions + - SELinux support + - New procfs files + - Userspace system call interface + - Kernel services + - Notes on accessing payload contents + - Defining a key type + - Request-key callback service + - Garbage collection + + +============ +KEY OVERVIEW +============ + +In this context, keys represent units of cryptographic data, authentication +tokens, keyrings, etc.. These are represented in the kernel by struct key. + +Each key has a number of attributes: + + - A serial number. + - A type. + - A description (for matching a key in a search). + - Access control information. + - An expiry time. + - A payload. + - State. + + + (*) Each key is issued a serial number of type key_serial_t that is unique for + the lifetime of that key. All serial numbers are positive non-zero 32-bit + integers. + + Userspace programs can use a key's serial numbers as a way to gain access + to it, subject to permission checking. + + (*) Each key is of a defined "type". Types must be registered inside the + kernel by a kernel service (such as a filesystem) before keys of that type + can be added or used. Userspace programs cannot define new types directly. + + Key types are represented in the kernel by struct key_type. This defines a + number of operations that can be performed on a key of that type. + + Should a type be removed from the system, all the keys of that type will + be invalidated. + + (*) Each key has a description. This should be a printable string. The key + type provides an operation to perform a match between the description on a + key and a criterion string. + + (*) Each key has an owner user ID, a group ID and a permissions mask. These + are used to control what a process may do to a key from userspace, and + whether a kernel service will be able to find the key. + + (*) Each key can be set to expire at a specific time by the key type's + instantiation function. Keys can also be immortal. + + (*) Each key can have a payload. This is a quantity of data that represent the + actual "key". In the case of a keyring, this is a list of keys to which + the keyring links; in the case of a user-defined key, it's an arbitrary + blob of data. + + Having a payload is not required; and the payload can, in fact, just be a + value stored in the struct key itself. + + When a key is instantiated, the key type's instantiation function is + called with a blob of data, and that then creates the key's payload in + some way. + + Similarly, when userspace wants to read back the contents of the key, if + permitted, another key type operation will be called to convert the key's + attached payload back into a blob of data. + + (*) Each key can be in one of a number of basic states: + + (*) Uninstantiated. The key exists, but does not have any data attached. + Keys being requested from userspace will be in this state. + + (*) Instantiated. This is the normal state. The key is fully formed, and + has data attached. + + (*) Negative. This is a relatively short-lived state. The key acts as a + note saying that a previous call out to userspace failed, and acts as + a throttle on key lookups. A negative key can be updated to a normal + state. + + (*) Expired. Keys can have lifetimes set. If their lifetime is exceeded, + they traverse to this state. An expired key can be updated back to a + normal state. + + (*) Revoked. A key is put in this state by userspace action. It can't be + found or operated upon (apart from by unlinking it). + + (*) Dead. The key's type was unregistered, and so the key is now useless. + +Keys in the last three states are subject to garbage collection. See the +section on "Garbage collection". + + +==================== +KEY SERVICE OVERVIEW +==================== + +The key service provides a number of features besides keys: + + (*) The key service defines two special key types: + + (+) "keyring" + + Keyrings are special keys that contain a list of other keys. Keyring + lists can be modified using various system calls. Keyrings should not + be given a payload when created. + + (+) "user" + + A key of this type has a description and a payload that are arbitrary + blobs of data. These can be created, updated and read by userspace, + and aren't intended for use by kernel services. + + (*) Each process subscribes to three keyrings: a thread-specific keyring, a + process-specific keyring, and a session-specific keyring. + + The thread-specific keyring is discarded from the child when any sort of + clone, fork, vfork or execve occurs. A new keyring is created only when + required. + + The process-specific keyring is replaced with an empty one in the child on + clone, fork, vfork unless CLONE_THREAD is supplied, in which case it is + shared. execve also discards the process's process keyring and creates a + new one. + + The session-specific keyring is persistent across clone, fork, vfork and + execve, even when the latter executes a set-UID or set-GID binary. A + process can, however, replace its current session keyring with a new one + by using PR_JOIN_SESSION_KEYRING. It is permitted to request an anonymous + new one, or to attempt to create or join one of a specific name. + + The ownership of the thread keyring changes when the real UID and GID of + the thread changes. + + (*) Each user ID resident in the system holds two special keyrings: a user + specific keyring and a default user session keyring. The default session + keyring is initialised with a link to the user-specific keyring. + + When a process changes its real UID, if it used to have no session key, it + will be subscribed to the default session key for the new UID. + + If a process attempts to access its session key when it doesn't have one, + it will be subscribed to the default for its current UID. + + (*) Each user has two quotas against which the keys they own are tracked. One + limits the total number of keys and keyrings, the other limits the total + amount of description and payload space that can be consumed. + + The user can view information on this and other statistics through procfs + files. The root user may also alter the quota limits through sysctl files + (see the section "New procfs files"). + + Process-specific and thread-specific keyrings are not counted towards a + user's quota. + + If a system call that modifies a key or keyring in some way would put the + user over quota, the operation is refused and error EDQUOT is returned. + + (*) There's a system call interface by which userspace programs can create and + manipulate keys and keyrings. + + (*) There's a kernel interface by which services can register types and search + for keys. + + (*) There's a way for the a search done from the kernel to call back to + userspace to request a key that can't be found in a process's keyrings. + + (*) An optional filesystem is available through which the key database can be + viewed and manipulated. + + +====================== +KEY ACCESS PERMISSIONS +====================== + +Keys have an owner user ID, a group access ID, and a permissions mask. The mask +has up to eight bits each for possessor, user, group and other access. Only +six of each set of eight bits are defined. These permissions granted are: + + (*) View + + This permits a key or keyring's attributes to be viewed - including key + type and description. + + (*) Read + + This permits a key's payload to be viewed or a keyring's list of linked + keys. + + (*) Write + + This permits a key's payload to be instantiated or updated, or it allows a + link to be added to or removed from a keyring. + + (*) Search + + This permits keyrings to be searched and keys to be found. Searches can + only recurse into nested keyrings that have search permission set. + + (*) Link + + This permits a key or keyring to be linked to. To create a link from a + keyring to a key, a process must have Write permission on the keyring and + Link permission on the key. + + (*) Set Attribute + + This permits a key's UID, GID and permissions mask to be changed. + +For changing the ownership, group ID or permissions mask, being the owner of +the key or having the sysadmin capability is sufficient. + + +=============== +SELINUX SUPPORT +=============== + +The security class "key" has been added to SELinux so that mandatory access +controls can be applied to keys created within various contexts. This support +is preliminary, and is likely to change quite significantly in the near future. +Currently, all of the basic permissions explained above are provided in SELinux +as well; SELinux is simply invoked after all basic permission checks have been +performed. + +The value of the file /proc/self/attr/keycreate influences the labeling of +newly-created keys. If the contents of that file correspond to an SELinux +security context, then the key will be assigned that context. Otherwise, the +key will be assigned the current context of the task that invoked the key +creation request. Tasks must be granted explicit permission to assign a +particular context to newly-created keys, using the "create" permission in the +key security class. + +The default keyrings associated with users will be labeled with the default +context of the user if and only if the login programs have been instrumented to +properly initialize keycreate during the login process. Otherwise, they will +be labeled with the context of the login program itself. + +Note, however, that the default keyrings associated with the root user are +labeled with the default kernel context, since they are created early in the +boot process, before root has a chance to log in. + +The keyrings associated with new threads are each labeled with the context of +their associated thread, and both session and process keyrings are handled +similarly. + + +================ +NEW PROCFS FILES +================ + +Two files have been added to procfs by which an administrator can find out +about the status of the key service: + + (*) /proc/keys + + This lists the keys that are currently viewable by the task reading the + file, giving information about their type, description and permissions. + It is not possible to view the payload of the key this way, though some + information about it may be given. + + The only keys included in the list are those that grant View permission to + the reading process whether or not it possesses them. Note that LSM + security checks are still performed, and may further filter out keys that + the current process is not authorised to view. + + The contents of the file look like this: + + SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY + 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 + 00000002 I----- 2 perm 1f3f0000 0 0 keyring _uid.0: empty + 00000007 I----- 1 perm 1f3f0000 0 0 keyring _pid.1: empty + 0000018d I----- 1 perm 1f3f0000 0 0 keyring _pid.412: empty + 000004d2 I--Q-- 1 perm 1f3f0000 32 -1 keyring _uid.32: 1/4 + 000004d3 I--Q-- 3 perm 1f3f0000 32 -1 keyring _uid_ses.32: empty + 00000892 I--QU- 1 perm 1f000000 0 0 user metal:copper: 0 + 00000893 I--Q-N 1 35s 1f3f0000 0 0 user metal:silver: 0 + 00000894 I--Q-- 1 10h 003f0000 0 0 user metal:gold: 0 + + The flags are: + + I Instantiated + R Revoked + D Dead + Q Contributes to user's quota + U Under construction by callback to userspace + N Negative key + + This file must be enabled at kernel configuration time as it allows anyone + to list the keys database. + + (*) /proc/key-users + + This file lists the tracking data for each user that has at least one key + on the system. Such data includes quota information and statistics: + + [root@andromeda root]# cat /proc/key-users + 0: 46 45/45 1/100 13/10000 + 29: 2 2/2 2/100 40/10000 + 32: 2 2/2 2/100 40/10000 + 38: 2 2/2 2/100 40/10000 + + The format of each line is + : User ID to which this applies + Structure refcount + / Total number of keys and number instantiated + / Key count quota + / Key size quota + + +Four new sysctl files have been added also for the purpose of controlling the +quota limits on keys: + + (*) /proc/sys/kernel/keys/root_maxkeys + /proc/sys/kernel/keys/root_maxbytes + + These files hold the maximum number of keys that root may have and the + maximum total number of bytes of data that root may have stored in those + keys. + + (*) /proc/sys/kernel/keys/maxkeys + /proc/sys/kernel/keys/maxbytes + + These files hold the maximum number of keys that each non-root user may + have and the maximum total number of bytes of data that each of those + users may have stored in their keys. + +Root may alter these by writing each new limit as a decimal number string to +the appropriate file. + + +=============================== +USERSPACE SYSTEM CALL INTERFACE +=============================== + +Userspace can manipulate keys directly through three new syscalls: add_key, +request_key and keyctl. The latter provides a number of functions for +manipulating keys. + +When referring to a key directly, userspace programs should use the key's +serial number (a positive 32-bit integer). However, there are some special +values available for referring to special keys and keyrings that relate to the +process making the call: + + CONSTANT VALUE KEY REFERENCED + ============================== ====== =========================== + KEY_SPEC_THREAD_KEYRING -1 thread-specific keyring + KEY_SPEC_PROCESS_KEYRING -2 process-specific keyring + KEY_SPEC_SESSION_KEYRING -3 session-specific keyring + KEY_SPEC_USER_KEYRING -4 UID-specific keyring + KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring + KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring + KEY_SPEC_REQKEY_AUTH_KEY -7 assumed request_key() + authorisation key + + +The main syscalls are: + + (*) Create a new key of given type, description and payload and add it to the + nominated keyring: + + key_serial_t add_key(const char *type, const char *desc, + const void *payload, size_t plen, + key_serial_t keyring); + + If a key of the same type and description as that proposed already exists + in the keyring, this will try to update it with the given payload, or it + will return error EEXIST if that function is not supported by the key + type. The process must also have permission to write to the key to be able + to update it. The new key will have all user permissions granted and no + group or third party permissions. + + Otherwise, this will attempt to create a new key of the specified type and + description, and to instantiate it with the supplied payload and attach it + to the keyring. In this case, an error will be generated if the process + does not have permission to write to the keyring. + + The payload is optional, and the pointer can be NULL if not required by + the type. The payload is plen in size, and plen can be zero for an empty + payload. + + A new keyring can be generated by setting type "keyring", the keyring name + as the description (or NULL) and setting the payload to NULL. + + User defined keys can be created by specifying type "user". It is + recommended that a user defined key's description by prefixed with a type + ID and a colon, such as "krb5tgt:" for a Kerberos 5 ticket granting + ticket. + + Any other type must have been registered with the kernel in advance by a + kernel service such as a filesystem. + + The ID of the new or updated key is returned if successful. + + + (*) Search the process's keyrings for a key, potentially calling out to + userspace to create it. + + key_serial_t request_key(const char *type, const char *description, + const char *callout_info, + key_serial_t dest_keyring); + + This function searches all the process's keyrings in the order thread, + process, session for a matching key. This works very much like + KEYCTL_SEARCH, including the optional attachment of the discovered key to + a keyring. + + If a key cannot be found, and if callout_info is not NULL, then + /sbin/request-key will be invoked in an attempt to obtain a key. The + callout_info string will be passed as an argument to the program. + + See also Documentation/security/keys-request-key.txt. + + +The keyctl syscall functions are: + + (*) Map a special key ID to a real key ID for this process: + + key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id, + int create); + + The special key specified by "id" is looked up (with the key being created + if necessary) and the ID of the key or keyring thus found is returned if + it exists. + + If the key does not yet exist, the key will be created if "create" is + non-zero; and the error ENOKEY will be returned if "create" is zero. + + + (*) Replace the session keyring this process subscribes to with a new one: + + key_serial_t keyctl(KEYCTL_JOIN_SESSION_KEYRING, const char *name); + + If name is NULL, an anonymous keyring is created attached to the process + as its session keyring, displacing the old session keyring. + + If name is not NULL, if a keyring of that name exists, the process + attempts to attach it as the session keyring, returning an error if that + is not permitted; otherwise a new keyring of that name is created and + attached as the session keyring. + + To attach to a named keyring, the keyring must have search permission for + the process's ownership. + + The ID of the new session keyring is returned if successful. + + + (*) Update the specified key: + + long keyctl(KEYCTL_UPDATE, key_serial_t key, const void *payload, + size_t plen); + + This will try to update the specified key with the given payload, or it + will return error EOPNOTSUPP if that function is not supported by the key + type. The process must also have permission to write to the key to be able + to update it. + + The payload is of length plen, and may be absent or empty as for + add_key(). + + + (*) Revoke a key: + + long keyctl(KEYCTL_REVOKE, key_serial_t key); + + This makes a key unavailable for further operations. Further attempts to + use the key will be met with error EKEYREVOKED, and the key will no longer + be findable. + + + (*) Change the ownership of a key: + + long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid); + + This function permits a key's owner and group ID to be changed. Either one + of uid or gid can be set to -1 to suppress that change. + + Only the superuser can change a key's owner to something other than the + key's current owner. Similarly, only the superuser can change a key's + group ID to something other than the calling process's group ID or one of + its group list members. + + + (*) Change the permissions mask on a key: + + long keyctl(KEYCTL_SETPERM, key_serial_t key, key_perm_t perm); + + This function permits the owner of a key or the superuser to change the + permissions mask on a key. + + Only bits the available bits are permitted; if any other bits are set, + error EINVAL will be returned. + + + (*) Describe a key: + + long keyctl(KEYCTL_DESCRIBE, key_serial_t key, char *buffer, + size_t buflen); + + This function returns a summary of the key's attributes (but not its + payload data) as a string in the buffer provided. + + Unless there's an error, it always returns the amount of data it could + produce, even if that's too big for the buffer, but it won't copy more + than requested to userspace. If the buffer pointer is NULL then no copy + will take place. + + A process must have view permission on the key for this function to be + successful. + + If successful, a string is placed in the buffer in the following format: + + ;;;; + + Where type and description are strings, uid and gid are decimal, and perm + is hexadecimal. A NUL character is included at the end of the string if + the buffer is sufficiently big. + + This can be parsed with + + sscanf(buffer, "%[^;];%d;%d;%o;%s", type, &uid, &gid, &mode, desc); + + + (*) Clear out a keyring: + + long keyctl(KEYCTL_CLEAR, key_serial_t keyring); + + This function clears the list of keys attached to a keyring. The calling + process must have write permission on the keyring, and it must be a + keyring (or else error ENOTDIR will result). + + + (*) Link a key into a keyring: + + long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key); + + This function creates a link from the keyring to the key. The process must + have write permission on the keyring and must have link permission on the + key. + + Should the keyring not be a keyring, error ENOTDIR will result; and if the + keyring is full, error ENFILE will result. + + The link procedure checks the nesting of the keyrings, returning ELOOP if + it appears too deep or EDEADLK if the link would introduce a cycle. + + Any links within the keyring to keys that match the new key in terms of + type and description will be discarded from the keyring as the new one is + added. + + + (*) Unlink a key or keyring from another keyring: + + long keyctl(KEYCTL_UNLINK, key_serial_t keyring, key_serial_t key); + + This function looks through the keyring for the first link to the + specified key, and removes it if found. Subsequent links to that key are + ignored. The process must have write permission on the keyring. + + If the keyring is not a keyring, error ENOTDIR will result; and if the key + is not present, error ENOENT will be the result. + + + (*) Search a keyring tree for a key: + + key_serial_t keyctl(KEYCTL_SEARCH, key_serial_t keyring, + const char *type, const char *description, + key_serial_t dest_keyring); + + This searches the keyring tree headed by the specified keyring until a key + is found that matches the type and description criteria. Each keyring is + checked for keys before recursion into its children occurs. + + The process must have search permission on the top level keyring, or else + error EACCES will result. Only keyrings that the process has search + permission on will be recursed into, and only keys and keyrings for which + a process has search permission can be matched. If the specified keyring + is not a keyring, ENOTDIR will result. + + If the search succeeds, the function will attempt to link the found key + into the destination keyring if one is supplied (non-zero ID). All the + constraints applicable to KEYCTL_LINK apply in this case too. + + Error ENOKEY, EKEYREVOKED or EKEYEXPIRED will be returned if the search + fails. On success, the resulting key ID will be returned. + + + (*) Read the payload data from a key: + + long keyctl(KEYCTL_READ, key_serial_t keyring, char *buffer, + size_t buflen); + + This function attempts to read the payload data from the specified key + into the buffer. The process must have read permission on the key to + succeed. + + The returned data will be processed for presentation by the key type. For + instance, a keyring will return an array of key_serial_t entries + representing the IDs of all the keys to which it is subscribed. The user + defined key type will return its data as is. If a key type does not + implement this function, error EOPNOTSUPP will result. + + As much of the data as can be fitted into the buffer will be copied to + userspace if the buffer pointer is not NULL. + + On a successful return, the function will always return the amount of data + available rather than the amount copied. + + + (*) Instantiate a partially constructed key. + + long keyctl(KEYCTL_INSTANTIATE, key_serial_t key, + const void *payload, size_t plen, + key_serial_t keyring); + long keyctl(KEYCTL_INSTANTIATE_IOV, key_serial_t key, + const struct iovec *payload_iov, unsigned ioc, + key_serial_t keyring); + + If the kernel calls back to userspace to complete the instantiation of a + key, userspace should use this call to supply data for the key before the + invoked process returns, or else the key will be marked negative + automatically. + + The process must have write access on the key to be able to instantiate + it, and the key must be uninstantiated. + + If a keyring is specified (non-zero), the key will also be linked into + that keyring, however all the constraints applying in KEYCTL_LINK apply in + this case too. + + The payload and plen arguments describe the payload data as for add_key(). + + The payload_iov and ioc arguments describe the payload data in an iovec + array instead of a single buffer. + + + (*) Negatively instantiate a partially constructed key. + + long keyctl(KEYCTL_NEGATE, key_serial_t key, + unsigned timeout, key_serial_t keyring); + long keyctl(KEYCTL_REJECT, key_serial_t key, + unsigned timeout, unsigned error, key_serial_t keyring); + + If the kernel calls back to userspace to complete the instantiation of a + key, userspace should use this call mark the key as negative before the + invoked process returns if it is unable to fulfil the request. + + The process must have write access on the key to be able to instantiate + it, and the key must be uninstantiated. + + If a keyring is specified (non-zero), the key will also be linked into + that keyring, however all the constraints applying in KEYCTL_LINK apply in + this case too. + + If the key is rejected, future searches for it will return the specified + error code until the rejected key expires. Negating the key is the same + as rejecting the key with ENOKEY as the error code. + + + (*) Set the default request-key destination keyring. + + long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl); + + This sets the default keyring to which implicitly requested keys will be + attached for this thread. reqkey_defl should be one of these constants: + + CONSTANT VALUE NEW DEFAULT KEYRING + ====================================== ====== ======================= + KEY_REQKEY_DEFL_NO_CHANGE -1 No change + KEY_REQKEY_DEFL_DEFAULT 0 Default[1] + KEY_REQKEY_DEFL_THREAD_KEYRING 1 Thread keyring + KEY_REQKEY_DEFL_PROCESS_KEYRING 2 Process keyring + KEY_REQKEY_DEFL_SESSION_KEYRING 3 Session keyring + KEY_REQKEY_DEFL_USER_KEYRING 4 User keyring + KEY_REQKEY_DEFL_USER_SESSION_KEYRING 5 User session keyring + KEY_REQKEY_DEFL_GROUP_KEYRING 6 Group keyring + + The old default will be returned if successful and error EINVAL will be + returned if reqkey_defl is not one of the above values. + + The default keyring can be overridden by the keyring indicated to the + request_key() system call. + + Note that this setting is inherited across fork/exec. + + [1] The default is: the thread keyring if there is one, otherwise + the process keyring if there is one, otherwise the session keyring if + there is one, otherwise the user default session keyring. + + + (*) Set the timeout on a key. + + long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout); + + This sets or clears the timeout on a key. The timeout can be 0 to clear + the timeout or a number of seconds to set the expiry time that far into + the future. + + The process must have attribute modification access on a key to set its + timeout. Timeouts may not be set with this function on negative, revoked + or expired keys. + + + (*) Assume the authority granted to instantiate a key + + long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key); + + This assumes or divests the authority required to instantiate the + specified key. Authority can only be assumed if the thread has the + authorisation key associated with the specified key in its keyrings + somewhere. + + Once authority is assumed, searches for keys will also search the + requester's keyrings using the requester's security label, UID, GID and + groups. + + If the requested authority is unavailable, error EPERM will be returned, + likewise if the authority has been revoked because the target key is + already instantiated. + + If the specified key is 0, then any assumed authority will be divested. + + The assumed authoritative key is inherited across fork and exec. + + + (*) Get the LSM security context attached to a key. + + long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer, + size_t buflen) + + This function returns a string that represents the LSM security context + attached to a key in the buffer provided. + + Unless there's an error, it always returns the amount of data it could + produce, even if that's too big for the buffer, but it won't copy more + than requested to userspace. If the buffer pointer is NULL then no copy + will take place. + + A NUL character is included at the end of the string if the buffer is + sufficiently big. This is included in the returned count. If no LSM is + in force then an empty string will be returned. + + A process must have view permission on the key for this function to be + successful. + + + (*) Install the calling process's session keyring on its parent. + + long keyctl(KEYCTL_SESSION_TO_PARENT); + + This functions attempts to install the calling process's session keyring + on to the calling process's parent, replacing the parent's current session + keyring. + + The calling process must have the same ownership as its parent, the + keyring must have the same ownership as the calling process, the calling + process must have LINK permission on the keyring and the active LSM module + mustn't deny permission, otherwise error EPERM will be returned. + + Error ENOMEM will be returned if there was insufficient memory to complete + the operation, otherwise 0 will be returned to indicate success. + + The keyring will be replaced next time the parent process leaves the + kernel and resumes executing userspace. + + +=============== +KERNEL SERVICES +=============== + +The kernel services for key management are fairly simple to deal with. They can +be broken down into two areas: keys and key types. + +Dealing with keys is fairly straightforward. Firstly, the kernel service +registers its type, then it searches for a key of that type. It should retain +the key as long as it has need of it, and then it should release it. For a +filesystem or device file, a search would probably be performed during the open +call, and the key released upon close. How to deal with conflicting keys due to +two different users opening the same file is left to the filesystem author to +solve. + +To access the key manager, the following header must be #included: + + + +Specific key types should have a header file under include/keys/ that should be +used to access that type. For keys of type "user", for example, that would be: + + + +Note that there are two different types of pointers to keys that may be +encountered: + + (*) struct key * + + This simply points to the key structure itself. Key structures will be at + least four-byte aligned. + + (*) key_ref_t + + This is equivalent to a struct key *, but the least significant bit is set + if the caller "possesses" the key. By "possession" it is meant that the + calling processes has a searchable link to the key from one of its + keyrings. There are three functions for dealing with these: + + key_ref_t make_key_ref(const struct key *key, + unsigned long possession); + + struct key *key_ref_to_ptr(const key_ref_t key_ref); + + unsigned long is_key_possessed(const key_ref_t key_ref); + + The first function constructs a key reference from a key pointer and + possession information (which must be 0 or 1 and not any other value). + + The second function retrieves the key pointer from a reference and the + third retrieves the possession flag. + +When accessing a key's payload contents, certain precautions must be taken to +prevent access vs modification races. See the section "Notes on accessing +payload contents" for more information. + +(*) To search for a key, call: + + struct key *request_key(const struct key_type *type, + const char *description, + const char *callout_info); + + This is used to request a key or keyring with a description that matches + the description specified according to the key type's match function. This + permits approximate matching to occur. If callout_string is not NULL, then + /sbin/request-key will be invoked in an attempt to obtain the key from + userspace. In that case, callout_string will be passed as an argument to + the program. + + Should the function fail error ENOKEY, EKEYEXPIRED or EKEYREVOKED will be + returned. + + If successful, the key will have been attached to the default keyring for + implicitly obtained request-key keys, as set by KEYCTL_SET_REQKEY_KEYRING. + + See also Documentation/security/keys-request-key.txt. + + +(*) To search for a key, passing auxiliary data to the upcaller, call: + + struct key *request_key_with_auxdata(const struct key_type *type, + const char *description, + const void *callout_info, + size_t callout_len, + void *aux); + + This is identical to request_key(), except that the auxiliary data is + passed to the key_type->request_key() op if it exists, and the callout_info + is a blob of length callout_len, if given (the length may be 0). + + +(*) A key can be requested asynchronously by calling one of: + + struct key *request_key_async(const struct key_type *type, + const char *description, + const void *callout_info, + size_t callout_len); + + or: + + struct key *request_key_async_with_auxdata(const struct key_type *type, + const char *description, + const char *callout_info, + size_t callout_len, + void *aux); + + which are asynchronous equivalents of request_key() and + request_key_with_auxdata() respectively. + + These two functions return with the key potentially still under + construction. To wait for construction completion, the following should be + called: + + int wait_for_key_construction(struct key *key, bool intr); + + The function will wait for the key to finish being constructed and then + invokes key_validate() to return an appropriate value to indicate the state + of the key (0 indicates the key is usable). + + If intr is true, then the wait can be interrupted by a signal, in which + case error ERESTARTSYS will be returned. + + +(*) When it is no longer required, the key should be released using: + + void key_put(struct key *key); + + Or: + + void key_ref_put(key_ref_t key_ref); + + These can be called from interrupt context. If CONFIG_KEYS is not set then + the argument will not be parsed. + + +(*) Extra references can be made to a key by calling the following function: + + struct key *key_get(struct key *key); + + These need to be disposed of by calling key_put() when they've been + finished with. The key pointer passed in will be returned. If the pointer + is NULL or CONFIG_KEYS is not set then the key will not be dereferenced and + no increment will take place. + + +(*) A key's serial number can be obtained by calling: + + key_serial_t key_serial(struct key *key); + + If key is NULL or if CONFIG_KEYS is not set then 0 will be returned (in the + latter case without parsing the argument). + + +(*) If a keyring was found in the search, this can be further searched by: + + key_ref_t keyring_search(key_ref_t keyring_ref, + const struct key_type *type, + const char *description) + + This searches the keyring tree specified for a matching key. Error ENOKEY + is returned upon failure (use IS_ERR/PTR_ERR to determine). If successful, + the returned key will need to be released. + + The possession attribute from the keyring reference is used to control + access through the permissions mask and is propagated to the returned key + reference pointer if successful. + + +(*) To check the validity of a key, this function can be called: + + int validate_key(struct key *key); + + This checks that the key in question hasn't expired or and hasn't been + revoked. Should the key be invalid, error EKEYEXPIRED or EKEYREVOKED will + be returned. If the key is NULL or if CONFIG_KEYS is not set then 0 will be + returned (in the latter case without parsing the argument). + + +(*) To register a key type, the following function should be called: + + int register_key_type(struct key_type *type); + + This will return error EEXIST if a type of the same name is already + present. + + +(*) To unregister a key type, call: + + void unregister_key_type(struct key_type *type); + + +Under some circumstances, it may be desirable to deal with a bundle of keys. +The facility provides access to the keyring type for managing such a bundle: + + struct key_type key_type_keyring; + +This can be used with a function such as request_key() to find a specific +keyring in a process's keyrings. A keyring thus found can then be searched +with keyring_search(). Note that it is not possible to use request_key() to +search a specific keyring, so using keyrings in this way is of limited utility. + + +=================================== +NOTES ON ACCESSING PAYLOAD CONTENTS +=================================== + +The simplest payload is just a number in key->payload.value. In this case, +there's no need to indulge in RCU or locking when accessing the payload. + +More complex payload contents must be allocated and a pointer to them set in +key->payload.data. One of the following ways must be selected to access the +data: + + (1) Unmodifiable key type. + + If the key type does not have a modify method, then the key's payload can + be accessed without any form of locking, provided that it's known to be + instantiated (uninstantiated keys cannot be "found"). + + (2) The key's semaphore. + + The semaphore could be used to govern access to the payload and to control + the payload pointer. It must be write-locked for modifications and would + have to be read-locked for general access. The disadvantage of doing this + is that the accessor may be required to sleep. + + (3) RCU. + + RCU must be used when the semaphore isn't already held; if the semaphore + is held then the contents can't change under you unexpectedly as the + semaphore must still be used to serialise modifications to the key. The + key management code takes care of this for the key type. + + However, this means using: + + rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock() + + to read the pointer, and: + + rcu_dereference() ... rcu_assign_pointer() ... call_rcu() + + to set the pointer and dispose of the old contents after a grace period. + Note that only the key type should ever modify a key's payload. + + Furthermore, an RCU controlled payload must hold a struct rcu_head for the + use of call_rcu() and, if the payload is of variable size, the length of + the payload. key->datalen cannot be relied upon to be consistent with the + payload just dereferenced if the key's semaphore is not held. + + +=================== +DEFINING A KEY TYPE +=================== + +A kernel service may want to define its own key type. For instance, an AFS +filesystem might want to define a Kerberos 5 ticket key type. To do this, it +author fills in a key_type struct and registers it with the system. + +Source files that implement key types should include the following header file: + + + +The structure has a number of fields, some of which are mandatory: + + (*) const char *name + + The name of the key type. This is used to translate a key type name + supplied by userspace into a pointer to the structure. + + + (*) size_t def_datalen + + This is optional - it supplies the default payload data length as + contributed to the quota. If the key type's payload is always or almost + always the same size, then this is a more efficient way to do things. + + The data length (and quota) on a particular key can always be changed + during instantiation or update by calling: + + int key_payload_reserve(struct key *key, size_t datalen); + + With the revised data length. Error EDQUOT will be returned if this is not + viable. + + + (*) int (*vet_description)(const char *description); + + This optional method is called to vet a key description. If the key type + doesn't approve of the key description, it may return an error, otherwise + it should return 0. + + + (*) int (*instantiate)(struct key *key, const void *data, size_t datalen); + + This method is called to attach a payload to a key during construction. + The payload attached need not bear any relation to the data passed to this + function. + + If the amount of data attached to the key differs from the size in + keytype->def_datalen, then key_payload_reserve() should be called. + + This method does not have to lock the key in order to attach a payload. + The fact that KEY_FLAG_INSTANTIATED is not set in key->flags prevents + anything else from gaining access to the key. + + It is safe to sleep in this method. + + + (*) int (*update)(struct key *key, const void *data, size_t datalen); + + If this type of key can be updated, then this method should be provided. + It is called to update a key's payload from the blob of data provided. + + key_payload_reserve() should be called if the data length might change + before any changes are actually made. Note that if this succeeds, the type + is committed to changing the key because it's already been altered, so all + memory allocation must be done first. + + The key will have its semaphore write-locked before this method is called, + but this only deters other writers; any changes to the key's payload must + be made under RCU conditions, and call_rcu() must be used to dispose of + the old payload. + + key_payload_reserve() should be called before the changes are made, but + after all allocations and other potentially failing function calls are + made. + + It is safe to sleep in this method. + + + (*) int (*match)(const struct key *key, const void *desc); + + This method is called to match a key against a description. It should + return non-zero if the two match, zero if they don't. + + This method should not need to lock the key in any way. The type and + description can be considered invariant, and the payload should not be + accessed (the key may not yet be instantiated). + + It is not safe to sleep in this method; the caller may hold spinlocks. + + + (*) void (*revoke)(struct key *key); + + This method is optional. It is called to discard part of the payload + data upon a key being revoked. The caller will have the key semaphore + write-locked. + + It is safe to sleep in this method, though care should be taken to avoid + a deadlock against the key semaphore. + + + (*) void (*destroy)(struct key *key); + + This method is optional. It is called to discard the payload data on a key + when it is being destroyed. + + This method does not need to lock the key to access the payload; it can + consider the key as being inaccessible at this time. Note that the key's + type may have been changed before this function is called. + + It is not safe to sleep in this method; the caller may hold spinlocks. + + + (*) void (*describe)(const struct key *key, struct seq_file *p); + + This method is optional. It is called during /proc/keys reading to + summarise a key's description and payload in text form. + + This method will be called with the RCU read lock held. rcu_dereference() + should be used to read the payload pointer if the payload is to be + accessed. key->datalen cannot be trusted to stay consistent with the + contents of the payload. + + The description will not change, though the key's state may. + + It is not safe to sleep in this method; the RCU read lock is held by the + caller. + + + (*) long (*read)(const struct key *key, char __user *buffer, size_t buflen); + + This method is optional. It is called by KEYCTL_READ to translate the + key's payload into something a blob of data for userspace to deal with. + Ideally, the blob should be in the same format as that passed in to the + instantiate and update methods. + + If successful, the blob size that could be produced should be returned + rather than the size copied. + + This method will be called with the key's semaphore read-locked. This will + prevent the key's payload changing. It is not necessary to use RCU locking + when accessing the key's payload. It is safe to sleep in this method, such + as might happen when the userspace buffer is accessed. + + + (*) int (*request_key)(struct key_construction *cons, const char *op, + void *aux); + + This method is optional. If provided, request_key() and friends will + invoke this function rather than upcalling to /sbin/request-key to operate + upon a key of this type. + + The aux parameter is as passed to request_key_async_with_auxdata() and + similar or is NULL otherwise. Also passed are the construction record for + the key to be operated upon and the operation type (currently only + "create"). + + This method is permitted to return before the upcall is complete, but the + following function must be called under all circumstances to complete the + instantiation process, whether or not it succeeds, whether or not there's + an error: + + void complete_request_key(struct key_construction *cons, int error); + + The error parameter should be 0 on success, -ve on error. The + construction record is destroyed by this action and the authorisation key + will be revoked. If an error is indicated, the key under construction + will be negatively instantiated if it wasn't already instantiated. + + If this method returns an error, that error will be returned to the + caller of request_key*(). complete_request_key() must be called prior to + returning. + + The key under construction and the authorisation key can be found in the + key_construction struct pointed to by cons: + + (*) struct key *key; + + The key under construction. + + (*) struct key *authkey; + + The authorisation key. + + +============================ +REQUEST-KEY CALLBACK SERVICE +============================ + +To create a new key, the kernel will attempt to execute the following command +line: + + /sbin/request-key create \ + + + is the key being constructed, and the three keyrings are the process +keyrings from the process that caused the search to be issued. These are +included for two reasons: + + (1) There may be an authentication token in one of the keyrings that is + required to obtain the key, eg: a Kerberos Ticket-Granting Ticket. + + (2) The new key should probably be cached in one of these rings. + +This program should set it UID and GID to those specified before attempting to +access any more keys. It may then look around for a user specific process to +hand the request off to (perhaps a path held in placed in another key by, for +example, the KDE desktop manager). + +The program (or whatever it calls) should finish construction of the key by +calling KEYCTL_INSTANTIATE or KEYCTL_INSTANTIATE_IOV, which also permits it to +cache the key in one of the keyrings (probably the session ring) before +returning. Alternatively, the key can be marked as negative with KEYCTL_NEGATE +or KEYCTL_REJECT; this also permits the key to be cached in one of the +keyrings. + +If it returns with the key remaining in the unconstructed state, the key will +be marked as being negative, it will be added to the session keyring, and an +error will be returned to the key requestor. + +Supplementary information may be provided from whoever or whatever invoked this +service. This will be passed as the parameter. If no such +information was made available, then "-" will be passed as this parameter +instead. + + +Similarly, the kernel may attempt to update an expired or a soon to expire key +by executing: + + /sbin/request-key update \ + + +In this case, the program isn't required to actually attach the key to a ring; +the rings are provided for reference. + + +================== +GARBAGE COLLECTION +================== + +Dead keys (for which the type has been removed) will be automatically unlinked +from those keyrings that point to them and deleted as soon as possible by a +background garbage collector. + +Similarly, revoked and expired keys will be garbage collected, but only after a +certain amount of time has passed. This time is set as a number of seconds in: + + /proc/sys/kernel/keys/gc_delay diff --git a/Documentation/security/tomoyo.txt b/Documentation/security/tomoyo.txt new file mode 100644 index 000000000000..200a2d37cbc8 --- /dev/null +++ b/Documentation/security/tomoyo.txt @@ -0,0 +1,55 @@ +--- What is TOMOYO? --- + +TOMOYO is a name-based MAC extension (LSM module) for the Linux kernel. + +LiveCD-based tutorials are available at +http://tomoyo.sourceforge.jp/1.7/1st-step/ubuntu10.04-live/ +http://tomoyo.sourceforge.jp/1.7/1st-step/centos5-live/ . +Though these tutorials use non-LSM version of TOMOYO, they are useful for you +to know what TOMOYO is. + +--- How to enable TOMOYO? --- + +Build the kernel with CONFIG_SECURITY_TOMOYO=y and pass "security=tomoyo" on +kernel's command line. + +Please see http://tomoyo.sourceforge.jp/2.3/ for details. + +--- Where is documentation? --- + +User <-> Kernel interface documentation is available at +http://tomoyo.sourceforge.jp/2.3/policy-reference.html . + +Materials we prepared for seminars and symposiums are available at +http://sourceforge.jp/projects/tomoyo/docs/?category_id=532&language_id=1 . +Below lists are chosen from three aspects. + +What is TOMOYO? + TOMOYO Linux Overview + http://sourceforge.jp/projects/tomoyo/docs/lca2009-takeda.pdf + TOMOYO Linux: pragmatic and manageable security for Linux + http://sourceforge.jp/projects/tomoyo/docs/freedomhectaipei-tomoyo.pdf + TOMOYO Linux: A Practical Method to Understand and Protect Your Own Linux Box + http://sourceforge.jp/projects/tomoyo/docs/PacSec2007-en-no-demo.pdf + +What can TOMOYO do? + Deep inside TOMOYO Linux + http://sourceforge.jp/projects/tomoyo/docs/lca2009-kumaneko.pdf + The role of "pathname based access control" in security. + http://sourceforge.jp/projects/tomoyo/docs/lfj2008-bof.pdf + +History of TOMOYO? + Realities of Mainlining + http://sourceforge.jp/projects/tomoyo/docs/lfj2008.pdf + +--- What is future plan? --- + +We believe that inode based security and name based security are complementary +and both should be used together. But unfortunately, so far, we cannot enable +multiple LSM modules at the same time. We feel sorry that you have to give up +SELinux/SMACK/AppArmor etc. when you want to use TOMOYO. + +We hope that LSM becomes stackable in future. Meanwhile, you can use non-LSM +version of TOMOYO, available at http://tomoyo.sourceforge.jp/1.7/ . +LSM version of TOMOYO is a subset of non-LSM version of TOMOYO. We are planning +to port non-LSM version's functionalities to LSM versions. diff --git a/Documentation/tomoyo.txt b/Documentation/tomoyo.txt deleted file mode 100644 index 200a2d37cbc8..000000000000 --- a/Documentation/tomoyo.txt +++ /dev/null @@ -1,55 +0,0 @@ ---- What is TOMOYO? --- - -TOMOYO is a name-based MAC extension (LSM module) for the Linux kernel. - -LiveCD-based tutorials are available at -http://tomoyo.sourceforge.jp/1.7/1st-step/ubuntu10.04-live/ -http://tomoyo.sourceforge.jp/1.7/1st-step/centos5-live/ . -Though these tutorials use non-LSM version of TOMOYO, they are useful for you -to know what TOMOYO is. - ---- How to enable TOMOYO? --- - -Build the kernel with CONFIG_SECURITY_TOMOYO=y and pass "security=tomoyo" on -kernel's command line. - -Please see http://tomoyo.sourceforge.jp/2.3/ for details. - ---- Where is documentation? --- - -User <-> Kernel interface documentation is available at -http://tomoyo.sourceforge.jp/2.3/policy-reference.html . - -Materials we prepared for seminars and symposiums are available at -http://sourceforge.jp/projects/tomoyo/docs/?category_id=532&language_id=1 . -Below lists are chosen from three aspects. - -What is TOMOYO? - TOMOYO Linux Overview - http://sourceforge.jp/projects/tomoyo/docs/lca2009-takeda.pdf - TOMOYO Linux: pragmatic and manageable security for Linux - http://sourceforge.jp/projects/tomoyo/docs/freedomhectaipei-tomoyo.pdf - TOMOYO Linux: A Practical Method to Understand and Protect Your Own Linux Box - http://sourceforge.jp/projects/tomoyo/docs/PacSec2007-en-no-demo.pdf - -What can TOMOYO do? - Deep inside TOMOYO Linux - http://sourceforge.jp/projects/tomoyo/docs/lca2009-kumaneko.pdf - The role of "pathname based access control" in security. - http://sourceforge.jp/projects/tomoyo/docs/lfj2008-bof.pdf - -History of TOMOYO? - Realities of Mainlining - http://sourceforge.jp/projects/tomoyo/docs/lfj2008.pdf - ---- What is future plan? --- - -We believe that inode based security and name based security are complementary -and both should be used together. But unfortunately, so far, we cannot enable -multiple LSM modules at the same time. We feel sorry that you have to give up -SELinux/SMACK/AppArmor etc. when you want to use TOMOYO. - -We hope that LSM becomes stackable in future. Meanwhile, you can use non-LSM -version of TOMOYO, available at http://tomoyo.sourceforge.jp/1.7/ . -LSM version of TOMOYO is a subset of non-LSM version of TOMOYO. We are planning -to port non-LSM version's functionalities to LSM versions. -- cgit v1.2.1 From 56e46742e846e4de167dde0e1e1071ace1c882a5 Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy Date: Tue, 17 May 2011 15:15:30 +0300 Subject: UBIFS: switch to dynamic printks Switch to debugging using dynamic printk (pr_debug()). There is no good reason to carry custom debugging prints if there is so cool and powerful generic dynamic printk infrastructure, see Documentation/dynamic-debug-howto.txt. With dynamic printks we can switch on/of individual prints, per-file, per-function and per format messages. This means that instead of doing old-fashioned echo 1 > /sys/module/ubifs/parameters/debug_msgs to enable general messages, we can do: echo 'format "UBIFS DBG gen" +ptlf' > control to enable general messages and additionally ask the dynamic printk infrastructure to print process ID, line number and function name. So there is no reason to keep UBIFS-specific crud if there is more powerful generic thing. Signed-off-by: Artem Bityutskiy --- Documentation/filesystems/ubifs.txt | 25 ++----------------------- 1 file changed, 2 insertions(+), 23 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ubifs.txt b/Documentation/filesystems/ubifs.txt index 7d17e5b91ff4..8e4fab639d9c 100644 --- a/Documentation/filesystems/ubifs.txt +++ b/Documentation/filesystems/ubifs.txt @@ -115,28 +115,8 @@ ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs Module Parameters for Debugging =============================== -When UBIFS has been compiled with debugging enabled, there are 3 module +When UBIFS has been compiled with debugging enabled, there are 2 module parameters that are available to control aspects of testing and debugging. -The parameters are unsigned integers where each bit controls an option. -The parameters are: - -debug_msgs Selects which debug messages to display, as follows: - - Message Type Flag value - - General messages 1 - Journal messages 2 - Mount messages 4 - Commit messages 8 - LEB search messages 16 - Budgeting messages 32 - Garbage collection messages 64 - Tree Node Cache (TNC) messages 128 - LEB properties (lprops) messages 256 - Input/output messages 512 - Log messages 1024 - Scan messages 2048 - Recovery messages 4096 debug_chks Selects extra checks that UBIFS can do while running: @@ -156,8 +136,7 @@ debug_tsts Selects a mode of testing, as follows: Failure mode for recovery testing 4 -For example, set debug_msgs to 5 to display General messages and Mount -messages. +For example, set debug_chks to 3 to enable general and TNC checks. References -- cgit v1.2.1 From ede338f4ce2fb5ee99d18751df32fbd3b10df268 Mon Sep 17 00:00:00 2001 From: Grant Likely Date: Thu, 28 Apr 2011 14:27:23 -0600 Subject: dt: add documentation of ARM dt boot interface v6: typo fixes v5: clarified that dtb should be aligned on a 64 bit boundary in RAM. v3: added details to Documentation/arm/Booting Acked-by: Tony Lindgren Acked-by: Nicolas Pitre Acked-by: Russell King Signed-off-by: Grant Likely --- Documentation/arm/Booting | 33 ++++++++++++++--- Documentation/devicetree/booting-without-of.txt | 48 ++++++++++++++++++++++--- 2 files changed, 73 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/arm/Booting b/Documentation/arm/Booting index 76850295af8f..4e686a2ed91e 100644 --- a/Documentation/arm/Booting +++ b/Documentation/arm/Booting @@ -65,13 +65,19 @@ looks at the connected hardware is beyond the scope of this document. The boot loader must ultimately be able to provide a MACH_TYPE_xxx value to the kernel. (see linux/arch/arm/tools/mach-types). - -4. Setup the kernel tagged list -------------------------------- +4. Setup boot data +------------------ Existing boot loaders: OPTIONAL, HIGHLY RECOMMENDED New boot loaders: MANDATORY +The boot loader must provide either a tagged list or a dtb image for +passing configuration data to the kernel. The physical address of the +boot data is passed to the kernel in register r2. + +4a. Setup the kernel tagged list +-------------------------------- + The boot loader must create and initialise the kernel tagged list. A valid tagged list starts with ATAG_CORE and ends with ATAG_NONE. The ATAG_CORE tag may or may not be empty. An empty ATAG_CORE tag @@ -101,6 +107,24 @@ The tagged list must be placed in a region of memory where neither the kernel decompressor nor initrd 'bootp' program will overwrite it. The recommended placement is in the first 16KiB of RAM. +4b. Setup the device tree +------------------------- + +The boot loader must load a device tree image (dtb) into system ram +at a 64bit aligned address and initialize it with the boot data. The +dtb format is documented in Documentation/devicetree/booting-without-of.txt. +The kernel will look for the dtb magic value of 0xd00dfeed at the dtb +physical address to determine if a dtb has been passed instead of a +tagged list. + +The boot loader must pass at a minimum the size and location of the +system memory, and the root filesystem location. The dtb must be +placed in a region of memory where the kernel decompressor will not +overwrite it. The recommended placement is in the first 16KiB of RAM +with the caveat that it may not be located at physical address 0 since +the kernel interprets a value of 0 in r2 to mean neither a tagged list +nor a dtb were passed. + 5. Calling the kernel image --------------------------- @@ -125,7 +149,8 @@ In either case, the following conditions must be met: - CPU register settings r0 = 0, r1 = machine type number discovered in (3) above. - r2 = physical address of tagged list in system RAM. + r2 = physical address of tagged list in system RAM, or + physical address of device tree block (dtb) in system RAM - CPU mode All forms of interrupts must be disabled (IRQs and FIQs) diff --git a/Documentation/devicetree/booting-without-of.txt b/Documentation/devicetree/booting-without-of.txt index 50619a0720a8..7c1329de0596 100644 --- a/Documentation/devicetree/booting-without-of.txt +++ b/Documentation/devicetree/booting-without-of.txt @@ -12,8 +12,9 @@ Table of Contents ================= I - Introduction - 1) Entry point for arch/powerpc - 2) Entry point for arch/x86 + 1) Entry point for arch/arm + 2) Entry point for arch/powerpc + 3) Entry point for arch/x86 II - The DT block format 1) Header @@ -148,7 +149,46 @@ upgrades without significantly impacting the kernel code or cluttering it with special cases. -1) Entry point for arch/powerpc +1) Entry point for arch/arm +--------------------------- + + There is one single entry point to the kernel, at the start + of the kernel image. That entry point supports two calling + conventions. A summary of the interface is described here. A full + description of the boot requirements is documented in + Documentation/arm/Booting + + a) ATAGS interface. Minimal information is passed from firmware + to the kernel with a tagged list of predefined parameters. + + r0 : 0 + + r1 : Machine type number + + r2 : Physical address of tagged list in system RAM + + b) Entry with a flattened device-tree block. Firmware loads the + physical address of the flattened device tree block (dtb) into r2, + r1 is not used, but it is considered good practise to use a valid + machine number as described in Documentation/arm/Booting. + + r0 : 0 + + r1 : Valid machine type number. When using a device tree, + a single machine type number will often be assigned to + represent a class or family of SoCs. + + r2 : physical pointer to the device-tree block + (defined in chapter II) in RAM. Device tree can be located + anywhere in system RAM, but it should be aligned on a 64 bit + boundary. + + The kernel will differentiate between ATAGS and device tree booting by + reading the memory pointed to by r2 and looking for either the flattened + device tree block magic value (0xd00dfeed) or the ATAG_CORE value at + offset 0x4 from r2 (0x54410001). + +2) Entry point for arch/powerpc ------------------------------- There is one single entry point to the kernel, at the start @@ -226,7 +266,7 @@ it with special cases. cannot support both configurations with Book E and configurations with classic Powerpc architectures. -2) Entry point for arch/x86 +3) Entry point for arch/x86 ------------------------------- There is one single 32bit entry point to the kernel at code32_start, -- cgit v1.2.1 From d94ba80ebbea17f036cecb104398fbcd788aa742 Mon Sep 17 00:00:00 2001 From: Richard Cochran Date: Fri, 22 Apr 2011 12:03:08 +0200 Subject: ptp: Added a brand new class driver for ptp clocks. This patch adds an infrastructure for hardware clocks that implement IEEE 1588, the Precision Time Protocol (PTP). A class driver offers a registration method to particular hardware clock drivers. Each clock is presented as a standard POSIX clock. The ancillary clock features are exposed in two different ways, via the sysfs and by a character device. Signed-off-by: Richard Cochran Acked-by: Arnd Bergmann Acked-by: David S. Miller Signed-off-by: John Stultz --- Documentation/ABI/testing/sysfs-ptp | 98 ++++++++++ Documentation/ptp/ptp.txt | 89 +++++++++ Documentation/ptp/testptp.c | 381 ++++++++++++++++++++++++++++++++++++ Documentation/ptp/testptp.mk | 33 ++++ 4 files changed, 601 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-ptp create mode 100644 Documentation/ptp/ptp.txt create mode 100644 Documentation/ptp/testptp.c create mode 100644 Documentation/ptp/testptp.mk (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-ptp b/Documentation/ABI/testing/sysfs-ptp new file mode 100644 index 000000000000..d40d2b550502 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-ptp @@ -0,0 +1,98 @@ +What: /sys/class/ptp/ +Date: September 2010 +Contact: Richard Cochran +Description: + This directory contains files and directories + providing a standardized interface to the ancillary + features of PTP hardware clocks. + +What: /sys/class/ptp/ptpN/ +Date: September 2010 +Contact: Richard Cochran +Description: + This directory contains the attributes of the Nth PTP + hardware clock registered into the PTP class driver + subsystem. + +What: /sys/class/ptp/ptpN/clock_name +Date: September 2010 +Contact: Richard Cochran +Description: + This file contains the name of the PTP hardware clock + as a human readable string. + +What: /sys/class/ptp/ptpN/max_adjustment +Date: September 2010 +Contact: Richard Cochran +Description: + This file contains the PTP hardware clock's maximum + frequency adjustment value (a positive integer) in + parts per billion. + +What: /sys/class/ptp/ptpN/n_alarms +Date: September 2010 +Contact: Richard Cochran +Description: + This file contains the number of periodic or one shot + alarms offer by the PTP hardware clock. + +What: /sys/class/ptp/ptpN/n_external_timestamps +Date: September 2010 +Contact: Richard Cochran +Description: + This file contains the number of external timestamp + channels offered by the PTP hardware clock. + +What: /sys/class/ptp/ptpN/n_periodic_outputs +Date: September 2010 +Contact: Richard Cochran +Description: + This file contains the number of programmable periodic + output channels offered by the PTP hardware clock. + +What: /sys/class/ptp/ptpN/pps_avaiable +Date: September 2010 +Contact: Richard Cochran +Description: + This file indicates whether the PTP hardware clock + supports a Pulse Per Second to the host CPU. Reading + "1" means that the PPS is supported, while "0" means + not supported. + +What: /sys/class/ptp/ptpN/extts_enable +Date: September 2010 +Contact: Richard Cochran +Description: + This write-only file enables or disables external + timestamps. To enable external timestamps, write the + channel index followed by a "1" into the file. + To disable external timestamps, write the channel + index followed by a "0" into the file. + +What: /sys/class/ptp/ptpN/fifo +Date: September 2010 +Contact: Richard Cochran +Description: + This file provides timestamps on external events, in + the form of three integers: channel index, seconds, + and nanoseconds. + +What: /sys/class/ptp/ptpN/period +Date: September 2010 +Contact: Richard Cochran +Description: + This write-only file enables or disables periodic + outputs. To enable a periodic output, write five + integers into the file: channel index, start time + seconds, start time nanoseconds, period seconds, and + period nanoseconds. To disable a periodic output, set + all the seconds and nanoseconds values to zero. + +What: /sys/class/ptp/ptpN/pps_enable +Date: September 2010 +Contact: Richard Cochran +Description: + This write-only file enables or disables delivery of + PPS events to the Linux PPS subsystem. To enable PPS + events, write a "1" into the file. To disable events, + write a "0" into the file. diff --git a/Documentation/ptp/ptp.txt b/Documentation/ptp/ptp.txt new file mode 100644 index 000000000000..ae8fef86b832 --- /dev/null +++ b/Documentation/ptp/ptp.txt @@ -0,0 +1,89 @@ + +* PTP hardware clock infrastructure for Linux + + This patch set introduces support for IEEE 1588 PTP clocks in + Linux. Together with the SO_TIMESTAMPING socket options, this + presents a standardized method for developing PTP user space + programs, synchronizing Linux with external clocks, and using the + ancillary features of PTP hardware clocks. + + A new class driver exports a kernel interface for specific clock + drivers and a user space interface. The infrastructure supports a + complete set of PTP hardware clock functionality. + + + Basic clock operations + - Set time + - Get time + - Shift the clock by a given offset atomically + - Adjust clock frequency + + + Ancillary clock features + - One short or periodic alarms, with signal delivery to user program + - Time stamp external events + - Period output signals configurable from user space + - Synchronization of the Linux system time via the PPS subsystem + +** PTP hardware clock kernel API + + A PTP clock driver registers itself with the class driver. The + class driver handles all of the dealings with user space. The + author of a clock driver need only implement the details of + programming the clock hardware. The clock driver notifies the class + driver of asynchronous events (alarms and external time stamps) via + a simple message passing interface. + + The class driver supports multiple PTP clock drivers. In normal use + cases, only one PTP clock is needed. However, for testing and + development, it can be useful to have more than one clock in a + single system, in order to allow performance comparisons. + +** PTP hardware clock user space API + + The class driver also creates a character device for each + registered clock. User space can use an open file descriptor from + the character device as a POSIX clock id and may call + clock_gettime, clock_settime, and clock_adjtime. These calls + implement the basic clock operations. + + User space programs may control the clock using standardized + ioctls. A program may query, enable, configure, and disable the + ancillary clock features. User space can receive time stamped + events via blocking read() and poll(). One shot and periodic + signals may be configured via the POSIX timer_settime() system + call. + +** Writing clock drivers + + Clock drivers include include/linux/ptp_clock_kernel.h and register + themselves by presenting a 'struct ptp_clock_info' to the + registration method. Clock drivers must implement all of the + functions in the interface. If a clock does not offer a particular + ancillary feature, then the driver should just return -EOPNOTSUPP + from those functions. + + Drivers must ensure that all of the methods in interface are + reentrant. Since most hardware implementations treat the time value + as a 64 bit integer accessed as two 32 bit registers, drivers + should use spin_lock_irqsave/spin_unlock_irqrestore to protect + against concurrent access. This locking cannot be accomplished in + class driver, since the lock may also be needed by the clock + driver's interrupt service routine. + +** Supported hardware + + + Freescale eTSEC gianfar + - 2 Time stamp external triggers, programmable polarity (opt. interrupt) + - 2 Alarm registers (optional interrupt) + - 3 Periodic signals (optional interrupt) + + + National DP83640 + - 6 GPIOs programmable as inputs or outputs + - 6 GPIOs with dedicated functions (LED/JTAG/clock) can also be + used as general inputs or outputs + - GPIO inputs can time stamp external triggers + - GPIO outputs can produce periodic signals + - 1 interrupt pin + + + Intel IXP465 + - Auxiliary Slave/Master Mode Snapshot (optional interrupt) + - Target Time (optional interrupt) diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c new file mode 100644 index 000000000000..f59ded066108 --- /dev/null +++ b/Documentation/ptp/testptp.c @@ -0,0 +1,381 @@ +/* + * PTP 1588 clock support - User space test program + * + * Copyright (C) 2010 OMICRON electronics GmbH + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define DEVICE "/dev/ptp0" + +#ifndef ADJ_SETOFFSET +#define ADJ_SETOFFSET 0x0100 +#endif + +#ifndef CLOCK_INVALID +#define CLOCK_INVALID -1 +#endif + +/* When glibc offers the syscall, this will go away. */ +#include +static int clock_adjtime(clockid_t id, struct timex *tx) +{ + return syscall(__NR_clock_adjtime, id, tx); +} + +static clockid_t get_clockid(int fd) +{ +#define CLOCKFD 3 +#define FD_TO_CLOCKID(fd) ((~(clockid_t) (fd) << 3) | CLOCKFD) + + return FD_TO_CLOCKID(fd); +} + +static void handle_alarm(int s) +{ + printf("received signal %d\n", s); +} + +static int install_handler(int signum, void (*handler)(int)) +{ + struct sigaction action; + sigset_t mask; + + /* Unblock the signal. */ + sigemptyset(&mask); + sigaddset(&mask, signum); + sigprocmask(SIG_UNBLOCK, &mask, NULL); + + /* Install the signal handler. */ + action.sa_handler = handler; + action.sa_flags = 0; + sigemptyset(&action.sa_mask); + sigaction(signum, &action, NULL); + + return 0; +} + +static long ppb_to_scaled_ppm(int ppb) +{ + /* + * The 'freq' field in the 'struct timex' is in parts per + * million, but with a 16 bit binary fractional field. + * Instead of calculating either one of + * + * scaled_ppm = (ppb / 1000) << 16 [1] + * scaled_ppm = (ppb << 16) / 1000 [2] + * + * we simply use double precision math, in order to avoid the + * truncation in [1] and the possible overflow in [2]. + */ + return (long) (ppb * 65.536); +} + +static void usage(char *progname) +{ + fprintf(stderr, + "usage: %s [options]\n" + " -a val request a one-shot alarm after 'val' seconds\n" + " -A val request a periodic alarm every 'val' seconds\n" + " -c query the ptp clock's capabilities\n" + " -d name device to open\n" + " -e val read 'val' external time stamp events\n" + " -f val adjust the ptp clock frequency by 'val' ppb\n" + " -g get the ptp clock time\n" + " -h prints this message\n" + " -p val enable output with a period of 'val' nanoseconds\n" + " -P val enable or disable (val=1|0) the system clock PPS\n" + " -s set the ptp clock time from the system time\n" + " -S set the system time from the ptp clock time\n" + " -t val shift the ptp clock time by 'val' seconds\n", + progname); +} + +int main(int argc, char *argv[]) +{ + struct ptp_clock_caps caps; + struct ptp_extts_event event; + struct ptp_extts_request extts_request; + struct ptp_perout_request perout_request; + struct timespec ts; + struct timex tx; + + static timer_t timerid; + struct itimerspec timeout; + struct sigevent sigevent; + + char *progname; + int c, cnt, fd; + + char *device = DEVICE; + clockid_t clkid; + int adjfreq = 0x7fffffff; + int adjtime = 0; + int capabilities = 0; + int extts = 0; + int gettime = 0; + int oneshot = 0; + int periodic = 0; + int perout = -1; + int pps = -1; + int settime = 0; + + progname = strrchr(argv[0], '/'); + progname = progname ? 1+progname : argv[0]; + while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghp:P:sSt:v"))) { + switch (c) { + case 'a': + oneshot = atoi(optarg); + break; + case 'A': + periodic = atoi(optarg); + break; + case 'c': + capabilities = 1; + break; + case 'd': + device = optarg; + break; + case 'e': + extts = atoi(optarg); + break; + case 'f': + adjfreq = atoi(optarg); + break; + case 'g': + gettime = 1; + break; + case 'p': + perout = atoi(optarg); + break; + case 'P': + pps = atoi(optarg); + break; + case 's': + settime = 1; + break; + case 'S': + settime = 2; + break; + case 't': + adjtime = atoi(optarg); + break; + case 'h': + usage(progname); + return 0; + case '?': + default: + usage(progname); + return -1; + } + } + + fd = open(device, O_RDWR); + if (fd < 0) { + fprintf(stderr, "opening %s: %s\n", device, strerror(errno)); + return -1; + } + + clkid = get_clockid(fd); + if (CLOCK_INVALID == clkid) { + fprintf(stderr, "failed to read clock id\n"); + return -1; + } + + if (capabilities) { + if (ioctl(fd, PTP_CLOCK_GETCAPS, &caps)) { + perror("PTP_CLOCK_GETCAPS"); + } else { + printf("capabilities:\n" + " %d maximum frequency adjustment (ppb)\n" + " %d programmable alarms\n" + " %d external time stamp channels\n" + " %d programmable periodic signals\n" + " %d pulse per second\n", + caps.max_adj, + caps.n_alarm, + caps.n_ext_ts, + caps.n_per_out, + caps.pps); + } + } + + if (0x7fffffff != adjfreq) { + memset(&tx, 0, sizeof(tx)); + tx.modes = ADJ_FREQUENCY; + tx.freq = ppb_to_scaled_ppm(adjfreq); + if (clock_adjtime(clkid, &tx)) { + perror("clock_adjtime"); + } else { + puts("frequency adjustment okay"); + } + } + + if (adjtime) { + memset(&tx, 0, sizeof(tx)); + tx.modes = ADJ_SETOFFSET; + tx.time.tv_sec = adjtime; + tx.time.tv_usec = 0; + if (clock_adjtime(clkid, &tx) < 0) { + perror("clock_adjtime"); + } else { + puts("time shift okay"); + } + } + + if (gettime) { + if (clock_gettime(clkid, &ts)) { + perror("clock_gettime"); + } else { + printf("clock time: %ld.%09ld or %s", + ts.tv_sec, ts.tv_nsec, ctime(&ts.tv_sec)); + } + } + + if (settime == 1) { + clock_gettime(CLOCK_REALTIME, &ts); + if (clock_settime(clkid, &ts)) { + perror("clock_settime"); + } else { + puts("set time okay"); + } + } + + if (settime == 2) { + clock_gettime(clkid, &ts); + if (clock_settime(CLOCK_REALTIME, &ts)) { + perror("clock_settime"); + } else { + puts("set time okay"); + } + } + + if (extts) { + memset(&extts_request, 0, sizeof(extts_request)); + extts_request.index = 0; + extts_request.flags = PTP_ENABLE_FEATURE; + if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request)) { + perror("PTP_EXTTS_REQUEST"); + extts = 0; + } else { + puts("external time stamp request okay"); + } + for (; extts; extts--) { + cnt = read(fd, &event, sizeof(event)); + if (cnt != sizeof(event)) { + perror("read"); + break; + } + printf("event index %u at %lld.%09u\n", event.index, + event.t.sec, event.t.nsec); + fflush(stdout); + } + /* Disable the feature again. */ + extts_request.flags = 0; + if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request)) { + perror("PTP_EXTTS_REQUEST"); + } + } + + if (oneshot) { + install_handler(SIGALRM, handle_alarm); + /* Create a timer. */ + sigevent.sigev_notify = SIGEV_SIGNAL; + sigevent.sigev_signo = SIGALRM; + if (timer_create(clkid, &sigevent, &timerid)) { + perror("timer_create"); + return -1; + } + /* Start the timer. */ + memset(&timeout, 0, sizeof(timeout)); + timeout.it_value.tv_sec = oneshot; + if (timer_settime(timerid, 0, &timeout, NULL)) { + perror("timer_settime"); + return -1; + } + pause(); + timer_delete(timerid); + } + + if (periodic) { + install_handler(SIGALRM, handle_alarm); + /* Create a timer. */ + sigevent.sigev_notify = SIGEV_SIGNAL; + sigevent.sigev_signo = SIGALRM; + if (timer_create(clkid, &sigevent, &timerid)) { + perror("timer_create"); + return -1; + } + /* Start the timer. */ + memset(&timeout, 0, sizeof(timeout)); + timeout.it_interval.tv_sec = periodic; + timeout.it_value.tv_sec = periodic; + if (timer_settime(timerid, 0, &timeout, NULL)) { + perror("timer_settime"); + return -1; + } + while (1) { + pause(); + } + timer_delete(timerid); + } + + if (perout >= 0) { + if (clock_gettime(clkid, &ts)) { + perror("clock_gettime"); + return -1; + } + memset(&perout_request, 0, sizeof(perout_request)); + perout_request.index = 0; + perout_request.start.sec = ts.tv_sec + 2; + perout_request.start.nsec = 0; + perout_request.period.sec = 0; + perout_request.period.nsec = perout; + if (ioctl(fd, PTP_PEROUT_REQUEST, &perout_request)) { + perror("PTP_PEROUT_REQUEST"); + } else { + puts("periodic output request okay"); + } + } + + if (pps != -1) { + int enable = pps ? 1 : 0; + if (ioctl(fd, PTP_ENABLE_PPS, enable)) { + perror("PTP_ENABLE_PPS"); + } else { + puts("pps for system time request okay"); + } + } + + close(fd); + return 0; +} diff --git a/Documentation/ptp/testptp.mk b/Documentation/ptp/testptp.mk new file mode 100644 index 000000000000..4ef2d9755421 --- /dev/null +++ b/Documentation/ptp/testptp.mk @@ -0,0 +1,33 @@ +# PTP 1588 clock support - User space test program +# +# Copyright (C) 2010 OMICRON electronics GmbH +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + +CC = $(CROSS_COMPILE)gcc +INC = -I$(KBUILD_OUTPUT)/usr/include +CFLAGS = -Wall $(INC) +LDLIBS = -lrt +PROGS = testptp + +all: $(PROGS) + +testptp: testptp.o + +clean: + rm -f testptp.o + +distclean: clean + rm -f $(PROGS) -- cgit v1.2.1 From c78275f366c687b5b3ead3d99fc96d1f02d38a8e Mon Sep 17 00:00:00 2001 From: Richard Cochran Date: Fri, 22 Apr 2011 12:03:54 +0200 Subject: ptp: Added a clock that uses the eTSEC found on the MPC85xx. The eTSEC includes a PTP clock with quite a few features. This patch adds support for the basic clock adjustment functions, plus two external time stamps, one alarm, and the PPS callback. Signed-off-by: Richard Cochran Acked-by: David S. Miller Acked-by: John Stultz Signed-off-by: John Stultz --- .../devicetree/bindings/net/fsl-tsec-phy.txt | 54 ++++++++++++++++++++++ 1 file changed, 54 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt index edb7ae19e868..2c6be0377f55 100644 --- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt +++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt @@ -74,3 +74,57 @@ Example: interrupt-parent = <&mpic>; phy-handle = <&phy0> }; + +* Gianfar PTP clock nodes + +General Properties: + + - compatible Should be "fsl,etsec-ptp" + - reg Offset and length of the register set for the device + - interrupts There should be at least two interrupts. Some devices + have as many as four PTP related interrupts. + +Clock Properties: + + - fsl,tclk-period Timer reference clock period in nanoseconds. + - fsl,tmr-prsc Prescaler, divides the output clock. + - fsl,tmr-add Frequency compensation value. + - fsl,tmr-fiper1 Fixed interval period pulse generator. + - fsl,tmr-fiper2 Fixed interval period pulse generator. + - fsl,max-adj Maximum frequency adjustment in parts per billion. + + These properties set the operational parameters for the PTP + clock. You must choose these carefully for the clock to work right. + Here is how to figure good values: + + TimerOsc = system clock MHz + tclk_period = desired clock period nanoseconds + NominalFreq = 1000 / tclk_period MHz + FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0) + tmr_add = ceil(2^32 / FreqDivRatio) + OutputClock = NominalFreq / tmr_prsc MHz + PulseWidth = 1 / OutputClock microseconds + FiperFreq1 = desired frequency in Hz + FiperDiv1 = 1000000 * OutputClock / FiperFreq1 + tmr_fiper1 = tmr_prsc * tclk_period * FiperDiv1 - tclk_period + max_adj = 1000000000 * (FreqDivRatio - 1.0) - 1 + + The calculation for tmr_fiper2 is the same as for tmr_fiper1. The + driver expects that tmr_fiper1 will be correctly set to produce a 1 + Pulse Per Second (PPS) signal, since this will be offered to the PPS + subsystem to synchronize the Linux clock. + +Example: + + ptp_clock@24E00 { + compatible = "fsl,etsec-ptp"; + reg = <0x24E00 0xB0>; + interrupts = <12 0x8 13 0x8>; + interrupt-parent = < &ipic >; + fsl,tclk-period = <10>; + fsl,tmr-prsc = <100>; + fsl,tmr-add = <0x999999A4>; + fsl,tmr-fiper1 = <0x3B9AC9F6>; + fsl,tmr-fiper2 = <0x00018696>; + fsl,max-adj = <659999998>; + }; -- cgit v1.2.1 From e2b0c215c2bd57693af69f7a430585109c02b07f Mon Sep 17 00:00:00 2001 From: Tiger Yang Date: Wed, 2 Mar 2011 19:32:09 +0800 Subject: ocfs2: clean up mount option about atime in ocfs2.txt As ocfs2 supports relatime and strictatime, we need update the relative document. Atime_quantum need work with strictatime, so only show it in procfs when mount with strictatime. Signed-off-by: Tiger Yang Signed-off-by: Joel Becker --- Documentation/filesystems/ocfs2.txt | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt index 9ed920a8cd79..7618a287aa41 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.txt @@ -46,9 +46,15 @@ errors=panic Panic and halt the machine if an error occurs. intr (*) Allow signals to interrupt cluster operations. nointr Do not allow signals to interrupt cluster operations. +noatime Do not update access time. +relatime(*) Update atime if the previous atime is older than + mtime or ctime +strictatime Always update atime, but the minimum update interval + is specified by atime_quantum. atime_quantum=60(*) OCFS2 will not update atime unless this number of seconds has passed since the last update. - Set to zero to always update atime. + Set to zero to always update atime. This option need + work with strictatime. data=ordered (*) All data are forced directly out to the main file system prior to its metadata being committed to the journal. -- cgit v1.2.1 From 69a60c4d177632bd56ae567dc0a082f7119b71c2 Mon Sep 17 00:00:00 2001 From: Amerigo Wang Date: Sun, 1 May 2011 21:34:16 +0800 Subject: ocfs2: remove the /sys/o2cb symlink It is obsoleted since Dec 2005. Signed-off-by: WANG Cong Signed-off-by: Joel Becker --- Documentation/ABI/obsolete/o2cb | 11 ----------- Documentation/ABI/removed/o2cb | 10 ++++++++++ Documentation/feature-removal-schedule.txt | 10 ---------- 3 files changed, 10 insertions(+), 21 deletions(-) delete mode 100644 Documentation/ABI/obsolete/o2cb create mode 100644 Documentation/ABI/removed/o2cb (limited to 'Documentation') diff --git a/Documentation/ABI/obsolete/o2cb b/Documentation/ABI/obsolete/o2cb deleted file mode 100644 index 9c49d8e6c0cc..000000000000 --- a/Documentation/ABI/obsolete/o2cb +++ /dev/null @@ -1,11 +0,0 @@ -What: /sys/o2cb symlink -Date: Dec 2005 -KernelVersion: 2.6.16 -Contact: ocfs2-devel@oss.oracle.com -Description: This is a symlink: /sys/o2cb to /sys/fs/o2cb. The symlink will - be removed when new versions of ocfs2-tools which know to look - in /sys/fs/o2cb are sufficiently prevalent. Don't code new - software to look here, it should try /sys/fs/o2cb instead. - See Documentation/ABI/stable/o2cb for more information on usage. -Users: ocfs2-tools. It's sufficient to mail proposed changes to - ocfs2-devel@oss.oracle.com. diff --git a/Documentation/ABI/removed/o2cb b/Documentation/ABI/removed/o2cb new file mode 100644 index 000000000000..7f5daa465093 --- /dev/null +++ b/Documentation/ABI/removed/o2cb @@ -0,0 +1,10 @@ +What: /sys/o2cb symlink +Date: May 2011 +KernelVersion: 2.6.40 +Contact: ocfs2-devel@oss.oracle.com +Description: This is a symlink: /sys/o2cb to /sys/fs/o2cb. The symlink is + removed when new versions of ocfs2-tools which know to look + in /sys/fs/o2cb are sufficiently prevalent. Don't code new + software to look here, it should try /sys/fs/o2cb instead. +Users: ocfs2-tools. It's sufficient to mail proposed changes to + ocfs2-devel@oss.oracle.com. diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 95788ad2506c..ff31b1cc50aa 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -262,16 +262,6 @@ Who: Michael Buesch --------------------------- -What: /sys/o2cb symlink -When: January 2010 -Why: /sys/fs/o2cb is the proper location for this information - /sys/o2cb - exists as a symlink for backwards compatibility for old versions of - ocfs2-tools. 2 years should be sufficient time to phase in new versions - which know to look in /sys/fs/o2cb. -Who: ocfs2-devel@oss.oracle.com - ---------------------------- - What: Ability for non root users to shm_get hugetlb pages based on mlock resource limits When: 2.6.31 -- cgit v1.2.1 From bdebd4892e05cc9068659f25af33c6b322034eb2 Mon Sep 17 00:00:00 2001 From: Arnaud Lacombe Date: Sun, 15 May 2011 23:22:56 -0400 Subject: kconfig: do not record timestamp in .config Signed-off-by: Arnaud Lacombe Signed-off-by: Michal Marek --- Documentation/kbuild/kconfig.txt | 5 ----- 1 file changed, 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kbuild/kconfig.txt b/Documentation/kbuild/kconfig.txt index cca46b1a0f6c..c313d71324b4 100644 --- a/Documentation/kbuild/kconfig.txt +++ b/Documentation/kbuild/kconfig.txt @@ -48,11 +48,6 @@ KCONFIG_OVERWRITECONFIG If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not break symlinks when .config is a symlink to somewhere else. -KCONFIG_NOTIMESTAMP --------------------------------------------------- -If this environment variable exists and is non-null, the timestamp line -in generated .config files is omitted. - ______________________________________________________________________ Environment variables for '{allyes/allmod/allno/rand}config' -- cgit v1.2.1 From e84661aa84e2e003738563f65155d4f12dc474e7 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 20 May 2011 13:45:32 +0000 Subject: xfs: add online discard support Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: Christoph Hellwig Signed-off-by: Alex Elder --- Documentation/filesystems/xfs.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt index 7bff3e4f35df..3fc0c31a6f5d 100644 --- a/Documentation/filesystems/xfs.txt +++ b/Documentation/filesystems/xfs.txt @@ -39,6 +39,12 @@ When mounting an XFS filesystem, the following options are accepted. drive level write caching to be enabled, for devices that support write barriers. + discard + Issue command to let the block device reclaim space freed by the + filesystem. This is useful for SSD devices, thinly provisioned + LUNs and virtual machine images, but may have a performance + impact. This option is incompatible with the nodelaylog option. + dmapi Enable the DMAPI (Data Management API) event callouts. Use with the "mtpt" option. -- cgit v1.2.1 From 4f788dce0baf44295a8d9708d3f124587158c061 Mon Sep 17 00:00:00 2001 From: adam radford Date: Wed, 11 May 2011 18:35:05 -0700 Subject: [SCSI] megaraid_sas: Version and Changelog update Signed-off-by: Adam Radford Signed-off-by: James Bottomley --- Documentation/scsi/ChangeLog.megaraid_sas | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation') diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas index 4d9ce73ff730..9ed1d9d96783 100644 --- a/Documentation/scsi/ChangeLog.megaraid_sas +++ b/Documentation/scsi/ChangeLog.megaraid_sas @@ -1,3 +1,17 @@ +Release Date : Wed. May 11, 2011 17:00:00 PST 2010 - + (emaild-id:megaraidlinux@lsi.com) + Adam Radford +Current Version : 00.00.05.38-rc1 +Old Version : 00.00.05.34-rc1 + 1. Remove MSI-X black list, use MFI_REG_STATE.ready.msiEnable. + 2. Remove un-used function megasas_return_cmd_for_smid(). + 3. Check MFI_REG_STATE.fault.resetAdapter in megasas_reset_fusion(). + 4. Disable interrupts/free_irq() in megasas_shutdown(). + 5. Fix bug where AENs could be lost in probe() and resume(). + 6. Convert 6,10,12 byte CDB's to 16 byte CDB for large LBA's for FastPath + IO. + 7. Add 1078 OCR support. +------------------------------------------------------------------------------- Release Date : Thu. Feb 24, 2011 17:00:00 PST 2010 - (emaild-id:megaraidlinux@lsi.com) Adam Radford -- cgit v1.2.1 From 3116c86033079a1d4d4e84c40028f96b614843b8 Mon Sep 17 00:00:00 2001 From: Vikram Narayanan Date: Tue, 24 May 2011 20:58:48 +0200 Subject: i2c/writing-clients: Fix foo_driver.id_table The i2c_device_id structure variable's name is not used in the i2c_driver structure. Signed-off-by: Vikram Narayanan Cc: stable@kernel.org Signed-off-by: Jean Delvare --- Documentation/i2c/writing-clients | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 5ebf5af1d716..5aa53374ea2a 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients @@ -38,7 +38,7 @@ static struct i2c_driver foo_driver = { .name = "foo", }, - .id_table = foo_ids, + .id_table = foo_idtable, .probe = foo_probe, .remove = foo_remove, /* if device autodetection is needed: */ -- cgit v1.2.1 From 6e2a851e71e65d4ec5bbc51802c36a61322d792b Mon Sep 17 00:00:00 2001 From: Seth Heasley Date: Tue, 24 May 2011 20:58:49 +0200 Subject: i2c-i801: SMBus patch for Intel Panther Point DeviceIDs This patch adds the SMBus controller DeviceID for the Intel Panther Point PCH. Signed-off-by: Seth Heasley Signed-off-by: Jean Delvare --- Documentation/i2c/busses/i2c-i801 | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index 6df69765ccb7..2871fd500349 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -19,6 +19,7 @@ Supported adapters: * Intel 6 Series (PCH) * Intel Patsburg (PCH) * Intel DH89xxCC (PCH) + * Intel Panther Point (PCH) Datasheets: Publicly available at the Intel website On Intel Patsburg and later chipsets, both the normal host SMBus controller -- cgit v1.2.1 From 371a689f64b0da140c3bcd3f55305ffa1c3a58ef Mon Sep 17 00:00:00 2001 From: Andrei Warkentin Date: Mon, 11 Apr 2011 18:10:25 -0500 Subject: mmc: MMC boot partitions support. Allows device MMC boot partitions to be accessed. MMC partitions are treated effectively as separate block devices on the same MMC card. Signed-off-by: Andrei Warkentin Acked-by: Arnd Bergmann Signed-off-by: Chris Ball --- Documentation/mmc/00-INDEX | 2 ++ Documentation/mmc/mmc-dev-attrs.txt | 10 ++++++++++ Documentation/mmc/mmc-dev-parts.txt | 27 +++++++++++++++++++++++++++ 3 files changed, 39 insertions(+) create mode 100644 Documentation/mmc/mmc-dev-parts.txt (limited to 'Documentation') diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX index fca586f5b853..93dd7a714075 100644 --- a/Documentation/mmc/00-INDEX +++ b/Documentation/mmc/00-INDEX @@ -2,3 +2,5 @@ - this file mmc-dev-attrs.txt - info on SD and MMC device attributes +mmc-dev-parts.txt + - info on SD and MMC device partitions diff --git a/Documentation/mmc/mmc-dev-attrs.txt b/Documentation/mmc/mmc-dev-attrs.txt index ff2bd685bced..8898a95b41e5 100644 --- a/Documentation/mmc/mmc-dev-attrs.txt +++ b/Documentation/mmc/mmc-dev-attrs.txt @@ -1,3 +1,13 @@ +SD and MMC Block Device Attributes +================================== + +These attributes are defined for the block devices associated with the +SD or MMC device. + +The following attributes are read/write. + + force_ro Enforce read-only access even if write protect switch is off. + SD and MMC Device Attributes ============================ diff --git a/Documentation/mmc/mmc-dev-parts.txt b/Documentation/mmc/mmc-dev-parts.txt new file mode 100644 index 000000000000..2db28b8e662f --- /dev/null +++ b/Documentation/mmc/mmc-dev-parts.txt @@ -0,0 +1,27 @@ +SD and MMC Device Partitions +============================ + +Device partitions are additional logical block devices present on the +SD/MMC device. + +As of this writing, MMC boot partitions as supported and exposed as +/dev/mmcblkXboot0 and /dev/mmcblkXboot1, where X is the index of the +parent /dev/mmcblkX. + +MMC Boot Partitions +=================== + +Read and write access is provided to the two MMC boot partitions. Due to +the sensitive nature of the boot partition contents, which often store +a bootloader or bootloader configuration tables crucial to booting the +platform, write access is disabled by default to reduce the chance of +accidental bricking. + +To enable write access to /dev/mmcblkXbootY, disable the forced read-only +access with: + +echo 0 > /sys/block/mmcblkXbootY/force_ro + +To re-enable read-only access: + +echo 1 > /sys/block/mmcblkXbootY/force_ro -- cgit v1.2.1 From cb87ea28ed9e75a41eb456bfcb547b4e6f10e750 Mon Sep 17 00:00:00 2001 From: John Calixto Date: Tue, 26 Apr 2011 18:56:29 -0400 Subject: mmc: core: Add mmc CMD+ACMD passthrough ioctl Allows appropriately-privileged applications to send CMD (normal) and ACMD (application-specific; preceded with CMD55) commands to cards/devices on the mmc bus. This is primarily useful for enabling the security functionality built in to every SD card. It can also be used as a generic passthrough (e.g. to enable virtual machines to control mmc bus devices directly). However, this use case has not been tested rigorously. Generic passthrough testing was only conducted for a few non-security opcodes to prove the feasibility of the passthrough. Since any opcode can be sent using this passthrough, it is very possible to render the card/device unusable. Applications that use this ioctl must have CAP_SYS_RAWIO. Security commands tested on TI PCIxx12 (SDHCI), Sigma Designs SMP8652 SoC, TI OMAP3621/OMAP3630 SoC, Samsung S5PC110 SoC, Qualcomm MSM7200A SoC. Signed-off-by: John Calixto Reviewed-by: Andrei Warkentin Reviewed-by: Arnd Bergmann Signed-off-by: Chris Ball --- Documentation/ioctl/ioctl-number.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index a0a5d82b6b0b..2a34d822e6d6 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -304,6 +304,7 @@ Code Seq#(hex) Include File Comments 0xB0 all RATIO devices in development: 0xB1 00-1F PPPoX +0xB3 00 linux/mmc/ioctl.h 0xC0 00-0F linux/usb/iowarrior.h 0xCB 00-1F CBM serial IEC bus in development: -- cgit v1.2.1 From 6a8a98b22b10f1560d5f90aded4a54234b9b2724 Mon Sep 17 00:00:00 2001 From: Jamie Iles Date: Mon, 23 May 2011 10:23:43 +0100 Subject: mtd: kill CONFIG_MTD_PARTITIONS Now that none of the drivers use CONFIG_MTD_PARTITIONS we can remove it from Kconfig and the last remaining uses. Signed-off-by: Jamie Iles Signed-off-by: Artem Bityutskiy Signed-off-by: David Woodhouse --- Documentation/DocBook/mtdnand.tmpl | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl index 6f242d5dee9a..17910e2052ad 100644 --- a/Documentation/DocBook/mtdnand.tmpl +++ b/Documentation/DocBook/mtdnand.tmpl @@ -189,8 +189,7 @@ static void __iomem *baseaddr; Partition defines If you want to divide your device into partitions, then - enable the configuration switch CONFIG_MTD_PARTITIONS and define - a partitioning scheme suitable to your board. + define a partitioning scheme suitable to your board. #define NUM_PARTITIONS 2 -- cgit v1.2.1 From 5f1e15868871a9da820357e0e3ff6fe89d2a16a6 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Mon, 23 May 2011 07:12:27 -0300 Subject: [media] Media DocBook: fix validation errors Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- Documentation/DocBook/dvb/dvbproperty.xml | 5 ++++- Documentation/DocBook/v4l/subdev-formats.xml | 10 +++++----- 2 files changed, 9 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/dvb/dvbproperty.xml b/Documentation/DocBook/dvb/dvbproperty.xml index 52d5e3c7cf6c..b5365f61d69b 100644 --- a/Documentation/DocBook/dvb/dvbproperty.xml +++ b/Documentation/DocBook/dvb/dvbproperty.xml @@ -141,13 +141,15 @@ struct dtv_properties { +
+ Property types On FE_GET_PROPERTY/FE_SET_PROPERTY, the actual action is determined by the dtv_property cmd/data pairs. With one single ioctl, is possible to get/set up to 64 properties. The actual meaning of each property is described on the next sections. -The Available frontend property types are: +The available frontend property types are: #define DTV_UNDEFINED 0 #define DTV_TUNE 1 @@ -193,6 +195,7 @@ get/set up to 64 properties. The actual meaning of each property is described on #define DTV_ISDBT_LAYER_ENABLED 41 #define DTV_ISDBS_TS_ID 42 +
Parameters that are common to all Digital TV standards diff --git a/Documentation/DocBook/v4l/subdev-formats.xml b/Documentation/DocBook/v4l/subdev-formats.xml index a26b10c07857..8d3409d2c632 100644 --- a/Documentation/DocBook/v4l/subdev-formats.xml +++ b/Documentation/DocBook/v4l/subdev-formats.xml @@ -2531,13 +2531,13 @@ _JPEG prefix the format code is made of the following information. - The number of bus samples per entropy encoded byte. - The bus width. + The number of bus samples per entropy encoded byte. + The bus width. + - For instance, for a JPEG baseline process and an 8-bit bus width - the format will be named V4L2_MBUS_FMT_JPEG_1X8. - + For instance, for a JPEG baseline process and an 8-bit bus width + the format will be named V4L2_MBUS_FMT_JPEG_1X8. The following table lists existing JPEG compressed formats. -- cgit v1.2.1 From ee294bedb6d0658c9934ef5d96951a22003c8886 Mon Sep 17 00:00:00 2001 From: Eric Van Hensbergen Date: Wed, 25 May 2011 09:19:27 -0500 Subject: 9p: update Documentation pointers Update documentation pointers to include virtfs publication, 9p RFC as well as updated list of servers and alternative clients. Signed-off-by: Eric Van Hensbergen --- Documentation/filesystems/9p.txt | 29 ++++++++++------------------- 1 file changed, 10 insertions(+), 19 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index b22abba78fed..13de64c7f0ab 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt @@ -25,6 +25,8 @@ Other applications are described in the following papers: http://xcpu.org/papers/cellfs-talk.pdf * PROSE I/O: Using 9p to enable Application Partitions http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf + * VirtFS: A Virtualization Aware File System pass-through + http://goo.gl/3WPDg USAGE ===== @@ -130,31 +132,20 @@ OPTIONS RESOURCES ========= -Our current recommendation is to use Inferno (http://www.vitanuova.com/nferno/index.html) -as the 9p server. You can start a 9p server under Inferno by issuing the -following command: - ; styxlisten -A tcp!*!564 export '#U*' +Protocol specifications are maintained on github: +http://ericvh.github.com/9p-rfc/ -The -A specifies an unauthenticated export. The 564 is the port # (you may -have to choose a higher port number if running as a normal user). The '#U*' -specifies exporting the root of the Linux name space. You may specify a -subset of the namespace by extending the path: '#U*'/tmp would just export -/tmp. For more information, see the Inferno manual pages covering styxlisten -and export. +9p client and server implementations are listed on +http://9p.cat-v.org/implementations -A Linux version of the 9p server is now maintained under the npfs project -on sourceforge (http://sourceforge.net/projects/npfs). The currently -maintained version is the single-threaded version of the server (named spfs) -available from the same SVN repository. +A 9p2000.L server is being developed by LLNL and can be found +at http://code.google.com/p/diod/ There are user and developer mailing lists available through the v9fs project on sourceforge (http://sourceforge.net/projects/v9fs). -A stand-alone version of the module (which should build for any 2.6 kernel) -is available via (http://github.com/ericvh/9p-sac/tree/master) - -News and other information is maintained on SWiK (http://swik.net/v9fs) -and the Wiki (http://sf.net/apps/mediawiki/v9fs/index.php). +News and other information is maintained on a Wiki. +(http://sf.net/apps/mediawiki/v9fs/index.php). Bug reports may be issued through the kernel.org bugzilla (http://bugzilla.kernel.org) -- cgit v1.2.1 From 3d48ae45e72390ddf8cc5256ac32ed6f7a19cbea Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 24 May 2011 17:12:06 -0700 Subject: mm: Convert i_mmap_lock to a mutex Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: Peter Zijlstra Acked-by: Hugh Dickins Cc: Benjamin Herrenschmidt Cc: David Miller Cc: Martin Schwidefsky Cc: Russell King Cc: Paul Mundt Cc: Jeff Dike Cc: Richard Weinberger Cc: Tony Luck Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: KOSAKI Motohiro Cc: Nick Piggin Cc: Namhyung Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/lockstat.txt | 2 +- Documentation/vm/locking | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt index 65f4c795015d..9c0a80d17a23 100644 --- a/Documentation/lockstat.txt +++ b/Documentation/lockstat.txt @@ -136,7 +136,7 @@ View the top contending locks: dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24 &inode->i_mutex: 161 286 18446744073709 62882.54 1244614.55 3653 20598 18446744073709 62318.60 1693822.74 &zone->lru_lock: 94 94 0.53 7.33 92.10 4366 32690 0.29 59.81 16350.06 - &inode->i_data.i_mmap_lock: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44 + &inode->i_data.i_mmap_mutex: 79 79 0.40 3.77 53.03 11779 87755 0.28 116.93 29898.44 &q->__queue_lock: 48 50 0.52 31.62 86.31 774 13131 0.17 113.08 12277.52 &rq->rq_lock_key: 43 47 0.74 68.50 170.63 3706 33929 0.22 107.99 17460.62 &rq->rq_lock_key#2: 39 46 0.75 6.68 49.03 2979 32292 0.17 125.17 17137.63 diff --git a/Documentation/vm/locking b/Documentation/vm/locking index 25fadb448760..f61228bd6395 100644 --- a/Documentation/vm/locking +++ b/Documentation/vm/locking @@ -66,7 +66,7 @@ in some cases it is not really needed. Eg, vm_start is modified by expand_stack(), it is hard to come up with a destructive scenario without having the vmlist protection in this case. -The page_table_lock nests with the inode i_mmap_lock and the kmem cache +The page_table_lock nests with the inode i_mmap_mutex and the kmem cache c_spinlock spinlocks. This is okay, since the kmem code asks for pages after dropping c_spinlock. The page_table_lock also nests with pagecache_lock and pagemap_lru_lock spinlocks, and no code asks for memory with these locks -- cgit v1.2.1 From de03c72cfce5b263a674d04348b58475ec50163c Mon Sep 17 00:00:00 2001 From: KOSAKI Motohiro Date: Tue, 24 May 2011 17:12:15 -0700 Subject: mm: convert mm->cpu_vm_cpumask into cpumask_var_t cpumask_t is very big struct and cpu_vm_mask is placed wrong position. It might lead to reduce cache hit ratio. This patch has two change. 1) Move the place of cpumask into last of mm_struct. Because usually cpumask is accessed only front bits when the system has cpu-hotplug capability 2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory footprint if cpumask_size() will use nr_cpumask_bits properly in future. In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var. It may help to detect out of tree cpu_vm_mask users. This patch has no functional change. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KOSAKI Motohiro Cc: David Howells Cc: Koichi Yasutake Cc: Hugh Dickins Cc: Chris Metcalf Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cachetlb.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt index 9164ae3b83bc..9b728dc17535 100644 --- a/Documentation/cachetlb.txt +++ b/Documentation/cachetlb.txt @@ -16,7 +16,7 @@ on all processors in the system. Don't let this scare you into thinking SMP cache/tlb flushing must be so inefficient, this is in fact an area where many optimizations are possible. For example, if it can be proven that a user address space has never executed -on a cpu (see vma->cpu_vm_mask), one need not perform a flush +on a cpu (see mm_cpumask()), one need not perform a flush for this address space on that cpu. First, the TLB flushing interfaces, since they are the simplest. The -- cgit v1.2.1 From a2c8990aed5ab000491732b07c8c4465d1b389b8 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 24 May 2011 17:12:50 -0700 Subject: memsw: remove noswapaccount kernel parameter The noswapaccount parameter has been deprecated since 2.6.38 without any complaints from users so we can remove it. swapaccount=0|1 can be used instead. As we are removing the parameter we can also clean up swapaccount because it doesn't have to accept an empty string anymore (to match noswapaccount) and so we can push = into __setup macro rather than checking "=1" resp. "=0" strings Signed-off-by: Michal Hocko Cc: Hiroyuki Kamezawa Cc: Daisuke Nishimura Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/kernel-parameters.txt | 3 --- 1 file changed, 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 7c6624e7a5cb..5438a2d7907f 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1777,9 +1777,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nosoftlockup [KNL] Disable the soft-lockup detector. - noswapaccount [KNL] Disable accounting of swap in memory resource - controller. (See Documentation/cgroups/memory.txt) - nosync [HW,M68K] Disables sync negotiation for all devices. notsc [BUGS=X86-32] Disable Time Stamp Counter -- cgit v1.2.1 From 4ff4d8d342fe25c4d1106fb0ffdd310a43d0ace0 Mon Sep 17 00:00:00 2001 From: Nolan Leake Date: Tue, 24 May 2011 17:13:02 -0700 Subject: um: add ucast ethernet transport The ucast transport is similar to the mcast transport (and, in fact, shares most of its code), only it uses UDP unicast to move packets. Obviously this is only useful for point-to-point connections between virtual ethernet devices. Signed-off-by: Nolan Leake Signed-off-by: Richard Weinberger Cc: David Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/virtual/uml/UserModeLinux-HOWTO.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/uml/UserModeLinux-HOWTO.txt b/Documentation/virtual/uml/UserModeLinux-HOWTO.txt index 9b7e1904db1c..5d0fc8bfcdb9 100644 --- a/Documentation/virtual/uml/UserModeLinux-HOWTO.txt +++ b/Documentation/virtual/uml/UserModeLinux-HOWTO.txt @@ -1182,6 +1182,16 @@ forge.net/> and explains these in detail, as well as some other issues. + There is also a related point-to-point only "ucast" transport. + This is useful when your network does not support multicast, and + all network connections are simple point to point links. + + The full set of command line options for this transport are + + + ethn=ucast,ethernet address,remote address,listen port,remote port + + 66..66.. TTUUNN//TTAAPP wwiitthh tthhee uummll__nneett hheellppeerr -- cgit v1.2.1 From 4b060420a596095869a6d7849caa798d23839cd1 Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Tue, 24 May 2011 17:13:12 -0700 Subject: bitmap, irq: add smp_affinity_list interface to /proc/irq Manually adjusting the smp_affinity for IRQ's becomes unwieldy when the cpu count is large. Setting smp affinity to cpus 256 to 263 would be: echo 000000ff,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > smp_affinity instead of: echo 256-263 > smp_affinity_list Think about what it looks like for cpus around say, 4088 to 4095. We already have many alternate "list" interfaces: /sys/devices/system/cpu/cpuX/indexY/shared_cpu_list /sys/devices/system/cpu/cpuX/topology/thread_siblings_list /sys/devices/system/cpu/cpuX/topology/core_siblings_list /sys/devices/system/node/nodeX/cpulist /sys/devices/pci***/***/local_cpulist Add a companion interface, smp_affinity_list to use cpu lists instead of cpu maps. This conforms to other companion interfaces where both a map and a list interface exists. This required adding a bitmap_parselist_user() function in a manner similar to the bitmap_parse_user() function. [akpm@linux-foundation.org: make __bitmap_parselist() static] Signed-off-by: Mike Travis Cc: Thomas Gleixner Cc: Jack Steiner Cc: Lee Schermerhorn Cc: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/IRQ-affinity.txt | 17 +++++++++++++---- Documentation/filesystems/proc.txt | 11 +++++++++-- 2 files changed, 22 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/IRQ-affinity.txt b/Documentation/IRQ-affinity.txt index b4a615b78403..7890fae18529 100644 --- a/Documentation/IRQ-affinity.txt +++ b/Documentation/IRQ-affinity.txt @@ -4,10 +4,11 @@ ChangeLog: SMP IRQ affinity -/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted -for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed -to turn off all CPUs, and if an IRQ controller does not support IRQ -affinity then the value will not change from the default 0xffffffff. +/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify +which target CPUs are permitted for a given IRQ source. It's a bitmask +(smp_affinity) or cpu list (smp_affinity_list) of allowed CPUs. It's not +allowed to turn off all CPUs, and if an IRQ controller does not support +IRQ affinity then the value will not change from the default of all cpus. /proc/irq/default_smp_affinity specifies default affinity mask that applies to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask @@ -54,3 +55,11 @@ round-trip min/avg/max = 0.1/0.5/585.4 ms This time around IRQ44 was delivered only to the last four processors. i.e counters for the CPU0-3 did not change. +Here is an example of limiting that same irq (44) to cpus 1024 to 1031: + +[root@moon 44]# echo 1024-1031 > smp_affinity +[root@moon 44]# cat smp_affinity +1024-1031 + +Note that to do this with a bitmask would require 32 bitmasks of zero +to follow the pertinent one. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 60740e8ecb37..f48178024067 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -574,6 +574,12 @@ The contents of each smp_affinity file is the same by default: > cat /proc/irq/0/smp_affinity ffffffff +There is an alternate interface, smp_affinity_list which allows specifying +a cpu range instead of a bitmask: + + > cat /proc/irq/0/smp_affinity_list + 1024-1031 + The default_smp_affinity mask applies to all non-active IRQs, which are the IRQs which have not yet been allocated/activated, and hence which lack a /proc/irq/[0-9]* directory. @@ -583,12 +589,13 @@ reports itself as being attached. This hardware locality information does not include information about any possible driver locality preference. prof_cpu_mask specifies which CPUs are to be profiled by the system wide -profiler. Default value is ffffffff (all cpus). +profiler. Default value is ffffffff (all cpus if there are only 32 of them). The way IRQs are routed is handled by the IO-APIC, and it's Round Robin between all the CPUs which are allowed to handle it. As usual the kernel has more info than you and does a better job than you, so the defaults are the -best choice for almost everyone. +best choice for almost everyone. [Note this applies only to those IO-APIC's +that support "Round Robin" interrupt distribution.] There are three more important subdirectories in /proc: net, scsi, and sys. The general rule is that the contents, or even the existence of these -- cgit v1.2.1 From 9e5813111265ad3c37a4370f0ee7e634dc07a7d6 Mon Sep 17 00:00:00 2001 From: Andre Przywara Date: Wed, 25 May 2011 20:43:31 +0200 Subject: hwmon: (k10temp) Add support for Fam15h (Bulldozer) AMDs upcoming CPUs use the same mechanism for the internal temperature reporting as the Fam10h CPUs, so we just needed to add the appropriate PCI-ID to the list. This allows to use the k10temp driver on those CPUs. While at it change the Kconfig entry to be more generic. Signed-off-by: Andre Przywara Acked-by: Clemens Ladisch Signed-off-by: Jean Delvare --- Documentation/hwmon/k10temp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/k10temp b/Documentation/hwmon/k10temp index d2b56a4fd1f5..0393c89277c0 100644 --- a/Documentation/hwmon/k10temp +++ b/Documentation/hwmon/k10temp @@ -11,6 +11,7 @@ Supported chips: Socket S1G2: Athlon (X2), Sempron (X2), Turion X2 (Ultra) * AMD Family 12h processors: "Llano" * AMD Family 14h processors: "Brazos" (C/E/G-Series) +* AMD Family 15h processors: "Bulldozer" Prefix: 'k10temp' Addresses scanned: PCI space @@ -40,7 +41,7 @@ Description ----------- This driver permits reading of the internal temperature sensor of AMD -Family 10h/11h/12h/14h processors. +Family 10h/11h/12h/14h/15h processors. All these processors have a sensor, but on those for Socket F or AM2+, the sensor may return inconsistent values (erratum 319). The driver -- cgit v1.2.1 From 512d1027a6c58def3c2a410e8be65b7e730aad3b Mon Sep 17 00:00:00 2001 From: Andreas Herrmann Date: Wed, 25 May 2011 20:43:31 +0200 Subject: hwmon: Add driver for AMD family 15h processor power information This CPU family provides NB register values to gather following TDP information * ProcessorPwrWatts: Specifies in Watts the maximum amount of power the processor can support. * CurrPwrWatts: Specifies in Watts the current amount of power being consumed by the processor. This driver provides * power1_crit (ProcessorPwrWatts) * power1_input (CurrPwrWatts) Signed-off-by: Andreas Herrmann Signed-off-by: Jean Delvare --- Documentation/hwmon/fam15h_power | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) create mode 100644 Documentation/hwmon/fam15h_power (limited to 'Documentation') diff --git a/Documentation/hwmon/fam15h_power b/Documentation/hwmon/fam15h_power new file mode 100644 index 000000000000..a92918e0bd69 --- /dev/null +++ b/Documentation/hwmon/fam15h_power @@ -0,0 +1,37 @@ +Kernel driver fam15h_power +========================== + +Supported chips: +* AMD Family 15h Processors + + Prefix: 'fam15h_power' + Addresses scanned: PCI space + Datasheets: + BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Processors + (not yet published) + +Author: Andreas Herrmann + +Description +----------- + +This driver permits reading of registers providing power information +of AMD Family 15h processors. + +For AMD Family 15h processors the following power values can be +calculated using different processor northbridge function registers: + +* BasePwrWatts: Specifies in watts the maximum amount of power + consumed by the processor for NB and logic external to the core. +* ProcessorPwrWatts: Specifies in watts the maximum amount of power + the processor can support. +* CurrPwrWatts: Specifies in watts the current amount of power being + consumed by the processor. + +This driver provides ProcessorPwrWatts and CurrPwrWatts: +* power1_crit (ProcessorPwrWatts) +* power1_input (CurrPwrWatts) + +On multi-node processors the calculated value is for the entire +package and not for a single node. Thus the driver creates sysfs +attributes only for internal node0 of a multi-node processor. -- cgit v1.2.1 From 629c58bac082ae091e518187d63249fa7e9f796f Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 25 May 2011 20:43:32 +0200 Subject: hwmon: (f71882fg) Add support for F71808A Signed-off-by: Hans de Goede Signed-off-by: Jean Delvare --- Documentation/hwmon/f71882fg | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/hwmon/f71882fg b/Documentation/hwmon/f71882fg index df02245d1419..84d2623810f3 100644 --- a/Documentation/hwmon/f71882fg +++ b/Documentation/hwmon/f71882fg @@ -6,6 +6,10 @@ Supported chips: Prefix: 'f71808e' Addresses scanned: none, address read from Super I/O config space Datasheet: Not public + * Fintek F71808A + Prefix: 'f71808a' + Addresses scanned: none, address read from Super I/O config space + Datasheet: Not public * Fintek F71858FG Prefix: 'f71858fg' Addresses scanned: none, address read from Super I/O config space -- cgit v1.2.1 From 67b670ff04cdff1c9584ecdb22e297956664c9b5 Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Wed, 25 May 2011 20:43:32 +0200 Subject: hwmon: (max6650) Drop device detection MAX6650 device detection is unreliable, we got reports of false positives. We now have many ways to let users instantiate the devices explicitly, so unreliable detection should be dropped. Signed-off-by: Jean Delvare Acked-by: "Hans J. Koch" Acked-by: Guenter Roeck --- Documentation/hwmon/max6650 | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/max6650 b/Documentation/hwmon/max6650 index c565650fcfc6..b26fef5f950e 100644 --- a/Documentation/hwmon/max6650 +++ b/Documentation/hwmon/max6650 @@ -4,7 +4,7 @@ Kernel driver max6650 Supported chips: * Maxim 6650 / 6651 Prefix: 'max6650' - Addresses scanned: I2C 0x1b, 0x1f, 0x48, 0x4b + Addresses scanned: none Datasheet: http://pdfserv.maxim-ic.com/en/ds/MAX6650-MAX6651.pdf Authors: @@ -36,6 +36,13 @@ fan1_div rw sets the speed range the inputs can handle. Legal values are 1, 2, 4, and 8. Use lower values for faster fans. +Usage notes +----------- + +This driver does not auto-detect devices. You will have to instantiate the +devices explicitly. Please see Documentation/i2c/instantiating-devices for +details. + Module parameters ----------------- -- cgit v1.2.1 From 9c084dae5dc7ae0039e330230e70f2a5956e566a Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Wed, 25 May 2011 20:43:32 +0200 Subject: hwmon: (max6650) Properly support the MAX6650 The MAX6650 has only one fan input. Signed-off-by: Jean Delvare Acked-by: "Hans J. Koch" Acked-by: Guenter Roeck --- Documentation/hwmon/max6650 | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/max6650 b/Documentation/hwmon/max6650 index b26fef5f950e..58d9644a2bde 100644 --- a/Documentation/hwmon/max6650 +++ b/Documentation/hwmon/max6650 @@ -2,10 +2,14 @@ Kernel driver max6650 ===================== Supported chips: - * Maxim 6650 / 6651 + * Maxim MAX6650 Prefix: 'max6650' Addresses scanned: none Datasheet: http://pdfserv.maxim-ic.com/en/ds/MAX6650-MAX6651.pdf + * Maxim MAX6651 + Prefix: 'max6651' + Addresses scanned: none + Datasheet: http://pdfserv.maxim-ic.com/en/ds/MAX6650-MAX6651.pdf Authors: Hans J. Koch @@ -15,10 +19,10 @@ Authors: Description ----------- -This driver implements support for the Maxim 6650/6651 +This driver implements support for the Maxim MAX6650 and MAX6651. -The 2 devices are very similar, but the Maxim 6550 has a reduced feature -set, e.g. only one fan-input, instead of 4 for the 6651. +The 2 devices are very similar, but the MAX6550 has a reduced feature +set, e.g. only one fan-input, instead of 4 for the MAX6651. The driver is not able to distinguish between the 2 devices. -- cgit v1.2.1 From b0b349a85d3df00a40a8bd398e4a151fd8e91bbe Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Wed, 25 May 2011 20:43:33 +0200 Subject: hwmon: New driver for the SMSC EMC6W201 This is a new driver for the SMSC EMC6W201 hardware monitoring device. The device is functionally close to the EMC6D100 series, but is register-incompatible. Signed-off-by: Jean Delvare Tested-by: Harry G McGavran Jr Tested-by: Jeff Rickman Acked-by: Guenter Roeck --- Documentation/hwmon/emc6w201 | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Documentation/hwmon/emc6w201 (limited to 'Documentation') diff --git a/Documentation/hwmon/emc6w201 b/Documentation/hwmon/emc6w201 new file mode 100644 index 000000000000..32f355aaf56b --- /dev/null +++ b/Documentation/hwmon/emc6w201 @@ -0,0 +1,42 @@ +Kernel driver emc6w201 +====================== + +Supported chips: + * SMSC EMC6W201 + Prefix: 'emc6w201' + Addresses scanned: I2C 0x2c, 0x2d, 0x2e + Datasheet: Not public + +Author: Jean Delvare + + +Description +----------- + +From the datasheet: + +"The EMC6W201 is an environmental monitoring device with automatic fan +control capability and enhanced system acoustics for noise suppression. +This ACPI compliant device provides hardware monitoring for up to six +voltages (including its own VCC) and five external thermal sensors, +measures the speed of up to five fans, and controls the speed of +multiple DC fans using three Pulse Width Modulator (PWM) outputs. Note +that it is possible to control more than three fans by connecting two +fans to one PWM output. The EMC6W201 will be available in a 36-pin +QFN package." + +The device is functionally close to the EMC6D100 series, but is +register-incompatible. + +The driver currently only supports the monitoring of the voltages, +temperatures and fan speeds. Limits can be changed. Alarms are not +supported, and neither is fan speed control. + + +Known Systems With EMC6W201 +--------------------------- + +The EMC6W201 is a rare device, only found on a few systems, made in +2005 and 2006. Known systems with this device: +* Dell Precision 670 workstation +* Gigabyte 2CEWH mainboard -- cgit v1.2.1 From 46b2903c05b248ed78304113ecfba368b4c55def Mon Sep 17 00:00:00 2001 From: Vinod Koul Date: Wed, 25 May 2011 14:49:20 -0700 Subject: dmaengine: Add API documentation for slave dma usage Signed-off-by: Vinod Koul Signed-off-by: Dan Williams --- Documentation/dmaengine.txt | 97 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 96 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/dmaengine.txt b/Documentation/dmaengine.txt index 0c1c2f63c0a9..5a0cb1ef6164 100644 --- a/Documentation/dmaengine.txt +++ b/Documentation/dmaengine.txt @@ -1 +1,96 @@ -See Documentation/crypto/async-tx-api.txt + DMA Engine API Guide + ==================== + + Vinod Koul + +NOTE: For DMA Engine usage in async_tx please see: + Documentation/crypto/async-tx-api.txt + + +Below is a guide to device driver writers on how to use the Slave-DMA API of the +DMA Engine. This is applicable only for slave DMA usage only. + +The slave DMA usage consists of following steps +1. Allocate a DMA slave channel +2. Set slave and controller specific parameters +3. Get a descriptor for transaction +4. Submit the transaction and wait for callback notification + +1. Allocate a DMA slave channel +Channel allocation is slightly different in the slave DMA context, client +drivers typically need a channel from a particular DMA controller only and even +in some cases a specific channel is desired. To request a channel +dma_request_channel() API is used. + +Interface: +struct dma_chan *dma_request_channel(dma_cap_mask_t mask, + dma_filter_fn filter_fn, + void *filter_param); +where dma_filter_fn is defined as: +typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param); + +When the optional 'filter_fn' parameter is set to NULL dma_request_channel +simply returns the first channel that satisfies the capability mask. Otherwise, +when the mask parameter is insufficient for specifying the necessary channel, +the filter_fn routine can be used to disposition the available channels in the +system. The filter_fn routine is called once for each free channel in the +system. Upon seeing a suitable channel filter_fn returns DMA_ACK which flags +that channel to be the return value from dma_request_channel. A channel +allocated via this interface is exclusive to the caller, until +dma_release_channel() is called. + +2. Set slave and controller specific parameters +Next step is always to pass some specific information to the DMA driver. Most of +the generic information which a slave DMA can use is in struct dma_slave_config. +It allows the clients to specify DMA direction, DMA addresses, bus widths, DMA +burst lengths etc. If some DMA controllers have more parameters to be sent then +they should try to embed struct dma_slave_config in their controller specific +structure. That gives flexibility to client to pass more parameters, if +required. + +Interface: +int dmaengine_slave_config(struct dma_chan *chan, + struct dma_slave_config *config) + +3. Get a descriptor for transaction +For slave usage the various modes of slave transfers supported by the +DMA-engine are: +slave_sg - DMA a list of scatter gather buffers from/to a peripheral +dma_cyclic - Perform a cyclic DMA operation from/to a peripheral till the + operation is explicitly stopped. +The non NULL return of this transfer API represents a "descriptor" for the given +transaction. + +Interface: +struct dma_async_tx_descriptor *(*chan->device->device_prep_dma_sg)( + struct dma_chan *chan, + struct scatterlist *dst_sg, unsigned int dst_nents, + struct scatterlist *src_sg, unsigned int src_nents, + unsigned long flags); +struct dma_async_tx_descriptor *(*chan->device->device_prep_dma_cyclic)( + struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len, + size_t period_len, enum dma_data_direction direction); + +4. Submit the transaction and wait for callback notification +To schedule the transaction to be scheduled by dma device, the "descriptor" +returned in above (3) needs to be submitted. +To tell the dma driver that a transaction is ready to be serviced, the +descriptor->submit() callback needs to be invoked. This chains the descriptor to +the pending queue. +The transactions in the pending queue can be activated by calling the +issue_pending API. If channel is idle then the first transaction in queue is +started and subsequent ones queued up. +On completion of the DMA operation the next in queue is submitted and a tasklet +triggered. The tasklet would then call the client driver completion callback +routine for notification, if set. +Interface: +void dma_async_issue_pending(struct dma_chan *chan); + +============================================================================== + +Additional usage notes for dma driver writers +1/ Although DMA engine specifies that completion callback routines cannot submit +any new operations, but typically for slave DMA subsequent transaction may not +be available for submit prior to callback routine being called. This requirement +is not a requirement for DMA-slave devices. But they should take care to drop +the spin-lock they might be holding before calling the callback routine -- cgit v1.2.1 From 94265cf5f731c7df29fdfde262ca3e6d51e6828c Mon Sep 17 00:00:00 2001 From: Flavio Leitner Date: Wed, 25 May 2011 08:38:58 +0000 Subject: bonding: documentation and code cleanup for resend_igmp Improves the documentation about how IGMP resend parameter works, fix two missing checks and coding style issues. Signed-off-by: Flavio Leitner Acked-by: Rick Jones Signed-off-by: David S. Miller --- Documentation/networking/bonding.txt | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 1f45bd887d65..675612ff41ae 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -770,8 +770,17 @@ resend_igmp a failover event. One membership report is issued immediately after the failover, subsequent packets are sent in each 200ms interval. - The valid range is 0 - 255; the default value is 1. This option - was added for bonding version 3.7.0. + The valid range is 0 - 255; the default value is 1. A value of 0 + prevents the IGMP membership report from being issued in response + to the failover event. + + This option is useful for bonding modes balance-rr (0), active-backup + (1), balance-tlb (5) and balance-alb (6), in which a failover can + switch the IGMP traffic from one slave to another. Therefore a fresh + IGMP report must be issued to cause the switch to forward the incoming + IGMP traffic over the newly selected slave. + + This option was added for bonding version 3.7.0. 3. Configuring Bonding Devices ============================== -- cgit v1.2.1 From cca4acc491ade9280e88a756d5d7d9424d160814 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 25 May 2011 10:56:58 -0300 Subject: [media] Documentation/DocBook: Rename media fops xml files By convention, the name of the XML file should match the references declared at media-entities.tmpl, as, otherwise, some validation scripts break. Signed-off-by: Mauro Carvalho Chehab --- Documentation/DocBook/media-entities.tmpl | 6 +++--- Documentation/DocBook/v4l/media-controller.xml | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/media-entities.tmpl b/Documentation/DocBook/media-entities.tmpl index c8abb23ef1e7..5c44aa73d57a 100644 --- a/Documentation/DocBook/media-entities.tmpl +++ b/Documentation/DocBook/media-entities.tmpl @@ -373,9 +373,9 @@ - - - + + + diff --git a/Documentation/DocBook/v4l/media-controller.xml b/Documentation/DocBook/v4l/media-controller.xml index 2dc25e1d4089..873ac3a621f0 100644 --- a/Documentation/DocBook/v4l/media-controller.xml +++ b/Documentation/DocBook/v4l/media-controller.xml @@ -78,9 +78,9 @@ Function Reference - &sub-media-open; - &sub-media-close; - &sub-media-ioctl; + &sub-media-func-open; + &sub-media-func-close; + &sub-media-func-ioctl; &sub-media-ioc-device-info; &sub-media-ioc-enum-entities; -- cgit v1.2.1 From 115d2535f8ced13503b62a1275338e09a51681dc Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 25 May 2011 11:55:57 -0300 Subject: [media] add V4L2-PIX-FMT-SRGGB12 & friends to docbook The xml with those guys are there, but they weren't included on the docbook. Signed-off-by: Mauro Carvalho Chehab --- Documentation/DocBook/media-entities.tmpl | 1 + Documentation/DocBook/v4l/pixfmt.xml | 1 + 2 files changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/DocBook/media-entities.tmpl b/Documentation/DocBook/media-entities.tmpl index 5c44aa73d57a..e5fe09430fd9 100644 --- a/Documentation/DocBook/media-entities.tmpl +++ b/Documentation/DocBook/media-entities.tmpl @@ -293,6 +293,7 @@ + diff --git a/Documentation/DocBook/v4l/pixfmt.xml b/Documentation/DocBook/v4l/pixfmt.xml index dbfe3b08435f..deb660207f94 100644 --- a/Documentation/DocBook/v4l/pixfmt.xml +++ b/Documentation/DocBook/v4l/pixfmt.xml @@ -673,6 +673,7 @@ access the palette, this must be done with ioctls of the Linux framebuffer API.< &sub-srggb8; &sub-sbggr16; &sub-srggb10; + &sub-srggb12;
-- cgit v1.2.1 From 4fe4746ab694690af9f2ccb80184f5c575917c7f Mon Sep 17 00:00:00 2001 From: Dan Magenheimer Date: Thu, 26 May 2011 10:00:56 -0600 Subject: mm/fs: cleancache documentation This patchset introduces cleancache, an optional new feature exposed by the VFS layer that potentially dramatically increases page cache effectiveness for many workloads in many environments at a negligible cost. It does this by providing an interface to transcendent memory, which is memory/storage that is not otherwise visible to and/or directly addressable by the kernel. Instead of being discarded, hooks in the reclaim code "put" clean pages to cleancache. Filesystems that "opt-in" may "get" pages from cleancache that were previously put, but pages in cleancache are "ephemeral", meaning they may disappear at any time. And the size of cleancache is entirely dynamic and unknowable to the kernel. Filesystems currently supported by this patchset include ext3, ext4, btrfs, and ocfs2. Other filesystems (especially those built entirely on VFS) should be easy to add, but should first be thoroughly tested to ensure coherency. Details and a FAQ are provided in Documentation/vm/cleancache.txt This first patch of eight in this cleancache series only adds two new documentation files. [v8: minor documentation changes by author] [v3: akpm@linux-foundation.org: document sysfs API] [v3: hch@infradead.org: move detailed description to Documentation/vm] Signed-off-by: Dan Magenheimer Reviewed-by: Jeremy Fitzhardinge Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Andrew Morton Acked-by: Randy Dunlap Cc: Al Viro Cc: Matthew Wilcox Cc: Nick Piggin Cc: Mel Gorman Cc: Rik Van Riel Cc: Jan Beulich Cc: Chris Mason Cc: Andreas Dilger Cc: Ted Ts'o Cc: Mark Fasheh Cc: Joel Becker Cc: Nitin Gupta --- .../ABI/testing/sysfs-kernel-mm-cleancache | 11 + Documentation/vm/cleancache.txt | 278 +++++++++++++++++++++ 2 files changed, 289 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-cleancache create mode 100644 Documentation/vm/cleancache.txt (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-cleancache b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache new file mode 100644 index 000000000000..662ae646ea12 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache @@ -0,0 +1,11 @@ +What: /sys/kernel/mm/cleancache/ +Date: April 2011 +Contact: Dan Magenheimer +Description: + /sys/kernel/mm/cleancache/ contains a number of files which + record a count of various cleancache operations + (sum across all filesystems): + succ_gets + failed_gets + puts + flushes diff --git a/Documentation/vm/cleancache.txt b/Documentation/vm/cleancache.txt new file mode 100644 index 000000000000..36c367c73084 --- /dev/null +++ b/Documentation/vm/cleancache.txt @@ -0,0 +1,278 @@ +MOTIVATION + +Cleancache is a new optional feature provided by the VFS layer that +potentially dramatically increases page cache effectiveness for +many workloads in many environments at a negligible cost. + +Cleancache can be thought of as a page-granularity victim cache for clean +pages that the kernel's pageframe replacement algorithm (PFRA) would like +to keep around, but can't since there isn't enough memory. So when the +PFRA "evicts" a page, it first attempts to use cleancache code to +put the data contained in that page into "transcendent memory", memory +that is not directly accessible or addressable by the kernel and is +of unknown and possibly time-varying size. + +Later, when a cleancache-enabled filesystem wishes to access a page +in a file on disk, it first checks cleancache to see if it already +contains it; if it does, the page of data is copied into the kernel +and a disk access is avoided. + +Transcendent memory "drivers" for cleancache are currently implemented +in Xen (using hypervisor memory) and zcache (using in-kernel compressed +memory) and other implementations are in development. + +FAQs are included below. + +IMPLEMENTATION OVERVIEW + +A cleancache "backend" that provides transcendent memory registers itself +to the kernel's cleancache "frontend" by calling cleancache_register_ops, +passing a pointer to a cleancache_ops structure with funcs set appropriately. +Note that cleancache_register_ops returns the previous settings so that +chaining can be performed if desired. The functions provided must conform to +certain semantics as follows: + +Most important, cleancache is "ephemeral". Pages which are copied into +cleancache have an indefinite lifetime which is completely unknowable +by the kernel and so may or may not still be in cleancache at any later time. +Thus, as its name implies, cleancache is not suitable for dirty pages. +Cleancache has complete discretion over what pages to preserve and what +pages to discard and when. + +Mounting a cleancache-enabled filesystem should call "init_fs" to obtain a +pool id which, if positive, must be saved in the filesystem's superblock; +a negative return value indicates failure. A "put_page" will copy a +(presumably about-to-be-evicted) page into cleancache and associate it with +the pool id, a file key, and a page index into the file. (The combination +of a pool id, a file key, and an index is sometimes called a "handle".) +A "get_page" will copy the page, if found, from cleancache into kernel memory. +A "flush_page" will ensure the page no longer is present in cleancache; +a "flush_inode" will flush all pages associated with the specified file; +and, when a filesystem is unmounted, a "flush_fs" will flush all pages in +all files specified by the given pool id and also surrender the pool id. + +An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache +to treat the pool as shared using a 128-bit UUID as a key. On systems +that may run multiple kernels (such as hard partitioned or virtualized +systems) that may share a clustered filesystem, and where cleancache +may be shared among those kernels, calls to init_shared_fs that specify the +same UUID will receive the same pool id, thus allowing the pages to +be shared. Note that any security requirements must be imposed outside +of the kernel (e.g. by "tools" that control cleancache). Or a +cleancache implementation can simply disable shared_init by always +returning a negative value. + +If a get_page is successful on a non-shared pool, the page is flushed (thus +making cleancache an "exclusive" cache). On a shared pool, the page +is NOT flushed on a successful get_page so that it remains accessible to +other sharers. The kernel is responsible for ensuring coherency between +cleancache (shared or not), the page cache, and the filesystem, using +cleancache flush operations as required. + +Note that cleancache must enforce put-put-get coherency and get-get +coherency. For the former, if two puts are made to the same handle but +with different data, say AAA by the first put and BBB by the second, a +subsequent get can never return the stale data (AAA). For get-get coherency, +if a get for a given handle fails, subsequent gets for that handle will +never succeed unless preceded by a successful put with that handle. + +Last, cleancache provides no SMP serialization guarantees; if two +different Linux threads are simultaneously putting and flushing a page +with the same handle, the results are indeterminate. Callers must +lock the page to ensure serial behavior. + +CLEANCACHE PERFORMANCE METRICS + +Cleancache monitoring is done by sysfs files in the +/sys/kernel/mm/cleancache directory. The effectiveness of cleancache +can be measured (across all filesystems) with: + +succ_gets - number of gets that were successful +failed_gets - number of gets that failed +puts - number of puts attempted (all "succeed") +flushes - number of flushes attempted + +A backend implementatation may provide additional metrics. + +FAQ + +1) Where's the value? (Andrew Morton) + +Cleancache provides a significant performance benefit to many workloads +in many environments with negligible overhead by improving the +effectiveness of the pagecache. Clean pagecache pages are +saved in transcendent memory (RAM that is otherwise not directly +addressable to the kernel); fetching those pages later avoids "refaults" +and thus disk reads. + +Cleancache (and its sister code "frontswap") provide interfaces for +this transcendent memory (aka "tmem"), which conceptually lies between +fast kernel-directly-addressable RAM and slower DMA/asynchronous devices. +Disallowing direct kernel or userland reads/writes to tmem +is ideal when data is transformed to a different form and size (such +as with compression) or secretly moved (as might be useful for write- +balancing for some RAM-like devices). Evicted page-cache pages (and +swap pages) are a great use for this kind of slower-than-RAM-but-much- +faster-than-disk transcendent memory, and the cleancache (and frontswap) +"page-object-oriented" specification provides a nice way to read and +write -- and indirectly "name" -- the pages. + +In the virtual case, the whole point of virtualization is to statistically +multiplex physical resources across the varying demands of multiple +virtual machines. This is really hard to do with RAM and efforts to +do it well with no kernel change have essentially failed (except in some +well-publicized special-case workloads). Cleancache -- and frontswap -- +with a fairly small impact on the kernel, provide a huge amount +of flexibility for more dynamic, flexible RAM multiplexing. +Specifically, the Xen Transcendent Memory backend allows otherwise +"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple +virtual machines, but the pages can be compressed and deduplicated to +optimize RAM utilization. And when guest OS's are induced to surrender +underutilized RAM (e.g. with "self-ballooning"), page cache pages +are the first to go, and cleancache allows those pages to be +saved and reclaimed if overall host system memory conditions allow. + +And the identical interface used for cleancache can be used in +physical systems as well. The zcache driver acts as a memory-hungry +device that stores pages of data in a compressed state. And +the proposed "RAMster" driver shares RAM across multiple physical +systems. + +2) Why does cleancache have its sticky fingers so deep inside the + filesystems and VFS? (Andrew Morton and Christoph Hellwig) + +The core hooks for cleancache in VFS are in most cases a single line +and the minimum set are placed precisely where needed to maintain +coherency (via cleancache_flush operations) between cleancache, +the page cache, and disk. All hooks compile into nothingness if +cleancache is config'ed off and turn into a function-pointer- +compare-to-NULL if config'ed on but no backend claims the ops +functions, or to a compare-struct-element-to-negative if a +backend claims the ops functions but a filesystem doesn't enable +cleancache. + +Some filesystems are built entirely on top of VFS and the hooks +in VFS are sufficient, so don't require an "init_fs" hook; the +initial implementation of cleancache didn't provide this hook. +But for some filesystems (such as btrfs), the VFS hooks are +incomplete and one or more hooks in fs-specific code are required. +And for some other filesystems, such as tmpfs, cleancache may +be counterproductive. So it seemed prudent to require a filesystem +to "opt in" to use cleancache, which requires adding a hook in +each filesystem. Not all filesystems are supported by cleancache +only because they haven't been tested. The existing set should +be sufficient to validate the concept, the opt-in approach means +that untested filesystems are not affected, and the hooks in the +existing filesystems should make it very easy to add more +filesystems in the future. + +The total impact of the hooks to existing fs and mm files is only +about 40 lines added (not counting comments and blank lines). + +3) Why not make cleancache asynchronous and batched so it can + more easily interface with real devices with DMA instead + of copying each individual page? (Minchan Kim) + +The one-page-at-a-time copy semantics simplifies the implementation +on both the frontend and backend and also allows the backend to +do fancy things on-the-fly like page compression and +page deduplication. And since the data is "gone" (copied into/out +of the pageframe) before the cleancache get/put call returns, +a great deal of race conditions and potential coherency issues +are avoided. While the interface seems odd for a "real device" +or for real kernel-addressable RAM, it makes perfect sense for +transcendent memory. + +4) Why is non-shared cleancache "exclusive"? And where is the + page "flushed" after a "get"? (Minchan Kim) + +The main reason is to free up space in transcendent memory and +to avoid unnecessary cleancache_flush calls. If you want inclusive, +the page can be "put" immediately following the "get". If +put-after-get for inclusive becomes common, the interface could +be easily extended to add a "get_no_flush" call. + +The flush is done by the cleancache backend implementation. + +5) What's the performance impact? + +Performance analysis has been presented at OLS'09 and LCA'10. +Briefly, performance gains can be significant on most workloads, +especially when memory pressure is high (e.g. when RAM is +overcommitted in a virtual workload); and because the hooks are +invoked primarily in place of or in addition to a disk read/write, +overhead is negligible even in worst case workloads. Basically +cleancache replaces I/O with memory-copy-CPU-overhead; on older +single-core systems with slow memory-copy speeds, cleancache +has little value, but in newer multicore machines, especially +consolidated/virtualized machines, it has great value. + +6) How do I add cleancache support for filesystem X? (Boaz Harrash) + +Filesystems that are well-behaved and conform to certain +restrictions can utilize cleancache simply by making a call to +cleancache_init_fs at mount time. Unusual, misbehaving, or +poorly layered filesystems must either add additional hooks +and/or undergo extensive additional testing... or should just +not enable the optional cleancache. + +Some points for a filesystem to consider: + +- The FS should be block-device-based (e.g. a ram-based FS such + as tmpfs should not enable cleancache) +- To ensure coherency/correctness, the FS must ensure that all + file removal or truncation operations either go through VFS or + add hooks to do the equivalent cleancache "flush" operations +- To ensure coherency/correctness, either inode numbers must + be unique across the lifetime of the on-disk file OR the + FS must provide an "encode_fh" function. +- The FS must call the VFS superblock alloc and deactivate routines + or add hooks to do the equivalent cleancache calls done there. +- To maximize performance, all pages fetched from the FS should + go through the do_mpag_readpage routine or the FS should add + hooks to do the equivalent (cf. btrfs) +- Currently, the FS blocksize must be the same as PAGESIZE. This + is not an architectural restriction, but no backends currently + support anything different. +- A clustered FS should invoke the "shared_init_fs" cleancache + hook to get best performance for some backends. + +7) Why not use the KVA of the inode as the key? (Christoph Hellwig) + +If cleancache would use the inode virtual address instead of +inode/filehandle, the pool id could be eliminated. But, this +won't work because cleancache retains pagecache data pages +persistently even when the inode has been pruned from the +inode unused list, and only flushes the data page if the file +gets removed/truncated. So if cleancache used the inode kva, +there would be potential coherency issues if/when the inode +kva is reused for a different file. Alternately, if cleancache +flushed the pages when the inode kva was freed, much of the value +of cleancache would be lost because the cache of pages in cleanache +is potentially much larger than the kernel pagecache and is most +useful if the pages survive inode cache removal. + +8) Why is a global variable required? + +The cleancache_enabled flag is checked in all of the frequently-used +cleancache hooks. The alternative is a function call to check a static +variable. Since cleancache is enabled dynamically at runtime, systems +that don't enable cleancache would suffer thousands (possibly +tens-of-thousands) of unnecessary function calls per second. So the +global variable allows cleancache to be enabled by default at compile +time, but have insignificant performance impact when cleancache remains +disabled at runtime. + +9) Does cleanache work with KVM? + +The memory model of KVM is sufficiently different that a cleancache +backend may have less value for KVM. This remains to be tested, +especially in an overcommitted system. + +10) Does cleancache work in userspace? It sounds useful for + memory hungry caches like web browsers. (Jamie Lokier) + +No plans yet, though we agree it sounds useful, at least for +apps that bypass the page cache (e.g. O_DIRECT). + +Last updated: Dan Magenheimer, April 13 2011 -- cgit v1.2.1 From 23b5c8fa01b723c70a20d6e4ef4ff54c7656d6e1 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 7 Sep 2010 10:38:22 -0700 Subject: rcu: Decrease memory-barrier usage based on semi-formal proof (Note: this was reverted, and is now being re-applied in pieces, with this being the fifth and final piece. See below for the reason that it is now felt to be safe to re-apply this.) Commit d09b62d fixed grace-period synchronization, but left some smp_mb() invocations in rcu_process_callbacks() that are no longer needed, but sheer paranoia prevented them from being removed. This commit removes them and provides a proof of correctness in their absence. It also adds a memory barrier to rcu_report_qs_rsp() immediately before the update to rsp->completed in order to handle the theoretical possibility that the compiler or CPU might move massive quantities of code into a lock-based critical section. This also proves that the sheer paranoia was not entirely unjustified, at least from a theoretical point of view. In addition, the old dyntick-idle synchronization depended on the fact that grace periods were many milliseconds in duration, so that it could be assumed that no dyntick-idle CPU could reorder a memory reference across an entire grace period. Unfortunately for this design, the addition of expedited grace periods breaks this assumption, which has the unfortunate side-effect of requiring atomic operations in the functions that track dyntick-idle state for RCU. (There is some hope that the algorithms used in user-level RCU might be applied here, but some work is required to handle the NMIs that user-space applications can happily ignore. For the short term, better safe than sorry.) This proof assumes that neither compiler nor CPU will allow a lock acquisition and release to be reordered, as doing so can result in deadlock. The proof is as follows: 1. A given CPU declares a quiescent state under the protection of its leaf rcu_node's lock. 2. If there is more than one level of rcu_node hierarchy, the last CPU to declare a quiescent state will also acquire the ->lock of the next rcu_node up in the hierarchy, but only after releasing the lower level's lock. The acquisition of this lock clearly cannot occur prior to the acquisition of the leaf node's lock. 3. Step 2 repeats until we reach the root rcu_node structure. Please note again that only one lock is held at a time through this process. The acquisition of the root rcu_node's ->lock must occur after the release of that of the leaf rcu_node. 4. At this point, we set the ->completed field in the rcu_state structure in rcu_report_qs_rsp(). However, if the rcu_node hierarchy contains only one rcu_node, then in theory the code preceding the quiescent state could leak into the critical section. We therefore precede the update of ->completed with a memory barrier. All CPUs will therefore agree that any updates preceding any report of a quiescent state will have happened before the update of ->completed. 5. Regardless of whether a new grace period is needed, rcu_start_gp() will propagate the new value of ->completed to all of the leaf rcu_node structures, under the protection of each rcu_node's ->lock. If a new grace period is needed immediately, this propagation will occur in the same critical section that ->completed was set in, but courtesy of the memory barrier in #4 above, is still seen to follow any pre-quiescent-state activity. 6. When a given CPU invokes __rcu_process_gp_end(), it becomes aware of the end of the old grace period and therefore makes any RCU callbacks that were waiting on that grace period eligible for invocation. If this CPU is the same one that detected the end of the grace period, and if there is but a single rcu_node in the hierarchy, we will still be in the single critical section. In this case, the memory barrier in step #4 guarantees that all callbacks will be seen to execute after each CPU's quiescent state. On the other hand, if this is a different CPU, it will acquire the leaf rcu_node's ->lock, and will again be serialized after each CPU's quiescent state for the old grace period. On the strength of this proof, this commit therefore removes the memory barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp(). The effect is to reduce the number of memory barriers by one and to reduce the frequency of execution from about once per scheduling tick per CPU to once per grace period. This was reverted do to hangs found during testing by Yinghai Lu and Ingo Molnar. Frederic Weisbecker supplied Yinghai with tracing that located the underlying problem, and Frederic also provided the fix. The underlying problem was that the HARDIRQ_ENTER() macro from lib/locking-selftest.c invoked irq_enter(), which in turn invokes rcu_irq_enter(), but HARDIRQ_EXIT() invoked __irq_exit(), which does not invoke rcu_irq_exit(). This situation resulted in calls to rcu_irq_enter() that were not balanced by the required calls to rcu_irq_exit(). Therefore, after these locking selftests completed, RCU's dyntick-idle nesting count was a large number (for example, 72), which caused RCU to to conclude that the affected CPU was not in dyntick-idle mode when in fact it was. RCU would therefore incorrectly wait for this dyntick-idle CPU, resulting in hangs. In contrast, with Frederic's patch, which replaces the irq_enter() in HARDIRQ_ENTER() with an __irq_enter(), these tests don't ever call either rcu_irq_enter() or rcu_irq_exit(), which works because the CPU running the test is already marked as not being in dyntick-idle mode. This means that the rcu_irq_enter() and rcu_irq_exit() calls and RCU then has no problem working out which CPUs are in dyntick-idle mode and which are not. The reason that the imbalance was not noticed before the barrier patch was applied is that the old implementation of rcu_enter_nohz() ignored the nesting depth. This could still result in delays, but much shorter ones. Whenever there was a delay, RCU would IPI the CPU with the unbalanced nesting level, which would eventually result in rcu_enter_nohz() being called, which in turn would force RCU to see that the CPU was in dyntick-idle mode. The reason that very few people noticed the problem is that the mismatched irq_enter() vs. __irq_exit() occured only when the kernel was built with CONFIG_DEBUG_LOCKING_API_SELFTESTS. Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- Documentation/RCU/trace.txt | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index c078ad48f7a1..8173cec473aa 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -99,18 +99,11 @@ o "qp" indicates that RCU still expects a quiescent state from o "dt" is the current value of the dyntick counter that is incremented when entering or leaving dynticks idle state, either by the - scheduler or by irq. The number after the "/" is the interrupt - nesting depth when in dyntick-idle state, or one greater than - the interrupt-nesting depth otherwise. - - This field is displayed only for CONFIG_NO_HZ kernels. - -o "dn" is the current value of the dyntick counter that is incremented - when entering or leaving dynticks idle state via NMI. If both - the "dt" and "dn" values are even, then this CPU is in dynticks - idle mode and may be ignored by RCU. If either of these two - counters is odd, then RCU must be alert to the possibility of - an RCU read-side critical section running on this CPU. + scheduler or by irq. This number is even if the CPU is in + dyntick idle mode and odd otherwise. The number after the first + "/" is the interrupt nesting depth when in dyntick-idle state, + or one greater than the interrupt-nesting depth otherwise. + The number after the second "/" is the NMI nesting depth. This field is displayed only for CONFIG_NO_HZ kernels. -- cgit v1.2.1 From 72eef0f3af410de2c85f236140ddea61b71cfc3e Mon Sep 17 00:00:00 2001 From: Nikanth Karthikesan Date: Thu, 26 May 2011 16:25:13 -0700 Subject: Documentation/atomic_ops.txt: avoid volatile in sample code As declaring counter as volatile is discouraged, it is best not to use it in sample code as well. Signed-off-by: Nikanth Karthikesan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/atomic_ops.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index ac4d47187122..3bd585b44927 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt @@ -12,7 +12,7 @@ Also, it should be made opaque such that any kind of cast to a normal C integer type will fail. Something like the following should suffice: - typedef struct { volatile int counter; } atomic_t; + typedef struct { int counter; } atomic_t; Historically, counter has been declared volatile. This is now discouraged. See Documentation/volatile-considered-harmful.txt for the complete rationale. -- cgit v1.2.1 From fbdd91a6293f5e7fb1914966f5d244f72619641e Mon Sep 17 00:00:00 2001 From: "Justin P. Mattock" Date: Thu, 26 May 2011 16:25:14 -0700 Subject: Documentation/accounting/getdelays.c: fix unused var warning Fixes Documentation/accounting/getdelays.c: In function `main': Documentation/accounting/getdelays.c:436:7: warning: variable `i' set but not used [-Wunused-but-set-variable] Signed-off-by: Justin P. Mattock Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/accounting/getdelays.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c index e9c77788a39d..16e16d57b9eb 100644 --- a/Documentation/accounting/getdelays.c +++ b/Documentation/accounting/getdelays.c @@ -433,8 +433,6 @@ int main(int argc, char *argv[]) } do { - int i; - rep_len = recv(nl_sd, &msg, sizeof(msg), 0); PRINTF("received %d bytes\n", rep_len); @@ -459,7 +457,6 @@ int main(int argc, char *argv[]) na = (struct nlattr *) GENLMSG_DATA(&msg); len = 0; - i = 0; while (len < rep_len) { len += NLA_ALIGN(na->nla_len); switch (na->nla_type) { -- cgit v1.2.1 From 4ed960b14d3b5fd14f1d9eb02f6d7e398317627a Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Thu, 26 May 2011 16:25:15 -0700 Subject: Documentation/accounting/getdelays.c: handle sendto() failures Fixes Documentation/accounting/getdelays.c: In function `get_family_id': Documentation/accounting/getdelays.c:172:14: warning: variable `rc' set but not used [-Wunused-but-set-variable] Reported-by: "Justin P. Mattock" Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/accounting/getdelays.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c index 16e16d57b9eb..fed225401dd2 100644 --- a/Documentation/accounting/getdelays.c +++ b/Documentation/accounting/getdelays.c @@ -177,6 +177,8 @@ static int get_family_id(int sd) rc = send_cmd(sd, GENL_ID_CTRL, getpid(), CTRL_CMD_GETFAMILY, CTRL_ATTR_FAMILY_NAME, (void *)name, strlen(TASKSTATS_GENL_NAME)+1); + if (rc < 0) + return 0; /* sendto() failure? */ rep_len = recv(sd, &ans, sizeof(ans), 0); if (ans.n.nlmsg_type == NLMSG_ERROR || -- cgit v1.2.1 From 02d54f092697b6046e466e447cc694b0e6ed45d0 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Thu, 26 May 2011 16:25:16 -0700 Subject: getdelays: show average CPU/IO/SWAP/RECLAIM delays I find it very handy to show the average delays in milliseconds. Example output (on 100 concurrent dd reading sparse files): CPU count real total virtual total delay total delay average 986 3223509952 3207643301 38863410579 39.415ms IO count delay total delay average 0 0 0ms SWAP count delay total delay average 0 0 0ms RECLAIM count delay total delay average 1059 5131834899 4ms dd: read=0, write=0, cancelled_write=0 Signed-off-by: Wu Fengguang Cc: Mel Gorman Cc: Balbir Singh Reviewed-by: Satoru Moriya Reviewed-by: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/accounting/getdelays.c | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) (limited to 'Documentation') diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c index fed225401dd2..f6318f6d7baf 100644 --- a/Documentation/accounting/getdelays.c +++ b/Documentation/accounting/getdelays.c @@ -193,30 +193,37 @@ static int get_family_id(int sd) return id; } +#define average_ms(t, c) (t / 1000000ULL / (c ? c : 1)) + static void print_delayacct(struct taskstats *t) { - printf("\n\nCPU %15s%15s%15s%15s\n" - " %15llu%15llu%15llu%15llu\n" - "IO %15s%15s\n" - " %15llu%15llu\n" - "SWAP %15s%15s\n" - " %15llu%15llu\n" - "RECLAIM %12s%15s\n" - " %15llu%15llu\n", - "count", "real total", "virtual total", "delay total", + printf("\n\nCPU %15s%15s%15s%15s%15s\n" + " %15llu%15llu%15llu%15llu%15.3fms\n" + "IO %15s%15s%15s\n" + " %15llu%15llu%15llums\n" + "SWAP %15s%15s%15s\n" + " %15llu%15llu%15llums\n" + "RECLAIM %12s%15s%15s\n" + " %15llu%15llu%15llums\n", + "count", "real total", "virtual total", + "delay total", "delay average", (unsigned long long)t->cpu_count, (unsigned long long)t->cpu_run_real_total, (unsigned long long)t->cpu_run_virtual_total, (unsigned long long)t->cpu_delay_total, - "count", "delay total", + average_ms((double)t->cpu_delay_total, t->cpu_count), + "count", "delay total", "delay average", (unsigned long long)t->blkio_count, (unsigned long long)t->blkio_delay_total, - "count", "delay total", + average_ms(t->blkio_delay_total, t->blkio_count), + "count", "delay total", "delay average", (unsigned long long)t->swapin_count, (unsigned long long)t->swapin_delay_total, - "count", "delay total", + average_ms(t->swapin_delay_total, t->swapin_count), + "count", "delay total", "delay average", (unsigned long long)t->freepages_count, - (unsigned long long)t->freepages_delay_total); + (unsigned long long)t->freepages_delay_total, + average_ms(t->freepages_delay_total, t->freepages_count)); } static void task_context_switch_counts(struct taskstats *t) -- cgit v1.2.1 From dcb3a08e69629ea65a3e9647da730bfaf670497d Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Thu, 26 May 2011 16:25:17 -0700 Subject: Documentation: configfs examples crash fix When configfs_register_subsystem() fails, we unregister too many subsystems in configfs_example_init. Decrement i by one to not unregister non-registered subsystem. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jiri Slaby Cc: Joel Becker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/configfs/configfs_example_explicit.c | 6 ++---- Documentation/filesystems/configfs/configfs_example_macros.c | 6 ++---- 2 files changed, 4 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/configfs/configfs_example_explicit.c b/Documentation/filesystems/configfs/configfs_example_explicit.c index fd53869f5633..1420233dfa55 100644 --- a/Documentation/filesystems/configfs/configfs_example_explicit.c +++ b/Documentation/filesystems/configfs/configfs_example_explicit.c @@ -464,9 +464,8 @@ static int __init configfs_example_init(void) return 0; out_unregister: - for (; i >= 0; i--) { + for (i--; i >= 0; i--) configfs_unregister_subsystem(example_subsys[i]); - } return ret; } @@ -475,9 +474,8 @@ static void __exit configfs_example_exit(void) { int i; - for (i = 0; example_subsys[i]; i++) { + for (i = 0; example_subsys[i]; i++) configfs_unregister_subsystem(example_subsys[i]); - } } module_init(configfs_example_init); diff --git a/Documentation/filesystems/configfs/configfs_example_macros.c b/Documentation/filesystems/configfs/configfs_example_macros.c index d8e30a0378aa..327dfbc640a9 100644 --- a/Documentation/filesystems/configfs/configfs_example_macros.c +++ b/Documentation/filesystems/configfs/configfs_example_macros.c @@ -427,9 +427,8 @@ static int __init configfs_example_init(void) return 0; out_unregister: - for (; i >= 0; i--) { + for (i--; i >= 0; i--) configfs_unregister_subsystem(example_subsys[i]); - } return ret; } @@ -438,9 +437,8 @@ static void __exit configfs_example_exit(void) { int i; - for (i = 0; example_subsys[i]; i++) { + for (i = 0; example_subsys[i]; i++) configfs_unregister_subsystem(example_subsys[i]); - } } module_init(configfs_example_init); -- cgit v1.2.1 From f780bdb7c1c73009cb57adcf99ef50027d80bf3c Mon Sep 17 00:00:00 2001 From: Ben Blum Date: Thu, 26 May 2011 16:25:19 -0700 Subject: cgroups: add per-thread subsystem callbacks Add cgroup subsystem callbacks for per-thread attachment in atomic contexts Add can_attach_task(), pre_attach(), and attach_task() as new callbacks for cgroups's subsystem interface. Unlike can_attach and attach, these are for per-thread operations, to be called potentially many times when attaching an entire threadgroup. Also, the old "bool threadgroup" interface is removed, as replaced by this. All subsystems are modified for the new interface - of note is cpuset, which requires from/to nodemasks for attach to be globally scoped (though per-cpuset would work too) to persist from its pre_attach to attach_task and attach. This is a pre-patch for cgroup-procs-writable.patch. Signed-off-by: Ben Blum Cc: "Eric W. Biederman" Cc: Li Zefan Cc: Matt Helsley Reviewed-by: Paul Menage Cc: Oleg Nesterov Cc: David Rientjes Cc: Miao Xie Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cgroups.txt | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index aedf1bd02fdd..b3bd3bdbe202 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -575,7 +575,7 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be called multiple times against a cgroup. int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct task_struct *task, bool threadgroup) + struct task_struct *task) (cgroup_mutex held by caller) Called prior to moving a task into a cgroup; if the subsystem @@ -584,9 +584,14 @@ task is passed, then a successful result indicates that *any* unspecified task can be moved into the cgroup. Note that this isn't called on a fork. If this method returns 0 (success) then this should remain valid while the caller holds cgroup_mutex and it is ensured that either -attach() or cancel_attach() will be called in future. If threadgroup is -true, then a successful result indicates that all threads in the given -thread's threadgroup can be moved together. +attach() or cancel_attach() will be called in future. + +int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk); +(cgroup_mutex held by caller) + +As can_attach, but for operations that must be run once per task to be +attached (possibly many when using cgroup_attach_proc). Called after +can_attach. void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, struct task_struct *task, bool threadgroup) @@ -598,15 +603,24 @@ function, so that the subsystem can implement a rollback. If not, not necessary. This will be called only about subsystems whose can_attach() operation have succeeded. +void pre_attach(struct cgroup *cgrp); +(cgroup_mutex held by caller) + +For any non-per-thread attachment work that needs to happen before +attach_task. Needed by cpuset. + void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup *old_cgrp, struct task_struct *task, - bool threadgroup) + struct cgroup *old_cgrp, struct task_struct *task) (cgroup_mutex held by caller) Called after the task has been attached to the cgroup, to allow any post-attachment activity that requires memory allocations or blocking. -If threadgroup is true, the subsystem should take care of all threads -in the specified thread's threadgroup. Currently does not support any + +void attach_task(struct cgroup *cgrp, struct task_struct *tsk); +(cgroup_mutex held by caller) + +As attach, but for operations that must be run once per task to be attached, +like can_attach_task. Called before attach. Currently does not support any subsystem that might need the old_cgrp for every thread in the group. void fork(struct cgroup_subsy *ss, struct task_struct *task) -- cgit v1.2.1 From 74a1166dfe1135dcc168d35fa5261aa7e087011b Mon Sep 17 00:00:00 2001 From: Ben Blum Date: Thu, 26 May 2011 16:25:20 -0700 Subject: cgroups: make procs file writable Make procs file writable to move all threads by tgid at once. Add functionality that enables users to move all threads in a threadgroup at once to a cgroup by writing the tgid to the 'cgroup.procs' file. This current implementation makes use of a per-threadgroup rwsem that's taken for reading in the fork() path to prevent newly forking threads within the threadgroup from "escaping" while the move is in progress. Signed-off-by: Ben Blum Cc: "Eric W. Biederman" Cc: Li Zefan Cc: Matt Helsley Reviewed-by: Paul Menage Cc: Oleg Nesterov Cc: David Rientjes Cc: Miao Xie Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cgroups.txt | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index b3bd3bdbe202..8c4f3466c894 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -236,7 +236,8 @@ containing the following files describing that cgroup: - cgroup.procs: list of tgids in the cgroup. This list is not guaranteed to be sorted or free of duplicate tgids, and userspace should sort/uniquify the list if this property is required. - This is a read-only file, for now. + Writing a thread group id into this file moves all threads in that + group into this cgroup. - notify_on_release flag: run the release agent on exit? - release_agent: the path to use for release notifications (this file exists in the top cgroup only) @@ -430,6 +431,12 @@ You can attach the current shell task by echoing 0: # echo 0 > tasks +You can use the cgroup.procs file instead of the tasks file to move all +threads in a threadgroup at once. Echoing the pid of any task in a +threadgroup to cgroup.procs causes all tasks in that threadgroup to be +be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks +in the writing task's threadgroup. + Note: Since every task is always a member of exactly one cgroup in each mounted hierarchy, to remove a task from its current cgroup you must move it into a new cgroup (possibly the root cgroup) by writing to the -- cgit v1.2.1 From a77aea92010acf54ad785047234418d5d68772e2 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Thu, 26 May 2011 16:25:23 -0700 Subject: cgroup: remove the ns_cgroup The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and leads to some problems: * cgroup creation is out-of-control * cgroup name can conflict when pids are looping * it is not possible to have a single process handling a lot of namespaces without falling in a exponential creation time * we may want to create a namespace without creating a cgroup The ns_cgroup was replaced by a compatibility flag 'clone_children', where a newly created cgroup will copy the parent cgroup values. The userspace has to manually create a cgroup and add a task to the 'tasks' file. This patch removes the ns_cgroup as suggested in the following thread: https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html The 'cgroup_clone' function is removed because it is no longer used. This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a printk warning users that the feature is planned for removal. Since that time we have heard from XXX users who were affected by this. Signed-off-by: Daniel Lezcano Signed-off-by: Serge E. Hallyn Cc: Eric W. Biederman Cc: Jamal Hadi Salim Reviewed-by: Li Zefan Acked-by: Paul Menage Acked-by: Matt Helsley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cgroups.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 8c4f3466c894..0ed99f08f1f3 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -651,7 +651,7 @@ always handled well. void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp) (cgroup_mutex held by caller) -Called at the end of cgroup_clone() to do any parameter +Called during cgroup_create() to do any parameter initialization which might be required before a task could attach. For example in cpusets, no task may attach before 'cpus' and 'mems' are set up. -- cgit v1.2.1 From 57cc083ad9e1bfeeb4a0ee831e7bb008c8865bf0 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Thu, 26 May 2011 16:25:46 -0700 Subject: coredump: add support for exe_file in core name Now, exe_file is not proc FS dependent, so we can use it to name core file. So we add %E pattern for core file name cration which extract path from mm_struct->exe_file. Then it converts slashes to exclamation marks and pastes the result to the core file name itself. This is useful for environments where binary names are longer than 16 character (the current->comm limitation). Also where there are binaries with same name but in a different path. Further in case the binery itself changes its current->comm after exec. So by doing (s/$/#/ -- # is treated as git comment): $ sysctl kernel.core_pattern='core.%p.%e.%E' $ ln /bin/cat cat45678901234567890 $ ./cat45678901234567890 ^Z $ rm cat45678901234567890 $ fg ^\Quit (core dumped) $ ls core* we now get: core.2434.cat456789012345.!root!cat45678901234567890 (deleted) Signed-off-by: Jiri Slaby Cc: Al Viro Cc: Alan Cox Reviewed-by: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/kernel.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 36f007514db3..5e7cb39ad195 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -161,7 +161,8 @@ core_pattern is used to specify a core dumpfile pattern name. %s signal number %t UNIX time of dump %h hostname - %e executable filename + %e executable filename (may be shortened) + %E executable path % both are dropped . If the first character of the pattern is a '|', the kernel will treat the rest of the pattern as a command to run. The core dump will be -- cgit v1.2.1 From 492c826b9facefa84995f4dea917e301b5ee0884 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Sun, 8 May 2011 22:30:18 +0100 Subject: regulator: Remove supply_regulator_dev from machine configuration supply_regulator_dev (using a struct pointer) has been deprecated in favour of supply_regulator (using a regulator name) for quite a few releases now with a warning generated if it is used and there are no current in tree users so just remove the code. Signed-off-by: Mark Brown Signed-off-by: Liam Girdwood --- Documentation/power/regulator/machine.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/power/regulator/machine.txt b/Documentation/power/regulator/machine.txt index bdec39b9bd75..b42419b52e44 100644 --- a/Documentation/power/regulator/machine.txt +++ b/Documentation/power/regulator/machine.txt @@ -53,11 +53,11 @@ static struct regulator_init_data regulator1_data = { Regulator-1 supplies power to Regulator-2. This relationship must be registered with the core so that Regulator-1 is also enabled when Consumer A enables its -supply (Regulator-2). The supply regulator is set by the supply_regulator_dev +supply (Regulator-2). The supply regulator is set by the supply_regulator field below:- static struct regulator_init_data regulator2_data = { - .supply_regulator_dev = &platform_regulator1_device.dev, + .supply_regulator = "regulator_name", .constraints = { .min_uV = 1800000, .max_uV = 2000000, -- cgit v1.2.1 From aa38572954ade525817fe88c54faebf85e5a61c0 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 27 May 2011 06:53:02 -0400 Subject: fs: pass exact type of data dirties to ->dirty_inode Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or anything else, so that the filesystem can track internally if it needs to push out a transaction for fdatasync or not. This is just the prototype change with no user for it yet. I plan to push large XFS changes for the next merge window, and getting this trivial infrastructure in this window would help a lot to avoid tree interdependencies. Also remove incorrect comments that ->dirty_inode can't block. That has been changed a long time ago, and many implementations rely on it. Signed-off-by: Christoph Hellwig Signed-off-by: Al Viro --- Documentation/filesystems/Locking | 4 ++-- Documentation/filesystems/vfs.txt | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 61b31acb9176..57d827d6071d 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -104,7 +104,7 @@ of the locking scheme for directory operations. prototypes: struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); - void (*dirty_inode) (struct inode *); + void (*dirty_inode) (struct inode *, int flags); int (*write_inode) (struct inode *, struct writeback_control *wbc); int (*drop_inode) (struct inode *); void (*evict_inode) (struct inode *); @@ -126,7 +126,7 @@ locking rules: s_umount alloc_inode: destroy_inode: -dirty_inode: (must not sleep) +dirty_inode: write_inode: drop_inode: !!!inode->i_lock!!! evict_inode: diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 21a7dc467bba..88b9f5519af9 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -211,7 +211,7 @@ struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); - void (*dirty_inode) (struct inode *); + void (*dirty_inode) (struct inode *, int flags); int (*write_inode) (struct inode *, int); void (*drop_inode) (struct inode *); void (*delete_inode) (struct inode *); -- cgit v1.2.1 From 020036678e810be10a352339faa9a1df2bd8f5a3 Mon Sep 17 00:00:00 2001 From: Carlos Corbacho Date: Mon, 2 May 2011 09:57:23 +0100 Subject: acer-wmi: Delete out-of-date documentation The documentation file for acer-wmi is long out of date, and there's not much point in keeping it around either. Signed-off-by: Carlos Corbacho Signed-off-by: Matthew Garrett --- Documentation/laptops/acer-wmi.txt | 184 ------------------------------------- 1 file changed, 184 deletions(-) delete mode 100644 Documentation/laptops/acer-wmi.txt (limited to 'Documentation') diff --git a/Documentation/laptops/acer-wmi.txt b/Documentation/laptops/acer-wmi.txt deleted file mode 100644 index 4beafa663dd6..000000000000 --- a/Documentation/laptops/acer-wmi.txt +++ /dev/null @@ -1,184 +0,0 @@ -Acer Laptop WMI Extras Driver -http://code.google.com/p/aceracpi -Version 0.3 -4th April 2009 - -Copyright 2007-2009 Carlos Corbacho - -acer-wmi is a driver to allow you to control various parts of your Acer laptop -hardware under Linux which are exposed via ACPI-WMI. - -This driver completely replaces the old out-of-tree acer_acpi, which I am -currently maintaining for bug fixes only on pre-2.6.25 kernels. All development -work is now focused solely on acer-wmi. - -Disclaimer -********** - -Acer and Wistron have provided nothing towards the development acer_acpi or -acer-wmi. All information we have has been through the efforts of the developers -and the users to discover as much as possible about the hardware. - -As such, I do warn that this could break your hardware - this is extremely -unlikely of course, but please bear this in mind. - -Background -********** - -acer-wmi is derived from acer_acpi, originally developed by Mark -Smith in 2005, then taken over by Carlos Corbacho in 2007, in order to activate -the wireless LAN card under a 64-bit version of Linux, as acerhk[1] (the -previous solution to the problem) relied on making 32 bit BIOS calls which are -not possible in kernel space from a 64 bit OS. - -[1] acerhk: http://www.cakey.de/acerhk/ - -Supported Hardware -****************** - -NOTE: The Acer Aspire One is not supported hardware. It cannot work with -acer-wmi until Acer fix their ACPI-WMI implementation on them, so has been -blacklisted until that happens. - -Please see the website for the current list of known working hardware: - -http://code.google.com/p/aceracpi/wiki/SupportedHardware - -If your laptop is not listed, or listed as unknown, and works with acer-wmi, -please contact me with a copy of the DSDT. - -If your Acer laptop doesn't work with acer-wmi, I would also like to see the -DSDT. - -To send me the DSDT, as root/sudo: - -cat /sys/firmware/acpi/tables/DSDT > dsdt - -And send me the resulting 'dsdt' file. - -Usage -***** - -On Acer laptops, acer-wmi should already be autoloaded based on DMI matching. -For non-Acer laptops, until WMI based autoloading support is added, you will -need to manually load acer-wmi. - -acer-wmi creates /sys/devices/platform/acer-wmi, and fills it with various -files whose usage is detailed below, which enables you to control some of the -following (varies between models): - -* the wireless LAN card radio -* inbuilt Bluetooth adapter -* inbuilt 3G card -* mail LED of your laptop -* brightness of the LCD panel - -Wireless -******** - -With regards to wireless, all acer-wmi does is enable the radio on the card. It -is not responsible for the wireless LED - once the radio is enabled, this is -down to the wireless driver for your card. So the behaviour of the wireless LED, -once you enable the radio, will depend on your hardware and driver combination. - -e.g. With the BCM4318 on the Acer Aspire 5020 series: - -ndiswrapper: Light blinks on when transmitting -b43: Solid light, blinks off when transmitting - -Wireless radio control is unconditionally enabled - all Acer laptops that support -acer-wmi come with built-in wireless. However, should you feel so inclined to -ever wish to remove the card, or swap it out at some point, please get in touch -with me, as we may well be able to gain some data on wireless card detection. - -The wireless radio is exposed through rfkill. - -Bluetooth -********* - -For bluetooth, this is an internal USB dongle, so once enabled, you will get -a USB device connection event, and a new USB device appears. When you disable -bluetooth, you get the reverse - a USB device disconnect event, followed by the -device disappearing again. - -Bluetooth is autodetected by acer-wmi, so if you do not have a bluetooth module -installed in your laptop, this file won't exist (please be aware that it is -quite common for Acer not to fit bluetooth to their laptops - so just because -you have a bluetooth button on the laptop, doesn't mean that bluetooth is -installed). - -For the adventurously minded - if you want to buy an internal bluetooth -module off the internet that is compatible with your laptop and fit it, then -it will work just fine with acer-wmi. - -Bluetooth is exposed through rfkill. - -3G -** - -3G is currently not autodetected, so the 'threeg' file is always created under -sysfs. So far, no-one in possession of an Acer laptop with 3G built-in appears to -have tried Linux, or reported back, so we don't have any information on this. - -If you have an Acer laptop that does have a 3G card in, please contact me so we -can properly detect these, and find out a bit more about them. - -To read the status of the 3G card (0=off, 1=on): -cat /sys/devices/platform/acer-wmi/threeg - -To enable the 3G card: -echo 1 > /sys/devices/platform/acer-wmi/threeg - -To disable the 3G card: -echo 0 > /sys/devices/platform/acer-wmi/threeg - -To set the state of the 3G card when loading acer-wmi, pass: -threeg=X (where X is 0 or 1) - -Mail LED -******** - -This can be found in most older Acer laptops supported by acer-wmi, and many -newer ones - it is built into the 'mail' button, and blinks when active. - -On newer (WMID) laptops though, we have no way of detecting the mail LED. If -your laptop identifies itself in dmesg as a WMID model, then please try loading -acer_acpi with: - -force_series=2490 - -This will use a known alternative method of reading/ writing the mail LED. If -it works, please report back to me with the DMI data from your laptop so this -can be added to acer-wmi. - -The LED is exposed through the LED subsystem, and can be found in: - -/sys/devices/platform/acer-wmi/leds/acer-wmi::mail/ - -The mail LED is autodetected, so if you don't have one, the LED device won't -be registered. - -Backlight -********* - -The backlight brightness control is available on all acer-wmi supported -hardware. The maximum brightness level is usually 15, but on some newer laptops -it's 10 (this is again autodetected). - -The backlight is exposed through the backlight subsystem, and can be found in: - -/sys/devices/platform/acer-wmi/backlight/acer-wmi/ - -Credits -******* - -Olaf Tauber, who did the real hard work when he developed acerhk -http://www.cakey.de/acerhk/ -All the authors of laptop ACPI modules in the kernel, whose work -was an inspiration in the early days of acer_acpi -Mathieu Segaud, who solved the problem with having to modprobe the driver -twice in acer_acpi 0.2. -Jim Ramsay, who added support for the WMID interface -Mark Smith, who started the original acer_acpi - -And the many people who have used both acer_acpi and acer-wmi. -- cgit v1.2.1 From f62508f68d04adefc4cf9b0177ba02c8818b3eec Mon Sep 17 00:00:00 2001 From: Juri Lelli Date: Thu, 19 May 2011 12:53:53 +0200 Subject: Documentation: Add statistics about nested locks Explain what the trailing "/1" on some lock class names of lock_stat output means. Reviewed-by: Yong Zhang Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/4DD4F6C1.5090701@gmail.com Signed-off-by: Ingo Molnar --- Documentation/lockstat.txt | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt index 9c0a80d17a23..cef00d42ed5b 100644 --- a/Documentation/lockstat.txt +++ b/Documentation/lockstat.txt @@ -12,8 +12,9 @@ Because things like lock contention can severely impact performance. - HOW Lockdep already has hooks in the lock functions and maps lock instances to -lock classes. We build on that. The graph below shows the relation between -the lock functions and the various hooks therein. +lock classes. We build on that (see Documentation/lockdep-design.txt). +The graph below shows the relation between the lock functions and the various +hooks therein. __acquire | @@ -128,6 +129,37 @@ points are the points we're contending with. The integer part of the time values is in us. +Dealing with nested locks, subclasses may appear: + +32............................................................................................................................................................................................... +33 +34 &rq->lock: 13128 13128 0.43 190.53 103881.26 97454 3453404 0.00 401.11 13224683.11 +35 --------- +36 &rq->lock 645 [] task_rq_lock+0x43/0x75 +37 &rq->lock 297 [] try_to_wake_up+0x127/0x25a +38 &rq->lock 360 [] select_task_rq_fair+0x1f0/0x74a +39 &rq->lock 428 [] scheduler_tick+0x46/0x1fb +40 --------- +41 &rq->lock 77 [] task_rq_lock+0x43/0x75 +42 &rq->lock 174 [] try_to_wake_up+0x127/0x25a +43 &rq->lock 4715 [] double_rq_lock+0x42/0x54 +44 &rq->lock 893 [] schedule+0x157/0x7b8 +45 +46............................................................................................................................................................................................... +47 +48 &rq->lock/1: 11526 11488 0.33 388.73 136294.31 21461 38404 0.00 37.93 109388.53 +49 ----------- +50 &rq->lock/1 11526 [] double_rq_lock+0x4f/0x54 +51 ----------- +52 &rq->lock/1 5645 [] double_rq_lock+0x42/0x54 +53 &rq->lock/1 1224 [] schedule+0x157/0x7b8 +54 &rq->lock/1 4336 [] double_rq_lock+0x4f/0x54 +55 &rq->lock/1 181 [] try_to_wake_up+0x127/0x25a + +Line 48 shows statistics for the second subclass (/1) of &rq->lock class +(subclass starts from 0), since in this case, as line 50 suggests, +double_rq_lock actually acquires a nested lock of two spinlocks. + View the top contending locks: # grep : /proc/lock_stat | head -- cgit v1.2.1 From 526b4af47f44148c9d665e57723ed9f86634c6e3 Mon Sep 17 00:00:00 2001 From: Thomas Renninger Date: Thu, 26 May 2011 12:26:24 +0200 Subject: ACPI: Split out custom_method functionality into an own driver With /sys/kernel/debug/acpi/custom_method root can write to arbitrary memory and increase his priveleges, even if these are restricted. -> Make this an own debug .config option and warn about the security issue in the config description. -> Still keep acpi/debugfs.c which now only creates an empty /sys/kernel/debug/acpi directory. There might be other users of it later. Signed-off-by: Thomas Renninger Acked-by: Rafael J. Wysocki Acked-by: rui.zhang@intel.com Signed-off-by: Len Brown --- Documentation/acpi/method-customizing.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation') diff --git a/Documentation/acpi/method-customizing.txt b/Documentation/acpi/method-customizing.txt index 3e1d25aee3fb..5f55373dd53b 100644 --- a/Documentation/acpi/method-customizing.txt +++ b/Documentation/acpi/method-customizing.txt @@ -66,3 +66,8 @@ Note: We can use a kernel with multiple custom ACPI method running, But each individual write to debugfs can implement a SINGLE method override. i.e. if we want to insert/override multiple ACPI methods, we need to redo step c) ~ g) for multiple times. + +Note: Be aware that root can mis-use this driver to modify arbitrary + memory and gain additional rights, if root's privileges got + restricted (for example if root is not allowed to load additional + modules after boot). -- cgit v1.2.1 From 3b70b2e5fcf6315eb833a1bcc2b810bdc75484ff Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 1 Apr 2011 15:08:48 -0400 Subject: x86 idle floppy: deprecate disable_hlt() Plan to remove floppy_disable_hlt in 2012, an ancient workaround with comments that it should be removed. This allows us to remove clutter and a run-time branch from the idle code. WARN_ONCE() on invocation until it is removed. cc: x86@kernel.org cc: stable@kernel.org # .39.x Signed-off-by: Len Brown --- Documentation/feature-removal-schedule.txt | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index b3f35e5f9c95..5540615ac26c 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -6,6 +6,14 @@ be removed from this file. --------------------------- +What: x86 floppy disable_hlt +When: 2012 +Why: ancient workaround of dubious utility clutters the + code used by everybody else. +Who: Len Brown + +--------------------------- + What: PRISM54 When: 2.6.34 -- cgit v1.2.1 From 99c63221435963e0cee2402686ba99293c2ffa9e Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 1 Apr 2011 15:19:23 -0400 Subject: x86 idle APM: deprecate CONFIG_APM_CPU_IDLE We don't want to export the pm_idle function pointer to modules. Currently CONFIG_APM_CPU_IDLE w/ CONFIG_APM_MODULE forces us to. CONFIG_APM_CPU_IDLE is of dubious value, it runs only on 32-bit uniprocessor laptops that are over 10 years old. It calls into the BIOS during idle, and is known to cause a number of machines to fail. Removing CONFIG_APM_CPU_IDLE and will allow us to stop exporting pm_idle. Any systems that were calling into the APM BIOS at run-time will simply use HLT instead. cc: x86@kernel.org cc: Jiri Kosina cc: stable@kernel.org # .39.x Signed-off-by: Len Brown --- Documentation/feature-removal-schedule.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 5540615ac26c..fc505c1b4762 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -14,6 +14,16 @@ Who: Len Brown --------------------------- +What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle +When: 2012 +Why: This optional sub-feature of APM is of dubious reliability, + and ancient APM laptops are likely better served by calling HLT. + Deleting CONFIG_APM_CPU_IDLE allows x86 to stop exporting + the pm_idle function pointer to modules. +Who: Len Brown + +---------------------------- + What: PRISM54 When: 2.6.34 -- cgit v1.2.1 From cdaab4a0d330f70c0e5ad8c3f7c65c2e375ea180 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 1 Apr 2011 15:41:17 -0400 Subject: x86 idle: deprecate "no-hlt" cmdline param We'd rather that modern machines not check if HLT works on every entry into idle, for the benefit of machines that had marginal electricals 15-years ago. If those machines are still running the upstream kernel, they can use "idle=poll". The only difference will be that they'll now invoke HLT in machine_hlt(). cc: x86@kernel.org # .39.x Signed-off-by: Len Brown --- Documentation/feature-removal-schedule.txt | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index fc505c1b4762..f1b0eb02aa1d 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -24,6 +24,17 @@ Who: Len Brown ---------------------------- +What: x86_32 "no-hlt" cmdline param +When: 2012 +Why: remove a branch from idle path, simplify code used by everybody. + This option disabled the use of HLT in idle and machine_halt() + for hardware that was flakey 15-years ago. Today we have + "idle=poll" that removed HLT from idle, and so if such a machine + is still running the upstream kernel, "idle=poll" is likely sufficient. +Who: Len Brown + +---------------------------- + What: PRISM54 When: 2.6.34 -- cgit v1.2.1 From 5d4c47e0195b989f284907358bd5c268a44b91c7 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 1 Apr 2011 15:46:09 -0400 Subject: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param mwait_idle() is a C1-only idle loop intended to be more efficient than HLT on SMP hardware that supports it. But mwait_idle() has been replaced by the more general mwait_idle_with_hints(), which handles both C1 and deeper C-states. ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle(). Deprecate mwait_idle() and the "idle=mwait" cmdline param to simplify the x86 idle code. After this change, kernels configured with (!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware that support MWAIT will simply use HLT. If MWAIT is desired on those systems, cpuidle and the cpuidle drivers above can be used. cc: x86@kernel.org cc: stable@kernel.org # .39.x Signed-off-by: Len Brown --- Documentation/feature-removal-schedule.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index f1b0eb02aa1d..13f0bb3187bc 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -35,6 +35,13 @@ Who: Len Brown ---------------------------- +What: x86 "idle=mwait" cmdline param +When: 2012 +Why: simplify x86 idle code +Who: Len Brown + +---------------------------- + What: PRISM54 When: 2.6.34 -- cgit v1.2.1