summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/DocBook/tracepoint.tmpl13
-rw-r--r--Documentation/RCU/NMI-RCU.txt39
-rw-r--r--Documentation/RCU/checklist.txt7
-rw-r--r--Documentation/RCU/lockdep.txt28
-rw-r--r--Documentation/RCU/whatisRCU.txt6
-rw-r--r--Documentation/block/biodoc.txt4
-rw-r--r--Documentation/input/multi-touch-protocol.txt23
-rw-r--r--Documentation/kernel-parameters.txt5
-rw-r--r--Documentation/kvm/api.txt164
-rw-r--r--Documentation/networking/timestamping.txt76
10 files changed, 298 insertions, 67 deletions
diff --git a/Documentation/DocBook/tracepoint.tmpl b/Documentation/DocBook/tracepoint.tmpl
index 8bca1d5cec09..e8473eae2a20 100644
--- a/Documentation/DocBook/tracepoint.tmpl
+++ b/Documentation/DocBook/tracepoint.tmpl
@@ -16,6 +16,15 @@
</address>
</affiliation>
</author>
+ <author>
+ <firstname>William</firstname>
+ <surname>Cohen</surname>
+ <affiliation>
+ <address>
+ <email>wcohen@redhat.com</email>
+ </address>
+ </affiliation>
+ </author>
</authorgroup>
<legalnotice>
@@ -91,4 +100,8 @@
!Iinclude/trace/events/signal.h
</chapter>
+ <chapter id="block">
+ <title>Block IO</title>
+!Iinclude/trace/events/block.h
+ </chapter>
</book>
diff --git a/Documentation/RCU/NMI-RCU.txt b/Documentation/RCU/NMI-RCU.txt
index a6d32e65d222..a8536cb88091 100644
--- a/Documentation/RCU/NMI-RCU.txt
+++ b/Documentation/RCU/NMI-RCU.txt
@@ -34,7 +34,7 @@ NMI handler.
cpu = smp_processor_id();
++nmi_count(cpu);
- if (!rcu_dereference(nmi_callback)(regs, cpu))
+ if (!rcu_dereference_sched(nmi_callback)(regs, cpu))
default_do_nmi(regs);
nmi_exit();
@@ -47,12 +47,13 @@ function pointer. If this handler returns zero, do_nmi() invokes the
default_do_nmi() function to handle a machine-specific NMI. Finally,
preemption is restored.
-Strictly speaking, rcu_dereference() is not needed, since this code runs
-only on i386, which does not need rcu_dereference() anyway. However,
-it is a good documentation aid, particularly for anyone attempting to
-do something similar on Alpha.
+In theory, rcu_dereference_sched() is not needed, since this code runs
+only on i386, which in theory does not need rcu_dereference_sched()
+anyway. However, in practice it is a good documentation aid, particularly
+for anyone attempting to do something similar on Alpha or on systems
+with aggressive optimizing compilers.
-Quick Quiz: Why might the rcu_dereference() be necessary on Alpha,
+Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha,
given that the code referenced by the pointer is read-only?
@@ -99,17 +100,21 @@ invoke irq_enter() and irq_exit() on NMI entry and exit, respectively.
Answer to Quick Quiz
- Why might the rcu_dereference() be necessary on Alpha, given
+ Why might the rcu_dereference_sched() be necessary on Alpha, given
that the code referenced by the pointer is read-only?
Answer: The caller to set_nmi_callback() might well have
- initialized some data that is to be used by the
- new NMI handler. In this case, the rcu_dereference()
- would be needed, because otherwise a CPU that received
- an NMI just after the new handler was set might see
- the pointer to the new NMI handler, but the old
- pre-initialized version of the handler's data.
-
- More important, the rcu_dereference() makes it clear
- to someone reading the code that the pointer is being
- protected by RCU.
+ initialized some data that is to be used by the new NMI
+ handler. In this case, the rcu_dereference_sched() would
+ be needed, because otherwise a CPU that received an NMI
+ just after the new handler was set might see the pointer
+ to the new NMI handler, but the old pre-initialized
+ version of the handler's data.
+
+ This same sad story can happen on other CPUs when using
+ a compiler with aggressive pointer-value speculation
+ optimizations.
+
+ More important, the rcu_dereference_sched() makes it
+ clear to someone reading the code that the pointer is
+ being protected by RCU-sched.
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index cbc180f90194..790d1a812376 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -260,7 +260,8 @@ over a rather long period of time, but improvements are always welcome!
The reason that it is permissible to use RCU list-traversal
primitives when the update-side lock is held is that doing so
can be quite helpful in reducing code bloat when common code is
- shared between readers and updaters.
+ shared between readers and updaters. Additional primitives
+ are provided for this case, as discussed in lockdep.txt.
10. Conversely, if you are in an RCU read-side critical section,
and you don't hold the appropriate update-side lock, you -must-
@@ -344,8 +345,8 @@ over a rather long period of time, but improvements are always welcome!
requiring SRCU's read-side deadlock immunity or low read-side
realtime latency.
- Note that, rcu_assign_pointer() and rcu_dereference() relate to
- SRCU just as they do to other forms of RCU.
+ Note that, rcu_assign_pointer() relates to SRCU just as they do
+ to other forms of RCU.
15. The whole point of call_rcu(), synchronize_rcu(), and friends
is to wait until all pre-existing readers have finished before
diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt
index fe24b58627bd..d7a49b2f6994 100644
--- a/Documentation/RCU/lockdep.txt
+++ b/Documentation/RCU/lockdep.txt
@@ -32,9 +32,20 @@ checking of rcu_dereference() primitives:
srcu_dereference(p, sp):
Check for SRCU read-side critical section.
rcu_dereference_check(p, c):
- Use explicit check expression "c".
+ Use explicit check expression "c". This is useful in
+ code that is invoked by both readers and updaters.
rcu_dereference_raw(p)
Don't check. (Use sparingly, if at all.)
+ rcu_dereference_protected(p, c):
+ Use explicit check expression "c", and omit all barriers
+ and compiler constraints. This is useful when the data
+ structure cannot change, for example, in code that is
+ invoked only by updaters.
+ rcu_access_pointer(p):
+ Return the value of the pointer and omit all barriers,
+ but retain the compiler constraints that prevent duplicating
+ or coalescsing. This is useful when when testing the
+ value of the pointer itself, for example, against NULL.
The rcu_dereference_check() check expression can be any boolean
expression, but would normally include one of the rcu_read_lock_held()
@@ -59,7 +70,20 @@ In case (1), the pointer is picked up in an RCU-safe manner for vanilla
RCU read-side critical sections, in case (2) the ->file_lock prevents
any change from taking place, and finally, in case (3) the current task
is the only task accessing the file_struct, again preventing any change
-from taking place.
+from taking place. If the above statement was invoked only from updater
+code, it could instead be written as follows:
+
+ file = rcu_dereference_protected(fdt->fd[fd],
+ lockdep_is_held(&files->file_lock) ||
+ atomic_read(&files->count) == 1);
+
+This would verify cases #2 and #3 above, and furthermore lockdep would
+complain if this was used in an RCU read-side critical section unless one
+of these two cases held. Because rcu_dereference_protected() omits all
+barriers and compiler constraints, it generates better code than do the
+other flavors of rcu_dereference(). On the other hand, it is illegal
+to use rcu_dereference_protected() if either the RCU-protected pointer
+or the RCU-protected data that it points to can change concurrently.
There are currently only "universal" versions of the rcu_assign_pointer()
and RCU list-/tree-traversal primitives, which do not (yet) check for
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 1dc00ee97163..cfaac34c4557 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -840,6 +840,12 @@ SRCU: Initialization/cleanup
init_srcu_struct
cleanup_srcu_struct
+All: lockdep-checked RCU-protected pointer access
+
+ rcu_dereference_check
+ rcu_dereference_protected
+ rcu_access_pointer
+
See the comment headers in the source code (or the docbook generated
from them) for more information.
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index 6fab97ea7e6b..508b5b2b0289 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -1162,8 +1162,8 @@ where a driver received a request ala this before:
As mentioned, there is no virtual mapping of a bio. For DMA, this is
not a problem as the driver probably never will need a virtual mapping.
-Instead it needs a bus mapping (pci_map_page for a single segment or
-use blk_rq_map_sg for scatter gather) to be able to ship it to the driver. For
+Instead it needs a bus mapping (dma_map_page for a single segment or
+use dma_map_sg for scatter gather) to be able to ship it to the driver. For
PIO drivers (or drivers that need to revert to PIO transfer once in a
while (IDE for example)), where the CPU is doing the actual data
transfer a virtual mapping is needed. If the driver supports highmem I/O,
diff --git a/Documentation/input/multi-touch-protocol.txt b/Documentation/input/multi-touch-protocol.txt
index 8490480ce432..c0fc1c75fd88 100644
--- a/Documentation/input/multi-touch-protocol.txt
+++ b/Documentation/input/multi-touch-protocol.txt
@@ -68,6 +68,22 @@ like:
SYN_MT_REPORT
SYN_REPORT
+Here is the sequence after lifting one of the fingers:
+
+ ABS_MT_POSITION_X
+ ABS_MT_POSITION_Y
+ SYN_MT_REPORT
+ SYN_REPORT
+
+And here is the sequence after lifting the remaining finger:
+
+ SYN_MT_REPORT
+ SYN_REPORT
+
+If the driver reports one of BTN_TOUCH or ABS_PRESSURE in addition to the
+ABS_MT events, the last SYN_MT_REPORT event may be omitted. Otherwise, the
+last SYN_REPORT will be dropped by the input core, resulting in no
+zero-finger event reaching userland.
Event Semantics
---------------
@@ -217,11 +233,6 @@ where examples can be found.
difference between the contact position and the approaching tool position
could be used to derive tilt.
[2] The list can of course be extended.
-[3] The multi-touch X driver is currently in the prototyping stage. At the
-time of writing (April 2009), the MT protocol is not yet merged, and the
-prototype implements finger matching, basic mouse support and two-finger
-scrolling. The project aims at improving the quality of current multi-touch
-functionality available in the Synaptics X driver, and in addition
-implement more advanced gestures.
+[3] Multitouch X driver project: http://bitmath.org/code/multitouch/.
[4] See the section on event computation.
[5] See the section on finger tracking.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e4cbca58536c..e2202e93b148 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -320,11 +320,6 @@ and is between 256 and 4096 characters. It is defined in the file
amd_iommu= [HW,X86-84]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
- isolate - enable device isolation (each device, as far
- as possible, will get its own protection
- domain) [default]
- share - put every device behind one IOMMU into the
- same protection domain
fullflush - enable flushing of IO/TLB entries when
they are unmapped. Otherwise they are
flushed before they will be reused, which
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index c6416a398163..baa8fde8bd16 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -656,6 +656,7 @@ struct kvm_clock_data {
4.29 KVM_GET_VCPU_EVENTS
Capability: KVM_CAP_VCPU_EVENTS
+Extended by: KVM_CAP_INTR_SHADOW
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_vcpu_event (out)
@@ -676,7 +677,7 @@ struct kvm_vcpu_events {
__u8 injected;
__u8 nr;
__u8 soft;
- __u8 pad;
+ __u8 shadow;
} interrupt;
struct {
__u8 injected;
@@ -688,9 +689,13 @@ struct kvm_vcpu_events {
__u32 flags;
};
+KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
+interrupt.shadow contains a valid state. Otherwise, this field is undefined.
+
4.30 KVM_SET_VCPU_EVENTS
Capability: KVM_CAP_VCPU_EVENTS
+Extended by: KVM_CAP_INTR_SHADOW
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_vcpu_event (in)
@@ -709,6 +714,139 @@ current in-kernel state. The bits are:
KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
+If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
+the flags field to signal that interrupt.shadow contains a valid state and
+shall be written into the VCPU.
+
+4.32 KVM_GET_DEBUGREGS
+
+Capability: KVM_CAP_DEBUGREGS
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_debugregs (out)
+Returns: 0 on success, -1 on error
+
+Reads debug registers from the vcpu.
+
+struct kvm_debugregs {
+ __u64 db[4];
+ __u64 dr6;
+ __u64 dr7;
+ __u64 flags;
+ __u64 reserved[9];
+};
+
+4.33 KVM_SET_DEBUGREGS
+
+Capability: KVM_CAP_DEBUGREGS
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_debugregs (in)
+Returns: 0 on success, -1 on error
+
+Writes debug registers into the vcpu.
+
+See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
+yet and must be cleared on entry.
+
+4.34 KVM_SET_USER_MEMORY_REGION
+
+Capability: KVM_CAP_USER_MEM
+Architectures: all
+Type: vm ioctl
+Parameters: struct kvm_userspace_memory_region (in)
+Returns: 0 on success, -1 on error
+
+struct kvm_userspace_memory_region {
+ __u32 slot;
+ __u32 flags;
+ __u64 guest_phys_addr;
+ __u64 memory_size; /* bytes */
+ __u64 userspace_addr; /* start of the userspace allocated memory */
+};
+
+/* for kvm_memory_region::flags */
+#define KVM_MEM_LOG_DIRTY_PAGES 1UL
+
+This ioctl allows the user to create or modify a guest physical memory
+slot. When changing an existing slot, it may be moved in the guest
+physical memory space, or its flags may be modified. It may not be
+resized. Slots may not overlap in guest physical address space.
+
+Memory for the region is taken starting at the address denoted by the
+field userspace_addr, which must point at user addressable memory for
+the entire memory slot size. Any object may back this memory, including
+anonymous memory, ordinary files, and hugetlbfs.
+
+It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
+be identical. This allows large pages in the guest to be backed by large
+pages in the host.
+
+The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
+instructs kvm to keep track of writes to memory within the slot. See
+the KVM_GET_DIRTY_LOG ioctl.
+
+When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
+region are automatically reflected into the guest. For example, an mmap()
+that affects the region will be made visible immediately. Another example
+is madvise(MADV_DROP).
+
+It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
+The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
+allocation and is deprecated.
+
+4.35 KVM_SET_TSS_ADDR
+
+Capability: KVM_CAP_SET_TSS_ADDR
+Architectures: x86
+Type: vm ioctl
+Parameters: unsigned long tss_address (in)
+Returns: 0 on success, -1 on error
+
+This ioctl defines the physical address of a three-page region in the guest
+physical address space. The region must be within the first 4GB of the
+guest physical address space and must not conflict with any memory slot
+or any mmio address. The guest may malfunction if it accesses this memory
+region.
+
+This ioctl is required on Intel-based hosts. This is needed on Intel hardware
+because of a quirk in the virtualization implementation (see the internals
+documentation when it pops into existence).
+
+4.36 KVM_ENABLE_CAP
+
+Capability: KVM_CAP_ENABLE_CAP
+Architectures: ppc
+Type: vcpu ioctl
+Parameters: struct kvm_enable_cap (in)
+Returns: 0 on success; -1 on error
+
++Not all extensions are enabled by default. Using this ioctl the application
+can enable an extension, making it available to the guest.
+
+On systems that do not support this ioctl, it always fails. On systems that
+do support it, it only works for extensions that are supported for enablement.
+
+To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
+be used.
+
+struct kvm_enable_cap {
+ /* in */
+ __u32 cap;
+
+The capability that is supposed to get enabled.
+
+ __u32 flags;
+
+A bitfield indicating future enhancements. Has to be 0 for now.
+
+ __u64 args[4];
+
+Arguments for enabling a feature. If a feature needs initial values to
+function properly, this is the place to put them.
+
+ __u8 pad[64];
+};
5. The kvm_run structure
@@ -820,6 +958,13 @@ executed a memory-mapped I/O instruction which could not be satisfied
by kvm. The 'data' member contains the written data if 'is_write' is
true, and should be filled by application code otherwise.
+NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding
+operations are complete (and guest state is consistent) only after userspace
+has re-entered the kernel with KVM_RUN. The kernel side will first finish
+incomplete operations and then check for pending signals. Userspace
+can re-enter the guest with an unmasked signal pending to complete
+pending operations.
+
/* KVM_EXIT_HYPERCALL */
struct {
__u64 nr;
@@ -829,7 +974,9 @@ true, and should be filled by application code otherwise.
__u32 pad;
} hypercall;
-Unused.
+Unused. This was once used for 'hypercall to userspace'. To implement
+such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
+Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
/* KVM_EXIT_TPR_ACCESS */
struct {
@@ -870,6 +1017,19 @@ s390 specific.
powerpc specific.
+ /* KVM_EXIT_OSI */
+ struct {
+ __u64 gprs[32];
+ } osi;
+
+MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
+hypercalls and exit with this exit struct that contains all the guest gprs.
+
+If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
+Userspace can now handle the hypercall and when it's done modify the gprs as
+necessary. Upon guest entry all guest GPRs will then be replaced by the values
+in this struct.
+
/* Fix the size of the union. */
char padding[256];
};
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 0e58b4539176..e8c8f4f06c67 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -41,11 +41,12 @@ SOF_TIMESTAMPING_SOFTWARE: return system time stamp generated in
SOF_TIMESTAMPING_TX/RX determine how time stamps are generated.
SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the
following control message:
- struct scm_timestamping {
- struct timespec systime;
- struct timespec hwtimetrans;
- struct timespec hwtimeraw;
- };
+
+struct scm_timestamping {
+ struct timespec systime;
+ struct timespec hwtimetrans;
+ struct timespec hwtimeraw;
+};
recvmsg() can be used to get this control message for regular incoming
packets. For send time stamps the outgoing packet is looped back to
@@ -87,12 +88,13 @@ by the network device and will be empty without that support.
SIOCSHWTSTAMP:
Hardware time stamping must also be initialized for each device driver
-that is expected to do hardware time stamping. The parameter is:
+that is expected to do hardware time stamping. The parameter is defined in
+/include/linux/net_tstamp.h as:
struct hwtstamp_config {
- int flags; /* no flags defined right now, must be zero */
- int tx_type; /* HWTSTAMP_TX_* */
- int rx_filter; /* HWTSTAMP_FILTER_* */
+ int flags; /* no flags defined right now, must be zero */
+ int tx_type; /* HWTSTAMP_TX_* */
+ int rx_filter; /* HWTSTAMP_FILTER_* */
};
Desired behavior is passed into the kernel and to a specific device by
@@ -139,42 +141,56 @@ enum {
/* time stamp any incoming packet */
HWTSTAMP_FILTER_ALL,
- /* return value: time stamp all packets requested plus some others */
- HWTSTAMP_FILTER_SOME,
+ /* return value: time stamp all packets requested plus some others */
+ HWTSTAMP_FILTER_SOME,
/* PTP v1, UDP, any kind of event packet */
HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
- ...
+ /* for the complete list of values, please check
+ * the include file /include/linux/net_tstamp.h
+ */
};
DEVICE IMPLEMENTATION
A driver which supports hardware time stamping must support the
-SIOCSHWTSTAMP ioctl. Time stamps for received packets must be stored
-in the skb with skb_hwtstamp_set().
+SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with
+the actual values as described in the section on SIOCSHWTSTAMP.
+
+Time stamps for received packets must be stored in the skb. To get a pointer
+to the shared time stamp structure of the skb call skb_hwtstamps(). Then
+set the time stamps in the structure:
+
+struct skb_shared_hwtstamps {
+ /* hardware time stamp transformed into duration
+ * since arbitrary point in time
+ */
+ ktime_t hwtstamp;
+ ktime_t syststamp; /* hwtstamp transformed to system time base */
+};
Time stamps for outgoing packets are to be generated as follows:
-- In hard_start_xmit(), check if skb_hwtstamp_check_tx_hardware()
- returns non-zero. If yes, then the driver is expected
- to do hardware time stamping.
+- In hard_start_xmit(), check if skb_tx(skb)->hardware is set no-zero.
+ If yes, then the driver is expected to do hardware time stamping.
- If this is possible for the skb and requested, then declare
- that the driver is doing the time stamping by calling
- skb_hwtstamp_tx_in_progress(). A driver not supporting
- hardware time stamping doesn't do that. A driver must never
- touch sk_buff::tstamp! It is used to store how time stamping
- for an outgoing packets is to be done.
+ that the driver is doing the time stamping by setting the field
+ skb_tx(skb)->in_progress non-zero. You might want to keep a pointer
+ to the associated skb for the next step and not free the skb. A driver
+ not supporting hardware time stamping doesn't do that. A driver must
+ never touch sk_buff::tstamp! It is used to store software generated
+ time stamps by the network subsystem.
- As soon as the driver has sent the packet and/or obtained a
hardware time stamp for it, it passes the time stamp back by
calling skb_hwtstamp_tx() with the original skb, the raw
- hardware time stamp and a handle to the device (necessary
- to convert the hardware time stamp to system time). If obtaining
- the hardware time stamp somehow fails, then the driver should
- not fall back to software time stamping. The rationale is that
- this would occur at a later time in the processing pipeline
- than other software time stamping and therefore could lead
- to unexpected deltas between time stamps.
-- If the driver did not call skb_hwtstamp_tx_in_progress(), then
+ hardware time stamp. skb_hwtstamp_tx() clones the original skb and
+ adds the timestamps, therefore the original skb has to be freed now.
+ If obtaining the hardware time stamp somehow fails, then the driver
+ should not fall back to software time stamping. The rationale is that
+ this would occur at a later time in the processing pipeline than other
+ software time stamping and therefore could lead to unexpected deltas
+ between time stamps.
+- If the driver did not call set skb_tx(skb)->in_progress, then
dev_hard_start_xmit() checks whether software time stamping
is wanted as fallback and potentially generates the time stamp.
OpenPOWER on IntegriCloud