diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/DocBook/tracepoint.tmpl | 13 | ||||
-rw-r--r-- | Documentation/RCU/NMI-RCU.txt | 39 | ||||
-rw-r--r-- | Documentation/RCU/checklist.txt | 7 | ||||
-rw-r--r-- | Documentation/RCU/lockdep.txt | 28 | ||||
-rw-r--r-- | Documentation/RCU/whatisRCU.txt | 6 | ||||
-rw-r--r-- | Documentation/block/biodoc.txt | 4 | ||||
-rw-r--r-- | Documentation/input/multi-touch-protocol.txt | 23 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 5 | ||||
-rw-r--r-- | Documentation/kvm/api.txt | 164 | ||||
-rw-r--r-- | Documentation/networking/timestamping.txt | 76 |
10 files changed, 298 insertions, 67 deletions
diff --git a/Documentation/DocBook/tracepoint.tmpl b/Documentation/DocBook/tracepoint.tmpl index 8bca1d5cec09..e8473eae2a20 100644 --- a/Documentation/DocBook/tracepoint.tmpl +++ b/Documentation/DocBook/tracepoint.tmpl @@ -16,6 +16,15 @@ </address> </affiliation> </author> + <author> + <firstname>William</firstname> + <surname>Cohen</surname> + <affiliation> + <address> + <email>wcohen@redhat.com</email> + </address> + </affiliation> + </author> </authorgroup> <legalnotice> @@ -91,4 +100,8 @@ !Iinclude/trace/events/signal.h </chapter> + <chapter id="block"> + <title>Block IO</title> +!Iinclude/trace/events/block.h + </chapter> </book> diff --git a/Documentation/RCU/NMI-RCU.txt b/Documentation/RCU/NMI-RCU.txt index a6d32e65d222..a8536cb88091 100644 --- a/Documentation/RCU/NMI-RCU.txt +++ b/Documentation/RCU/NMI-RCU.txt @@ -34,7 +34,7 @@ NMI handler. cpu = smp_processor_id(); ++nmi_count(cpu); - if (!rcu_dereference(nmi_callback)(regs, cpu)) + if (!rcu_dereference_sched(nmi_callback)(regs, cpu)) default_do_nmi(regs); nmi_exit(); @@ -47,12 +47,13 @@ function pointer. If this handler returns zero, do_nmi() invokes the default_do_nmi() function to handle a machine-specific NMI. Finally, preemption is restored. -Strictly speaking, rcu_dereference() is not needed, since this code runs -only on i386, which does not need rcu_dereference() anyway. However, -it is a good documentation aid, particularly for anyone attempting to -do something similar on Alpha. +In theory, rcu_dereference_sched() is not needed, since this code runs +only on i386, which in theory does not need rcu_dereference_sched() +anyway. However, in practice it is a good documentation aid, particularly +for anyone attempting to do something similar on Alpha or on systems +with aggressive optimizing compilers. -Quick Quiz: Why might the rcu_dereference() be necessary on Alpha, +Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only? @@ -99,17 +100,21 @@ invoke irq_enter() and irq_exit() on NMI entry and exit, respectively. Answer to Quick Quiz - Why might the rcu_dereference() be necessary on Alpha, given + Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only? Answer: The caller to set_nmi_callback() might well have - initialized some data that is to be used by the - new NMI handler. In this case, the rcu_dereference() - would be needed, because otherwise a CPU that received - an NMI just after the new handler was set might see - the pointer to the new NMI handler, but the old - pre-initialized version of the handler's data. - - More important, the rcu_dereference() makes it clear - to someone reading the code that the pointer is being - protected by RCU. + initialized some data that is to be used by the new NMI + handler. In this case, the rcu_dereference_sched() would + be needed, because otherwise a CPU that received an NMI + just after the new handler was set might see the pointer + to the new NMI handler, but the old pre-initialized + version of the handler's data. + + This same sad story can happen on other CPUs when using + a compiler with aggressive pointer-value speculation + optimizations. + + More important, the rcu_dereference_sched() makes it + clear to someone reading the code that the pointer is + being protected by RCU-sched. diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index cbc180f90194..790d1a812376 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -260,7 +260,8 @@ over a rather long period of time, but improvements are always welcome! The reason that it is permissible to use RCU list-traversal primitives when the update-side lock is held is that doing so can be quite helpful in reducing code bloat when common code is - shared between readers and updaters. + shared between readers and updaters. Additional primitives + are provided for this case, as discussed in lockdep.txt. 10. Conversely, if you are in an RCU read-side critical section, and you don't hold the appropriate update-side lock, you -must- @@ -344,8 +345,8 @@ over a rather long period of time, but improvements are always welcome! requiring SRCU's read-side deadlock immunity or low read-side realtime latency. - Note that, rcu_assign_pointer() and rcu_dereference() relate to - SRCU just as they do to other forms of RCU. + Note that, rcu_assign_pointer() relates to SRCU just as they do + to other forms of RCU. 15. The whole point of call_rcu(), synchronize_rcu(), and friends is to wait until all pre-existing readers have finished before diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt index fe24b58627bd..d7a49b2f6994 100644 --- a/Documentation/RCU/lockdep.txt +++ b/Documentation/RCU/lockdep.txt @@ -32,9 +32,20 @@ checking of rcu_dereference() primitives: srcu_dereference(p, sp): Check for SRCU read-side critical section. rcu_dereference_check(p, c): - Use explicit check expression "c". + Use explicit check expression "c". This is useful in + code that is invoked by both readers and updaters. rcu_dereference_raw(p) Don't check. (Use sparingly, if at all.) + rcu_dereference_protected(p, c): + Use explicit check expression "c", and omit all barriers + and compiler constraints. This is useful when the data + structure cannot change, for example, in code that is + invoked only by updaters. + rcu_access_pointer(p): + Return the value of the pointer and omit all barriers, + but retain the compiler constraints that prevent duplicating + or coalescsing. This is useful when when testing the + value of the pointer itself, for example, against NULL. The rcu_dereference_check() check expression can be any boolean expression, but would normally include one of the rcu_read_lock_held() @@ -59,7 +70,20 @@ In case (1), the pointer is picked up in an RCU-safe manner for vanilla RCU read-side critical sections, in case (2) the ->file_lock prevents any change from taking place, and finally, in case (3) the current task is the only task accessing the file_struct, again preventing any change -from taking place. +from taking place. If the above statement was invoked only from updater +code, it could instead be written as follows: + + file = rcu_dereference_protected(fdt->fd[fd], + lockdep_is_held(&files->file_lock) || + atomic_read(&files->count) == 1); + +This would verify cases #2 and #3 above, and furthermore lockdep would +complain if this was used in an RCU read-side critical section unless one +of these two cases held. Because rcu_dereference_protected() omits all +barriers and compiler constraints, it generates better code than do the +other flavors of rcu_dereference(). On the other hand, it is illegal +to use rcu_dereference_protected() if either the RCU-protected pointer +or the RCU-protected data that it points to can change concurrently. There are currently only "universal" versions of the rcu_assign_pointer() and RCU list-/tree-traversal primitives, which do not (yet) check for diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 1dc00ee97163..cfaac34c4557 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -840,6 +840,12 @@ SRCU: Initialization/cleanup init_srcu_struct cleanup_srcu_struct +All: lockdep-checked RCU-protected pointer access + + rcu_dereference_check + rcu_dereference_protected + rcu_access_pointer + See the comment headers in the source code (or the docbook generated from them) for more information. diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 6fab97ea7e6b..508b5b2b0289 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt @@ -1162,8 +1162,8 @@ where a driver received a request ala this before: As mentioned, there is no virtual mapping of a bio. For DMA, this is not a problem as the driver probably never will need a virtual mapping. -Instead it needs a bus mapping (pci_map_page for a single segment or -use blk_rq_map_sg for scatter gather) to be able to ship it to the driver. For +Instead it needs a bus mapping (dma_map_page for a single segment or +use dma_map_sg for scatter gather) to be able to ship it to the driver. For PIO drivers (or drivers that need to revert to PIO transfer once in a while (IDE for example)), where the CPU is doing the actual data transfer a virtual mapping is needed. If the driver supports highmem I/O, diff --git a/Documentation/input/multi-touch-protocol.txt b/Documentation/input/multi-touch-protocol.txt index 8490480ce432..c0fc1c75fd88 100644 --- a/Documentation/input/multi-touch-protocol.txt +++ b/Documentation/input/multi-touch-protocol.txt @@ -68,6 +68,22 @@ like: SYN_MT_REPORT SYN_REPORT +Here is the sequence after lifting one of the fingers: + + ABS_MT_POSITION_X + ABS_MT_POSITION_Y + SYN_MT_REPORT + SYN_REPORT + +And here is the sequence after lifting the remaining finger: + + SYN_MT_REPORT + SYN_REPORT + +If the driver reports one of BTN_TOUCH or ABS_PRESSURE in addition to the +ABS_MT events, the last SYN_MT_REPORT event may be omitted. Otherwise, the +last SYN_REPORT will be dropped by the input core, resulting in no +zero-finger event reaching userland. Event Semantics --------------- @@ -217,11 +233,6 @@ where examples can be found. difference between the contact position and the approaching tool position could be used to derive tilt. [2] The list can of course be extended. -[3] The multi-touch X driver is currently in the prototyping stage. At the -time of writing (April 2009), the MT protocol is not yet merged, and the -prototype implements finger matching, basic mouse support and two-finger -scrolling. The project aims at improving the quality of current multi-touch -functionality available in the Synaptics X driver, and in addition -implement more advanced gestures. +[3] Multitouch X driver project: http://bitmath.org/code/multitouch/. [4] See the section on event computation. [5] See the section on finger tracking. diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e4cbca58536c..e2202e93b148 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -320,11 +320,6 @@ and is between 256 and 4096 characters. It is defined in the file amd_iommu= [HW,X86-84] Pass parameters to the AMD IOMMU driver in the system. Possible values are: - isolate - enable device isolation (each device, as far - as possible, will get its own protection - domain) [default] - share - put every device behind one IOMMU into the - same protection domain fullflush - enable flushing of IO/TLB entries when they are unmapped. Otherwise they are flushed before they will be reused, which diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index c6416a398163..baa8fde8bd16 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -656,6 +656,7 @@ struct kvm_clock_data { 4.29 KVM_GET_VCPU_EVENTS Capability: KVM_CAP_VCPU_EVENTS +Extended by: KVM_CAP_INTR_SHADOW Architectures: x86 Type: vm ioctl Parameters: struct kvm_vcpu_event (out) @@ -676,7 +677,7 @@ struct kvm_vcpu_events { __u8 injected; __u8 nr; __u8 soft; - __u8 pad; + __u8 shadow; } interrupt; struct { __u8 injected; @@ -688,9 +689,13 @@ struct kvm_vcpu_events { __u32 flags; }; +KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that +interrupt.shadow contains a valid state. Otherwise, this field is undefined. + 4.30 KVM_SET_VCPU_EVENTS Capability: KVM_CAP_VCPU_EVENTS +Extended by: KVM_CAP_INTR_SHADOW Architectures: x86 Type: vm ioctl Parameters: struct kvm_vcpu_event (in) @@ -709,6 +714,139 @@ current in-kernel state. The bits are: KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector +If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in +the flags field to signal that interrupt.shadow contains a valid state and +shall be written into the VCPU. + +4.32 KVM_GET_DEBUGREGS + +Capability: KVM_CAP_DEBUGREGS +Architectures: x86 +Type: vm ioctl +Parameters: struct kvm_debugregs (out) +Returns: 0 on success, -1 on error + +Reads debug registers from the vcpu. + +struct kvm_debugregs { + __u64 db[4]; + __u64 dr6; + __u64 dr7; + __u64 flags; + __u64 reserved[9]; +}; + +4.33 KVM_SET_DEBUGREGS + +Capability: KVM_CAP_DEBUGREGS +Architectures: x86 +Type: vm ioctl +Parameters: struct kvm_debugregs (in) +Returns: 0 on success, -1 on error + +Writes debug registers into the vcpu. + +See KVM_GET_DEBUGREGS for the data structure. The flags field is unused +yet and must be cleared on entry. + +4.34 KVM_SET_USER_MEMORY_REGION + +Capability: KVM_CAP_USER_MEM +Architectures: all +Type: vm ioctl +Parameters: struct kvm_userspace_memory_region (in) +Returns: 0 on success, -1 on error + +struct kvm_userspace_memory_region { + __u32 slot; + __u32 flags; + __u64 guest_phys_addr; + __u64 memory_size; /* bytes */ + __u64 userspace_addr; /* start of the userspace allocated memory */ +}; + +/* for kvm_memory_region::flags */ +#define KVM_MEM_LOG_DIRTY_PAGES 1UL + +This ioctl allows the user to create or modify a guest physical memory +slot. When changing an existing slot, it may be moved in the guest +physical memory space, or its flags may be modified. It may not be +resized. Slots may not overlap in guest physical address space. + +Memory for the region is taken starting at the address denoted by the +field userspace_addr, which must point at user addressable memory for +the entire memory slot size. Any object may back this memory, including +anonymous memory, ordinary files, and hugetlbfs. + +It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr +be identical. This allows large pages in the guest to be backed by large +pages in the host. + +The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which +instructs kvm to keep track of writes to memory within the slot. See +the KVM_GET_DIRTY_LOG ioctl. + +When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory +region are automatically reflected into the guest. For example, an mmap() +that affects the region will be made visible immediately. Another example +is madvise(MADV_DROP). + +It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. +The KVM_SET_MEMORY_REGION does not allow fine grained control over memory +allocation and is deprecated. + +4.35 KVM_SET_TSS_ADDR + +Capability: KVM_CAP_SET_TSS_ADDR +Architectures: x86 +Type: vm ioctl +Parameters: unsigned long tss_address (in) +Returns: 0 on success, -1 on error + +This ioctl defines the physical address of a three-page region in the guest +physical address space. The region must be within the first 4GB of the +guest physical address space and must not conflict with any memory slot +or any mmio address. The guest may malfunction if it accesses this memory +region. + +This ioctl is required on Intel-based hosts. This is needed on Intel hardware +because of a quirk in the virtualization implementation (see the internals +documentation when it pops into existence). + +4.36 KVM_ENABLE_CAP + +Capability: KVM_CAP_ENABLE_CAP +Architectures: ppc +Type: vcpu ioctl +Parameters: struct kvm_enable_cap (in) +Returns: 0 on success; -1 on error + ++Not all extensions are enabled by default. Using this ioctl the application +can enable an extension, making it available to the guest. + +On systems that do not support this ioctl, it always fails. On systems that +do support it, it only works for extensions that are supported for enablement. + +To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should +be used. + +struct kvm_enable_cap { + /* in */ + __u32 cap; + +The capability that is supposed to get enabled. + + __u32 flags; + +A bitfield indicating future enhancements. Has to be 0 for now. + + __u64 args[4]; + +Arguments for enabling a feature. If a feature needs initial values to +function properly, this is the place to put them. + + __u8 pad[64]; +}; 5. The kvm_run structure @@ -820,6 +958,13 @@ executed a memory-mapped I/O instruction which could not be satisfied by kvm. The 'data' member contains the written data if 'is_write' is true, and should be filled by application code otherwise. +NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding +operations are complete (and guest state is consistent) only after userspace +has re-entered the kernel with KVM_RUN. The kernel side will first finish +incomplete operations and then check for pending signals. Userspace +can re-enter the guest with an unmasked signal pending to complete +pending operations. + /* KVM_EXIT_HYPERCALL */ struct { __u64 nr; @@ -829,7 +974,9 @@ true, and should be filled by application code otherwise. __u32 pad; } hypercall; -Unused. +Unused. This was once used for 'hypercall to userspace'. To implement +such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390). +Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO. /* KVM_EXIT_TPR_ACCESS */ struct { @@ -870,6 +1017,19 @@ s390 specific. powerpc specific. + /* KVM_EXIT_OSI */ + struct { + __u64 gprs[32]; + } osi; + +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch +hypercalls and exit with this exit struct that contains all the guest gprs. + +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. +Userspace can now handle the hypercall and when it's done modify the gprs as +necessary. Upon guest entry all guest GPRs will then be replaced by the values +in this struct. + /* Fix the size of the union. */ char padding[256]; }; diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 0e58b4539176..e8c8f4f06c67 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -41,11 +41,12 @@ SOF_TIMESTAMPING_SOFTWARE: return system time stamp generated in SOF_TIMESTAMPING_TX/RX determine how time stamps are generated. SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the following control message: - struct scm_timestamping { - struct timespec systime; - struct timespec hwtimetrans; - struct timespec hwtimeraw; - }; + +struct scm_timestamping { + struct timespec systime; + struct timespec hwtimetrans; + struct timespec hwtimeraw; +}; recvmsg() can be used to get this control message for regular incoming packets. For send time stamps the outgoing packet is looped back to @@ -87,12 +88,13 @@ by the network device and will be empty without that support. SIOCSHWTSTAMP: Hardware time stamping must also be initialized for each device driver -that is expected to do hardware time stamping. The parameter is: +that is expected to do hardware time stamping. The parameter is defined in +/include/linux/net_tstamp.h as: struct hwtstamp_config { - int flags; /* no flags defined right now, must be zero */ - int tx_type; /* HWTSTAMP_TX_* */ - int rx_filter; /* HWTSTAMP_FILTER_* */ + int flags; /* no flags defined right now, must be zero */ + int tx_type; /* HWTSTAMP_TX_* */ + int rx_filter; /* HWTSTAMP_FILTER_* */ }; Desired behavior is passed into the kernel and to a specific device by @@ -139,42 +141,56 @@ enum { /* time stamp any incoming packet */ HWTSTAMP_FILTER_ALL, - /* return value: time stamp all packets requested plus some others */ - HWTSTAMP_FILTER_SOME, + /* return value: time stamp all packets requested plus some others */ + HWTSTAMP_FILTER_SOME, /* PTP v1, UDP, any kind of event packet */ HWTSTAMP_FILTER_PTP_V1_L4_EVENT, - ... + /* for the complete list of values, please check + * the include file /include/linux/net_tstamp.h + */ }; DEVICE IMPLEMENTATION A driver which supports hardware time stamping must support the -SIOCSHWTSTAMP ioctl. Time stamps for received packets must be stored -in the skb with skb_hwtstamp_set(). +SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with +the actual values as described in the section on SIOCSHWTSTAMP. + +Time stamps for received packets must be stored in the skb. To get a pointer +to the shared time stamp structure of the skb call skb_hwtstamps(). Then +set the time stamps in the structure: + +struct skb_shared_hwtstamps { + /* hardware time stamp transformed into duration + * since arbitrary point in time + */ + ktime_t hwtstamp; + ktime_t syststamp; /* hwtstamp transformed to system time base */ +}; Time stamps for outgoing packets are to be generated as follows: -- In hard_start_xmit(), check if skb_hwtstamp_check_tx_hardware() - returns non-zero. If yes, then the driver is expected - to do hardware time stamping. +- In hard_start_xmit(), check if skb_tx(skb)->hardware is set no-zero. + If yes, then the driver is expected to do hardware time stamping. - If this is possible for the skb and requested, then declare - that the driver is doing the time stamping by calling - skb_hwtstamp_tx_in_progress(). A driver not supporting - hardware time stamping doesn't do that. A driver must never - touch sk_buff::tstamp! It is used to store how time stamping - for an outgoing packets is to be done. + that the driver is doing the time stamping by setting the field + skb_tx(skb)->in_progress non-zero. You might want to keep a pointer + to the associated skb for the next step and not free the skb. A driver + not supporting hardware time stamping doesn't do that. A driver must + never touch sk_buff::tstamp! It is used to store software generated + time stamps by the network subsystem. - As soon as the driver has sent the packet and/or obtained a hardware time stamp for it, it passes the time stamp back by calling skb_hwtstamp_tx() with the original skb, the raw - hardware time stamp and a handle to the device (necessary - to convert the hardware time stamp to system time). If obtaining - the hardware time stamp somehow fails, then the driver should - not fall back to software time stamping. The rationale is that - this would occur at a later time in the processing pipeline - than other software time stamping and therefore could lead - to unexpected deltas between time stamps. -- If the driver did not call skb_hwtstamp_tx_in_progress(), then + hardware time stamp. skb_hwtstamp_tx() clones the original skb and + adds the timestamps, therefore the original skb has to be freed now. + If obtaining the hardware time stamp somehow fails, then the driver + should not fall back to software time stamping. The rationale is that + this would occur at a later time in the processing pipeline than other + software time stamping and therefore could lead to unexpected deltas + between time stamps. +- If the driver did not call set skb_tx(skb)->in_progress, then dev_hard_start_xmit() checks whether software time stamping is wanted as fallback and potentially generates the time stamp. |