From 26cb63ad11e04047a64309362674bcbbd6a6f246 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 28 May 2013 10:55:48 +0200 Subject: perf: Fix perf mmap bugs Vince reported a problem found by his perf specific trinity fuzzer. Al noticed 2 problems with perf's mmap(): - it has issues against fork() since we use vma->vm_mm for accounting. - it has an rb refcount leak on double mmap(). We fix the issues against fork() by using VM_DONTCOPY; I don't think there's code out there that uses this; we didn't hear about weird accounting problems/crashes. If we do need this to work, the previously proposed VM_PINNED could make this work. Aside from the rb reference leak spotted by Al, Vince's example prog was indeed doing a double mmap() through the use of perf_event_set_output(). This exposes another problem, since we now have 2 events with one buffer, the accounting gets screwy because we account per event. Fix this by making the buffer responsible for its own accounting. Reported-by: Vince Weaver Signed-off-by: Peter Zijlstra Cc: Al Viro Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar --- include/linux/perf_event.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index f463a46424e2..c5b6dbf9c2fc 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -389,8 +389,7 @@ struct perf_event { /* mmap bits */ struct mutex mmap_mutex; atomic_t mmap_count; - int mmap_locked; - struct user_struct *mmap_user; + struct ring_buffer *rb; struct list_head rb_entry; -- cgit v1.2.1 From 45eacc692771bd2b1ea3d384e6345cab3da10861 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 15 May 2013 22:16:32 +0200 Subject: vtime: Use consistent clocks among nohz accounting While computing the cputime delta of dynticks CPUs, we are mixing up clocks of differents natures: * local_clock() which takes care of unstable clock sources and fix these if needed. * sched_clock() which is the weaker version of local_clock(). It doesn't compute any fixup in case of unstable source. If the clock source is stable, those two clocks are the same and we can safely compute the difference against two random points. Otherwise it results in random deltas as sched_clock() can randomly drift away, back or forward, from local_clock(). As a consequence, some strange behaviour with unstable tsc has been observed such as non progressing constant zero cputime. (The 'top' command showing no load). Fix this by only using local_clock(), or its irq safe/remote equivalent, in vtime code. Reported-by: Mike Galbraith Suggested-by: Mike Galbraith Cc: Steven Rostedt Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Li Zhong Cc: Mike Galbraith Signed-off-by: Frederic Weisbecker Signed-off-by: Ingo Molnar --- include/linux/vtime.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vtime.h b/include/linux/vtime.h index 71a5782d8c59..b1dd2db80076 100644 --- a/include/linux/vtime.h +++ b/include/linux/vtime.h @@ -34,7 +34,7 @@ static inline void vtime_user_exit(struct task_struct *tsk) } extern void vtime_guest_enter(struct task_struct *tsk); extern void vtime_guest_exit(struct task_struct *tsk); -extern void vtime_init_idle(struct task_struct *tsk); +extern void vtime_init_idle(struct task_struct *tsk, int cpu); #else static inline void vtime_account_irq_exit(struct task_struct *tsk) { @@ -45,7 +45,7 @@ static inline void vtime_user_enter(struct task_struct *tsk) { } static inline void vtime_user_exit(struct task_struct *tsk) { } static inline void vtime_guest_enter(struct task_struct *tsk) { } static inline void vtime_guest_exit(struct task_struct *tsk) { } -static inline void vtime_init_idle(struct task_struct *tsk) { } +static inline void vtime_init_idle(struct task_struct *tsk, int cpu) { } #endif #ifdef CONFIG_IRQ_TIME_ACCOUNTING -- cgit v1.2.1 From 521921bad1192fb1b8f9b6a5aa673635848b8b5f Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Thu, 16 May 2013 01:21:38 +0200 Subject: kvm: Move guest entry/exit APIs to context_tracking The kvm_host.h header file doesn't handle well inclusion when archs don't support KVM. This results in build crashes for such archs when they want to implement context tracking because this subsystem includes kvm_host.h in order to implement the guest_enter/exit APIs but it doesn't handle KVM off case. To fix this, move the guest_enter()/guest_exit() declarations and generic implementation to the context tracking headers. These generic APIs actually belong to this subsystem, besides other domains boundary tracking like user_enter() et al. KVM now properly becomes a user of this library, not the other buggy way around. Reported-by: Kevin Hilman Reviewed-by: Kevin Hilman Tested-by: Kevin Hilman Signed-off-by: Frederic Weisbecker Cc: Steven Rostedt Cc: Paul E. McKenney Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Kevin Hilman Cc: Marcelo Tosatti Cc: Gleb Natapov Signed-off-by: Ingo Molnar --- include/linux/context_tracking.h | 35 +++++++++++++++++++++++++++++++++++ include/linux/kvm_host.h | 37 +------------------------------------ 2 files changed, 36 insertions(+), 36 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 365f4a61bf04..fc09d7b0dacf 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -3,6 +3,7 @@ #include #include +#include #include struct context_tracking { @@ -19,6 +20,26 @@ struct context_tracking { } state; }; +static inline void __guest_enter(void) +{ + /* + * This is running in ioctl context so we can avoid + * the call to vtime_account() with its unnecessary idle check. + */ + vtime_account_system(current); + current->flags |= PF_VCPU; +} + +static inline void __guest_exit(void) +{ + /* + * This is running in ioctl context so we can avoid + * the call to vtime_account() with its unnecessary idle check. + */ + vtime_account_system(current); + current->flags &= ~PF_VCPU; +} + #ifdef CONFIG_CONTEXT_TRACKING DECLARE_PER_CPU(struct context_tracking, context_tracking); @@ -35,6 +56,9 @@ static inline bool context_tracking_active(void) extern void user_enter(void); extern void user_exit(void); +extern void guest_enter(void); +extern void guest_exit(void); + static inline enum ctx_state exception_enter(void) { enum ctx_state prev_ctx; @@ -57,6 +81,17 @@ extern void context_tracking_task_switch(struct task_struct *prev, static inline bool context_tracking_in_user(void) { return false; } static inline void user_enter(void) { } static inline void user_exit(void) { } + +static inline void guest_enter(void) +{ + __guest_enter(); +} + +static inline void guest_exit(void) +{ + __guest_exit(); +} + static inline enum ctx_state exception_enter(void) { return 0; } static inline void exception_exit(enum ctx_state prev_ctx) { } static inline void context_tracking_task_switch(struct task_struct *prev, diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f0eea07d2c2b..8db53cfaccdb 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -760,42 +761,6 @@ static inline int kvm_iommu_unmap_guest(struct kvm *kvm) } #endif -static inline void __guest_enter(void) -{ - /* - * This is running in ioctl context so we can avoid - * the call to vtime_account() with its unnecessary idle check. - */ - vtime_account_system(current); - current->flags |= PF_VCPU; -} - -static inline void __guest_exit(void) -{ - /* - * This is running in ioctl context so we can avoid - * the call to vtime_account() with its unnecessary idle check. - */ - vtime_account_system(current); - current->flags &= ~PF_VCPU; -} - -#ifdef CONFIG_CONTEXT_TRACKING -extern void guest_enter(void); -extern void guest_exit(void); - -#else /* !CONFIG_CONTEXT_TRACKING */ -static inline void guest_enter(void) -{ - __guest_enter(); -} - -static inline void guest_exit(void) -{ - __guest_exit(); -} -#endif /* !CONFIG_CONTEXT_TRACKING */ - static inline void kvm_guest_enter(void) { unsigned long flags; -- cgit v1.2.1 From 29bb9e5a75684106a37593ad75ec75ff8312731b Mon Sep 17 00:00:00 2001 From: Steven Rostedt Date: Fri, 24 May 2013 15:23:40 -0400 Subject: tracing/context-tracking: Add preempt_schedule_context() for tracing Dave Jones hit the following bug report: =============================== [ INFO: suspicious RCU usage. ] 3.10.0-rc2+ #1 Not tainted ------------------------------- include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 2 locks held by cc1/63645: #0: (&rq->lock){-.-.-.}, at: [] __schedule+0xed/0x9b0 #1: (rcu_read_lock){.+.+..}, at: [] cpuacct_charge+0x5/0x1f0 CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28 ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645 Call Trace: [] dump_stack+0x19/0x1b [] lockdep_rcu_suspicious+0xfd/0x130 [] cpuacct_charge+0x185/0x1f0 [] ? cpuacct_charge+0x5/0x1f0 [] update_curr+0xec/0x240 [] put_prev_task_fair+0x228/0x480 [] __schedule+0x161/0x9b0 [] preempt_schedule+0x51/0x80 [] ? __cond_resched_softirq+0x60/0x60 [] ? retint_careful+0x12/0x2e [] ftrace_ops_control_func+0x1dc/0x210 [] ftrace_call+0x5/0x2f [] ? retint_careful+0xb/0x2e [] ? schedule_user+0x5/0x70 [] ? schedule_user+0x5/0x70 [] ? retint_careful+0x12/0x2e ------------[ cut here ]------------ What happened was that the function tracer traced the schedule_user() code that tells RCU that the system is coming back from userspace, and to add the CPU back to the RCU monitoring. Because the function tracer does a preempt_disable/enable_notrace() calls the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set, then preempt_schedule() is called. But this is called before the user_exit() function can inform the kernel that the CPU is no longer in user mode and needs to be accounted for by RCU. The fix is to create a new preempt_schedule_context() that checks if the kernel is still in user mode and if so to switch it to kernel mode before calling schedule. It also switches back to user mode coming back from schedule in need be. The only user of this currently is the preempt_enable_notrace(), which is only used by the tracing subsystem. Signed-off-by: Steven Rostedt Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home Signed-off-by: Ingo Molnar --- include/linux/preempt.h | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/preempt.h b/include/linux/preempt.h index 87a03c746f17..f5d4723cdb3d 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -33,9 +33,25 @@ do { \ preempt_schedule(); \ } while (0) +#ifdef CONFIG_CONTEXT_TRACKING + +void preempt_schedule_context(void); + +#define preempt_check_resched_context() \ +do { \ + if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \ + preempt_schedule_context(); \ +} while (0) +#else + +#define preempt_check_resched_context() preempt_check_resched() + +#endif /* CONFIG_CONTEXT_TRACKING */ + #else /* !CONFIG_PREEMPT */ #define preempt_check_resched() do { } while (0) +#define preempt_check_resched_context() do { } while (0) #endif /* CONFIG_PREEMPT */ @@ -88,7 +104,7 @@ do { \ do { \ preempt_enable_no_resched_notrace(); \ barrier(); \ - preempt_check_resched(); \ + preempt_check_resched_context(); \ } while (0) #else /* !CONFIG_PREEMPT_COUNT */ -- cgit v1.2.1 From a7bf58040f4e8a7245cb3e24ab93057be3c46363 Mon Sep 17 00:00:00 2001 From: Olaf Hering Date: Fri, 14 Jun 2013 17:07:01 +0200 Subject: net: vlan: fix comment for vlan_ethhdr->h_vlan_proto After addition of 8021AD h_vlan_proto can be either ETH_P_8021Q or ETH_P_8021AD. Signed-off-by: Olaf Hering Signed-off-by: David S. Miller --- include/linux/if_vlan.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 52bd03b38962..637fa71de0c7 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -44,7 +44,7 @@ struct vlan_hdr { * struct vlan_ethhdr - vlan ethernet header (ethhdr + vlan_hdr) * @h_dest: destination ethernet address * @h_source: source ethernet address - * @h_vlan_proto: ethernet protocol (always 0x8100) + * @h_vlan_proto: ethernet protocol * @h_vlan_TCI: priority and VLAN ID * @h_vlan_encapsulated_proto: packet type ID or len */ -- cgit v1.2.1 From 7995bd287134f6c8f80d94bebe7396f05a9bc42b Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 20 Jun 2013 18:58:36 +0400 Subject: splice: don't pass the address of ->f_pos to methods Signed-off-by: Al Viro --- include/linux/fs.h | 2 -- include/linux/splice.h | 1 + 2 files changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 43db02e9c9fa..65c2be22b601 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2414,8 +2414,6 @@ extern ssize_t generic_file_splice_write(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out, loff_t *, size_t len, unsigned int flags); -extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, - size_t len, unsigned int flags); extern void file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping); diff --git a/include/linux/splice.h b/include/linux/splice.h index 09a545a7dfa3..74575cbf2d6f 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -35,6 +35,7 @@ struct splice_desc { void *data; /* cookie */ } u; loff_t pos; /* file position */ + loff_t *opos; /* sendfile: output position */ size_t num_spliced; /* number of bytes already spliced */ bool need_wakeup; /* need to wake up writer */ }; -- cgit v1.2.1 From bd8a7036c06cf15779b31a5397d4afcb12be81ea Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 24 Jun 2013 06:26:00 -0700 Subject: gre: fix a possible skb leak commit 68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE") added a possible skb leak, because it frees only the head of segment list, in case a skb_linearize() call fails. This patch adds a kfree_skb_list() helper to fix the bug. Signed-off-by: Eric Dumazet Cc: Pravin B Shelar Cc: Daniel Borkmann Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9c676eae3968..dec1748cd002 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -627,6 +627,7 @@ static inline struct rtable *skb_rtable(const struct sk_buff *skb) } extern void kfree_skb(struct sk_buff *skb); +extern void kfree_skb_list(struct sk_buff *segs); extern void skb_tx_error(struct sk_buff *skb); extern void consume_skb(struct sk_buff *skb); extern void __kfree_skb(struct sk_buff *skb); -- cgit v1.2.1 From 5dbe7c178d3f0a4634f088d9e729f1909b9ddcd1 Mon Sep 17 00:00:00 2001 From: Nicolas Schichan Date: Wed, 26 Jun 2013 17:23:42 +0200 Subject: net: fix kernel deadlock with interface rename and netdev name retrieval. When the kernel (compiled with CONFIG_PREEMPT=n) is performing the rename of a network interface, it can end up waiting for a workqueue to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to the fact that read_secklock_begin() will spin forever waiting for the writer process (the one doing the interface rename) to update the devnet_rename_seq sequence. This patch fixes the problem by adding a helper (netdev_get_name()) and using it in the code handling the SIOCGIFNAME ioctl and SO_BINDTODEVICE setsockopt. The netdev_get_name() helper uses raw_seqcount_begin() to avoid spinning forever, waiting for devnet_rename_seq->sequence to become even. cond_resched() is used in the contended case, before retrying the access to give the writer process a chance to finish. The use of raw_seqcount_begin() will incur some unneeded work in the reader process in the contended case, but this is better than deadlocking the system. Signed-off-by: Nicolas Schichan Acked-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 60584b185a0c..96e4c21e15e0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1695,6 +1695,7 @@ extern int init_dummy_netdev(struct net_device *dev); extern struct net_device *dev_get_by_index(struct net *net, int ifindex); extern struct net_device *__dev_get_by_index(struct net *net, int ifindex); extern struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex); +extern int netdev_get_name(struct net *net, char *name, int ifindex); extern int dev_restart(struct net_device *dev); #ifdef CONFIG_NETPOLL_TRAP extern int netpoll_trap(void); -- cgit v1.2.1