summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* perf: Optimize the perf_output() path by removing IRQ-disablesPeter Zijlstra2010-05-181-66/+28
| | | | | | | | | | | | | | | | Since we can now assume there is only a single writer to each buffer, we can remove per-cpu lock thingy and use a simply nest-count to the same effect. This removes the need to disable IRQs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Disallow mmap() on per-task inherited eventsPeter Zijlstra2010-05-181-0/+8
| | | | | | | | | | | | | | | | | Since we now have working per-task-per-cpu events for a while, disallow mmap() on per-task inherited events. Those things were a performance problem anyway, and doing away with it allows us to optimize the buffer somewhat by assuming there is only a single writer. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Optimize buffer placement by allocating buffers NUMA awarePeter Zijlstra2010-05-181-2/+15
| | | | | | | | | | | | | Ensure cpu bound buffers live on the right NUMA node. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1274114880.5605.5236.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Fix errors path in perf_output_begin()Stephane Eranian2010-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | In case the sampling buffer has no "payload" pages, nr_pages is 0. The problem is that the error path in perf_output_begin() skips to a label which assumes perf_output_lock() has been issued which is not the case. That triggers a WARN_ON() in perf_output_unlock(). This patch fixes the problem by skipping perf_output_unlock() in case data->nr_pages is 0. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4bf13674.014fd80a.6c82.ffffb20c@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf/ftrace: Optimize perf/tracepoint interaction for single eventsPeter Zijlstra2010-05-184-13/+23
| | | | | | | | | | | | | | | When we've got but a single event per tracepoint there is no reason to try and multiplex it so don't. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Fix exit() vs event-groupsPeter Zijlstra2010-05-111-13/+13
| | | | | | | | | | | | | | | | | | Corey reported that the value scale times of group siblings are not updated when the monitored task dies. The problem appears to be that we only update the group leader's time values, fix it by updating the whole group. Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> # .34.x LKML-Reference: <1273588935.1810.6.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Fix exit() vs PERF_FORMAT_GROUPPeter Zijlstra2010-05-111-3/+16
| | | | | | | | | | | | | | | | | | | | | | Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work as expected if the task the counters were attached to quit before the read() call. The cause is that we unconditionally destroy the grouping when we remove counters from their context. Fix this by splitting off the group destroy from the list removal such that perf_event_remove_from_context() does not do this and change perf_event_release() to do so. Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: <stable@kernel.org> # .34.x LKML-Reference: <1273571513.5605.3527.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Revert "perf: Fix exit() vs PERF_FORMAT_GROUP"Ingo Molnar2010-05-111-5/+0
| | | | | | | | | | | | | | | | | | | | This reverts commit 4fd38e4595e2f6c9d27732c042a0e16b2753049c. It causes various crashes and hangs when events are activated. The cause is not fully understood yet but we need to revert it because the effects are severe. Reported-by: Stephane Eranian <eranian@google.com> Reported-by: Lin Ming <ming.m.lin@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* tracing: Drop the nested field from lock_release eventFrederic Weisbecker2010-05-091-1/+1
| | | | | | | | | | | Drop the nested field as we don't use it. Every nested state can be computed from a state machine on post processing already. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Steven Rostedt <rostedt@goodmis.org>
* tracing: Drop lock_acquired waittime fieldFrederic Weisbecker2010-05-091-1/+1
| | | | | | | | | | | | | | | | Drop the waittime field from the lock_acquired event, we can calculate it by substracting the lock_acquired event timestamp with the matching lock_acquire one. It is not needed and takes useless space in the traces. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Steven Rostedt <rostedt@goodmis.org>
* kprobes: Move enable/disable_kprobe() out from debugfs codeMasami Hiramatsu2010-05-081-66/+66
| | | | | | | | | | | | | | | Move enable/disable_kprobe() API out from debugfs related code, because these interfaces are not related to debugfs interface. This fixes a compiler warning. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> LKML-Reference: <20100427223312.2322.60512.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_event: Make software events work againPaul Mackerras2010-05-081-6/+6
| | | | | | | | | | | | | | | | | | | | | | Commit 6bde9b6ce0127e2a56228a2071536d422be31336 ("perf: Add group scheduling transactional APIs") added code to allow a group to be scheduled in a single transaction. However, it introduced a bug in handling events whose pmu does not implement transactions -- at the end of scheduling in the events in the group, in the non-transactional case the code now falls through to the group_error label, and proceeds to unschedule all the events in the group and return failure. This fixes it by returning 0 (success) in the non-transactional case. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: eranian@gmail.com LKML-Reference: <20100508105800.GB10650@brick.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Add group scheduling transactional APIsLin Ming2010-05-071-13/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add group scheduling transactional APIs to struct pmu. These APIs will be implemented in arch code, based on Peter's idea as below. > the idea behind hw_perf_group_sched_in() is to not perform > schedulability tests on each event in the group, but to add the group > as a whole and then perform one test. > > Of course, when that test fails, you'll have to roll-back the whole > group again. > > So start_txn (or a better name) would simply toggle a flag in the pmu > implementation that will make pmu::enable() not perform the > schedulablilty test. > > Then commit_txn() will perform the schedulability test (so note the > method has to have a !void return value. > > This will allow us to use the regular > kernel/perf_event.c::group_sched_in() and all the rollback code. > Currently each hw_perf_group_sched_in() implementation duplicates all > the rolllback code (with various bugs). ->start_txn: Start group events scheduling transaction, set a flag to make pmu::enable() not perform the schedulability test, it will be performed at commit time. ->commit_txn: Commit group events scheduling transaction, perform the group schedulability as a whole ->cancel_txn: Stop group events scheduling transaction, clear the flag so pmu::enable() will perform the schedulability test. Reviewed-by: Stephane Eranian <eranian@google.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1272002160.5707.60.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Annotate perf_event_read_group() vs perf_event_release_kernel()Peter Zijlstra2010-05-071-2/+14
| | | | | | | | | | | | | | | | | | Stephane reported a lockdep warning while using PERF_FORMAT_GROUP. The issue is that perf_event_read_group() takes faults while holding the ctx->mutex, while perf_event_release_kernel() can be called from munmap(). Which makes for an AB-BA deadlock. Except we can never establish the deadlock because we'll only ever call perf_event_release_kernel() after all file descriptors are dead so there is no concurrency possible. Reported-by: Stephane Eranian <eranian@google.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'perf/urgent' into perf/coreIngo Molnar2010-05-074-7/+25
|\ | | | | | | | | | | Merge reason: Resolve patch dependency Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Fix exit() vs PERF_FORMAT_GROUPPeter Zijlstra2010-05-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work as expected if the task the counters were attached to quit before the read() call. The cause is that we unconditionally destroy the grouping when we remove counters from their context. Fix this by only doing this when we free the counter itself. Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1273160566.5605.404.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2010-05-051-1/+1
| |\ | | | | | | | | | | | | * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: flush_delayed_work: keep the original workqueue for re-queueing
| | * workqueue: flush_delayed_work: keep the original workqueue for re-queueingOleg Nesterov2010-04-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | flush_delayed_work() always uses keventd_wq for re-queueing, but it should use the workqueue this dwork was queued on. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-05-041-1/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix resource leak in failure path of perf_event_open()
| | * | perf: Fix resource leak in failure path of perf_event_open()Tejun Heo2010-05-011-1/+1
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf_event_open() kfrees event after init failure which doesn't release all resources allocated by perf_event_alloc(). Use free_event() instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@au1.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: <stable@kernel.org> LKML-Reference: <4BDBE237.1040809@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2010-05-042-5/+18
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Fix RCU lockdep splat on freezer_fork path rcu: Fix RCU lockdep splat in set_task_cpu on fork path mutex: Don't spin when the owner CPU is offline or other weird cases
| | * rcu: Fix RCU lockdep splat on freezer_fork pathPaul E. McKenney2010-04-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an RCU read-side critical section to suppress this false positive. Located-by: Eric Paris <eparis@parisplace.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <1271880131-3951-2-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * rcu: Fix RCU lockdep splat in set_task_cpu on fork pathPeter Zijlstra2010-04-301-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an RCU read-side critical section to suppress this false positive. Located-by: Eric Paris <eparis@parisplace.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <1271880131-3951-1-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * mutex: Don't spin when the owner CPU is offline or other weird casesBenjamin Herrenschmidt2010-04-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to recent load-balancer changes that delay the task migration to the next wakeup, the adaptive mutex spinning ends up in a live lock when the owner's CPU gets offlined because the cpu_online() check lives before the owner running check. This patch changes mutex_spin_on_owner() to return 0 (don't spin) in any case where we aren't sure about the owner struct validity or CPU number, and if the said CPU is offline. There is no point going back & re-evaluate spinning in corner cases like that, let's just go to sleep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271212509.13059.135.camel@pasglop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | perf: Fix check at end of event searchDan Carpenter2010-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The original code doesn't work because "call" is never NULL there. Signed-off-by: Dan Carpenter <error27@gmail.com> LKML-Reference: <20100320143911.GF5331@bicker> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | hw_breakpoints: Fix percpu build failureFrederic Weisbecker2010-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix this build error: kernel/hw_breakpoint.c:58:1: error: pasting "__pcpu_scope_" and "*" does not give a valid preprocessing token It happens if CONFIG_DEBUG_FORCE_WEAK_PER_CPU, because we concatenate someting with the name and we have the "*" in the name. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: K. Prasad <prasad@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jason Wessel <jason.wessel@windriver.com> LKML-Reference: <20100503133942.GA5497@nowhere> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | hw-breakpoints: Get the number of available registers on boot dynamicallyFrederic Weisbecker2010-05-012-31/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The breakpoint generic layer assumes that archs always know in advance the static number of address registers available to host breakpoints through the HBP_NUM macro. However this is not true for every archs. For example Arm needs to get this information dynamically to handle the compatiblity between different versions. To solve this, this patch proposes to drop the static HBP_NUM macro and let the arch provide the number of available slots through a new hw_breakpoint_slots() function. For archs that have CONFIG_HAVE_MIXED_BREAKPOINTS_REGS selected, it will be called once as the number of registers fits for instruction and data breakpoints together. For the others it will be called first to get the number of instruction breakpoint registers and another time to get the data breakpoint registers, the targeted type is given as a parameter of hw_breakpoint_slots(). Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: K. Prasad <prasad@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Ingo Molnar <mingo@elte.hu>
* | | hw-breakpoints: Handle breakpoint weight in allocation constraintsFrederic Weisbecker2010-05-011-18/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on their nature and on what an arch supports, breakpoints may consume more than one address register. For example a simple absolute address match usually only requires one address register. But an address range match may consume two registers. Currently our slot allocation constraints, that tend to reflect the limited arch's resources, always consider that a breakpoint consumes one slot. Then provide a way for archs to tell us the weight of a breakpoint through a new hw_breakpoint_weight() helper. This weight will be computed against the generic allocation constraints instead of a constant value. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: K. Prasad <prasad@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
* | | hw-breakpoints: Separate constraint space for data and instruction breakpointsFrederic Weisbecker2010-05-011-27/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two outstanding fashions for archs to implement hardware breakpoints. The first is to separate breakpoint address pattern definition space between data and instruction breakpoints. We then have typically distinct instruction address breakpoint registers and data address breakpoint registers, delivered with separate control registers for data and instruction breakpoints as well. This is the case of PowerPc and ARM for example. The second consists in having merged breakpoint address space definition between data and instruction breakpoint. Address registers can host either instruction or data address and the access mode for the breakpoint is defined in a control register. This is the case of x86 and Super H. This patch adds a new CONFIG_HAVE_MIXED_BREAKPOINTS_REGS config that archs can select if they belong to the second case. Those will have their slot allocation merged for instructions and data breakpoints. The others will have a separate slot tracking between data and instruction breakpoints. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: K. Prasad <prasad@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
* | | hw-breakpoints: Change/Enforce some breakpoints policiesFrederic Weisbecker2010-05-011-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current policies of breakpoints in x86 and SH are the following: - task bound breakpoints can only break on userspace addresses - cpu wide breakpoints can only break on kernel addresses The former rule prevents ptrace breakpoints to be set to trigger on kernel addresses, which is good. But as a side effect, we can't breakpoint on kernel addresses for task bound breakpoints. The latter rule simply makes no sense, there is no reason why we can't set breakpoints on userspace while performing cpu bound profiles. We want the following new policies: - task bound breakpoint can set userspace address breakpoints, with no particular privilege required. - task bound breakpoints can set kernelspace address breakpoints but must be privileged to do that. - cpu bound breakpoints can do what they want as they are privileged already. To implement these new policies, this patch checks if we are dealing with a kernel address breakpoint, if so and if the exclude_kernel parameter is set, we tell the user that the breakpoint is invalid, which makes a good generic ptrace protection. If we don't have exclude_kernel, ensure the user has the right privileges as kernel breakpoints are quite sensitive (risk of trap recursion attacks and global performance impacts). [ Paul Mundt: keep addr space check for sh signal delivery and fix double function declaration] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: K. Prasad <prasad@linux.vnet.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* | | hw-breakpoints: Check disabled breakpoints againFrederic Weisbecker2010-05-011-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We stopped checking disabled breakpoints because we weren't allowing breakpoints on NULL addresses. And gdb tends to set NULL addresses on inactive breakpoints. But refusing NULL addresses was actually a regression that has been fixed now. There is no reason anymore to not validate inactive breakpoint settings. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: K. Prasad <prasad@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Ingo Molnar <mingo@elte.hu>
* | | Merge commit 'v2.6.34-rc6' into perf/coreIngo Molnar2010-04-301-1/+1
|\ \ \ | |/ / | | | | | | | | | | | | Merge reason: update to the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | kernel/sys.c: fix compat uname machineAndreas Schwab2010-04-241-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ppc64 you get this error: $ setarch ppc -R true setarch: ppc: Unrecognized architecture because uname still reports ppc64 as the machine. So mask off the personality flags when checking for PER_LINUX32. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'linus' into perf/coreIngo Molnar2010-04-234-4/+11
|\ \ | |/ | | | | | | | | Merge reason: merge the latest fixes, update to latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * CRED: Fix a race in creds_are_invalid() in credentials debuggingDavid Howells2010-04-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | creds_are_invalid() reads both cred->usage and cred->subscribers and then compares them to make sure the number of processes subscribed to a cred struct never exceeds the refcount of that cred struct. The problem is that this can cause a race with both copy_creds() and exit_creds() as the two counters, whilst they are of atomic_t type, are only atomic with respect to themselves, and not atomic with respect to each other. This means that if creds_are_invalid() can read the values on one CPU whilst they're being modified on another CPU, and so can observe an evolving state in which the subscribers count now is greater than the usage count a moment before. Switching the order in which the counts are read cannot help, so the thing to do is to remove that particular check. I had considered rechecking the values to see if they're in flux if the test fails, but I can't guarantee they won't appear the same, even if they've changed several times in the meantime. Note that this can only happen if CONFIG_DEBUG_CREDENTIALS is enabled. The problem is only likely to occur with multithreaded programs, and can be tested by the tst-eintr1 program from glibc's "make check". The symptoms look like: CRED: Invalid credentials CRED: At include/linux/cred.h:240 CRED: Specified credentials: ffff88003dda5878 [real][eff] CRED: ->magic=43736564, put_addr=(null) CRED: ->usage=766, subscr=766 CRED: ->*uid = { 0,0,0,0 } CRED: ->*gid = { 0,0,0,0 } CRED: ->security is ffff88003d72f538 CRED: ->security {359, 359} ------------[ cut here ]------------ kernel BUG at kernel/cred.c:850! ... RIP: 0010:[<ffffffff81049889>] [<ffffffff81049889>] __invalid_creds+0x4e/0x52 ... Call Trace: [<ffffffff8104a37b>] copy_creds+0x6b/0x23f Note the ->usage=766 and subscr=766. The values appear the same because they've been re-read since the check was made. Reported-by: Roland McGrath <roland@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
| * CRED: Fix double free in prepare_usermodehelper_creds() error handlingDavid Howells2010-04-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch 570b8fb505896e007fd3bb07573ba6640e51851d: Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Tue Mar 30 00:04:00 2010 +0100 Subject: CRED: Fix memory leak in error handling attempts to fix a memory leak in the error handling by making the offending return statement into a jump down to the bottom of the function where a kfree(tgcred) is inserted. This is, however, incorrect, as it does a kfree() after doing put_cred() if security_prepare_creds() fails. That will result in a double free if 'error' is jumped to as put_cred() will also attempt to free the new tgcred record by virtue of it being pointed to by the new cred record. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
| * rcu: Make RCU lockdep check the lockdep_recursion variablePaul E. McKenney2010-04-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lockdep facility temporarily disables lockdep checking by incrementing the current->lockdep_recursion variable. Such disabling happens in NMIs and in other situations where lockdep might expect to recurse on itself. This patch therefore checks current->lockdep_recursion, disabling RCU lockdep splats when this variable is non-zero. In addition, this patch removes the "likely()", as suggested by Lai Jiangshan. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: David Miller <davem@davemloft.net> Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <20100415195039.GA22623@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * PM / Hibernate: user.c, fix SNAPSHOT_SET_SWAP_AREA handlingJiri Slaby2010-04-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_DEBUG_BLOCK_EXT_DEVT is set we decode the device improperly by old_decode_dev and it results in an error while hibernating with s2disk. All users already pass the new device number, so switch to new_decode_dev(). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
| * Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2010-04-081-1/+1
| |\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix sched_getaffinity()
| | * sched: Fix sched_getaffinity()Anton Blanchard2010-04-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | taskset on 2.6.34-rc3 fails on one of my ppc64 test boxes with the following error: sched_getaffinity(0, 16, 0x10029650030) = -1 EINVAL (Invalid argument) This box has 128 threads and 16 bytes is enough to cover it. Commit cd3d8031eb4311e516329aee03c79a08333141f1 (sched: sched_getaffinity(): Allow less than NR_CPUS length) is comparing this 16 bytes agains nr_cpu_ids. Fix it by comparing nr_cpu_ids to the number of bits in the cpumask we pass in. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Sharyathi Nagesh <sharyath@in.ibm.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Mike Travis <travis@sgi.com> LKML-Reference: <20100406070218.GM5594@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | perf: Enhance perf to allow for guest statistic collection from hostZhang, Yanmin2010-04-191-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Below patch introduces perf_guest_info_callbacks and related register/unregister functions. Add more PERF_RECORD_MISC_XXX bits meaning guest kernel and guest user space. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | Merge branch 'perf' of ↵Ingo Molnar2010-04-152-220/+331
|\ \ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core
| * | | tracing/kprobes: Support basic types on dynamic eventsMasami Hiramatsu2010-04-142-220/+331
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support basic types of integer (u8, u16, u32, u64, s8, s16, s32, s64) in kprobe tracer. With this patch, users can specify above basic types on each arguments after ':'. If omitted, the argument type is set as unsigned long (u32 or u64, arch-dependent). e.g. echo 'p account_system_time+0 hardirq_offset=%si:s32' > kprobe_events adds a probe recording hardirq_offset in signed-32bits value on the entry of account_system_time. Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100412171708.3790.18599.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | | perf: Fix hlist related build errorFrederic Weisbecker2010-04-151-30/+30
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hlist helpers need to be available for all software events, not only trace events. Pull them out outside the ifdef CONFIG_EVENT_TRACING section. Fixes: kernel/perf_event.c:4573: error: implicit declaration of function 'swevent_hlist_put' kernel/perf_event.c:4614: error: implicit declaration of function 'swevent_hlist_get' kernel/perf_event.c:5534: error: implicit declaration of function 'swevent_hlist_release Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1271281338-23491-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | perf: Make clock software events consistent with general exclusion rulesFrederic Weisbecker2010-04-141-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cpu/task clock events implement their own version of exclusion on top of exclude_user and exclude_kernel. The result is that when the event triggered in the kernel but we have exclude_kernel set, we try to rewind using task_pt_regs. There are two side effects of this: - we call task_pt_regs even on kernel threads, which doesn't give us the desired result. - if the event occured in the kernel, we shouldn't rewind to the user context. We want to actually ignore the event. get_irq_regs() will always give us the right interrupted context, so use its result and submit it to perf_exclude_context() that knows when an event must be ignored. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
* | | perf: Store active software events in a hashlistFrederic Weisbecker2010-04-141-63/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each time a software event triggers, we need to walk through the entire list of events from the current cpu and task contexts to retrieve a running perf event that matches. We also need to check a matching perf event is actually counting. This walk is wasteful and makes the event fast path scaling down with a growing number of events running on the same contexts. To solve this, we store the running perf events in a hashlist to get an immediate access to them against their type:event_id when they trigger. v2: - Fix SWEVENT_HLIST_SIZE definition (and re-learn some basic maths along the way) - Only allocate hlist for online cpus, but keep track of the refcount on offline possible cpus too, so that we allocate it if needed when it becomes online. - Drop the kref use as it's not adapted to our tricks anymore. v3: - Fix bad refcount check (address instead of value). Thanks to Eric Dumazet who spotted this. - While exiting cpu, move the hlist release out of the IPI path to lock the hlist mutex sanely. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'linus' into perf/coreIngo Molnar2010-04-0857-181/+284
|\ \ \ | |/ / | | | | | | | | | | | | | | | | | | Semantic conflict: arch/x86/kernel/cpu/perf_event_intel_ds.c Merge reason: pick up latest fixes, fix the conflict Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | mm: avoid null-pointer deref in sync_mm_rss()KAMEZAWA Hiroyuki2010-04-072-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - We weren't zeroing p->rss_stat[] at fork() - Consequently sync_mm_rss() was dereferencing tsk->mm for kernel threads and was oopsing. - Make __sync_task_rss_stat() static, too. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15648 [akpm@linux-foundation.org: remove the BUG_ON(!mm->rss)] Reported-by: Troels Liebe Bentsen <tlb@rapanden.dk> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> "Michael S. Tsirkin" <mst@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | Merge branch 'irq-core-for-linus' of ↵Linus Torvalds2010-04-061-0/+10
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Force MSI irq handlers to run with interrupts disabled
| | * | genirq: Force MSI irq handlers to run with interrupts disabledThomas Gleixner2010-03-311-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Network folks reported that directing all MSI-X vectors of their multi queue NICs to a single core can cause interrupt stack overflows when enough interrupts fire at the same time. This is caused by the fact that we run interrupt handlers by default with interrupts enabled unless the driver reuqests the interrupt with the IRQF_DISABLED set. The NIC handlers do not set this flag, so simultaneous interrupts can nest unlimited and cause the stack overflow. The only safe counter measure is to run the interrupt handlers with interrupts disabled. We can't switch to this mode in general right now, but it is safe to do so for MSI interrupts. Force IRQF_DISABLED for MSI interrupt handlers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@osdl.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: stable@kernel.org
OpenPOWER on IntegriCloud