summaryrefslogtreecommitdiffstats
path: root/tools/perf/util/hist.c
Commit message (Collapse)AuthorAgeFilesLines
* perf hists: Print number of samples, not the period sumArnaldo Carvalho de Melo2011-02-251-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So that we match the header where we state the number of events with the "Samples" column when using 'perf report -n/--show-nr-samples': [root@emilia ~]# perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.111 MB perf.data (~4860 samples) ] [root@emilia ~]# perf report --stdio --show-nr-samples # Events: 11 cycles # # Overhead Samples Command Shared Object Symbol # ........ .......... ........... .................. ............................ # 16.65% 1 sleep [kernel.kallsyms] [k] unmap_vmas 16.10% 1 perf libpthread-2.12.so [.] __pthread_cleanup_push_defer 15.79% 2 perf [kernel.kallsyms] [k] format_decode 12.88% 1 kworker/1:2 [kernel.kallsyms] [k] cache_reap 10.69% 1 swapper [kernel.kallsyms] [k] _raw_spin_lock 7.55% 1 sleep [kernel.kallsyms] [k] prepare_exec_creds 6.00% 1 perf [jbd2] [k] start_this_handle 5.29% 1 perf [kernel.kallsyms] [k] seq_read 4.75% 1 perf [kernel.kallsyms] [k] get_pid_task 4.30% 1 perf [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore # # (For a higher level overview, try: perf report --sort comm,dso) # [root@emilia ~]# Reported-by: Stephane Eranian <eranian@google.com> Reported-by: Cliff Wickman <cpw@sgi.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: <stable@kernel.org> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> [ cherry-picked it from perf/core, as it has been reported by others as well. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf tools: Fix 64 bit integer format stringsArnaldo Carvalho de Melo2011-01-221-8/+9
| | | | | | | | | | | | | | | | | | | | | | | Using %L[uxd] has issues in some architectures, like on ppc64. Fix it by making our 64 bit integers typedefs of stdint.h types and using PRI[ux]64 like, for instance, git does. Reported by Denis Kirjanov that provided a patch for one case, I went and changed all cases. Reported-by: Denis Kirjanov <dkirjanov@kernel.org> Tested-by: Denis Kirjanov <dkirjanov@kernel.org> LKML-Reference: <20110120093246.GA8031@hera.kernel.org> Cc: Denis Kirjanov <dkirjanov@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pingtian Han <phan@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge commit 'v2.6.37' into perf/coreIngo Molnar2011-01-051-1/+1
|\ | | | | | | | | | | Merge reason: Add the final .37 tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Fix callchain hit bad cast on ascii displayFrederic Weisbecker2011-01-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipchain__fprintf_graph() casts the number of hits in a branch as an int, which means we lose its highests bits. This results in meaningless number of callchain hits in perf.data that have a high number of hits recorded, typically those that have callchain branches hits appearing more than INT_MAX. This happens easily as those are pondered by the event period. Reported-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org>
* | perf symbols: Add symfs option for off-box analysis using specified treeDavid Ahern2010-12-211-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The symfs argument allows analysis of perf.data file using a locally accessible filesystem tree with debug symbols - e.g., tree created during image builds, sshfs mount, loop mounted KVM disk images, USB keys, initrds, etc. Anything with an OS tree can be analyzed from anywhere without the need to populate a local data store with build-ids. Commiter notes: o Fixed up symfs="/" variants handling. o prefixed DSO__ORIG_GUEST_KMODULE case with symfs too, avoiding use of files outside the symfs directory. LKML-Reference: <1291926427-28846-1-git-send-email-daahern@cisco.com> Signed-off-by: David Ahern <daahern@cisco.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf event: Prevent unbound event__name array accessThomas Gleixner2010-12-091-3/+6
|/ | | | | | | | | | | | | | | | event__name[] is missing an entry for PERF_RECORD_FINISHED_ROUND, but we happily access the array from the dump code. Make event__name[] static and provide an accessor function, fix up all callers and add the missing string. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20101207124550.432593943@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf: Support for callchains mergeFrederic Weisbecker2010-08-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we sort the histograms by comm, which is the default, we need to merge some of them, typically different thread histograms of a same process, or just same comm. But during this merge, we forgot to merge callchains. So imagine we have three threads (tids: 1000, 1001, 1002) that belong to comm "foo". tid 1000 got 100 events tid 1001 got 10 events tid 1002 got 3 events Once we merge these histograms to get a per comm result, we'll finally get: "foo" got 113 events The problem is if we merge 1000 and 1001 histograms into 1002, then the end merge result, wrt callchains, will be only callchains that belong to 1002. This is because we haven't handled callchains in the merge. Only those from one of the threads inside a common comm survive. It means during this merge, we can lose a lot of callchains. Fix this by implementing callchains merge and apply it on histograms that collapse. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org>
* perf: Keep track of the max depth of a callchainFrederic Weisbecker2010-08-221-1/+1
| | | | | | | | | | | | | | In order to implement callchains collapsing, we need to keep track of the maximum depth in a histogram tree of callchains. This way we'll avoid allocating an arbitrary temporary buffer size on callchain merge time. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@infradead.org>
* perf annotate: Sort by hottest lines in the TUIArnaldo Carvalho de Melo2010-08-101-6/+7
| | | | | | | | | | | | | | | Right now it will just sort and position at the hottest line, i.e. the one where more samples were taken. It will be at the center of the screen and later TAB/shift-TAB will cycle thru the hottest lines. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hists: Handle verbose in hists__sort_list_widthArnaldo Carvalho de Melo2010-08-051-0/+3
| | | | | | | | | | | Otherwise entries will get chopped up on the window. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Don't keep unreferenced maps when unmaps are detectedArnaldo Carvalho de Melo2010-08-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a file with: [root@emilia linux-2.6-tip]# perf report -D -fi allmodconfig-j32.perf.data | grep events: TOTAL events: 36933 MMAP events: 9056 LOST events: 0 COMM events: 1702 EXIT events: 1887 THROTTLE events: 8 UNTHROTTLE events: 8 FORK events: 1894 READ events: 0 SAMPLE events: 22378 ATTR events: 0 EVENT_TYPE events: 0 TRACING_DATA events: 0 BUILD_ID events: 0 [root@emilia linux-2.6-tip]# Testing with valgrind and making perf_session__delete() a nop, so that we can notice how many maps were actually deleted due to not having any samples on it: ==== HEAP SUMMARY: Before: ==10339== in use at exit: 8,909,997 bytes in 68,690 blocks ==10339== total heap usage: 78,696 allocs, 10,007 frees, 11,925,853 bytes allocated After: ==10506== in use at exit: 8,902,605 bytes in 68,606 blocks ==10506== total heap usage: 78,696 allocs, 10,091 frees, 11,925,853 bytes allocated I.e. just 84 detected unmaps with no hits out of 9056 for this workload, not much, but in some other long running workload this may save more bytes. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge commit 'v2.6.35' into perf/coreIngo Molnar2010-08-021-2/+6
|\ | | | | | | | | | | | | | | | | | | Conflicts: tools/perf/Makefile tools/perf/util/hist.c Merge reason: Resolve the conflicts and update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf annotate: Fix handling of goto labels that are valid hex numbersArnaldo Carvalho de Melo2010-07-221-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When parsing the objdump disassembly output we can have goto labels that are valid hex numbers and thus get confused with lines with machine code. Handle the common case of a label that has nothing after it and other cases where there is just source code by validating the resulting "ip". It is still possible that we find goto labels that are in the function address range, but only if they are located before the real address we should be OK. A change in the objdump output to have a clear marker separating addresses from the disassembly would come handy, but we would still have to deal with older versions. Reported-by: Gleb Natapov <gleb@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <20100722170541.GF17631@ghostprotocols.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf: Fix various display bugs with parent filteringFrederic Weisbecker2010-07-161-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hists that have been filtered, because they don't have callchains matching the parent filter, won't be printed. As such, hist_entry__snprintf() returns 0 for them, but we don't control this value and we always print the buffer, which might be untouched and then only made of random stack garbage. Not only does it paint the screen with barf, it also prints the callchains for these hists, even though they have been filtered, since the hist has been filtered as well. We need to check the return value of hist_entry__snprintf() and ignore the hist if it is 0, which means it didn't get any callchain matching the parent filter. This fixes the barf and the undesired callchains. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org>
* | perf ui: New hists tree widgetArnaldo Carvalho de Melo2010-07-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stock newt checkbox tree widget we were using was not really suitable for hist entry + callchain browsing. The problems with it were manifold: - We needed to traverse the whole hist_entry rb_tree to add each entry + callchains beforehand. - No control over the colors used for each row So a new tree widget, based mostly on slang, was written. It extends the ui_browser class already used for annotate to allow the user to fold/unfold branches in the callchains tree, using extra fields in the symbol_map class that is embedded in hist_entry and callchain_node instances to store the folding state and when changing this state calculates the number of rows that are produced when showing a particular hist_entry instance. This greatly speeds up browsing as we don't have to upfront touch all the entries and only calculate callchain related operations when some callchain branch is actually unfolded. The memory footprint is also reduced as the data structure is not duplicated, just some extra fields for controling callchain state and to simplify the process of seeking thru entries (nr_rows, row_offset) were added. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf hist: Introduce routine to measure lenght of formatted entryArnaldo Carvalho de Melo2010-07-271-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | Will be used to figure out the window width needed in the new tree widget. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf sort: Make column width code per hists instanceArnaldo Carvalho de Melo2010-07-231-36/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | They were globals, and since we support multiple hists and sessions at the same time, it doesn't make sense to calculate those values considereing all symbols in all sessions. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf hists: Mark entries filtered by parentArnaldo Carvalho de Melo2010-07-231-5/+16
| | | | | | | | | | | | | | | | | | | | And don't consider them in hists__inc_nr_entries. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf hists: Factor out duplicated codeArnaldo Carvalho de Melo2010-07-171-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | Introducing hists__remove_entry_filter. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Implement --sort cpuArun Sharma2010-06-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a shared multi-core environment, users want to analyze why their program was slow. In particular, if the code ran slower only on certain CPUs due to interference from other programs or kernel threads, the user should be able to notice that. Sample usage: perf record -f -a -- sleep 3 perf report --sort cpu,comm Workload: program is running on 16 CPUs Experiencing interference from an antagonist only on 4 CPUs. Samples: 106218177676 cycles Overhead CPU Command ........ ... ............... 6.25% 2 program 6.24% 6 program 6.24% 11 program 6.24% 5 program 6.24% 9 program 6.24% 10 program 6.23% 15 program 6.23% 7 program 6.23% 3 program 6.23% 14 program 6.22% 1 program 6.20% 13 program 3.17% 12 program 3.15% 8 program 3.14% 0 program 3.13% 4 program 3.11% 4 antagonist 3.11% 0 antagonist 3.10% 8 antagonist 3.07% 12 antagonist Cc: David S. Miller <davem@davemloft.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <20100505181612.GA5091@sharma-home.net> Signed-off-by: Arun Sharma <aruns@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf annotate: Ask objdump to demangle symbolsStephane Eranian2010-06-051-1/+1
|/ | | | | | | | | | | | | | | | | | | | Perf report is demangling symbols but not annotate. The former uses internal demangling via libbdf or libiberty. The latter executes objdump which by default does not demangle symbols. This patch adds the -C option to the objdump cmdline to enable symbol demangling. Cc: David S. Miller <davem@davemloft.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <4c07b323.2126e30a.6245.0e1e@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: fix objdump output parsingKonstantin Stepanyuk2010-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | hist_entry__annotate() runs objdump with -S option so the output may contain lines of any format. If a line starts with a colon strtoull() returns 0 and calculated offset will be negative. This causes perf annotate segfaults. Make sure that strtoull() has parsed at least one digit. Cc: David S. Miller <davem@davemloft.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Konstantin Stepanyuk <konstantin.stepanyuk@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Fix up usage of the build id cacheArnaldo Carvalho de Melo2010-05-231-2/+11
| | | | | | | | | | | | | | | | It was assuming that the cache was always available and also wasn't checking if the file found in the build id cache was just a kallsyms file, that is not supported by objdump for disassembly. Reported-by: Ingo Molnar <mingo@elte.hu> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Add TUI interfaceArnaldo Carvalho de Melo2010-05-221-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | When annotating multiple entries, for instance, when running simply as: $ perf annotate the right and left keys, as well as TAB can be used to cycle thru the multiple symbols being annotated. If one doesn't like TUI annotate, disable it by editing ~/.perfconfig and adding: [tui] annotate = off Just like it is possible for report. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf: Fix getline undeclaredFrederic Weisbecker2010-05-211-1/+1
| | | | | | | | | | | | | | | | | | | | We need to have stdio.h included with _GNU_SOURCEfopr getline, which is broken with the inclusion of build-id.h. Keep util.h included first in hist.c Fixes: util/hist.c: Dans la fonction «hist_entry__parse_objdump_line» : util/hist.c:938: attention : déclaration implicite de la fonction « «getline» » util/hist.c:938: attention : nested extern declaration of «getline» make: *** [util/hist.o] Erreur 1 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1274438919-5104-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf annotate: Use build-ids to find the right DSOArnaldo Carvalho de Melo2010-05-201-8/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were still using the pathname found on the MMAP event, that could not be the one we used when recording, so use the build-id cache for that, only falling back to use the pathname in the MMAP event if no build-ids are available. With this we now also are able to do secure, seamless offline annotation. Example: [root@doppio linux-2.6-tip]# perf report -g none -v 2> /dev/null | head -10 8.12% Xorg /usr/lib64/libpixman-1.so.0.14.0 0x0000000000026d02 B [.] pixman_rasterize_edges 4.68% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x00000000005dbdba B [.] 0x000000005dbdba 3.70% swapper /lib/modules/2.6.34-rc6/build/vmlinux 0xffffffff81022cea ! [k] read_hpet 2.96% init /lib/modules/2.6.34-rc6/build/vmlinux 0xffffffff81022cea ! [k] read_hpet 2.73% swapper /lib/modules/2.6.34-rc6/build/vmlinux 0xffffffff8100a738 ! [k] mwait_idle_with_hints [root@doppio linux-2.6-tip]# perf annotate -v pixman_rasterize_edges 2>&1 | grep Executing Executing: objdump --start-address=0x000000371ce26670 --stop-address=0x000000371ce2709f -dS /root/.debug/.build-id/bd/6ac5199137aaeb279f864717d8d061477466c1|grep -v /root/.debug/.build-id/bd/6ac5199137aaeb279f864717d8d061477466c1|expand [root@doppio linux-2.6-tip]# perf buildid-list | grep libpixman-1.so.0.14.0 bd6ac5199137aaeb279f864717d8d061477466c1 /usr/lib64/libpixman-1.so.0.14.0 [root@doppio linux-2.6-tip]# Reported-by: Stephane Eranian <eranian@google.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf options: Type check all the remaining OPT_ variantsArnaldo Carvalho de Melo2010-05-171-1/+1
| | | | | | | | | | | | | | | | | | OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these tools is to set something that has an enum type, that is builtin compatible with unsigned int. Several string constifications were done to make OPT_STRING require a const char * type. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Report number of events, not samplesArnaldo Carvalho de Melo2010-05-141-41/+47
| | | | | | | | | | | | | | | | | | | | | | | | Number of samples is meaningless after we switched to auto-freq, so report the number of events, i.e. not the sum of the different periods, but the number PERF_RECORD_SAMPLE emitted by the kernel. While doing this I noticed that naming "count" to the sum of all the event periods can be confusing, so rename it to .period, just like in struct sample.data, so that we become more consistent. This helps with the next step, that was to record in struct hist_entry the number of sample events for each instance, we need that because we use it to generate the number of events when applying filters to the tree of hist entries like it is being done in the TUI report browser. Suggested-by: Ingo Molnar <mingo@elte.hu> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Clarify events_stats fields usageArnaldo Carvalho de Melo2010-05-141-10/+10
| | | | | | | | | | | | | | | | | | | | | The events_stats.total field is too generic, rename it to .total_period, and also add a comment explaining that it is the sum of all the .period fields in samples, that is needed because we use auto-freq to avoid sampling artifacts. Ditto for events_stats.lost, that is the sum of all lost_event.lost fields, i.e. the number of events the kernel dropped. Looking at the users, builtin-sched.c can make use of these fields and stop doing it again. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Make event__totals per histsArnaldo Carvalho de Melo2010-05-141-0/+21
| | | | | | | | | | | | | This is one more thing that started global but are more useful per hist or per session. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Fix missing getline declarationFrederic Weisbecker2010-05-131-0/+1
| | | | | | | | | | | | | | | | | | hist.c needs to include util.h so that it gets stdio.h inclusion with __GNU_SOURCE defined. Fixes: util/hist.c: In function ‘hist_entry__parse_objdump_line’: util/hist.c:931: erreur: implicit declaration of function ‘getline’ util/hist.c:931: erreur: nested extern declaration of ‘getline’ Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1273772836-11533-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Librarize the annotation code and use it in the newt browserArnaldo Carvalho de Melo2010-05-111-0/+184
| | | | | | | | | | | | | | | | | | | | | | | | Now we don't anymore use popen to run 'perf annotate' for the selected symbol, instead we collect per address samplings when processing samples in 'perf report' if we're using the newt browser, then we use this data directly to do annotation. Done this way we can actually traverse the objdump_line objects directly, matching the addresses to the collected samples and colouring them appropriately using lower level slang routines. The new ui_browser class will be reused for the main, callchain aware, histogram browser, when it will be made generic and don't assume that the objects are always instances of the objdump_line class maintained using list_heads. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Adopt filter by dso and by thread methods from the newt browserArnaldo Carvalho de Melo2010-05-111-0/+59
| | | | | | | | | | | | | Those are really not specific to the newt code, can be used by other UI frontends. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Calculate max_sym name len and nr_entriesArnaldo Carvalho de Melo2010-05-101-7/+20
| | | | | | | | | | | | | Better done when we are adding entries, be it initially of when we're re-sorting the histograms. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Introduce hists class and move lots of methods to itArnaldo Carvalho de Melo2010-05-101-51/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cbbc79a we introduced support for multiple events by introducing a new "event_stat_id" struct and then made several perf_session methods receive a point to it instead of a pointer to perf_session, and kept the event_stats and hists rb_tree in perf_session. While working on the new newt based browser, I realised that it would be better to introduce a new class, "hists" (short for "histograms"), renaming the "event_stat_id" struct and the perf_session methods that were really "hists" methods, as they manipulate only struct hists members, not touching anything in the other perf_session members. Other optimizations, such as calculating the maximum lenght of a symbol name present in an hists instance will be possible as we add them, avoiding a re-traversal just for finding that information. The rationale for the name "hists" to replace "event_stat_id" is that we may have multiple sets of hists for the same event_stat id, as, for instance, the 'perf diff' tool has, so event stat id is not what characterizes what this struct and the functions that manipulate it do. Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Allow limiting the number of entries to print in callchainsArnaldo Carvalho de Melo2010-05-091-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | Works by adding a third parameter to the '-g' argument, after the graph type and minimum percentage, for example: [root@doppio linux-2.6-tip]# perf report -g fractal,0.5,2 Will show only the first two symbols where at least 0.5% of the samples took place. All the other symbols that don't fall outside these constraints will be put together in the last entry, prefixed with "[...]" and the total percentage for them. Suggested-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Simplify the insertion of new hist_entry instancesArnaldo Carvalho de Melo2010-05-091-13/+23
| | | | | | | | | | | | | | | | And with that fix at least one bug: The first hit for an entry, the one that calls malloc to create a new instance in __perf_session__add_hist_entry, wasn't adding the count to the per cpumode (PERF_RECORD_MISC_USER, etc) total variable. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf: 'perf kvm' tool for monitoring guest performance from hostZhang, Yanmin2010-04-191-1/+71
| | | | | | | Here is the patch of userspace perf tool. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* perf tools: Fix accidentally preprocessed snprintf callbackFrederic Weisbecker2010-04-141-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | struct sort_entry has a callback named snprintf that turns an entry into a string result. But there are glibc versions that implement snprintf through a macro. The following expression is then going to get the snprintf call preprocessed: ent->snprintf(...) to finally end up in a build error: util/hist.c: Dans la fonction «hist_entry__snprintf» : util/hist.c:539: erreur: «struct sort_entry» has no member named «__builtin___snprintf_chk» To fix this, prepend struct sort_entry callbacks with an "se_" prefix. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Only allocate callchain_node if processing callchainsArnaldo Carvalho de Melo2010-04-021-2/+3
| | | | | | | | | | | | | | The struct callchain_node size is 120 bytes, that are never used when there are no callchains or '-g none' is specified, so conditionally allocate it, reducing sizeof(struct hist_entry) from 210 bytes to only 96, greatly speeding the non-callchain processing. LKML-Reference: <new-submission> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Replace ->print() routines by ->snprintf() equivalentsArnaldo Carvalho de Melo2010-04-021-17/+37
| | | | | | | | | | | | | | | | | | | | | | | | Then hist_entry__fprintf will just us the newly introduced hist_entry__snprintf, add the newline and fprintf it to the supplied FILE descriptor. This allows us to remove the use_browser checking in the color_printf routines, that now got color_snprintf variants too. The newt TUI browser (and other GUIs that may come in the future) don't have to worry about stdio specific stuff in the strings they get from the se->snprintf routines and instead use whatever means to do the equivalent. Also the newt TUI browser don't have to use the fmemopen() hack, instead it can use the se->snprintf routines directly. For now tho use the hist_entry__snprintf routine to reduce the patch size. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Add progress barsArnaldo Carvalho de Melo2010-04-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | For when we are processing the events and inserting the entries in the browser. Experimentation here: naming "ui_something" we may be treading into creating a TUI/GUI set of routines that can then be implemented in terms of multiple backends. Also the time it takes for adding things to the "browser" takes, visually (I guess I should do some profiling here ;-) ), more time than for processing the events... That means we probably need to create a custom hist_entry browser, so that we reuse the structures we have in place instead of duplicating them in newt. But progress was made and at least we can see something while long files are being loaded, that must be one of UI 101 bullet points :-) Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf symbols: Move more map_groups methods to map.cArnaldo Carvalho de Melo2010-04-021-1/+1
| | | | | | | | | | | While writing a standalone test app that uses the symbol system to find kernel space symbols I noticed these also need to be moved. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf callchains: Store the map together with the symbolArnaldo Carvalho de Melo2010-03-261-7/+7
| | | | | | | | | | | | | | We need this to know where a symbol in a callchain came from, for various reasons, among them precise annotation from a TUI/GUI tool. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1269459619-982-5-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf tools: Introduce struct map_symbolArnaldo Carvalho de Melo2010-03-261-3/+5
| | | | | | | | | | | | | | | | That will be in both struct hist_entry and struct callchain_list, so that the TUI can store a pointer to the pair (map, symbol) in the trees where hist_entries and callchain_lists are present, to allow precise annotation instead of looking for the first symbol with the selected name. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1269459619-982-4-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Fix orphan callchain branchesFrederic Weisbecker2010-03-221-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Callchains have markers inside their capture to tell we enter a context (kernel, user, ...). Those are not displayed in the callchains but they are incidentally an active part of the radix tree where callchains are stored, just like any other address. If we have the two following callchains: addr1 -> addr2 -> user context -> addr3 addr1 -> addr2 -> user context -> addr4 addr1 -> addr2 -> addr 5 This is pretty common if addr1 and addr2 are part of an interrupt path, addr3 and addr4 are user addresses and addr5 is a kernel non interrupt path. This will be stored as follows in the tree: addr1 addr2 / \ / addr5 user context / \ addr3 addr4 But we ignore the context markers in the report, hence the addr3 and addr4 will appear as orphan branches: |--28.30%-- hrtimer_interrupt | smp_apic_timer_interrupt | apic_timer_interrupt | | <------------- here, no parent! | | | | | |--11.11%-- 0x7fae7bccb875 | | | | | |--11.11%-- 0xffffffffff60013b | | | | | |--11.11%-- __pthread_mutex_lock_internal | | | | | |--11.11%-- __errno_location Fix this by removing the context markers when we process the callchains to the tree. Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1269274173-20328-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge commit 'v2.6.34-rc2' into perf/coreIngo Molnar2010-03-221-1/+1
|\ | | | | | | | | | | Merge reason: Pick up latest perf fixes from upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-03-181-22/+28
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits) perf: Fix unexported generic perf_arch_fetch_caller_regs perf record: Don't try to find buildids in a zero sized file perf: export perf_trace_regs and perf_arch_fetch_caller_regs perf, x86: Fix hw_perf_enable() event assignment perf, ppc: Fix compile error due to new cpu notifiers perf: Make the install relative to DESTDIR if specified kprobes: Calculate the index correctly when freeing the out-of-line execution slot perf tools: Fix sparse CPU numbering related bugs perf_event: Fix oops triggered by cpu offline/online perf: Drop the obsolete profile naming for trace events perf: Take a hot regs snapshot for trace events perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot perf/x86-64: Use frame pointer to walk on irq and process stacks lockdep: Move lock events under lockdep recursion protection perf report: Print the map table just after samples for which no map was found perf report: Add multiple event support perf session: Change perf_session post processing functions to take histogram tree perf session: Add storage for seperating event types in report perf session: Change add_hist_entry to take the tree root instead of session perf record: Add ID and to recorded event data when recording multiple events ...
| * | tree-wide: Assorted spelling fixesDaniel Mack2010-02-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In particular, several occurances of funny versions of 'success', 'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address', 'beginning', 'desirable', 'separate' and 'necessary' are fixed. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Joe Perches <joe@perches.com> Cc: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | perf hist: Don't fprintf the callgraph unconditionallyArnaldo Carvalho de Melo2010-03-121-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [root@doppio ~]# perf report -i newt.data | head -10 # Samples: 11999679868 # # Overhead Command Shared Object Symbol # ........ ....... ............................. ...... # 63.61% perf libslang.so.2.1.4 [.] SLsmg_write_chars 6.30% perf perf [.] symbols__find 2.19% perf libnewt.so.0.52.10 [.] newtListboxAppendEntry 2.08% perf libslang.so.2.1.4 [.] SLsmg_write_chars@plt 1.99% perf libc-2.10.2.so [.] _IO_vfprintf_internal [root@doppio ~]# Not good, the newt form for report works, but slang has to eat the cost of the additional callgraph lines everytime it prints a line, and the callgraph doesn't appear on the screen, so move the callgraph printing to a separate function and don't use it in newt.c. Newt tree widgets are being investigated to properly support callgraphs, but till that gets merged, lets remove this huge overhead and show at least the symbol overheads for a callgraph rich perf.data with good performance. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1268408808-13595-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
OpenPOWER on IntegriCloud