summaryrefslogtreecommitdiffstats
path: root/tools/perf/bench
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'perf-core-for-mingo' into perf/urgentIngo Molnar2014-04-141-0/+4
|\ | | | | | | | | | | | | | | | | Conflicts: tools/perf/bench/numa.c Pull perf fixes from Jiri Olsa. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf bench: Set more defaults in the 'numa' suiteRamkumar Ramachandra2014-04-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, $ perf bench numa mem errors out with usage information. To make this more user-friendly, let us provide a minimum set of default values required for a test run. As an added bonus, $ perf bench all now goes all the way to completion. Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1395964219-22173-2-git-send-email-artagnon@gmail.com Signed-off-by: Jiri Olsa <jolsa@redhat.com>
* | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2014-03-315-0/+698
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Main changes: Kernel side changes: - Add SNB/IVB/HSW client uncore memory controller support (Stephane Eranian) - Fix various x86/P4 PMU driver bugs (Don Zickus) Tooling, user visible changes: - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso) - Speed up thread map generation (Don Zickus) - Introduce 'perf kvm --list-cmds' command line option for use by scripts (Ramkumar Ramachandra) - Print the evsel name in the annotate stdio output, prep to fix support outputting annotation for multiple events, not just for the first one (Arnaldo Carvalho de Melo) - Allow setting preferred callchain method in .perfconfig (Jiri Olsa) - Show in what binaries/modules 'perf probe's are set (Masami Hiramatsu) - Support distro-style debuginfo for uprobe in 'perf probe' (Masami Hiramatsu) Tooling, internal changes and fixes: - Use tid in mmap/mmap2 events to find maps (Don Zickus) - Record the reason for filtering an address_location (Namhyung Kim) - Apply all filters to an addr_location (Namhyung Kim) - Merge al->filtered with hist_entry->filtered in report/hists (Namhyung Kim) - Fix memory leak when synthesizing thread records (Namhyung Kim) - Use ui__has_annotation() in 'report' (Namhyung Kim) - hists browser refactorings to reuse code accross UIs (Namhyung Kim) - Add support for the new DWARF unwinder library in elfutils (Jiri Olsa) - Fix build race in the generation of bison files (Jiri Olsa) - Further streamline the feature detection display, trimming it a bit to show just the libraries detected, using VF=1 gets a more verbose output, showing the less interesting feature checks as well (Jiri Olsa). - Check compatible symtab type before loading dso (Namhyung Kim) - Check return value of filename__read_debuglink() (Stephane Eranian) - Move some hashing and fs related code from tools/perf/util/ to tools/lib/ so that it can be used by more tools/ living utilities (Borislav Petkov) - Prepare DWARF unwinding code for using an elfutils alternative unwinding library (Jiri Olsa) - Fix DWARF unwind max_stack processing (Jiri Olsa) - Add dwarf unwind 'perf test' entry (Jiri Olsa) - 'perf probe' improvements including memory leak fixes, sharing the intlist class with other tools, uprobes/kprobes code sharing and use of ref_reloc_sym (Masami Hiramatsu) - Shorten sample symbol resolving by adding cpumode to struct addr_location (Arnaldo Carvalho de Melo) - Fix synthesizing mmaps for threads (Don Zickus) - Fix invalid output on event group stdio report (Namhyung Kim) - Fixup header alignment in 'perf sched latency' output (Ramkumar Ramachandra) - Fix off-by-one error in 'perf timechart record' argv handling (Ramkumar Ramachandra) Tooling, cleanups: - Remove unused thread__find_map function (Jiri Olsa) - Remove unused simple_strtoul() function (Ramkumar Ramachandra) Tooling, documentation updates: - Update function names in debug messages (Ramkumar Ramachandra) - Update some code references in design.txt (Ramkumar Ramachandra) - Clarify load-latency information in the 'perf mem' docs (Andi Kleen) - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits) perf tools: Remove unused simple_strtoul() function perf tools: Update some code references in design.txt perf evsel: Update function names in debug messages perf tools: Remove thread__find_map function perf annotate: Print the evsel name in the stdio output perf report: Use ui__has_annotation() perf tools: Fix memory leak when synthesizing thread records perf tools: Use tid in mmap/mmap2 events to find maps perf report: Merge al->filtered with hist_entry->filtered perf symbols: Apply all filters to an addr_location perf symbols: Record the reason for filtering an address_location perf sched: Fixup header alignment in 'latency' output perf timechart: Fix off-by-one error in 'record' argv handling perf machine: Factor machine__find_thread to take tid argument perf tools: Speed up thread map generation perf kvm: introduce --list-cmds for use by scripts perf ui hists: Pass evsel to hpp->header/width functions explicitly perf symbols: Introduce thread__find_cpumode_addr_location perf session: Change header.misc dump from decimal to hex perf ui/tui: Reuse generic __hpp__fmt() code ...
| * perf bench: Add futex-requeue microbenchmarkDavidlohr Bueso2014-03-143-0/+225
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Block a bunch of threads on a futex and requeue them on another, N at a time. This program is particularly useful to measure the latency of nthread requeues without waking up any tasks -- thus mimicking a regular futex_wait. An example run: $ perf bench futex requeue -r 100 -t 64 Run summary [PID 151011]: Requeuing 64 threads (from 0x7d15c4 to 0x7d15c8), 1 at a time. [Run 1]: Requeued 64 of 64 threads in 0.0400 ms [Run 2]: Requeued 64 of 64 threads in 0.0390 ms [Run 3]: Requeued 64 of 64 threads in 0.0400 ms ... [Run 100]: Requeued 64 of 64 threads in 0.0390 ms Requeued 64 of 64 threads in 0.0399 ms (+-0.37%) Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Low <jason.low2@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1387081917-9102-4-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf bench: Add futex-wake microbenchmarkDavidlohr Bueso2014-03-143-0/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Block a bunch of threads on a futex and wake them up, N at a time. This program is particularly useful to measure the latency of nthread wakeups in non-error situations: all waiters are queued and all wake calls wakeup one or more tasks. An example run: $ perf bench futex wake -t 512 -r 100 Run summary [PID 27823]: blocking on 512 threads (at futex 0x7e10d4), waking up 1 at a time. [Run 1]: Wokeup 512 of 512 threads in 6.0080 ms [Run 2]: Wokeup 512 of 512 threads in 5.2280 ms [Run 3]: Wokeup 512 of 512 threads in 4.8300 ms ... [Run 100]: Wokeup 512 of 512 threads in 5.0100 ms Wokeup 512 of 512 threads in 5.0109 ms (+-2.25%) Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Low <jason.low2@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1387081917-9102-3-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf bench: Add futex-hash microbenchmarkDavidlohr Bueso2014-03-143-0/+261
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce futexes to perf-bench and add a program that stresses and measures the kernel's implementation of the hash table. This is a multi-threaded program that simply measures the amount of failed futex wait calls - we only want to deal with the hashing overhead, so a negative return of futex_wait_setup() is enough to do the trick. An example run: $ perf bench futex hash -t 32 Run summary [PID 10989]: 32 threads, each operating on 1024 [private] futexes for 10 secs. [thread 0] futexes: 0x19d9b10 ... 0x19dab0c [ 418713 ops/sec ] [thread 1] futexes: 0x19daca0 ... 0x19dbc9c [ 469913 ops/sec ] [thread 2] futexes: 0x19dbe30 ... 0x19dce2c [ 479744 ops/sec ] ... [thread 31] futexes: 0x19fbb80 ... 0x19fcb7c [ 464179 ops/sec ] Averaged 454310 operations/sec (+- 0.84%), total secs = 10 Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Low <jason.low2@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1387081917-9102-2-git-send-email-davidlohr@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf bench numa: Make no args mean 'run all tests'Arnaldo Carvalho de Melo2014-03-141-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we call just: perf bench numa mem it will present the same output as: perf bench numa mem -h i.e. ask for instructions about what to run. While that is kinda ok, using 'run all tests' as the default, i.e. making 'no parms' be equivalent to: perf bench numa mem -a Will allow: perf bench numa all to actually do what is asked: i.e. run all the 'bench' tests, instead of responding to that by asking what to do. That, in turn, allows: perf bench all to actually complete, for the same reasons. And after that, the tests that come after that, and that at some point hit a NULL deref, will run, allowing me to reproduce a recently reported problem. That when you have the needed numa libraries, which wasn't the case for the reporter, making me a bit confused after trying to reproduce his report. So make no parms mean -a. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Patrick Palka <patrick@parcs.ath.cx> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-x7h0ghx4pef4n0brywg21krk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Fix bench/numa.c for 32-bit buildAdrian Hunter2013-10-211-2/+2
| | | | | | | | | | | | | | | | | | | | bench/numa.c: In function 'worker_thread': bench/numa.c:1123:20: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] bench/numa.c:1171:6: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' [-Werror=format] cc1: all warnings being treated as errors Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382099356-4918-13-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Fix failing assertions in numa benchPetr Holasek2013-10-111-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch adds more subtle handling of -C and -N parameters in parse_{cpu,node}_setup_list() functions when there isn't enough NUMA nodes or CPUs present. Instead of assertion and terminating benchmark, partial test is skipped with error message and perf will continue to the next one. Fixed problem can be easily reproduced on machine with only one NUMA node: # Running numa/mem benchmark... # Running main, "perf bench numa mem -a" ... # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s perf: bench/numa.c:622: parse_setup_node_list: Assertion `!(bind_node_0 < 0 || bind_node_0 >= g->p.nr_nodes)' failed. Aborted Signed-off-by: Petr Holasek <pholasek@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Petr Benas <pbenas@redhat.com> Link: http://lkml.kernel.org/r/1380821325-4017-1-git-send-email-pholasek@redhat.com Signed-off-by: Petr Benas <pbenas@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench sched: Add --threaded optionIngo Molnar2013-10-111-29/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the measurement of thread versus process context switch performance. The default stays at 'process' based measurement, like lmbench's lat_ctx benchmark. Sample output: comet:~/tip/tools/perf> taskset 1 ./perf bench sched pipe # Running sched/pipe benchmark... # Executed 1000000 pipe operations between two processes Total time: 4.138 [sec] 4.138729 usecs/op 241620 ops/sec comet:~/tip/tools/perf> taskset 1 ./perf bench sched pipe --threaded # Running sched/pipe benchmark... # Executed 1000000 pipe operations between two threads Total time: 3.667 [sec] 3.667667 usecs/op 272652 ops/sec Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Link: http://lkml.kernel.org/r/20130917114256.GA31159@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools/perf: Standardize feature support define names to: HAVE_{FEATURE}_SUPPORTIngo Molnar2013-10-094-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Standardize all the feature flags based on the HAVE_{FEATURE}_SUPPORT naming convention: HAVE_ARCH_X86_64_SUPPORT HAVE_BACKTRACE_SUPPORT HAVE_CPLUS_DEMANGLE_SUPPORT HAVE_DWARF_SUPPORT HAVE_ELF_GETPHDRNUM_SUPPORT HAVE_GTK2_SUPPORT HAVE_GTK_INFO_BAR_SUPPORT HAVE_LIBAUDIT_SUPPORT HAVE_LIBELF_MMAP_SUPPORT HAVE_LIBELF_SUPPORT HAVE_LIBNUMA_SUPPORT HAVE_LIBUNWIND_SUPPORT HAVE_ON_EXIT_SUPPORT HAVE_PERF_REGS_SUPPORT HAVE_SLANG_SUPPORT HAVE_STRLCPY_SUPPORT Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/n/tip-u3zvqejddfZhtrbYbfhi3spa@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* perf bench: Fix memcpy benchmark for large sizesAndi Kleen2013-07-221-0/+2
| | | | | | | | | | | | | | | | | The glibc calloc() function has an optimization to not explicitely memset() very large calloc allocations that just came from mmap(), because they are known to be zero. This could result in the perf memcpy benchmark reading only from the zero page, which gives unrealistic results. Always call memset explicitly on the source area to avoid this problem. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/n/tip-pzz2qrdq9eymxda0y8yxdn33@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Fix memory allocation fail check in mem{set,cpy} workloadsKirill A. Shutemov2013-07-082-3/+3
| | | | | | | | | | | | | | Addresses of allocated memory areas saved to '*src' and '*dst', so we need to check them for NULL, not 'src' and 'dst'. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1370518503-4230-1-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Fix LIBNUMA build with glibc 2.12 and older.Vinson Lee2013-03-141-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | The tokens MADV_HUGEPAGE and MADV_NOHUGEPAGE are not available with glibc 2.12 and older. Define these tokens if they are not already defined. This patch fixes these build errors with older versions of glibc. CC bench/numa.o bench/numa.c: In function ‘alloc_data’: bench/numa.c:334: error: ‘MADV_HUGEPAGE’ undeclared (first use in this function) bench/numa.c:334: error: (Each undeclared identifier is reported only once bench/numa.c:334: error: for each function it appears in.) bench/numa.c:341: error: ‘MADV_NOHUGEPAGE’ undeclared (first use in this function) make: *** [bench/numa.o] Error 1 Signed-off-by: Vinson Lee <vlee@twitter.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Irina Tirdea <irina.tirdea@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1363214064-4671-2-git-send-email-vlee@twitter.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf: Add 'perf bench numa mem' NUMA performance measurement suiteIngo Molnar2013-01-302-0/+1732
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a suite of NUMA performance benchmarks. The goal was simulate the behavior and access patterns of real NUMA workloads, via a wide range of parameters, so this tool goes well beyond simple bzero() measurements that most NUMA micro-benchmarks use: - It processes the data and creates a chain of data dependencies, like a real workload would. Neither the compiler, nor the kernel (via KSM and other optimizations) nor the CPU can eliminate parts of the workload. - It randomizes the initial state and also randomizes the target addresses of the processing - it's not a simple forward scan of addresses. - It provides flexible options to set process, thread and memory relationship information: -G sets "global" memory shared between all test processes, -P sets "process" memory shared by all threads of a process and -T sets "thread" private memory. - There's a NUMA convergence monitoring and convergence latency measurement option via -c and -m. - Micro-sleeps and synchronization can be injected to provoke lock contention and scheduling, via the -u and -S options. This simulates IO and contention. - The -x option instructs the workload to 'perturb' itself artificially every N seconds, by moving to the first and last CPU of the system periodically. This way the stability of convergence equilibrium and the number of steps taken for the scheduler to reach equilibrium again can be measured. - The amount of work can be specified via the -l loop count, and/or via a -s seconds-timeout value. - CPU and node memory binding options, to test hard binding scenarios. THP can be turned on and off via madvise() calls. - Live reporting of convergence progress in an 'at glance' output format. Printing of convergence and deconvergence events. The 'perf bench numa mem -a' option will start an array of about 30 individual tests that will each output such measurements: # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 5x5-bw-thread, 20.276, secs, runtime-max/thread 5x5-bw-thread, 20.004, secs, runtime-min/thread 5x5-bw-thread, 20.155, secs, runtime-avg/thread 5x5-bw-thread, 0.671, %, spread-runtime/thread 5x5-bw-thread, 21.153, GB, data/thread 5x5-bw-thread, 528.818, GB, data-total 5x5-bw-thread, 0.959, nsecs, runtime/byte/thread 5x5-bw-thread, 1.043, GB/sec, thread-speed 5x5-bw-thread, 26.081, GB/sec, total-speed See the help text and the code for more details. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* perf tools: Use __maybe_used for unused variablesIrina Tirdea2012-09-115-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf defines both __used and __unused variables to use for marking unused variables. The variable __used is defined to __attribute__((__unused__)), which contradicts the kernel definition to __attribute__((__used__)) for new gcc versions. On Android, __used is also defined in system headers and this leads to warnings like: warning: '__used__' attribute ignored __unused is not defined in the kernel and is not a standard definition. If __unused is included everywhere instead of __used, this leads to conflicts with glibc headers, since glibc has a variables with this name in its headers. The best approach is to use __maybe_unused, the definition used in the kernel for __attribute__((unused)). In this way there is only one definition in perf sources (instead of 2 definitions that point to the same thing: __used and __unused) and it works on both Linux and Android. This patch simply replaces all instances of __used and __unused with __maybe_unused. Signed-off-by: Irina Tirdea <irina.tirdea@intel.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com [ committer note: fixed up conflict with a116e05 in builtin-sched.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: fix assert when NDEBUG is definedIrina Tirdea2012-09-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When NDEBUG is defined, the assert macro will be expanded to nothing. Some assert calls used in perf are also including some functionality (e.g. system calls), not only validity checks. Therefore, if NDEBUG is defined, this functionality will be removed along with the assert. Perf also defines BUG_ON based on assert, so it has the same problem. Define BUG_ON so that the condition will be executed when NDEBUG is defined. Replace the assert statements that have these side effects with BUG_ON. For defining BUG_ON, use "if (cond) {}" insted of "if (cond) ;" because in the latter case build fails with "error: suggest braces around empty body in an ‘if’ statement [-Werror=empty-body]" Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Irina Tirdea <irina.tirdea@intel.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1347082551-2394-1-git-send-email-irina.tirdea@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Fix confused variable namings and descriptions in mem subsystemHitoshi Mitake2012-07-022-80/+80
| | | | | | | | | | | | | | | | | | | | | | As Namhyung Kim pointed, there are confused namings and descriptions of words "cycle" and "clock" in mem-memset.c and mem-memcpy.c. With the option "-c" (or "--clock", now renamed as "--cycle"), mem subsystem measures cost of memset() and memcpy() with cpu-cycles event. But current mem subsystem source code contains lots of confused variable namings and descriptions with "clock" (e.g. the variable use_clock). This is a very bad style because there is another software event named "cpu-clock". This patch replaces wrong usage of "clock" to "cycle". v2: modified Documentation/perf-bench.txt for the descriptions of --cycle option Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1341236777-18457-1-git-send-email-h.mitake@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Documentation updateNamhyung Kim2012-06-272-6/+6
| | | | | | | | | | | | | | | The current perf-bench documentation has a couple of typos and even lacks entire description of mem subsystem. Fix it. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340172486-17805-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tool: Fix perf stack to non executable on x86_64Jiri Olsa2012-02-061-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | By adding following objects: bench/mem-memset-x86-64-asm.o bench/mem-memcpy-x86-64-asm.o the x86_64 perf binary ended up with executable stack. The reason was that above objects are assembler sourced and are missing the GNU-stack note section. In such case the linker assumes that the final binary should not be restricted at all and mark the stack as RWX. Adding section ".note.GNU-stack" definition to mentioned objects, with all flags disabled, thus omiting those objects from linker stack flags decision. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=783570 Reported-by: Clark Williams <williams@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1328100848-5630-1-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> [ committer note: Remaining bits after what was already added to perf/urgent ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge branch 'perf/urgent' into perf/coreArnaldo Carvalho de Melo2012-02-061-0/+6
|\ | | | | | | | | | | | | So that we can get the perf bench exec stack fixes and then apply the remaining fix for the files added after what is in perf/urgent. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf tools: Fix perf stack to non executable on x86_64Jiri Olsa2012-02-061-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By adding following objects: bench/mem-memcpy-x86-64-asm.o the x86_64 perf binary ended up with executable stack. The reason was that above object are assembler sourced and is missing the GNU-stack note section. In such case the linker assumes that the final binary should not be restricted at all and mark the stack as RWX. Adding section ".note.GNU-stack" definition to mentioned object, with all flags disabled, thus omiting this object from linker stack flags decision. Problem introduced in: $ git describe ea7872b v2.6.37-rc2-19-gea7872b Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=783570 Reported-by: Clark Williams <williams@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1328100848-5630-1-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> [ committer note: Backported fix to perf/urgent (3.3-rc2+) ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf tools: Remove unnecessary ctype.h inclusionNamhyung Kim2012-01-302-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are unnecessary #include <ctype.h> out there, and they might cause a nasty build failure in some environment. As we already have most of ctype macros in util.h, just get rid of them. A few of exceptions are util/symbol.c which needs isupper() macro util.h doesn't provide and perl scripting support code which includes ctype.h internally. Suggested-by: Ingo Molnar <mingo@elte.hu> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1327827356-8786-4-git-send-email-namhyung@gmail.com Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf bench: Allow passing an iteration count to "bench mem mem{cpy,set}"Jan Beulich2012-01-242-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "perf stat ... perf bench mem mem..." is pretty meaningless when using small block sizes (as the overhead of the invocation of each test run basically hides the actual test result in the noise). Repeating the actually interesting function's invocation a number of times allows the results to become meaningful. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4F16D767020000780006D738@nat28.tlf.novell.com Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf bench: Also allow measuring memset()Jan Beulich2012-01-245-0/+322
| | | | | | | | | | | | | | | | | | | | | | | | This simply clones the respective memcpy() implementation. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/4F16D743020000780006D735@nat28.tlf.novell.com Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf bench: Also allow measuring alternative memcpy implementationsJan Beulich2012-01-242-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intended to be able to support the current selection of the preferred memcpy() implementation, this patch adds the ability to also measure the two alternative implementations, again by way of using some pre-processsor replacement. While on my Westmere system this proves that the movsb based variant is worse than the movsq based one (since the ERMS feature isn't there), it also shows that here for the default as well as small sizes the unrolled variant outperforms the movsq one. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4F16D728020000780006D732@nat28.tlf.novell.com Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf bench: Make "default" memcpy() selection actually use glibc's ↵Jan Beulich2012-01-241-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | implementation Since arch/x86/lib/memcpy_64.S implements not only __memcpy, but also memcpy, without further precautions this function will get chose by the static linker for resolving all references, and hence the "default" measurement didn't really measure anything else than the "x86-64-unrolled" one. Fix this by renaming (through the pre-processor) the conflicting symbol. On my Westmere system, the glibc variant turns out to require about 4% less instructions, but 15% more cycles for the default 1Mb block size measured. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4F16D6FD020000780006D72F@nat28.tlf.novell.com Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tool: Fix gcc 4.6.0 issuesKyle McMartin2011-02-071-1/+1
| | | | | | | | | | | | | | | | | | | | | GCC 4.6.0 in Fedora rawhide turned up some compile errors in tools/perf due to the -Werror=unused-but-set-variable flag. I've gone through and annotated some of the assignments that had side effects (ie: return value from a function) with the __used annotation, and in some cases, just removed unused code. In a few cases, we were assigning something useful, but not using it in later parts of the function. kyle@dreadnought:~/src% gcc --version gcc (GCC) 4.6.0 20110122 (Red Hat 4.6.0-0.3) Cc: Ingo Molnar <mingo@redhat.com> LKML-Reference: <20110124161304.GK27353@bombadil.infradead.org> Signed-off-by: Kyle McMartin <kyle@redhat.com> [ committer note: Fixed up the annotation fixes, as that code moved recently ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Add feature that measures the performance of the ↵Hitoshi Mitake2010-11-263-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | arch/x86/lib/memcpy_64.S memcpy routines via 'perf bench mem' This patch ports arch/x86/lib/memcpy_64.S to perf bench mem memcpy for benchmarking memcpy() in userland with tricky and dirty way. util/include/asm/cpufeature.h, util/include/asm/dwarf2.h, and util/include/linux/linkage.h are mostly dummy files with small wrappers, so that we are able to include memcpy_64.S unmodified. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: h.mitake@gmail.com Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Ma Ling <ling.ma@intel.com> Cc: Zhao Yakui <yakui.zhao@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <1290668693-27068-2-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Print both of prefaulted and no prefaulted results by defaultHitoshi Mitake2010-11-261-57/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After applying this patch, perf bench mem memcpy prints both of prefualted and without prefaulted score of memcpy(). New options --no-prefault and --only-prefault are added to print single result, mainly for scripting usage. Usage example: | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB | # Running mem/memcpy benchmark... | # Copying 500MB Bytes ... | | 634.969014 MB/Sec | 4.828062 GB/Sec (with prefault) | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --only-prefault | # Running mem/memcpy benchmark... | # Copying 500MB Bytes ... | | 4.705192 GB/Sec (with prefault) | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --no-prefault | # Running mem/memcpy benchmark... | # Copying 500MB Bytes ... | | 642.725568 MB/Sec Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: h.mitake@gmail.com Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Ma Ling <ling.ma@intel.com> Cc: Zhao Yakui <yakui.zhao@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <1290668693-27068-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf options: Check v type in OPT_U?INTEGERArnaldo Carvalho de Melo2010-05-171-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | To avoid problems like the one fixed by Stephane Eranian in 3de29ca, now we'll got this instead: bench/sched-messaging.c:259: error: negative width in bit-field ‘<anonymous>’ bench/sched-messaging.c:261: error: negative width in bit-field ‘<anonymous>’ Which is rather cryptic, but is how BUILD_BUG_ON_ZERO works, so kernel hackers should be already used to this. With it in place found some problems, fixed by changing the affected variables to sensible types or changed some OPT_INTEGER to OPT_UINTEGER. Next csets will go thru converting each of the remaining OPT_ so that review can be made easier by grouping changes per type per patch. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce ↵Ian Munsie2010-04-142-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OPT_INCR() Parsing an option from the command line with OPT_BOOLEAN on a bool data type would not work on a big-endian machine due to the manner in which the boolean was being cast into an int and incremented. For example, running 'perf probe --list' on a PowerPC machine would fail to properly set the list_events bool and would therefore print out the usage information and terminate. This patch makes OPT_BOOLEAN work as expected with a bool datatype. For cases where the original OPT_BOOLEAN was intentionally being used to increment an int each time it was passed in on the command line, this patch introduces OPT_INCR with the old behaviour of OPT_BOOLEAN (the verbose variable is currently the only such example of this). I have reviewed every use of OPT_BOOLEAN to verify that a true C99 bool was passed. Where integers were used, I verified that they were only being used for boolean logic and changed them to bools to ensure that they would not be mistakenly used as ints. The major exception was the verbose variable which now uses OPT_INCR instead of OPT_BOOLEAN. Signed-off-by: Ian Munsie <imunsie@au.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> # NOTE: wont apply to .3[34].x cleanly, please backport Cc: Git development list <git@vger.kernel.org> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Valdis.Kletnieks@vt.edu Cc: WANG Cong <amwang@redhat.com> Cc: Thiago Farina <tfransosi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Mike Galbraith <efault@gmx.de> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: John Kacur <jkacur@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: fix spelloRandy Dunlap2010-04-081-1/+1
| | | | | | | | | Fix spello in user message. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Paul Mackerra <paulus@samba.org>s LKML-Reference: <20100331113056.2c7df509.randy.dunlap@oracle.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
* perf tools: Move the prototypes in util/string.h to util.hArnaldo Carvalho de Melo2010-04-031-1/+0
| | | | | | | | | | | | | So that we avoid conflict with libc's string.h header. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf sched: Fix build failure on sparcDavid Miller2009-12-142-4/+8
| | | | | | | | | | | | | | | | Here, tvec->tv_usec is "unsigned int" not "unsigned long". Since the type is different on every platform, it's probably best to just use long printf formats and cast. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091213.235622.53363059.davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Add "all" pseudo subsystem and "all" pseudo suiteHitoshi Mitake2009-12-142-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new "all" pseudo subsystem and an "all" pseudo suite. These are for testing all subsystem and its all suite, or all suite of one subsystem. (This patch also contains a few trivial comment fixes for bench/* and output style fixes. I judged that there are no necessity to make them into individual patch.) Example of use: | % ./perf bench sched all # Test all suites of sched subsystem | # Running sched/messaging benchmark... | # 20 sender and receiver processes per group | # 10 groups == 400 processes run | | Total time: 0.414 [sec] | | # Running sched/pipe benchmark... | # Extecuted 1000000 pipe operations between two tasks | | Total time: 10.999 [sec] | | 10.999317 usecs/op | 90914 ops/sec | | % ./perf bench all # Test all suites of all subsystems | # Running sched/messaging benchmark... | # 20 sender and receiver processes per group | # 10 groups == 400 processes run | | Total time: 0.420 [sec] | | # Running sched/pipe benchmark... | # Extecuted 1000000 pipe operations between two tasks | | Total time: 11.741 [sec] | | 11.741346 usecs/op | 85169 ops/sec | | # Running mem/memcpy benchmark... | # Copying 1MB Bytes from 0x7ff33e920010 to 0x7ff3401ae010 ... | | 808.407437 MB/Sec Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1260691319-4683-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf tools: Introduce zalloc() for the common calloc(1, N) caseArnaldo Carvalho de Melo2009-11-241-2/+2
| | | | | | | | | | | | | This way we type less characters and it looks more like the kzalloc kernel counterpart. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1259071517-3242-3-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Make the mem/memcpy tests more user-friendlyHitoshi Mitake2009-11-221-15/+22
| | | | | | | | | | | | | | | | | | | | | mem-memcpy.c uses perf event system calls to obtain CPU clocks. And it suddenly dies with BUG_ON() when it running on Linux doesn't support perf event. Also fail at calloc() can occur easily when too large length is passed. Fail of calloc() causes sudden death with assert(). These behaviours are not friendly. So I fixed the treating of errors. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1258688237-3797-1-git-send-email-mitake@dcl.info.waseda.ac.jp> [ v2: improved a few small details ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Add memcpy() benchmarkHitoshi Mitake2009-11-192-0/+187
| | | | | | | | | | | | | | | | | | | | | 'perf bench mem memcpy' is a benchmark suite for measuring memcpy() performance. Example on a Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz: | % perf bench mem memcpy -l 1GB | # Running mem/memcpy benchmark... | # Copying 1MB Bytes from 0xb7d98008 to 0xb7e99008 ... | | 726.216412 MB/Sec Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1258471212-30281-1-git-send-email-mitake@dcl.info.waseda.ac.jp> [ v2: updated changelog, clarified history of builtin-bench.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Improve sched-message.c with more comfortable outputHitoshi Mitake2009-11-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | This patch improves sched-message.c with more comfortable output. Change points are comment style description and formatting numerical values and its units. Example: | % perf bench sched messaging | # Running sched/messaging benchmark... | # 20 sender and receiver processes per group | # 10 groups == 400 processes run | | Total time: 1.490 [sec] Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1257865442-20252-4-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org>
* perf bench: Improve sched-pipe.c with more comfortable outputHitoshi Mitake2009-11-101-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | This patch improves sched-pipe.c with more comfortable output. Change points are comment style description and formatting numerical values and its units. Example: | % ./perf bench sched pipe | # Running sched/pipe benchmark... | # Extecuted 1000000 pipe operations between two tasks | | Total time:5.822 [sec] | | 5.822553 usecs/op | 171745 ops/sec Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1257865442-20252-3-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Clean up bench/bench.hIngo Molnar2009-11-101-9/+7
| | | | | | | | | | | | | | | | Clean up initializers in bench.h: - No need to break the line for function prototypes, they are more readable in a single line. (even if checkpatch complains about it - We try to align definitions / structure fields vertically, to make it all a bit more readable. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1257853855-28934-2-git-send-email-mitake@dcl.info.waseda.ac.jp>
* perf bench: Modify builtin-pipe.c for processing common optionsHitoshi Mitake2009-11-101-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch modifies builtin-pipe.c for processing common options. The first option added is "--format". Users of perf bench will be able to specify output style by --format. Usage example: % ./perf bench sched pipe # with no style specify (executing 1000000 pipe operations between two tasks) Total time:5.855 sec 5.855061 usecs/op 170792 ops/sec % ./perf bench --format=simple sched pipe # specified simple 5.988 Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1257808802-9420-5-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org>
* perf bench: Modify bench/bench-messaging.c to adopt unified output formattingHitoshi Mitake2009-11-101-7/+11
| | | | | | | | | | | | | | | | | | | | | | This patch modifies bench/bench-messaging.c to adopt unified output formatting: --format option. Usage example: % ./perf bench sched messaging # with no style specify (20 sender and receiver processes per group) (10 groups == 400 processes run) Total time:1.431 sec % ./perf bench --format=simple sched messaging # specified simple 1.431 Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1257808802-9420-4-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Add format constants to bench.h for unified output formattingHitoshi Mitake2009-11-101-0/+9
| | | | | | | | | | | | This patch adds some constants and extern declaration to bench.h. These are used for unified output formatting of 'perf bench'. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1257808802-9420-2-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Fix bench/sched-pipe.c to wait for child processHitoshi Mitake2009-11-091-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Ingo reported this small 'perf bench sched pipe' output problem: | $ ./perf bench sched pipe | (executing 1000000 pipe operations between two tasks) | | Total time:4.898 sec | $ 4.898586 usecs/op | 204140 ops/sec | | the shell prompt came back before the usecs/op and ops/sec line | was printed. Process teardown race, lack of wait() or so? This caused by lack of calling waitpid() by parent process, so I added it. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> LKML-Reference: <1257737465-7546-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Add sched-pipe.c: Benchmark for pipe() system callHitoshi Mitake2009-11-081-0/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds bench/sched-pipe.c. bench/sched-pipe.c is a benchmark program to measure performance of pipe() system call. This benchmark is based on pipe-test-1m.c by Ingo Molnar: http://people.redhat.com/mingo/cfs-scheduler/tools/pipe-test-1m.c Example of use: % perf bench sched pipe (executing 1000000 pipe operations between two tasks) Total time:4.499 sec 4.499179 usecs/op 222262 ops/sec % perf bench sched pipe -s -l 1000 0.015 Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: fweisbec@gmail.com Cc: Jiri Kosina <jkosina@suse.cz> LKML-Reference: <1257381097-4743-4-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Add sched-messaging.c: Benchmark for scheduler and IPC ↵Hitoshi Mitake2009-11-081-0/+332
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mechanisms based on hackbench This patch adds bench/sched-messaging.c. This benchmark measures performance of scheduler and IPC mechanisms, and is based on hackbench by Rusty Russell. Example of usage: % perf bench sched messaging -g 20 -l 1000 -s 5.432 # in sec % perf bench sched messaging # run with default options (20 sender and receiver processes per group) (10 groups == 400 processes run) Total time:0.308 sec % perf bench sched messaging -t -g 20 # # be multi-thread, with 20 groups (20 sender and receiver threads per group) (20 groups == 800 threads run) Total time:0.582 sec ( Rusty is the original author of hackbench.c and he said the code is and was under the GPLv2 so fine to be merged. ) Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: fweisbec@gmail.com Cc: Jiri Kosina <jkosina@suse.cz> LKML-Reference: <1257381097-4743-3-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf bench: Add new directory and header for new subcommand 'bench'Hitoshi Mitake2009-11-081-0/+9
This patch adds bench/ directory and bench/bench.h. bench/ directory will contain modules for bench subcommand. bench/bench.h is for listing prototypes of module functions. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: fweisbec@gmail.com Cc: Jiri Kosina <jkosina@suse.cz> LKML-Reference: <1257381097-4743-2-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
OpenPOWER on IntegriCloud