summaryrefslogtreecommitdiffstats
path: root/tools
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | tools/power turbostat: modprobe msr, if neededLen Brown2015-04-181-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some distros (Ubuntu) ship the msr driver as a module. If turbosat is run as root on those systems, and discovers that there is no /dev/cpu/cpu0/msr, it will now "modprobe msr" for the user. If not root, the modprobe attempt will fail, and turbostat will exit as before: turbostat: no /dev/cpu/0/msr, Try "# modprobe msr" : No such file or directory Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | | tools/power turbostat: dump MSR_TURBO_RATIO_LIMIT2Len Brown2015-04-181-36/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and up to 18 cores of turbo ratio limit when using the turbostat --debug option. Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | | tools/power turbostat: use new MSR_TURBO_RATIO_LIMIT namesLen Brown2015-04-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s/MSR_NHM_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT/ s/MSR_IVT_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT1/ syntax only -- use the documented strings describing these registers. Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | | tools/power turbostat: label base frequencyLen Brown2015-04-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syntax only. The cool kids are now using the phrase "base frequency", where in the past we used "max non-turbo frequency" or "TSC frequency". This distinction becomes important when a processor has a TSC that runs at a different speed than the "base frequency". Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | | tools/power turbostat: update PERF_LIMIT_REASONS decodingLen Brown2015-04-131-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cosmetic only. order the decoding of MSR_PERF_LIMIT_REASONS bits from MSB to LSB -- which you notice when more than 1 bit is set and you are, say, comparing the output to the documentation... Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | | tools/power turbostat: simplify default outputLen Brown2015-04-132-70/+98
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Casual turbostat users generally just want to know MHz. So by default, just print enough information to make sense of MHz. All the other configuration data and columns for C-states and temperature etc, are printed with the --debug option. Signed-off-by: Len Brown <len.brown@intel.com>
* | | | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2015-04-183-1/+184
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This tree includes: - an FPU related crash fix - a ptrace fix (with matching testcase in tools/testing/selftests/) - an x86 Kconfig DMA-config defaults tweak to better avoid non-working drivers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected x86/fpu: Load xsave pointer *after* initialization x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal() x86, selftests: Add single_step_syscall test
| * | | | | x86, selftests: Add single_step_syscall testAndy Lutomirski2015-04-163-1/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a very simple test that makes system calls with TF set. This test currently fails when running the 32-bit build on a 64-bit kernel on an Intel CPU. This bug will be fixed by the next commit. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Shuah Khan <shuah.kh@samsung.com> Link: http://lkml.kernel.org/r/20e68021155f6ab5c60590dcad81d37c68ea2c4f.1429139075.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2015-04-185-78/+567
|\ \ \ \ \ \ | |_|_|_|/ / |/| | | | / | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This update has mostly fixes, but also other bits: - perf tooling fixes - PMU driver fixes - Intel Broadwell PMU driver HW-enablement for LBR callstacks - a late coming 'perf kmem' tool update that enables it to also analyze page allocation data. Note, this comes with MM tracepoint changes that we believe to not break anything: because it changes the formerly opaque 'struct page *' field that uniquely identifies pages to 'pfn' which identifies pages uniquely too, but isn't as opaque and can be used for other purposes as well" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/pt: Fix and clean up error handling in pt_event_add() perf/x86/intel: Add Broadwell support for the LBR callstack perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events perf/x86: Fix hw_perf_event::flags collision perf probe: Fix segfault when probe with lazy_line to file perf probe: Find compilation directory path for lazy matching perf probe: Set retprobe flag when probe in address-based alternative mode perf kmem: Analyze page allocator events also tracing, mm: Record pfn instead of pointer to struct page
| * | | | perf probe: Fix segfault when probe with lazy_line to fileHe Kuang2015-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first argument passed to find_probe_point_lazy() should be CU die, which will be passed to die_walk_lines() when lazy_line matches. Currently, when we probe with lazy_line pattern to file without function name, NULL pointer is passed and causes a segment fault. Can be reproduced as following: $ perf probe -k vmlinux --add='fs/super.c;s->s_count=1;' [ 1958.984658] perf[1020]: segfault at 10 ip 00007fc6e10d8c71 sp 00007ffcbfaaf900 error 4 in libdw-0.161.so[7fc6e10ce000+34000] Segmentation fault After this patch: $ perf probe -k vmlinux --add='fs/super.c;s->s_count=1;' Added new event: probe:_stext (on @fs/super.c) You can now use it in all perf tools, such as: perf record -e probe:_stext -aR sleep 1 Signed-off-by: He Kuang <hekuang@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1428925290-5623-3-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf probe: Find compilation directory path for lazy matchingNaohiro Aota2015-04-133-60/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we use lazy matching, it failed to open a souce file if perf command is invoked outside of compilation directory: $ perf probe -a '__schedule;clear_*' Failed to open kernel/sched/core.c: No such file or directory Error: Failed to add events. (-2) OTOH, other commands like "probe -L" can solve the souce directory by themselves. Let's make it possible for lazy matching too! Signed-off-by: Naohiro Aota <naota@elisp.net> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1426223923-1493-1-git-send-email-naota@elisp.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf probe: Set retprobe flag when probe in address-based alternative modeHe Kuang2015-04-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When perf probe searched in a debuginfo file and failed, it tried with an alternative, in function get_alternative_probe_event(): memcpy(tmp, &pev->point, sizeof(*tmp)); memset(&pev->point, 0, sizeof(pev->point)); In this case, it drops the retprobe flag and forgets to set it back in find_alternative_probe_point(), so the problem occurs. Can be reproduced as following: $ perf probe -v -k vmlinux --add='sys_write%return' ... Added new event: Writing event: p:probe/sys_write _stext+1584952 probe:sys_write (on sys_write%return) $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/sys_write _stext+1584952 After this patch: $ perf probe -v -k vmlinux --add='sys_write%return' Added new event: Writing event: r:probe/sys_write SyS_write+0 probe:sys_write (on sys_write%return) $ cat /sys/kernel/debug/tracing/kprobe_events r:probe/sys_write SyS_write Signed-off-by: He Kuang <hekuang@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1428925290-5623-1-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf kmem: Analyze page allocator events alsoNamhyung Kim2015-04-132-17/+491
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The perf kmem command records and analyze kernel memory allocation only for SLAB objects. This patch implement a simple page allocator analyzer using kmem:mm_page_alloc and kmem:mm_page_free events. It adds two new options of --slab and --page. The --slab option is for analyzing SLAB allocator and that's what perf kmem currently does. The new --page option enables page allocator events and analyze kernel memory usage in page unit. Currently, 'stat --alloc' subcommand is implemented only. If none of these --slab nor --page is specified, --slab is implied. First run 'perf kmem record' to generate a suitable perf.data file: # perf kmem record --page sleep 5 Then run 'perf kmem stat' to postprocess the perf.data file: # perf kmem stat --page --alloc --line 10 ------------------------------------------------------------------------------- PFN | Total alloc (KB) | Hits | Order | Mig.type | GFP flags ------------------------------------------------------------------------------- 4045014 | 16 | 1 | 2 | RECLAIM | 00285250 4143980 | 16 | 1 | 2 | RECLAIM | 00285250 3938658 | 16 | 1 | 2 | RECLAIM | 00285250 4045400 | 16 | 1 | 2 | RECLAIM | 00285250 3568708 | 16 | 1 | 2 | RECLAIM | 00285250 3729824 | 16 | 1 | 2 | RECLAIM | 00285250 3657210 | 16 | 1 | 2 | RECLAIM | 00285250 4120750 | 16 | 1 | 2 | RECLAIM | 00285250 3678850 | 16 | 1 | 2 | RECLAIM | 00285250 3693874 | 16 | 1 | 2 | RECLAIM | 00285250 ... | ... | ... | ... | ... | ... ------------------------------------------------------------------------------- SUMMARY (page allocator) ======================== Total allocation requests : 44,260 [ 177,256 KB ] Total free requests : 117 [ 468 KB ] Total alloc+freed requests : 49 [ 196 KB ] Total alloc-only requests : 44,211 [ 177,060 KB ] Total free-only requests : 68 [ 272 KB ] Total allocation failures : 0 [ 0 KB ] Order Unmovable Reclaimable Movable Reserved CMA/Isolated ----- ------------ ------------ ------------ ------------ ------------ 0 32 . 44,210 . . 1 . . . . . 2 . 18 . . . 3 . . . . . 4 . . . . . 5 . . . . . 6 . . . . . 7 . . . . . 8 . . . . . 9 . . . . . 10 . . . . . Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1428298576-9785-4-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | | | Merge tag 'acpica-4.1-rc1' of ↵Linus Torvalds2015-04-172-2/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPICA updates from Rafael Wysocki: "This updates the kernel's ACPICA code to upstream revision 20150410 and adds a fix for a GPE handling regression introduced during the 3.19 cycle on top of that. Included are two stable-candidate bug fixes (one of them fixing a 3.16 regression), multiple other fixes and a bunch of cleanups. Specifics: - Fix for a GPE handling regression on Dell Latitude D600 that caused GPE signaling to stop working on that machine, which appears to be due to a hardware glitch, but it used to work and it can be made work again in a relativly straightforward way (Rafael J Wysocki). - Fix for a mutex unlock regression related to the handling of ACPI tables introduced during the 3.16 development cycle (Octavian Purdila). - _REV modification to always return 2 which has been done by all versions of Windows since NT and the firmware people started to use it to distinguish between OSes in their AML and do some silly and wrong things on that basis (Bob Moore). - Fixes and cleanups related to the acpi_physicall_address data type including one stable-candidate fix for an issue occasionally occuring on 64-bit machines running 32-bit kernels where using offsets provided by the firmware may lead to address overflows (Lv Zheng). - External() opcode support infrastructure needed for recompiling disassembled ACPI tables in some cases including interpreter modification to ignore that opcode (Bob Moore). - Support for the "Windows 2015" string in _OSI (Bob Moore). - GPE debug interface change to return values read from hardware registers (Lv Zheng). - Removal of the __DATE__ macro usage in tools (Rasmus Villemoes). - Assorted minor fixes and cleanups (Lv Zheng, Rickard Strandqvist, Bob Moore)" * tag 'acpica-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) ACPICA: Store GPE register enable masks upfront ACPICA: Update version to 20150410. ACPICA: Fix a couple issues with the local printf module. ACPICA: Disassembler: Some cleanup of the table dump module. ACPICA: iASL: Add support for MSDM ACPI table. ACPICA: Update for SLIC ACPI table. ACPICA: Add "//" before ascii output of buffers. ACPICA: Remove unused internal AML opcode. ACPICA: Permanently set _REV to the value '2'. ACPICA: Add "Windows 2015" string to _OSI support. ACPICA: Add infrastructure for External() opcode. ACPICA: iASL: Enhancement for constant folding. ACPICA: iASL/Disassembler: Add option to assume table contains valid AML. ACPICA: Update AML Debugger global variables. ACPICA: Update Resource descriptor dump module. ACPICA: Fix a sscanf format string. ACPICA: Casting changes around acpi_physical_address/acpi_size. ACPICA: Resources: Correct conditional compilation definitions. ACPICA: Utilities: Correct conditional compilation definitions. ACPICA: Tables: Move an iasl specific table function to iasl source file. ...
| * | | | | ACPICA: Fix a sscanf format string.Bob Moore2015-04-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ACPICA commit 84f3569db7accc576ace2dae81d101467254fe9d Was using %d instead of properly using %u. This patch only affects acpidump tool. Link: https://github.com/acpica/acpica/commit/84f3569d Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | ACPICA: Unix: Cleanup to use ACPI_TO_INTEGER() to calc page offset.Lv Zheng2015-04-141-1/+1
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ACPICA commit 9e2d8180f4d5e61949b17513bae8aff6412f62dd The offset calculation needn't convert a pointer to a special integer type. So this patch uses ACPI_TO_INTEGER() instead. This patch only affects acpidump tool. Link: https://github.com/acpica/acpica/commit/9e2d8180 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | Merge tag 'powerpc-4.1-1' of ↵Linus Torvalds2015-04-1621-91/+852
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - Numerous minor fixes, cleanups etc. - More EEH work from Gavin to remove its dependency on device_nodes. - Memory hotplug implemented entirely in the kernel from Nathan Fontenot. - Removal of redundant CONFIG_PPC_OF by Kevin Hao. - Rewrite of VPHN parsing logic & tests from Greg Kurz. - A fix from Nish Aravamudan to reduce memory usage by clamping nodes_possible_map. - Support for pstore on powernv from Hari Bathini. - Removal of old powerpc specific byte swap routines by David Gibson. - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing your firmware when it wasn't. - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver. - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek. - Some fixes for migration from Tyrel Datwyler. - A new syscall to switch the cpu endian by Michael Ellerman. - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn. - A fix for the OPAL sensor driver from Cédric Le Goater. - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman. - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per machine. - Small patch from Sam Bobroff to explicitly abort non-suspended transactions on syscalls, plus a test to exercise it. - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu. - Small patch to enable the hard lockup detector from Anton Blanchard. - Fix from Dave Olson for missing L2 cache information on some CPUs. - Some fixes from Michael Ellerman to get Cell machines booting again. - Freescale updates from Scott: Highlights include BMan device tree nodes, an MSI erratum workaround, a couple minor performance improvements, config updates, and misc fixes/cleanup. * tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits) powerpc/powermac: Fix build error seen with powermac smp builds powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE powerpc: Remove PPC32 code from pseries specific find_and_init_phbs() powerpc/cell: Fix iommu breakage caused by controller_ops change powerpc/eeh: Fix crash in eeh_add_device_early() on Cell powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails powerpc/pseries: Correct memory hotplug locking powerpc: Fix missing L2 cache size in /sys/devices/system/cpu powerpc: Add ppc64 hard lockup detector support oprofile: Disable oprofile NMI timer on ppc64 powerpc/perf/hv-24x7: Add missing put_cpu_var() powerpc/perf/hv-24x7: Break up single_24x7_request powerpc/perf/hv-24x7: Define update_event_count() powerpc/perf/hv-24x7: Whitespace cleanup powerpc/perf/hv-24x7: Define add_event_to_24x7_request() powerpc/perf/hv-24x7: Rename hv_24x7_event_update powerpc/perf/hv-24x7: Move debug prints to separate function powerpc/perf/hv-24x7: Drop event_24x7_request() powerpc/perf/hv-24x7: Use pr_devel() to log message ... Conflicts: tools/testing/selftests/powerpc/Makefile tools/testing/selftests/powerpc/tm/Makefile
| * | | | | selftests/powerpc: Add transactional syscall testSam bobroff2015-04-114-1/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check that a syscall made during an active transaction will fail with the correct failure code and that one made during a suspended transaction will succeed. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | selftests/powerpc: Move get_auxv_entry() to harness.cSam bobroff2015-04-114-49/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move get_auxv_entry() from pmu/lib.c up to harness.c in order to make it available to other tests. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | selftests/powerpc: Add a test of the switch_endian() syscallMichael Ellerman2015-03-286-1/+214
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a test of the switch_endian() syscall we added in the previous commit. We test it by calling the endian switch syscall, and then executing some code in the other endian to check everything went as expected. That code checks registers we expect to be maintained are. If the endian switch failed to happen that code sequence will be illegal and cause the test to abort. We then switch back to the original endian, do the same checks and finally write a success message and exit(0). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | selftests/powerpc: Rename TARGETS in powerpc selftests makefileMichael Ellerman2015-03-181-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the name of the make variable TARGETS, to prevent it from colliding with a value set by the user on the command line (as they are recommended to do by tools/testing/selftests/README.txt). Without this patch, "make -C tools/testing/selftests TARGETS=powerpc" will fail. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | selftests/powerpc: Add test for VPHNGreg Kurz2015-03-187-1/+430
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal is to verify vphn_unpack_associativity() parses VPHN numbers correctly. We feed it with a variety of input values and compare with expected results. PAPR+ does not say much about VPHN parsing: I came up with a list of tests that check many simple cases and some corner ones. I wouldn't dare to say the list is exhaustive though. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> [mpe: Rework harness logic, rename to test-vphn, add -m64] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
| * | | | | selftests/powerpc: Build the copyloops with -maltivecMichael Ellerman2015-03-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent change to remove the vrX defines exposed the fact that we are building the copyloops tests without altivec enabled. It depends on the toolchain as to whether altivec is on by default or not, so it only breaks on some toolchains. But we should always enable it. Fixes: c2ce6f9f3dc0 ("powerpc: Change vrX register defines to vX to match gcc and glibc") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Change vrX register defines to vX to match gcc and glibcAnton Blanchard2015-03-161-33/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As our various loops (copy, string, crypto etc) get more complicated, we want to share implementations between userspace (eg glibc) and the kernel. We also want to write userspace test harnesses to put in tools/testing/selftest. One gratuitous difference between userspace and the kernel is the VMX register definitions - the kernel uses vrX whereas both gcc and glibc use vX. Change the kernel to match userspace. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | | | | mm, selftests: test return value of munmap for MAP_HUGETLB memoryDavid Rientjes2015-04-153-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When MAP_HUGETLB memory is unmapped, the length must be hugepage aligned, otherwise it fails with -EINVAL. All tests currently behave correctly, but it's better to explcitly test the return value for completeness and document the requirement, especially if users copy map_hugetlb.c as a sample implementation. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Joern Engel <joern@logfs.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Eric B Munson <emunson@akamai.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2015-04-152-9/+20
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Add BQL support to via-rhine, from Tino Reichardt. 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers can support hw switch offloading. From Floria Fainelli. 3) Allow 'ip address' commands to initiate multicast group join/leave, from Madhu Challa. 4) Many ipv4 FIB lookup optimizations from Alexander Duyck. 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel Borkmann. 6) Remove the ugly compat support in ARP for ugly layers like ax25, rose, etc. And use this to clean up the neigh layer, then use it to implement MPLS support. All from Eric Biederman. 7) Support L3 forwarding offloading in switches, from Scott Feldman. 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed up route lookups even further. From Alexander Duyck. 9) Many improvements and bug fixes to the rhashtable implementation, from Herbert Xu and Thomas Graf. In particular, in the case where an rhashtable user bulk adds a large number of items into an empty table, we expand the table much more sanely. 10) Don't make the tcp_metrics hash table per-namespace, from Eric Biederman. 11) Extend EBPF to access SKB fields, from Alexei Starovoitov. 12) Split out new connection request sockets so that they can be established in the main hash table. Much less false sharing since hash lookups go direct to the request sockets instead of having to go first to the listener then to the request socks hashed underneath. From Eric Dumazet. 13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk. 14) Support stable privacy address generation for RFC7217 in IPV6. From Hannes Frederic Sowa. 15) Hash network namespace into IP frag IDs, also from Hannes Frederic Sowa. 16) Convert PTP get/set methods to use 64-bit time, from Richard Cochran. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits) fm10k: Bump driver version to 0.15.2 fm10k: corrected VF multicast update fm10k: mbx_update_max_size does not drop all oversized messages fm10k: reset head instead of calling update_max_size fm10k: renamed mbx_tx_dropped to mbx_tx_oversized fm10k: update xcast mode before synchronizing multicast addresses fm10k: start service timer on probe fm10k: fix function header comment fm10k: comment next_vf_mbx flow fm10k: don't handle mailbox events in iov_event path and always process mailbox fm10k: use separate workqueue for fm10k driver fm10k: Set PF queues to unlimited bandwidth during virtualization fm10k: expose tx_timeout_count as an ethtool stat fm10k: only increment tx_timeout_count in Tx hang path fm10k: remove extraneous "Reset interface" message fm10k: separate PF only stats so that VF does not display them fm10k: use hw->mac.max_queues for stats fm10k: only show actual queues, not the maximum in hardware fm10k: allow creation of VLAN on default vid fm10k: fix unused warnings ...
| * \ \ \ \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-04-021-0/+8
| |\ \ \ \ \ \ | | | |/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/usb/asix_common.c drivers/net/usb/sr9800.c drivers/net/usb/usbnet.c include/linux/usb/usbnet.h net/ipv4/tcp_ipv4.c net/ipv6/tcp_ipv6.c The TCP conflicts were overlapping changes. In 'net' we added a READ_ONCE() to the socket cached RX route read, whilst in 'net-next' Eric Dumazet touched the surrounding code dealing with how mini sockets are handled. With USB, it's a case of the same bug fix first going into net-next and then I cherry picked it back into net. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | tools: bpf_asm: cleanup vlan extension related tokenDaniel Borkmann2015-03-242-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now have K_VLANT, K_VLANP and K_VLANTPID. Clean them up into more descriptive token, namely K_VLAN_TCI, K_VLAN_AVAIL and K_VLAN_TPID. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | filter: introduce SKF_AD_VLAN_TPID BPF extensionMichal Sekletar2015-03-242-1/+12
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If vlan offloading takes place then vlan header is removed from frame and its contents, both vlan_tci and vlan_proto, is available to user space via TPACKET interface. However, only vlan_tci can be used in BPF filters. This commit introduces a new BPF extension. It makes possible to load the value of vlan_proto (vlan TPID) to register A. Support for classic BPF and eBPF is being added, analogous to skb->protocol. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Michal Sekletar <msekleta@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2015-04-14229-2445/+6322
|\ \ \ \ \ \ | | |_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
| * | | | | perf evlist: Fix type for references to data_head/tailDavid Ahern2015-04-103-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The data_head and data_tail fields are defined as __u64 in linux/perf_event.h, but perf userspace uses int and unsigned int. Convert all references to u64 for consistency. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1428420037-26599-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf probe: Check the orphaned -x optionMasami Hiramatsu2015-04-101-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid probing in unintended binary, the orphaned -x option must be checked and warned. Without this patch, following command sets up the probe in the kernel. ----- # perf probe -a strcpy -x ./perf Added new event: probe:strcpy (on strcpy) You can now use it in all perf tools, such as: perf record -e probe:strcpy -aR sleep 1 ----- But in this case, it seems that the user may want to probe in the perf binary. With this patch, perf-probe correctly handles the orphaned -x. ----- # perf probe -a strcpy -x ./perf Error: -x/-m must follow the probe definitions. ... ----- Reported-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150401102541.17137.75477.stgit@localhost.localdomain Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf probe: Support multiple probes on different binariesMasami Hiramatsu2015-04-103-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support multiple probes on different binaries with just one command. In the result, this example sets up the probes on icmp_rcv in kernel, on main and set_target in perf, and on pcspkr_event in pcspker.ko driver. ----- # perf probe -a icmp_rcv -x ./perf -a main -a set_target \ -m /lib/modules/4.0.0-rc5+/kernel/drivers/input/misc/pcspkr.ko \ -a pcspkr_event Added new event: probe:icmp_rcv (on icmp_rcv) You can now use it in all perf tools, such as: perf record -e probe:icmp_rcv -aR sleep 1 Added new event: probe_perf:main (on main in /home/mhiramat/ksrc/linux-3/tools/perf/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:main -aR sleep 1 Added new event: probe_perf:set_target (on set_target in /home/mhiramat/ksrc/linux-3/tools/perf/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:set_target -aR sleep 1 Added new event: probe:pcspkr_event (on pcspkr_event in pcspkr) You can now use it in all perf tools, such as: perf record -e probe:pcspkr_event -aR sleep 1 ----- Reported-by: Arnaldo Carvalho de Melo <acme@infradead.org> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150401102539.17137.46454.stgit@localhost.localdomain Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf buildid-list: Fix segfault when show DSOs with hitsHe Kuang2015-04-103-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit: f3b623b8490a ("perf tools: Reference count struct thread") appends every thread->node to dead_threads in machine__remove_thread() and list_del_init() this node in thread__put(). perf_event__exit_del_thread() releases thread wihout using machine__remove_thread(), and causes a NULL pointer crash when list_del_init(&thread->node) is called. Fix this by using machine_remove_thread() instead of using thread__put() directly. This problem can be reproduced as following: $ perf record ls $ perf buildid-list --with-hits [ 3874.195070] perf[1018]: segfault at 0 ip 00000000004b0b15 sp 00007ffc35b44780 error 6 in perf[400000+166000] Segmentation fault After this patch: $ perf record ls $ perf buildid-list --with-hits bc23e7c3281e542650ba4324421d6acf78f4c23e /proc/kcore 643324cb0e969f30c56d660f167f84a150845511 [vdso] 0000000000000000000000000000000000000000 /bin/busybox ... Signed-off-by: He Kuang <hekuang@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1428658500-6483-1-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf tools: Fix cross-endian analysisDavid Ahern2015-04-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trying to analyze a big endian data file on little endian system fails with the error: 0xa9b40 [0x70]: failed to process type: 9 The problem is that header parsing is not done correctly because the file attributes are not swapped. Make it so. With this patch able to analyze a sparc64 data file on x86_64. Signed-off-by: David Ahern <david.ahern@oracle.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1428610546-178789-1-git-send-email-david.ahern@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf tools: Fix error path to do closedir() when synthesizing threadsArnaldo Carvalho de Melo2015-04-101-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When traversing /proc to synthesize the PERF_RECORD_FORK et al events we were bailing out on errors without calling closedir(), fix it. Reported-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-vxtp593rfztgbi8noy0m967p@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf tools: Fix synthesizing fork_event.ppid for non-main threadDavid Ahern2015-04-101-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ca6c41c59b9 sets the ppid based on what is read from the /proc/pid/status file when synthesizing fork events. This is correct thing to do for new processes but not threads of a process. Fix ppid for threads to be the main thread when synthesizing fork events (ie., assume main thread spawned all sub-threads in a process). Reported-by: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Signed-off-by: David Ahern <david.ahern@oracle.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/1428598107-178999-1-git-send-email-david.ahern@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf tools: Add 'I' event modifier for exclude_idle bitJiri Olsa2015-04-084-2/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding 'I' event modifier to have complete set of modifiers for perf_event_attr:exclude_* bits. Any event specified with 'I' modifier will have the perf_event_attr:exclude_idle bit set. $ perf record -e cycles:I -vv ls 2>&1 | grep exclude_idle exclude_hv 0 exclude_idle 1 Adding automated tests. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: William Cohen <wcohen@redhat.com> Link: http://lkml.kernel.org/r/1428441919-23099-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf report: Don't call map__kmap if map is NULL.Wang Nan2015-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | report__warn_kptr_restrict() calls map__kmap(kernel_map) before checking kernel_map againest NULL. Which is dangerous, since map__kmap() will return a invalid and not NULL address. It will trigger a warning message in map__kmap() after the patch "perf: kmaps: enforce usage of kmaps to protect futher bugs." was applied. This patch fixes it by adding the missing checking. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1428490772-135393-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf tests: Fix attr testsJiri Olsa2015-04-082-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following commit: 1a5941312414 perf: Add wakeup watermark control to the AUX area enlarged perf_event_attr, but did not updated attr tests. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Markus T Metzger <markus.t.metzger@intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Link: http://lkml.kernel.org/n/20150407171715.GA22603@krava.redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf probe: Fix ARM 32 building errorWang Nan2015-04-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9b118acae310f57baee770b5db402500d8695e50 ("perf probe: Fix to handle aliased symbols in glibc") uses an absolute format '%lx' to print u64 argument, which causes compiling error on ARM 32. This patch replaces it with PRIx64. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1428459274-138470-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf tools: Merge all perf_event_attr print functionsPeter Zijlstra2015-04-083-182/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there's 3 (that I found) different and incomplete implementations of printing perf_event_attr. This is quite silly. Merge the lot. While this patch does not retain the exact form all printing that I found is debug output and thus it should not be critical. Also, I cannot find a single print_event_desc() caller. Pre: $ perf record -vv -e cycles -- sleep 1 ------------------------------------------------------------ perf_event_attr: type 0 size 104 config 0 sample_period 4000 sample_freq 4000 sample_type 0x107 read_format 0 disabled 1 inherit 1 pinned 0 exclusive 0 exclude_user 0 exclude_kernel 0 exclude_hv 0 exclude_idle 0 mmap 1 comm 1 mmap2 1 comm_exec 1 freq 1 inherit_stat 0 enable_on_exec 1 task 1 watermark 0 precise_ip 0 mmap_data 0 sample_id_all 1 exclude_host 0 exclude_guest 1 excl.callchain_kern 0 excl.callchain_user 0 wakeup_events 0 wakeup_watermark 0 bp_type 0 bp_addr 0 config1 0 bp_len 0 config2 0 branch_sample_type 0 sample_regs_user 0 sample_stack_user 0 sample_regs_intr 0 ------------------------------------------------------------ $ perf evlist -vv cycles: sample_freq=4000, size: 104, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, comm_exec: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1 Post: $ ./perf record -vv -e cycles -- sleep 1 ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ------------------------------------------------------------ $ ./perf evlist -vv cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Ahern <dsahern@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150407091150.644238729@infradead.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf record: Add clockid parameterPeter Zijlstra2015-04-085-4/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach perf-record about the new perf_event_attr::{use_clockid, clockid} fields. Add a simple parameter to set the clock (if any) to be used for the events to be recorded into the data file. Since we store the entire perf_event_attr in the EVENT_DESC section we also already store the used clockid in the data file. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yunlong Song <yunlong.song@huawei.com> Link: http://lkml.kernel.org/r/20150407154851.GR23123@twins.programming.kicks-ass.net [ Conditionally define CLOCK_BOOTTIME, at least rhel6 doesn't have it - dsahern Ditto for CLOCK_MONOTONIC_RAW, sles11sp2 doesn't have it - yunlong.song ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf sched replay: Use replay_repeat to calculate the runavg of cpu usage ↵Yunlong Song2015-04-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead of the default value 10 Since sched->replay_repeat is set to 10 as default, the sched->run_avg, sched->runavg_cpu_usage, and sched->runavg_parent_cpu_usage all use 10 to calculate their value. However, the replay_repeat can be changed to other value by using -r option, so the calculation above should use replay_repeat to achieve more accurate results instead of the default value 10. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-10-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf sched replay: Support using -f to override perf.data file ownershipYunlong Song2015-04-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable to use perf.data when it is not owned by current user or root. Example: $ ls -al perf.data -rw------- 1 Yunlong.Song Yunlong.Song 5321918 Mar 25 15:14 perf.data $ sudo id uid=0(root) gid=0(root) groups=0(root),64(pkcs11) Before this patch: $ sudo perf sched replay -f run measurement overhead: 98 nsecs sleep measurement overhead: 52909 nsecs the run test took 1000015 nsecs the sleep test took 1054253 nsecs File perf.data not owned by current user or root (use -f to override) As shown above, the -f option does not work at all. After this patch: $ sudo perf sched replay -f run measurement overhead: 221 nsecs sleep measurement overhead: 40514 nsecs the run test took 1000003 nsecs the sleep test took 1056098 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 ... ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 ------------------------------------------------------------ #1 : 50.198, ravg: 50.20, cpu: 2335.18 / 2335.18 #2 : 219.099, ravg: 67.09, cpu: 2835.11 / 2385.17 #3 : 238.626, ravg: 84.24, cpu: 3278.26 / 2474.48 #4 : 200.364, ravg: 95.85, cpu: 2977.41 / 2524.77 #5 : 176.882, ravg: 103.96, cpu: 2801.35 / 2552.43 #6 : 191.093, ravg: 112.67, cpu: 2813.70 / 2578.56 #7 : 189.448, ravg: 120.35, cpu: 2809.21 / 2601.62 #8 : 200.637, ravg: 128.38, cpu: 2849.91 / 2626.45 #9 : 248.338, ravg: 140.37, cpu: 4380.61 / 2801.87 #10 : 511.139, ravg: 177.45, cpu: 3077.73 / 2829.45 As shown above, the -f option really works now. Besides for replay, -f option can also work for latency and map. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-9-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf sched replay: Fix the EMFILE error caused by the limitation of the ↵Yunlong Song2015-04-081-5/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | maximum open files The soft maximum number of open files for a calling process is 1024, which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the hard maximum number of open files for a calling process is 4096, which is defined as INR_OPEN_MAX in include/uapi/linux/fs.h. Both INR_OPEN_CUR and INR_OPEN_MAX are used to limit the value of RLIMIT_NOFILE in include/asm-generic/resource.h. And the soft maximum number finally decides the limitation of the maximum files which are allowed to be opened. That is to say a process can use at most 1024 file descriptors for its o pened files, or an EMFILE error will happen. This error can be fixed by increasing the soft maximum number, under the constraint that the soft maximum number can not exceed the hard maximum number, or both soft and hard maximum number should be increased simultaneously with privilege. For perf sched replay, it uses sys_perf_event_open to create the file descriptor for each of the tasks in order to handle information of perf events. That is to say each task needs a unique file descriptor. In x86_64, there may be over 1024 or 4096 tasks correspoinding to the record in perf.data, which causes that no enough file descriptors can be used. As a result, EMFILE error happens and stops the replay process. To solve this problem, we adaptively increase the soft and hard maximum number of open files with a '-f' option. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 $ cat /proc/sys/fs/file-max 6815744 $ ulimit -Sn 1024 $ ulimit -Hn 4096 Before this patch: $ perf sched replay ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 Error: sys_perf_event_open() syscall returned with -1 (Too many open files) After this patch: $ perf sched replay ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 Error: sys_perf_event_open() syscall returned with -1 (Too many open files) Have a try with -f option $ perf sched replay -f ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 ------------------------------------------------------------ #1 : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21 #2 : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66 #3 : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99 #4 : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67 #5 : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96 #6 : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82 #7 : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39 #8 : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59 #9 : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65 #10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48 Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-8-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf sched replay: Handle the dead halt of sem_wait when create_tasks() ↵Yunlong Song2015-04-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fails for any task Since there is sem_wait for each task in the wait_for_tasks(), e.g. sem_wait(&task->work_done_sem). The sem_wait can continue only when work_done_sem is greater than 0, or it will be blocked. For perf sched replay, one task may sem_post the work_done_sem of another task, which causes the work_done_sem of that task processed in a reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post... This sequence simulates the sched process of the running tasks at the time when perf sched record runs. As a result, all the tasks are required and their threads must be successfully created. If any one (task A) of the tasks fails to create its thread, then another task (task B), whose work_done_sem needs sem_post from that failed task A, may likely block itself due to seg_wait. And this is a dead halt, since task B's thread_func cannot continue at all. To solve this problem, perf sched replay should exit once any task fails to create its thread. Example: Test environment: x86_64 with 160 cores Before this patch: $ perf sched replay ... Error: sys_perf_event_open() syscall returned with -1 (Too many open files) ------------------------------------------------------------ <- dead halt After this patch: $ perf sched replay ... task 1551 ( <unknown>: 0), nr_events: 10 Error: sys_perf_event_open() syscall returned with -1 (Too many open files) $ As shown above, perf sched replay finishes the process after printing an error message and does not block itself. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf sched replay: Fix the segmentation fault problem caused by pr_err in ↵Yunlong Song2015-04-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | threads The pr_err in self_open_counters() prints error message to stderr. Unlike stdout, stderr uses memory buffer on the stack of each calling process. The pr_err in self_open_counters() works in a thread called thread_func created in function create_tasks, which concurrently creates sched->nr_tasks threads. If the error happens and pr_err prints the error message in each of these threads, the stack size of the perf process (default is 8192 kbytes) will quickly run out and the segmentation fault will happen then. To solve this problem, pr_err with self_open_counters() should be moved from newly created threads to the old main thread of the perf process. Then the pr_err can work in a stable situation without the strange segmentation fault problem. Example: Test environment: x86_64 with 160 cores Before this patch: $ perf sched replay ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 Segmentation fault After this patch: $ perf sched replay ... task 1549 ( :163132: 163132), nr_events: 1 task 1550 ( :163540: 163540), nr_events: 1 task 1551 ( <unknown>: 0), nr_events: 10 ... As shown above, the result continues without any segmentation fault. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-6-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to ↵Yunlong Song2015-04-081-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the different pid_max configurations Although the memory of pid_to_task can be allocated via calloc according to the value of /proc/sys/kernel/pid_max, it cannot handle the case when pid_max is changed after 'perf sched record' has created its perf.data. If the new pid_max configured in 'perf sched replay' is smaller than the old pid_max configured in 'perf sched record', then it will cause the assertion failure problem. To solve this problem, we realloc the memory of pid_to_task stepwise once the passed-in pid parameter in register_pid is larger than the current pid_max. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 $ perf sched record ls $ echo 5000 > /proc/sys/kernel/pid_max $ cat /proc/sys/kernel/pid_max 5000 Before this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55356 nsecs the run test took 1000011 nsecs the sleep test took 1060940 nsecs perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned long)pid_max)' failed. Aborted After this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55611 nsecs the run test took 1000026 nsecs the sleep test took 1060486 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 task 3 ( :5: 5), nr_events: 1 ... Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-5-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to ↵Yunlong Song2015-04-081-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the unexpected change of pid_max The current memory allocation of struct task_desc *pid_to_task[MAX_PID] is in a permanent and preset way, and it has two problems: Problem 1: If the pid_max, which is the max number of pids in the system, is much smaller than MAX_PID (1024*1000), then it causes a waste of stack memory. This may happen in the case where the number of cpu cores is much smaller than 1000. Problem 2: If the pid_max is changed from the default value to a value larger than MAX_PID, then it will cause assertion failure problem. The maximum value of pid_max can be set to pid_max_max (see pidmap_init defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64, PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This value is much larger than MAX_PID, and will take up 32768 Kbytes (4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much larger than the default 8192 Kbytes of the stack size of calling process. Due to these two problems, we use calloc to allocate the memory of pid_to_task dynamically. Example: Test environment: x86_64 with 160 cores $ cat /proc/sys/kernel/pid_max 163840 $ echo 1025000 > /proc/sys/kernel/pid_max $ cat /proc/sys/kernel/pid_max 1025000 Run some applications until the pid of some process is greater than the value of MAX_PID (1024*1000). Before this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55480 nsecs the run test took 1000008 nsecs the sleep test took 1063151 nsecs perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)' failed. Aborted After this patch: $ perf sched replay run measurement overhead: 221 nsecs sleep measurement overhead: 55435 nsecs the run test took 1000004 nsecs the sleep test took 1059312 nsecs nr_run_events: 10 nr_sleep_events: 1562 nr_wakeup_events: 5 task 0 ( :1: 1), nr_events: 1 task 1 ( :2: 2), nr_events: 1 task 2 ( :3: 3), nr_events: 1 task 3 ( :5: 5), nr_events: 1 ... Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
OpenPOWER on IntegriCloud