summaryrefslogtreecommitdiffstats
path: root/kernel/trace
Commit message (Collapse)AuthorAgeFilesLines
* tracing: kdb: Fix ftdump to not sleepDouglas Anderson2019-04-053-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 31b265b3baaf55f209229888b7ffea523ddab366 ] As reported back in 2016-11 [1], the "ftdump" kdb command triggers a BUG for "sleeping function called from invalid context". kdb's "ftdump" command wants to call ring_buffer_read_prepare() in atomic context. A very simple solution for this is to add allocation flags to ring_buffer_read_prepare() so kdb can call it without triggering the allocation error. This patch does that. Note that in the original email thread about this, it was suggested that perhaps the solution for kdb was to either preallocate the buffer ahead of time or create our own iterator. I'm hoping that this alternative of adding allocation flags to ring_buffer_read_prepare() can be considered since it means I don't need to duplicate more of the core trace code into "trace_kdb.c" (for either creating my own iterator or re-preparing a ring allocator whose memory was already allocated). NOTE: another option for kdb is to actually figure out how to make it reuse the existing ftrace_dump() function and totally eliminate the duplication. This sounds very appealing and actually works (the "sr z" command can be seen to properly dump the ftrace buffer). The downside here is that ftrace_dump() fully consumes the trace buffer. Unless that is changed I'd rather not use it because it means "ftdump | grep xyz" won't be very useful to search the ftrace buffer since it will throw away the whole trace on the first grep. A future patch to dump only the last few lines of the buffer will also be hard to implement. [1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.org Reported-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tracing: initialize variable in create_dyn_event()Frank Rowand2019-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | commit 3dee10da2e9ff220e054a8f158cc296c797fbe81 upstream. Fix compile warning in create_dyn_event(): 'ret' may be used uninitialized in this function [-Wuninitialized]. Link: http://lkml.kernel.org/r/1553237900-8555-1-git-send-email-frowand.list@gmail.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 5448d44c3855 ("tracing: Add unified dynamic event framework") Signed-off-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tracing/perf: Use strndup_user() instead of buggy open-coded versionJann Horn2019-03-231-9/+7
| | | | | | | | | | | | | | | | | | | | | commit 83540fbc8812a580b6ad8f93f4c29e62e417687e upstream. The first version of this method was missing the check for `ret == PATH_MAX`; then such a check was added, but it didn't call kfree() on error, so there was still a small memory leak in the error case. Fix it by using strndup_user() instead of open-coding it. Link: http://lkml.kernel.org/r/20190220165443.152385-1-jannh@google.com Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Fixes: 0eadcc7a7bc0 ("perf/core: Fix perf_uprobe_init()") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tracing: Do not free iter->trace in fail path of tracing_open_pipe()zhangyi (F)2019-03-231-1/+0
| | | | | | | | | | | | | | | | | | | | | commit e7f0c424d0806b05d6f47be9f202b037eb701707 upstream. Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files") use the current tracer instead of the copy in tracing_open_pipe(), but it forget to remove the freeing sentence in the error path. There's an error path that can call kfree(iter->trace) after the iter->trace was assigned to tr->current_trace, which would be bad to free. Link: http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zhang@huawei.com Cc: stable@vger.kernel.org Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tracing: Use strncpy instead of memcpy for string keys in hist triggersTom Zanussi2019-03-231-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9f0bbf3115ca9f91f43b7c74e9ac7d79f47fc6c2 upstream. Because there may be random garbage beyond a string's null terminator, it's not correct to copy the the complete character array for use as a hist trigger key. This results in multiple histogram entries for the 'same' string key. So, in the case of a string key, use strncpy instead of memcpy to avoid copying in the extra bytes. Before, using the gdbus entries in the following hist trigger as an example: # echo 'hist:key=comm' > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist ... { comm: ImgDecoder #4 } hitcount: 203 { comm: gmain } hitcount: 213 { comm: gmain } hitcount: 216 { comm: StreamTrans #73 } hitcount: 221 { comm: mozStorage #3 } hitcount: 230 { comm: gdbus } hitcount: 233 { comm: StyleThread#5 } hitcount: 253 { comm: gdbus } hitcount: 256 { comm: gdbus } hitcount: 260 { comm: StyleThread#4 } hitcount: 271 ... # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l 51 After: # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l 1 Link: http://lkml.kernel.org/r/50c35ae1267d64eee975b8125e151e600071d4dc.1549309756.git.tom.zanussi@linux.intel.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Fixes: 79e577cbce4c4 ("tracing: Support string type key properly") Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tracing: Fix event filters and triggers to handle negative numbersPavel Tikhomirov2019-03-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6a072128d262d2b98d31626906a96700d1fc11eb upstream. Then tracing syscall exit event it is extremely useful to filter exit codes equal to some negative value, to react only to required errors. But negative numbers does not work: [root@snorch sys_exit_read]# echo "ret == -1" > filter bash: echo: write error: Invalid argument [root@snorch sys_exit_read]# cat filter ret == -1 ^ parse_error: Invalid value (did you forget quotes)? Similar thing happens when setting triggers. These is a regression in v4.17 introduced by the commit mentioned below, testing without these commit shows no problem with negative numbers. Link: http://lkml.kernel.org/r/20180823102534.7642-1-ptikhomirov@virtuozzo.com Cc: stable@vger.kernel.org Fixes: 80765597bc58 ("tracing: Rewrite filter logic to be simpler and faster") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'trace-v5.0-rc4-3' of ↵Linus Torvalds2019-02-182-9/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two more tracing fixes - Have kprobes not use copy_from_user() to access kernel addresses, because kprobes can legitimately poke at bad kernel memory, which will fault. Copy from user code should never fault in kernel space. Using probe_mem_read() can handle kernel address space faulting. - Put back the entries counter in the tracing output that was accidentally removed" * tag 'trace-v5.0-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix number of entries in trace header kprobe: Do not use uaccess functions to access kernel memory that can fault
| * tracing: Fix number of entries in trace headerQuentin Perret2019-02-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit 441dae8f2f29 ("tracing: Add support for display of tgid in trace output") removed the call to print_event_info() from print_func_help_header_irq() which results in the ftrace header not reporting the number of entries written in the buffer. As this wasn't the original intent of the patch, re-introduce the call to print_event_info() to restore the orginal behaviour. Link: http://lkml.kernel.org/r/20190214152950.4179-1-quentin.perret@arm.com Acked-by: Joel Fernandes <joelaf@google.com> Cc: stable@vger.kernel.org Fixes: 441dae8f2f29 ("tracing: Add support for display of tgid in trace output") Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * kprobe: Do not use uaccess functions to access kernel memory that can faultChangbin Du2019-02-151-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The userspace can ask kprobe to intercept strings at any memory address, including invalid kernel address. In this case, fetch_store_strlen() would crash since it uses general usercopy function, and user access functions are no longer allowed to access kernel memory. For example, we can crash the kernel by doing something as below: $ sudo kprobe 'p:do_sys_open +0(+0(%si)):string' [ 103.620391] BUG: GPF in non-whitelisted uaccess (non-canonical address?) [ 103.622104] general protection fault: 0000 [#1] SMP PTI [ 103.623424] CPU: 10 PID: 1046 Comm: cat Not tainted 5.0.0-rc3-00130-gd73aba1-dirty #96 [ 103.625321] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-2-g628b2e6-dirty-20190104_103505-linux 04/01/2014 [ 103.628284] RIP: 0010:process_fetch_insn+0x1ab/0x4b0 [ 103.629518] Code: 10 83 80 28 2e 00 00 01 31 d2 31 ff 48 8b 74 24 28 eb 0c 81 fa ff 0f 00 00 7f 1c 85 c0 75 18 66 66 90 0f ae e8 48 63 ca 89 f8 <8a> 0c 31 66 66 90 83 c2 01 84 c9 75 dc 89 54 24 34 89 44 24 28 48 [ 103.634032] RSP: 0018:ffff88845eb37ce0 EFLAGS: 00010246 [ 103.635312] RAX: 0000000000000000 RBX: ffff888456c4e5a8 RCX: 0000000000000000 [ 103.637057] RDX: 0000000000000000 RSI: 2e646c2f6374652f RDI: 0000000000000000 [ 103.638795] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 103.640556] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 103.642297] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 103.644040] FS: 0000000000000000(0000) GS:ffff88846f000000(0000) knlGS:0000000000000000 [ 103.646019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 103.647436] CR2: 00007ffc79758038 CR3: 0000000463360006 CR4: 0000000000020ee0 [ 103.649147] Call Trace: [ 103.649781] ? sched_clock_cpu+0xc/0xa0 [ 103.650747] ? do_sys_open+0x5/0x220 [ 103.651635] kprobe_trace_func+0x303/0x380 [ 103.652645] ? do_sys_open+0x5/0x220 [ 103.653528] kprobe_dispatcher+0x45/0x50 [ 103.654682] ? do_sys_open+0x1/0x220 [ 103.655875] kprobe_ftrace_handler+0x90/0xf0 [ 103.657282] ftrace_ops_assist_func+0x54/0xf0 [ 103.658564] ? __call_rcu+0x1dc/0x280 [ 103.659482] 0xffffffffc00000bf [ 103.660384] ? __ia32_sys_open+0x20/0x20 [ 103.661682] ? do_sys_open+0x1/0x220 [ 103.662863] do_sys_open+0x5/0x220 [ 103.663988] do_syscall_64+0x60/0x210 [ 103.665201] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 103.666862] RIP: 0033:0x7fc22fadccdd [ 103.668034] Code: 48 89 54 24 e0 41 83 e2 40 75 32 89 f0 25 00 00 41 00 3d 00 00 41 00 74 24 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 33 f3 c3 66 0f 1f 84 00 00 00 00 00 48 8d 44 [ 103.674029] RSP: 002b:00007ffc7972c3a8 EFLAGS: 00000287 ORIG_RAX: 0000000000000101 [ 103.676512] RAX: ffffffffffffffda RBX: 0000562f86147a21 RCX: 00007fc22fadccdd [ 103.678853] RDX: 0000000000080000 RSI: 00007fc22fae1428 RDI: 00000000ffffff9c [ 103.681151] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000 [ 103.683489] R10: 0000000000000000 R11: 0000000000000287 R12: 00007fc22fce90a8 [ 103.685774] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [ 103.688056] Modules linked in: [ 103.689131] ---[ end trace 43792035c28984a1 ]--- This can be fixed by using probe_mem_read() instead, as it can handle faulting kernel memory addresses, which kprobes can legitimately do. Link: http://lkml.kernel.org/r/20190125151051.7381-1-changbin.du@gmail.com Cc: stable@vger.kernel.org Fixes: 9da3f2b7405 ("x86/fault: BUG() when uaccess helpers fault on kernel addresses") Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | Merge tag 'trace-v5.0-rc4' of ↵Linus Torvalds2019-02-131-2/+4
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "This fixes kprobes/uprobes dynamic processing of strings, where it processes the args but does not update the remaining length of the buffer that the string arguments will be placed in. It constantly passes in the total size of buffer used instead of passing in the remaining size of the buffer used. This could cause issues if the strings are larger than the max size of an event which could cause the strings to be written beyond what was reserved on the buffer" * tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: probeevent: Correctly update remaining space in dynamic area
| * tracing: probeevent: Correctly update remaining space in dynamic areaAndreas Ziegler2019-02-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9178412ddf5a ("tracing: probeevent: Return consumed bytes of dynamic area") improved the string fetching mechanism by returning the number of required bytes after copying the argument to the dynamic area. However, this return value is now only used to increment the pointer inside the dynamic area but misses updating the 'maxlen' variable which indicates the remaining space in the dynamic area. This means that fetch_store_string() always reads the *total* size of the dynamic area from the data_loc pointer instead of the *remaining* size (and passes it along to strncpy_from_{user,unsafe}) even if we're already about to copy data into the middle of the dynamic area. Link: http://lkml.kernel.org/r/20190206190013.16405-1-andreas.ziegler@fau.de Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 9178412ddf5a ("tracing: probeevent: Return consumed bytes of dynamic area") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2019-02-081-12/+2
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "This pull request is dedicated to the upcoming snowpocalypse parts 2 and 3 in the Pacific Northwest: 1) Drop profiles are broken because some drivers use dev_kfree_skb* instead of dev_consume_skb*, from Yang Wei. 2) Fix IWLWIFI kconfig deps, from Luca Coelho. 3) Fix percpu maps updating in bpftool, from Paolo Abeni. 4) Missing station release in batman-adv, from Felix Fietkau. 5) Fix some networking compat ioctl bugs, from Johannes Berg. 6) ucc_geth must reset the BQL queue state when stopping the device, from Mathias Thore. 7) Several XDP bug fixes in virtio_net from Toshiaki Makita. 8) TSO packets must be sent always on queue 0 in stmmac, from Jose Abreu. 9) Fix socket refcounting bug in RDS, from Eric Dumazet. 10) Handle sparse cpu allocations in bpf selftests, from Martynas Pumputis. 11) Make sure mgmt frames have enough tailroom in mac80211, from Felix Feitkau. 12) Use safe list walking in sctp_sendmsg() asoc list traversal, from Greg Kroah-Hartman. 13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL ccid, from Eric Dumazet. 14) Need to reload WoL password into bcmsysport device after deep sleeps, from Florian Fainelli. 15) Remove filter from mask before freeing in cls_flower, from Petr Machata. 16) Missing release and use after free in error paths of s390 qeth code, from Julian Wiedmann. 17) Fix lockdep false positive in dsa code, from Marc Zyngier. 18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn. 19) Fix EQ firmware assert in qed driver, from Manish Chopra. 20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net: dsa: b53: Fix for failure when irq is not defined in dt sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() geneve: should not call rt6_lookup() when ipv6 was disabled net: Don't default Cavium PTP driver to 'y' net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net/mlx5e: Don't overwrite pedit action when multiple pedit used net/mlx5e: Update hw flows when encap source mac changed qed*: Advance drivers version to 8.37.0.20 qed: Change verbosity for coalescing message. qede: Fix system crash on configuring channels. qed: Consider TX tcs while deriving the max num_queues for PF. ...
| * bpf: fix potential deadlock in bpf_prog_registerAlexei Starovoitov2019-01-311-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex: [ 13.007000] WARNING: possible circular locking dependency detected [ 13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted [ 13.008124] ------------------------------------------------------ [ 13.008624] test_progs/246 is trying to acquire lock: [ 13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300 [ 13.009770] [ 13.009770] but task is already holding lock: [ 13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60 [ 13.010877] [ 13.010877] which lock already depends on the new lock. [ 13.010877] [ 13.011532] [ 13.011532] the existing dependency chain (in reverse order) is: [ 13.012129] [ 13.012129] -> #4 (bpf_event_mutex){+.+.}: [ 13.012582] perf_event_query_prog_array+0x9b/0x130 [ 13.013016] _perf_ioctl+0x3aa/0x830 [ 13.013354] perf_ioctl+0x2e/0x50 [ 13.013668] do_vfs_ioctl+0x8f/0x6a0 [ 13.014003] ksys_ioctl+0x70/0x80 [ 13.014320] __x64_sys_ioctl+0x16/0x20 [ 13.014668] do_syscall_64+0x4a/0x180 [ 13.015007] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 13.015469] [ 13.015469] -> #3 (&cpuctx_mutex){+.+.}: [ 13.015910] perf_event_init_cpu+0x5a/0x90 [ 13.016291] perf_event_init+0x1b2/0x1de [ 13.016654] start_kernel+0x2b8/0x42a [ 13.016995] secondary_startup_64+0xa4/0xb0 [ 13.017382] [ 13.017382] -> #2 (pmus_lock){+.+.}: [ 13.017794] perf_event_init_cpu+0x21/0x90 [ 13.018172] cpuhp_invoke_callback+0xb3/0x960 [ 13.018573] _cpu_up+0xa7/0x140 [ 13.018871] do_cpu_up+0xa4/0xc0 [ 13.019178] smp_init+0xcd/0xd2 [ 13.019483] kernel_init_freeable+0x123/0x24f [ 13.019878] kernel_init+0xa/0x110 [ 13.020201] ret_from_fork+0x24/0x30 [ 13.020541] [ 13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}: [ 13.021051] static_key_slow_inc+0xe/0x20 [ 13.021424] tracepoint_probe_register_prio+0x28c/0x300 [ 13.021891] perf_trace_event_init+0x11f/0x250 [ 13.022297] perf_trace_init+0x6b/0xa0 [ 13.022644] perf_tp_event_init+0x25/0x40 [ 13.023011] perf_try_init_event+0x6b/0x90 [ 13.023386] perf_event_alloc+0x9a8/0xc40 [ 13.023754] __do_sys_perf_event_open+0x1dd/0xd30 [ 13.024173] do_syscall_64+0x4a/0x180 [ 13.024519] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 13.024968] [ 13.024968] -> #0 (tracepoints_mutex){+.+.}: [ 13.025434] __mutex_lock+0x86/0x970 [ 13.025764] tracepoint_probe_register_prio+0x2d/0x300 [ 13.026215] bpf_probe_register+0x40/0x60 [ 13.026584] bpf_raw_tracepoint_open.isra.34+0xa4/0x130 [ 13.027042] __do_sys_bpf+0x94f/0x1a90 [ 13.027389] do_syscall_64+0x4a/0x180 [ 13.027727] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 13.028171] [ 13.028171] other info that might help us debug this: [ 13.028171] [ 13.028807] Chain exists of: [ 13.028807] tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex [ 13.028807] [ 13.029666] Possible unsafe locking scenario: [ 13.029666] [ 13.030140] CPU0 CPU1 [ 13.030510] ---- ---- [ 13.030875] lock(bpf_event_mutex); [ 13.031166] lock(&cpuctx_mutex); [ 13.031645] lock(bpf_event_mutex); [ 13.032135] lock(tracepoints_mutex); [ 13.032441] [ 13.032441] *** DEADLOCK *** [ 13.032441] [ 13.032911] 1 lock held by test_progs/246: [ 13.033239] #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60 [ 13.033909] [ 13.033909] stack backtrace: [ 13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477 [ 13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 [ 13.035657] Call Trace: [ 13.035859] dump_stack+0x5f/0x8b [ 13.036130] print_circular_bug.isra.37+0x1ce/0x1db [ 13.036526] __lock_acquire+0x1158/0x1350 [ 13.036852] ? lock_acquire+0x98/0x190 [ 13.037154] lock_acquire+0x98/0x190 [ 13.037447] ? tracepoint_probe_register_prio+0x2d/0x300 [ 13.037876] __mutex_lock+0x86/0x970 [ 13.038167] ? tracepoint_probe_register_prio+0x2d/0x300 [ 13.038600] ? tracepoint_probe_register_prio+0x2d/0x300 [ 13.039028] ? __mutex_lock+0x86/0x970 [ 13.039337] ? __mutex_lock+0x24a/0x970 [ 13.039649] ? bpf_probe_register+0x1d/0x60 [ 13.039992] ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10 [ 13.040478] ? tracepoint_probe_register_prio+0x2d/0x300 [ 13.040906] tracepoint_probe_register_prio+0x2d/0x300 [ 13.041325] bpf_probe_register+0x40/0x60 [ 13.041649] bpf_raw_tracepoint_open.isra.34+0xa4/0x130 [ 13.042068] ? __might_fault+0x3e/0x90 [ 13.042374] __do_sys_bpf+0x94f/0x1a90 [ 13.042678] do_syscall_64+0x4a/0x180 [ 13.042975] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 13.043382] RIP: 0033:0x7f23b10a07f9 [ 13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141 [ 13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9 [ 13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011 [ 13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10 [ 13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000 Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister() there is no need to take bpf_event_mutex too. bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs. bpf_raw_tracepoints don't need to take this mutex. Fixes: c4f6699dfcb8 ("bpf: introduce BPF_RAW_TRACEPOINT") Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | tracing/uprobes: Fix output for multiple string argumentsAndreas Ziegler2019-01-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When printing multiple uprobe arguments as strings the output for the earlier arguments would also include all later string arguments. This is best explained in an example: Consider adding a uprobe to a function receiving two strings as parameters which is at offset 0xa0 in strlib.so and we want to print both parameters when the uprobe is hit (on x86_64): $ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \ /sys/kernel/debug/tracing/uprobe_events When the function is called as func("foo", "bar") and we hit the probe, the trace file shows a line like the following: [...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar" Note the extra "bar" printed as part of arg1. This behaviour stacks up for additional string arguments. The strings are stored in a dynamically growing part of the uprobe buffer by fetch_store_string() after copying them from userspace via strncpy_from_user(). The return value of strncpy_from_user() is then directly used as the required size for the string. However, this does not take the terminating null byte into account as the documentation for strncpy_from_user() cleary states that it "[...] returns the length of the string (not including the trailing NUL)" even though the null byte will be copied to the destination. Therefore, subsequent calls to fetch_store_string() will overwrite the terminating null byte of the most recently fetched string with the first character of the current string, leading to the "accumulation" of strings in earlier arguments in the output. Fix this by incrementing the return value of strncpy_from_user() by one if we did not hit the maximum buffer size. Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 5baaa59ef09e ("tracing/probes: Implement 'memory' fetch method for uprobes") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | tracing: uprobes: Fix typo in pr_fmt stringAndreas Ziegler2019-01-171-1/+1
|/ | | | | | | | | | | | | | | The subsystem-specific message prefix for uprobes was also "trace_kprobe: " instead of "trace_uprobe: " as described in the original commit message. Link: http://lkml.kernel.org/r/20190117133023.19292-1-andreas.ziegler@fau.de Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 7257634135c24 ("tracing/probe: Show subsystem name in messages") Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* tracing/kprobes: Fix NULL pointer dereference in trace_kprobe_create()Andrea Righi2019-01-151-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to trigger a NULL pointer dereference by writing an incorrectly formatted string to krpobe_events (trying to create a kretprobe omitting the symbol). Example: echo "r:event_1 " >> /sys/kernel/debug/tracing/kprobe_events That triggers this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 6 PID: 1757 Comm: bash Not tainted 5.0.0-rc1+ #125 Hardware name: Dell Inc. XPS 13 9370/0F6P3V, BIOS 1.5.1 08/09/2018 RIP: 0010:kstrtoull+0x2/0x20 Code: 28 00 00 00 75 17 48 83 c4 18 5b 41 5c 5d c3 b8 ea ff ff ff eb e1 b8 de ff ff ff eb da e8 d6 36 bb ff 66 0f 1f 44 00 00 31 c0 <80> 3f 2b 55 48 89 e5 0f 94 c0 48 01 c7 e8 5c ff ff ff 5d c3 66 2e RSP: 0018:ffffb5d482e57cb8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff82b12720 RDX: ffffb5d482e57cf8 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb5d482e57d70 R08: ffffa0c05e5a7080 R09: ffffa0c05e003980 R10: 0000000000000000 R11: 0000000040000000 R12: ffffa0c04fe87b08 R13: 0000000000000001 R14: 000000000000000b R15: ffffa0c058d749e1 FS: 00007f137c7f7740(0000) GS:ffffa0c05e580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000497d46004 CR4: 00000000003606e0 Call Trace: ? trace_kprobe_create+0xb6/0x840 ? _cond_resched+0x19/0x40 ? _cond_resched+0x19/0x40 ? __kmalloc+0x62/0x210 ? argv_split+0x8f/0x140 ? trace_kprobe_create+0x840/0x840 ? trace_kprobe_create+0x840/0x840 create_or_delete_trace_kprobe+0x11/0x30 trace_run_command+0x50/0x90 trace_parse_run_command+0xc1/0x160 probes_write+0x10/0x20 __vfs_write+0x3a/0x1b0 ? apparmor_file_permission+0x1a/0x20 ? security_file_permission+0x31/0xf0 ? _cond_resched+0x19/0x40 vfs_write+0xb1/0x1a0 ksys_write+0x55/0xc0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x5a/0x120 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix by doing the proper argument checks in trace_kprobe_create(). Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/lkml/20190111095108.b79a2ee026185cbd62365977@kernel.org Link: http://lkml.kernel.org/r/20190111060113.GA22841@xps-13 Fixes: 6212dd29683e ("tracing/kprobes: Use dyn_event framework for kprobe events") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* Remove 'type' argument from access_ok() functionLinus Torvalds2019-01-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'trace-v4.21' of ↵Linus Torvalds2018-12-3122-1562/+2199
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Rework of the kprobe/uprobe and synthetic events to consolidate all the dynamic event code. This will make changes in the future easier. - Partial rewrite of the function graph tracing infrastructure. This will allow for multiple users of hooking onto functions to get the callback (return) of the function. This is the ground work for having kprobes and function graph tracer using one code base. - Clean up of the histogram code that will facilitate adding more features to the histograms in the future. - Addition of str_has_prefix() and a few use cases. There currently is a similar function strstart() that is used in a few places, but only returns a bool and not a length. These instances will be removed in the future to use str_has_prefix() instead. - A few other various clean ups as well. * tag 'trace-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits) tracing: Use the return of str_has_prefix() to remove open coded numbers tracing: Have the historgram use the result of str_has_prefix() for len of prefix tracing: Use str_has_prefix() instead of using fixed sizes tracing: Use str_has_prefix() helper for histogram code string.h: Add str_has_prefix() helper function tracing: Make function ‘ftrace_exports’ static tracing: Simplify printf'ing in seq_print_sym tracing: Avoid -Wformat-nonliteral warning tracing: Merge seq_print_sym_short() and seq_print_sym_offset() tracing: Add hist trigger comments for variable-related fields tracing: Remove hist trigger synth_var_refs tracing: Use hist trigger's var_ref array to destroy var_refs tracing: Remove open-coding of hist trigger var_ref management tracing: Use var_refs[] for hist trigger reference checking tracing: Change strlen to sizeof for hist trigger static strings tracing: Remove unnecessary hist trigger struct field tracing: Fix ftrace_graph_get_ret_stack() to use task and not current seq_buf: Use size_t for len in seq_buf_puts() seq_buf: Make seq_buf_puts() null-terminate the buffer arm64: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack ...
| * tracing: Use the return of str_has_prefix() to remove open coded numbersSteven Rostedt (VMware)2018-12-223-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several locations that compare constants to the beginning of string variables to determine what commands should be done, then the constant length is used to index into the string. This is error prone as the hard coded numbers have to match the size of the constants. Instead, use the len returned from str_has_prefix() and remove the open coded string length sizes. Cc: Joe Perches <joe@perches.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> (for trace_probe part) Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Have the historgram use the result of str_has_prefix() for len of ↵Steven Rostedt (VMware)2018-12-221-4/+5
| | | | | | | | | | | | | | | | | | | | | | prefix As str_has_prefix() returns the length on match, we can use that for the updating of the string pointer instead of recalculating the prefix size. Cc: Tom Zanussi <zanussi@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Use str_has_prefix() instead of using fixed sizesSteven Rostedt (VMware)2018-12-225-6/+6
| | | | | | | | | | | | | | | | | | There are several instances of strncmp(str, "const", 123), where 123 is the strlen of the const string to check if "const" is the prefix of str. But this can be error prone. Use str_has_prefix() instead. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Use str_has_prefix() helper for histogram codeSteven Rostedt (VMware)2018-12-221-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The tracing histogram code contains a lot of instances of the construct: strncmp(str, "const", sizeof("const") - 1) This can be prone to bugs due to typos or bad cut and paste. Use the str_has_prefix() helper macro instead that removes the need for having two copies of the constant string. Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Make function ‘ftrace_exports’ staticMathieu Malaterre2018-12-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 478409dd683d ("tracing: Add hook to function tracing for other subsystems to use"), a new function ‘ftrace_exports’ was added. Since this function can be made static, make it so. Silence the following warning triggered using W=1: kernel/trace/trace.c:2451:6: warning: no previous prototype for ‘ftrace_exports’ [-Wmissing-prototypes] Link: http://lkml.kernel.org/r/20180516193012.25390-1-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Simplify printf'ing in seq_print_symRasmus Villemoes2018-12-221-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | trace_seq_printf(..., "%s", ...) can be done with trace_seq_puts() instead, avoiding printf overhead. In the second instance, the string we're copying was just created from an snprintf() to a stack buffer, so we might as well do that printf directly. This naturally leads to moving the declaration of the str buffer inside the CONFIG_KALLSYMS guard, which in turn will make gcc inline the function for !CONFIG_KALLSYMS (it only has a single caller, but the huge stack frame seems to make gcc not inline it for CONFIG_KALLSYMS). Link: http://lkml.kernel.org/r/20181029223542.26175-4-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Avoid -Wformat-nonliteral warningRasmus Villemoes2018-12-221-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building with -Wformat-nonliteral, gcc complains kernel/trace/trace_output.c: In function ‘seq_print_sym’: kernel/trace/trace_output.c:356:3: warning: format not a string literal, argument types not checked [-Wformat-nonliteral] trace_seq_printf(s, fmt, name); But seq_print_sym only has a single caller which passes "%s" as fmt, so we might as well just use that directly. That also paves the way for further cleanups that will actually make that format string go away entirely. Link: http://lkml.kernel.org/r/20181029223542.26175-3-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Merge seq_print_sym_short() and seq_print_sym_offset()Rasmus Villemoes2018-12-221-27/+7
| | | | | | | | | | | | | | | | | | | | These two functions are nearly identical, so we can avoid some code duplication by moving the conditional into a common implementation. Link: http://lkml.kernel.org/r/20181029223542.26175-2-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Add hist trigger comments for variable-related fieldsTom Zanussi2018-12-221-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | Add a few comments to help clarify how variable and variable reference fields are used in the code. Link: http://lkml.kernel.org/r/ea857ce948531d7bec712bbb0f38360aa1d378ec.1545161087.git.tom.zanussi@linux.intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Remove hist trigger synth_var_refsTom Zanussi2018-12-221-18/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | All var_refs are now handled uniformly and there's no reason to treat the synth_refs in a special way now, so remove them and associated functions. Link: http://lkml.kernel.org/r/b4d3470526b8f0426dcec125399dad9ad9b8589d.1545161087.git.tom.zanussi@linux.intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Use hist trigger's var_ref array to destroy var_refsTom Zanussi2018-12-221-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since every var ref for a trigger has an entry in the var_ref[] array, use that to destroy the var_refs, instead of piecemeal via the field expressions. This allows us to avoid having to keep and treat differently separate lists for the action-related references, which future patches will remove. Link: http://lkml.kernel.org/r/fad1a164f0e257c158e70d6eadbf6c586e04b2a2.1545161087.git.tom.zanussi@linux.intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Remove open-coding of hist trigger var_ref managementTom Zanussi2018-12-221-18/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Have create_var_ref() manage the hist trigger's var_ref list, rather than having similar code doing it in multiple places. This cleans up the code and makes sure var_refs are always accounted properly. Also, document the var_ref-related functions to make what their purpose clearer. Link: http://lkml.kernel.org/r/05ddae93ff514e66fc03897d6665231892939913.1545161087.git.tom.zanussi@linux.intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Use var_refs[] for hist trigger reference checkingTom Zanussi2018-12-221-57/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since all the variable reference hist_fields are collected into hist_data->var_refs[] array, there's no need to go through all the fields looking for them, or in separate arrays like synth_var_refs[], which will be going away soon anyway. This also allows us to get rid of some unnecessary code and functions currently used for the same purpose. Link: http://lkml.kernel.org/r/1545246556.4239.7.camel@gmail.com Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Change strlen to sizeof for hist trigger static stringsTom Zanussi2018-12-221-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | There's no need to use strlen() for static strings when the length is already known, so update trace_events_hist.c with sizeof() for those cases. Link: http://lkml.kernel.org/r/e3e754f2bd18e56eaa8baf79bee619316ebf4cfc.1545161087.git.tom.zanussi@linux.intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Remove unnecessary hist trigger struct fieldTom Zanussi2018-12-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | hist_field.var_idx is completely unused, so remove it. Link: http://lkml.kernel.org/r/d4e066c0f509f5f13ad3babc8c33ca6e7ddc439a.1545161087.git.tom.zanussi@linux.intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Fix ftrace_graph_get_ret_stack() to use task and not currentSteven Rostedt (VMware)2018-12-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The function ftrace_graph_get_ret_stack() takes a task struct descriptor but uses current as the task to perform the operations on. In pretty much all cases the task decriptor is the same as current, so this wasn't an issue. But there is a case in the ARM architecture that passes in a task that is not current, and expects a result from that task, and this code breaks it. Fixes: 51584396cff5 ("arm64: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack") Reported-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * fgraph: Add comment to describe ftrace_graph_get_ret_stackSteven Rostedt (VMware)2018-12-101-0/+11
| | | | | | | | | | | | | | | | | | | | | | The ret_stack should not be accessed directly via the curr_ret_stack variable on the task_struct. This is because the ret_stack is going to be converted into a series of longs and not an array of ret_stack structures. The way that a ret_stack should be retrieved is via the ftrace_graph_get_ret_stack structure, but it needs to be documented on how to use it. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * ftrace: Allow ftrace_replace_code() to be schedulableSteven Rostedt (VMware)2018-12-101-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function ftrace_replace_code() is the ftrace engine that does the work to modify all the nops into the calls to the function callback in all the functions being traced. The generic version which is normally called from stop machine, but an architecture can implement a non stop machine version and still use the generic ftrace_replace_code(). When an architecture does this, ftrace_replace_code() may be called from a schedulable context, where it can allow the code to be preemptible, and schedule out. In order to allow an architecture to make ftrace_replace_code() schedulable, a new command flag is added called: FTRACE_MAY_SLEEP Which can be or'd to the command that is passed to ftrace_modify_all_code() that calls ftrace_replace_code() and will have it call cond_resched() in the loop that modifies the nops into the calls to the ftrace trampolines. Link: http://lkml.kernel.org/r/20181204192903.8193-1-anders.roxell@linaro.org Link: http://lkml.kernel.org/r/20181205183303.828422192@goodmis.org Reported-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Add generic event-name based remove event methodMasami Hiramatsu2018-12-101-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a generic method to remove event from dynamic event list. This is same as other system under ftrace. You just need to pass the event name with '!', e.g. # echo p:new_grp/new_event _do_fork > dynamic_events This creates an event, and # echo '!p:new_grp/new_event _do_fork' > dynamic_events Or, # echo '!p:new_grp/new_event' > dynamic_events will remove new_grp/new_event event. Note that this doesn't check the event prefix (e.g. "p:") strictly, because the "group/event" name must be unique. Link: http://lkml.kernel.org/r/154140869774.17322.8887303560398645347.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Consolidate trace_add/remove_event_call back to the nolock functionsSteven Rostedt (VMware)2018-12-104-33/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trace_add/remove_event_call_nolock() functions were added to allow the tace_add/remove_event_call() code be called when the event_mutex lock was already taken. Now that all callers are done within the event_mutex, there's no reason to have two different interfaces. Remove the current wrapper trace_add/remove_event_call()s and rename the _nolock versions back to the original names. Link: http://lkml.kernel.org/r/154140866955.17322.2081425494660638846.stgit@devbox Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Remove unneeded synth_event_mutexMasami Hiramatsu2018-12-081-23/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rmove unneeded synth_event_mutex. This mutex protects the reference count in synth_event, however, those operational points are already protected by event_mutex. 1. In __create_synth_event() and create_or_delete_synth_event(), those synth_event_mutex clearly obtained right after event_mutex. 2. event_hist_trigger_func() is trigger_hist_cmd.func() which is called by trigger_process_regex(), which is a part of event_trigger_regex_write() and this function takes event_mutex. 3. hist_unreg_all() is trigger_hist_cmd.unreg_all() which is called by event_trigger_regex_open() and it takes event_mutex. 4. onmatch_destroy() and onmatch_create() have long call tree, but both are finally invoked from event_trigger_regex_write() and event_trace_del_tracer(), former takes event_mutex, and latter ensures called under event_mutex locked. Finally, I ensured there is no resource conflict. For safety, I added lockdep_assert_held(&event_mutex) for each function. Link: http://lkml.kernel.org/r/154140864134.17322.4796059721306031894.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Use dyn_event framework for synthetic eventsMasami Hiramatsu2018-12-083-98/+176
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dyn_event framework for synthetic events. This shows synthetic events on "tracing/dynamic_events" file in addition to tracing/synthetic_events interface. User can also define new events via tracing/dynamic_events with "s:" prefix. So, the new syntax is below; s:[synthetic/]EVENT_NAME TYPE ARG; [TYPE ARG;]... To remove events via tracing/dynamic_events, you can use "-:" prefix as same as other events. Link: http://lkml.kernel.org/r/154140861301.17322.15454611233735614508.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/uprobes: Use dyn_event framework for uprobe eventsMasami Hiramatsu2018-12-082-130/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | Use dyn_event framework for uprobe events. This shows uprobe events on "dynamic_events" file. User can also define new uprobe events via dynamic_events. Link: http://lkml.kernel.org/r/154140858481.17322.9091293846515154065.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/kprobes: Use dyn_event framework for kprobe eventsMasami Hiramatsu2018-12-084-145/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dyn_event framework for kprobe events. This shows kprobe events on "tracing/dynamic_events" file. User can also define new events via tracing/dynamic_events. Link: http://lkml.kernel.org/r/154140855646.17322.6619219995865980392.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Add unified dynamic event frameworkMasami Hiramatsu2018-12-085-0/+337
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add unified dynamic event framework for ftrace kprobes, uprobes and synthetic events. Those dynamic events can be co-exist on same file because those syntax doesn't overlap. This introduces a framework part which provides a unified tracefs interface and operations. Link: http://lkml.kernel.org/r/154140852824.17322.12250362185969352095.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Integrate similar probe argument parsersMasami Hiramatsu2018-12-084-96/+50
| | | | | | | | | | | | | | | | | | | | | | | | Integrate similar argument parsers for kprobes and uprobes events into traceprobe_parse_probe_arg(). Link: http://lkml.kernel.org/r/154140850016.17322.9836787731210512176.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Simplify creation and deletion of synthetic eventsMasami Hiramatsu2018-12-081-35/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the event_mutex and synth_event_mutex ordering issue is gone, we can skip existing event check when adding or deleting events, and some redundant code in error path. This changes release_all_synth_events() to abort the process when it hits any error and returns the error code. It succeeds only if it has no error. Link: http://lkml.kernel.org/r/154140847194.17322.17960275728005067803.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Lock event_mutex before synth_event_mutexMasami Hiramatsu2018-12-082-20/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | synthetic event is using synth_event_mutex for protecting synth_event_list, and event_trigger_write() path acquires locks as below order. event_trigger_write(event_mutex) ->trigger_process_regex(trigger_cmd_mutex) ->event_hist_trigger_func(synth_event_mutex) On the other hand, synthetic event creation and deletion paths call trace_add_event_call() and trace_remove_event_call() which acquires event_mutex. In that case, if we keep the synth_event_mutex locked while registering/unregistering synthetic events, its dependency will be inversed. To avoid this issue, current synthetic event is using a 2 phase process to create/delete events. For example, it searches existing events under synth_event_mutex to check for event-name conflicts, and unlocks synth_event_mutex, then registers a new event under event_mutex locked. Finally, it locks synth_event_mutex and tries to add the new event to the list. But it can introduce complexity and a chance for name conflicts. To solve this simpler, this introduces trace_add_event_call_nolock() and trace_remove_event_call_nolock() which don't acquire event_mutex inside. synthetic event can lock event_mutex before synth_event_mutex to solve the lock dependency issue simpler. Link: http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing/uprobes: Add busy check when cleanup all uprobesMasami Hiramatsu2018-12-081-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a busy check loop in cleanup_all_probes() before trying to remove all events in uprobe_events, the same way that kprobe_events does. Without this change, writing null to uprobe_events will try to remove events but if one of them is enabled, it will stop there leaving some events cleared and others not clceared. With this change, writing null to uprobe_events makes sure all events are not enabled before removing events. So, it clears all events, or returns an error (-EBUSY) with keeping all events. Link: http://lkml.kernel.org/r/154140841557.17322.12653952888762532401.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Change default buffer_percent to 50Steven Rostedt (VMware)2018-12-081-1/+1
| | | | | | | | | | | | | | | | | | After running several tests, it appears that having the reader wait till half the buffer is full before starting to read (and causing its own events to fill up the ring buffer constantly), works well. It keeps trace-cmd (the main user of this interface) from dominating the traces it records. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * tracing: Add tracefs file buffer_percentageSteven Rostedt (VMware)2018-12-083-17/+77
| | | | | | | | | | | | | | | | Add a "buffer_percentage" file, that allows users to specify how much of the buffer (percentage of pages) need to be filled before waking up a task blocked on a per cpu trace_pipe_raw file. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * ring-buffer: Add percentage of ring buffer full to wake up readerSteven Rostedt (VMware)2018-12-082-9/+70
| | | | | | | | | | | | | | | | | | | | Instead of just waiting for a page to be full before waking up a pending reader, allow the reader to pass in a "percentage" of pages that have content before waking up a reader. This should help keep the process of reading the events not cause wake ups that constantly cause reading of the buffer. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
OpenPOWER on IntegriCloud