| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The config option for TRACING_MAP has "default n", which is not needed
because the default of configs is 'n'.
Also, since the TRACING_MAP has no config prompt, there's no reason to
include "If in doubt, say N" in the help text.
Fixed a typo in the comments of tracing_map.h.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add tracing_map, a special-purpose lock-free map for tracing.
tracing_map is designed to aggregate or 'sum' one or more values
associated with a specific object of type tracing_map_elt, which
is associated by the map to a given key.
It provides various hooks allowing per-tracer customization and is
separated out into a separate file in order to allow it to be shared
between multiple tracers, but isn't meant to be generally used outside
of that context.
The tracing_map implementation was inspired by lock-free map
algorithms originated by Dr. Cliff Click:
http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
Link: http://lkml.kernel.org/r/b43d68d1add33582a396f553c8ef705a33a6a748.1449767187.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add the infrastructure needed to have the PIDs in set_event_pid to
automatically add PIDs of the children of the tasks that have their PIDs in
set_event_pid. This will also remove PIDs from set_event_pid when a task
exits
This is implemented by adding hooks into the fork and exit tracepoints. On
fork, the PIDs are added to the list, and on exit, they are removed.
Add a new option called event_fork that when set, PIDs in set_event_pid will
automatically get their children PIDs added when they fork, as well as any
task that exits will have its PID removed from set_event_pid.
This works for instances as well.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In order to add the ability to let tasks that are filtered by the events
have their children also be traced on fork (and then not traced on exit),
convert the array into a pid bitmask. Most of the time the number of pids is
only 32768 pids or a 4k bitmask, which is the same size as the default list
currently is, and that list could grow if more pids are listed.
This also greatly simplifies the code.
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The name "check_ignore_pid" is confusing in trying to figure out if the pid
should be ignored or not. Rename it to "ignore_this_task" which is pretty
straight forward, as a task (not a pid) is passed in, and should if true
should be ignored.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| |\ \ \ \
| |_|_|/
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina:
- remove of our own implementation of architecture-specific relocation
code and leveraging existing code in the module loader to perform
arch-dependent work, from Jessica Yu.
The relevant patches have been acked by Rusty (for module.c) and
Heiko (for s390).
- live patching support for ppc64le, which is a joint work of Michael
Ellerman and Torsten Duwe. This is coming from topic branch that is
share between livepatching.git and ppc tree.
- addition of livepatching documentation from Petr Mladek
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: make object/func-walking helpers more robust
livepatch: Add some basic livepatch documentation
powerpc/livepatch: Add live patching support on ppc64le
powerpc/livepatch: Add livepatch stack to struct thread_info
powerpc/livepatch: Add livepatch header
livepatch: Allow architectures to specify an alternate ftrace location
ftrace: Make ftrace_location_range() global
livepatch: robustify klp_register_patch() API error checking
Documentation: livepatch: outline Elf format and requirements for patch modules
livepatch: reuse module loader code to write relocations
module: s390: keep mod_arch_specific for livepatch modules
module: preserve Elf information for livepatch modules
Elf: add livepatch-specific Elf constants
|
| | |\ \ \
| | | |/
| | |/|
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into for-4.7/livepatching-ppc64le
Pull livepatching support for ppc64 architecture from Michael Ellerman.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In order to support live patching on powerpc we would like to call
ftrace_location_range(), so make it global.
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| |\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull networking updates from David Miller:
"Highlights:
1) Support SPI based w5100 devices, from Akinobu Mita.
2) Partial Segmentation Offload, from Alexander Duyck.
3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.
4) Allow cls_flower stats offload, from Amir Vadai.
5) Implement bpf blinding, from Daniel Borkmann.
6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
actually using FASYNC these atomics are superfluous. From Eric
Dumazet.
7) Run TCP more preemptibly, also from Eric Dumazet.
8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
driver, from Gal Pressman.
9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.
10) Improve BPF usage documentation, from Jesper Dangaard Brouer.
11) Support tunneling offloads in qed, from Manish Chopra.
12) aRFS offloading in mlx5e, from Maor Gottlieb.
13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
Leitner.
14) Add MSG_EOR support to TCP, this allows controlling packet
coalescing on application record boundaries for more accurate
socket timestamp sampling. From Martin KaFai Lau.
15) Fix alignment of 64-bit netlink attributes across the board, from
Nicolas Dichtel.
16) Per-vlan stats in bridging, from Nikolay Aleksandrov.
17) Several conversions of drivers to ethtool ksettings, from Philippe
Reynes.
18) Checksum neutral ILA in ipv6, from Tom Herbert.
19) Factorize all of the various marvell dsa drivers into one, from
Vivien Didelot
20) Add VF support to qed driver, from Yuval Mintz"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
Revert "phy dp83867: Make rgmii parameters optional"
r8169: default to 64-bit DMA on recent PCIe chips
phy dp83867: Make rgmii parameters optional
phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
bpf: arm64: remove callee-save registers use for tmp registers
asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
switchdev: pass pointer to fib_info instead of copy
net_sched: close another race condition in tcf_mirred_release()
tipc: fix nametable publication field in nl compat
drivers: net: Don't print unpopulated net_device name
qed: add support for dcbx.
ravb: Add missing free_irq() calls to ravb_close()
qed: Remove a stray tab
net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fec-mpc52xx: use phydev from struct net_device
bpf, doc: fix typo on bpf_asm descriptions
stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fs-enet: use phydev from struct net_device
...
|
| | |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In netdevice.h we removed the structure in net-next that is being
changes in 'net'. In macsec.c and rtnetlink.c we have overlaps
between fixes in 'net' and the u64 attribute changes in 'net-next'.
The mlx5 conflicts have to do with vxlan support dependencies.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This patch adds a new helper for cls/act programs that can push events
to user space applications. For networking, this can be f.e. for sampling,
debugging, logging purposes or pushing of arbitrary wake-up events. The
idea is similar to a43eec304259 ("bpf: introduce bpf_perf_event_output()
helper") and 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example").
The eBPF program utilizes a perf event array map that user space populates
with fds from perf_event_open(), the eBPF program calls into the helper
f.e. as skb_event_output(skb, &my_map, BPF_F_CURRENT_CPU, raw, sizeof(raw))
so that the raw data is pushed into the fd f.e. at the map index of the
current CPU.
User space can poll/mmap/etc on this and has a data channel for receiving
events that can be post-processed. The nice thing is that since the eBPF
program and user space application making use of it are tightly coupled,
they can define their own arbitrary raw data format and what/when they
want to push.
While f.e. packet headers could be one part of the meta data that is being
pushed, this is not a substitute for things like packet sockets as whole
packet is not being pushed and push is only done in a single direction.
Intention is more of a generically usable, efficient event pipe to applications.
Workflow is that tc can pin the map and applications can attach themselves
e.g. after cls/act setup to one or multiple map slots, demuxing is done by
the eBPF program.
Adding this facility is with minimal effort, it reuses the helper
introduced in a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
and we get its functionality for free by overloading its BPF_FUNC_ identifier
for cls/act programs, ctx is currently unused, but will be made use of in
future. Example will be added to iproute2's BPF example files.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Add a BPF_F_CURRENT_CPU flag to optimize the use-case where user space has
per-CPU ring buffers and the eBPF program pushes the data into the current
CPU's ring buffer which saves us an extra helper function call in eBPF.
Also, make sure to properly reserve the remaining flags which are not used.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Two new functions in bpf contain a cast from a 'u64' to a
pointer. This works on 64-bit architectures but causes a warning
on all 32-bit architectures:
kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output_tp':
kernel/trace/bpf_trace.c:350:13: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
u64 ctx = *(long *)r1;
This changes the cast to first convert the u64 argument into a uintptr_t,
which is guaranteed to be the same size as a pointer.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 9940d67c93b5 ("bpf: support bpf_get_stackid() and bpf_perf_event_output() in tracepoint programs")
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This patch converts all helpers that can use ARG_PTR_TO_RAW_STACK as argument
type. For tc programs this is bpf_skb_load_bytes(), bpf_skb_get_tunnel_key(),
bpf_skb_get_tunnel_opt(). For tracing, this optimizes bpf_get_current_comm()
and bpf_probe_read(). The check in bpf_skb_load_bytes() for MAX_BPF_STACK can
also be removed since the verifier already makes sure we stay within bounds
on stack buffers.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
during bpf program loading remember the last byte of ctx access
and at the time of attaching the program to tracepoint check that
the program doesn't access bytes beyond defined in tracepoint fields
This also disallows access to __dynamic_array fields, but can be
relaxed in the future.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
programs
needs two wrapper functions to fetch 'struct pt_regs *' to convert
tracepoint bpf context into kprobe bpf context to reuse existing
helper functions
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
register tracepoint bpf program type and let it call the same set
of helper functions as BPF_PROG_TYPE_KPROBE
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
split allows to move expensive update of 'struct trace_entry' to later phase.
Repurpose unused 1st argument of perf_tp_event() to indicate event type.
While splitting use temp variable 'rctx' instead of '*rctx' to avoid
unnecessary loads done by the compiler due to -fno-strict-aliasing
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |_|_|/
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints.
It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call
with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu init logic and
subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs,
so we can safely drop memset from all of the above cases and move it into
perf_ftrace_function_call that calls it with stack allocated pt_regs.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull core block layer updates from Jens Axboe:
"This is the core block IO changes for this merge window. Nothing
earth shattering in here, it's mostly just fixes. In detail:
- Fix for a long standing issue where wrong ordering in blk-mq caused
order_to_size() to spew a warning. From Bart.
- Async discard support from Christoph. Basically just splitting our
sync interface into a submit + wait part.
- Add a cleaner interface for flagging whether a device has a write
back cache or not. We've previously overloaded blk_queue_flush()
with this, but let's make it more explicit. Drivers cleaned up and
updated in the drivers pull request. From me.
- Fix for a double check for whether IO accounting is enabled or not.
From Michael Callahan.
- Fix for the async discard from Mike Snitzer, reinstating the early
EOPNOTSUPP return if the device doesn't support discards.
- Also from Mike, export bio_inc_remaining() so dm can drop it's
private copy of it.
- From Ming Lin, add support for passing in an offset for request
payloads.
- Tag function export from Sagi, which will be used in NVMe in the
drivers pull.
- Two blktrace related fixes from Shaohua.
- Propagate NOMERGE flag when making a request from a bio, also from
Shaohua.
- An optimization to not parse cgroup paths in blk-throttle, if we
don't need to. From Shaohua"
* 'for-4.7/core' of git://git.kernel.dk/linux-block:
blk-mq: fix undefined behaviour in order_to_size()
blk-throttle: don't parse cgroup path if trace isn't enabled
blktrace: add missed mask name
blktrace: delete garbage for message trace
block: make bio_inc_remaining() interface accessible again
block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard
block: Minor blk_account_io_start usage cleanup
block: add __blkdev_issue_discard
block: remove struct bio_batch
block: copy NOMERGE flag from bio to request
block: add ability to flag write back caching on a device
blk-mq: Export tagset iter function
block: add offset in blk_add_request_payload()
writeback: Fix performance regression in wb_over_bg_thresh()
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
BLK_TC_NOTIFY is missed in mask_maps, so we can't print out notify or
set mask with 'notify' name.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
commit f4a1d08ce65 introduces a regression. Originally for
BLK_TN_MESSAGE, we add message in trace and return. The commit ignores
the early return and add garbage info.
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing ring-buffer fixes from Steven Rostedt:
"Hao Qin reported an integer overflow possibility with signed and
unsigned numbers in the ring-buffer code.
https://bugzilla.kernel.org/show_bug.cgi?id=118001
At first I did not think this was too much of an issue, because the
overflow would be caught later when either too much data was allocated
or it would trigger RB_WARN_ON() which shuts down the ring buffer.
But looking closer into it, I found that the right settings could
bypass the checks and crash the kernel. Luckily, this is only
accessible by root.
The first fix is to convert all the variables into long, such that we
don't get into issues between 32 bit variables being assigned 64 bit
ones. This fixes the RB_WARN_ON() triggering.
The next fix is to get rid of a duplicate DIV_ROUND_UP() that when
called twice with the right value, can cause a kernel crash.
The first DIV_ROUND_UP() is to normalize the input and it is checked
against the minimum allowable value. But then DIV_ROUND_UP() is
called again, which can overflow due to the (a + b - 1)/b, logic. The
first called upped the value, the second can overflow (with the +b
part).
The second call to DIV_ROUND_UP() came in via a second change a while
ago and the code is cleaned up to remove it"
* tag 'trace-fixes-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Prevent overflow of size in ring_buffer_resize()
ring-buffer: Use long for nr_pages to avoid overflow failures
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
then the DIV_ROUND_UP() will return zero.
Here's the details:
# echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb
tracing_entries_write() processes this and converts kb to bytes.
18014398509481980 << 10 = 18446744073709547520
and this is passed to ring_buffer_resize() as unsigned long size.
size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
Where DIV_ROUND_UP(a, b) is (a + b - 1)/b
BUF_PAGE_SIZE is 4080 and here
18446744073709547520 + 4080 - 1 = 18446744073709551599
where 18446744073709551599 is still smaller than 2^64
2^64 - 18446744073709551599 = 17
But now 18446744073709551599 / 4080 = 4521260802379792
and size = size * 4080 = 18446744073709551360
This is checked to make sure its still greater than 2 * 4080,
which it is.
Then we convert to the number of buffer pages needed.
nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)
but this time size is 18446744073709551360 and
2^64 - (18446744073709551360 + 4080 - 1) = -3823
Thus it overflows and the resulting number is less than 4080, which makes
3823 / 4080 = 0
an nr_pages is set to this. As we already checked against the minimum that
nr_pages may be, this causes the logic to fail as well, and we crash the
kernel.
There's no reason to have the two DIV_ROUND_UP() (that's just result of
historical code changes), clean up the code and fix this bug.
Cc: stable@vger.kernel.org # 3.5+
Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | |/ / /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The size variable to change the ring buffer in ftrace is a long. The
nr_pages used to update the ring buffer based on the size is int. On 64 bit
machines this can cause an overflow problem.
For example, the following will cause the ring buffer to crash:
# cd /sys/kernel/debug/tracing
# echo 10 > buffer_size_kb
# echo 8556384240 > buffer_size_kb
Then you get the warning of:
WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260
Which is:
RB_WARN_ON(cpu_buffer, nr_removed);
Note each ring buffer page holds 4080 bytes.
This is because:
1) 10 causes the ring buffer to have 3 pages.
(10kb requires 3 * 4080 pages to hold)
2) (2^31 / 2^10 + 1) * 4080 = 8556384240
The value written into buffer_size_kb is shifted by 10 and then passed
to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760
3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
which is 4080. 8761737461760 / 4080 = 2147484672
4) nr_pages is subtracted from the current nr_pages (3) and we get:
2147484669. This value is saved in a signed integer nr_pages_to_update
5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
turns into the value of -2147482627
6) As the value is a negative number, in update_pages_handler() it is
negated and passed to rb_remove_pages() and 2147482627 pages will
be removed, which is much larger than 3 and it causes the warning
because not all the pages asked to be removed were removed.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001
Cc: stable@vger.kernel.org # 2.6.28+
Fixes: 7a8e76a3829f1 ("tracing: unified trace buffer")
Reported-by: Hao Qin <QEver.cn@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| |\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The majority of changes go into the cpufreq subsystem this time.
To me, quite obviously, the biggest ticket item is the new "schedutil"
governor. Interestingly enough, it's the first new cpufreq governor
since the beginning of the git era (except for some out-of-the-tree
ones).
There are two main differences between it and the existing governors.
First, it uses the information provided by the scheduler directly for
making its decisions, so it doesn't have to track anything by itself.
Second, it can invoke drivers (supporting that feature) to adjust CPU
performance right away without having to spawn work items to be
executed in process context or similar. Currently, the acpi-cpufreq
driver is the only one supporting that mode of operation, but then it
is used on a large number of systems.
The "schedutil" governor as included here is very simple and mostly
regarded as a foundation for future work on the integration of the
scheduler with CPU power management (in fact, there is work in
progress on top of it already). Nevertheless it works and the
preliminary results obtained with it are encouraging.
There also is some consolidation of CPU frequency management for ARM
platforms that can add their machine IDs the the new stub dt-platdev
driver now and that will take care of creating the requisite platform
device for cpufreq-dt, so it is not necessary to do that in platform
code any more. Several ARM platforms are switched over to using this
generic mechanism.
In addition to that, the intel_pstate driver is now going to respect
CPU frequency limits set by the platform firmware (or a BMC) and
provided via the ACPI _PPC object.
The devfreq subsystem is getting a new "passive" governor for SoCs
subsystems that will depend on somebody else to manage their voltage
rails and its support for Samsung Exynos SoCs is consolidated.
The rest is support for new hardware (Intel Broxton support in
intel_idle for one example), bug fixes, optimizations and cleanups in
a number of places.
Specifics:
- New cpufreq "schedutil" governor (making decisions based on CPU
utilization information provided by the scheduler and capable of
switching CPU frequencies right away if the underlying driver
supports that) and support for fast frequency switching in the
acpi-cpufreq driver (Rafael Wysocki)
- Consolidation of CPU frequency management on ARM platforms allowing
them to get rid of some platform-specific boilerplate code if they
are going to use the cpufreq-dt driver (Viresh Kumar, Finley Xiao,
Marc Gonzalez)
- Support for ACPI _PPC and CPU frequency limits in the intel_pstate
driver (Srinivas Pandruvada)
- Fixes and cleanups in the cpufreq core and generic governor code
(Rafael Wysocki, Sai Gurrappadi)
- intel_pstate driver optimizations and cleanups (Rafael Wysocki,
Philippe Longepe, Chen Yu, Joe Perches)
- cpufreq powernv driver fixes and cleanups (Akshay Adiga, Shilpasri
Bhat)
- cpufreq qoriq driver fixes and cleanups (Jia Hongtao)
- ACPI cpufreq driver cleanups (Viresh Kumar)
- Assorted cpufreq driver updates (Ashwin Chaugule, Geliang Tang,
Javier Martinez Canillas, Paul Gortmaker, Sudeep Holla)
- Assorted cpufreq fixes and cleanups (Joe Perches, Arnd Bergmann)
- Fixes and cleanups in the OPP (Operating Performance Points)
framework, mostly related to OPP sharing, and reorganization of
OF-dependent code in it (Viresh Kumar, Arnd Bergmann, Sudeep Holla)
- New "passive" governor for devfreq (for SoC subsystems that will
rely on someone else for the management of their power resources)
and consolidation of devfreq support for Exynos platforms, coding
style and typo fixes for devfreq (Chanwoo Choi, MyungJoo Ham)
- PM core fixes and cleanups, mostly to make it work better with the
generic power domains (genpd) framework, and updates for that
framework (Ulf Hansson, Thierry Reding, Colin Ian King)
- Intel Broxton support for the intel_idle driver (Len Brown)
- cpuidle core optimization and fix (Daniel Lezcano, Dave Gerlach)
- ARM cpuidle cleanups (Jisheng Zhang)
- Intel Kabylake support for the RAPL power capping driver (Jacob
Pan)
- AVS (Adaptive Voltage Switching) rockchip-io driver update (Heiko
Stuebner)
- Updates for the cpupower tool (Arjun Sreedharan, Colin Ian King,
Mattia Dongili, Thomas Renninger)"
* tag 'pm-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (112 commits)
intel_pstate: Clean up get_target_pstate_use_performance()
intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
intel_pstate: Clarify average performance computation
intel_pstate: Avoid unnecessary synchronize_sched() during initialization
cpufreq: schedutil: Make default depend on CONFIG_SMP
cpufreq: powernv: del_timer_sync when global and local pstate are equal
cpufreq: powernv: Move smp_call_function_any() out of irq safe block
intel_pstate: Clean up intel_pstate_get()
cpufreq: schedutil: Make it depend on CONFIG_SMP
cpufreq: governor: Fix handling of special cases in dbs_update()
PM / OPP: Move CONFIG_OF dependent code in a separate file
cpufreq: intel_pstate: Ignore _PPC processing under HWP
cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table
PM / OPP: add non-OF versions of dev_pm_opp_{cpumask_, }remove_table
cpufreq: tango: Use generic platdev driver
PM / OPP: pass cpumask by reference
cpufreq: Fix GOV_LIMITS handling for the userspace governor
cpupower: fix potential memory leak
PM / devfreq: style/typo fixes
PM / devfreq: exynos: Add the detailed correlation for Exynos5422 bus
..
|
| | |\ \ \ \ \
| | |/ / / /
| |/| | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
* pm-cpufreq: (63 commits)
intel_pstate: Clean up get_target_pstate_use_performance()
intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
intel_pstate: Clarify average performance computation
intel_pstate: Avoid unnecessary synchronize_sched() during initialization
cpufreq: schedutil: Make default depend on CONFIG_SMP
cpufreq: powernv: del_timer_sync when global and local pstate are equal
cpufreq: powernv: Move smp_call_function_any() out of irq safe block
intel_pstate: Clean up intel_pstate_get()
cpufreq: schedutil: Make it depend on CONFIG_SMP
cpufreq: governor: Fix handling of special cases in dbs_update()
cpufreq: intel_pstate: Ignore _PPC processing under HWP
cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table
cpufreq: tango: Use generic platdev driver
cpufreq: Fix GOV_LIMITS handling for the userspace governor
cpufreq: mvebu: Move cpufreq code into drivers/cpufreq/
cpufreq: dt: Kill platform-data
mvebu: Use dev_pm_opp_set_sharing_cpus() to mark OPP tables as shared
cpufreq: dt: Identify cpu-sharing for platforms without operating-points-v2
cpufreq: governor: Change confusing struct field and variable names
cpufreq: intel_pstate: Enable PPC enforcement for servers
...
|
| | | |/ / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add a new cpufreq scaling governor, called "schedutil", that uses
scheduler-provided CPU utilization information as input for making
its decisions.
Doing that is possible after commit 34e2c555f3e1 (cpufreq: Add
mechanism for registering utilization update callbacks) that
introduced cpufreq_update_util() called by the scheduler on
utilization changes (from CFS) and RT/DL task status updates.
In particular, CPU frequency scaling decisions may be based on
the the utilization data passed to cpufreq_update_util() by CFS.
The new governor is relatively simple.
The frequency selection formula used by it depends on whether or not
the utilization is frequency-invariant. In the frequency-invariant
case the new CPU frequency is given by
next_freq = 1.25 * max_freq * util / max
where util and max are the last two arguments of cpufreq_update_util().
In turn, if util is not frequency-invariant, the maximum frequency in
the above formula is replaced with the current frequency of the CPU:
next_freq = 1.25 * curr_freq * util / max
The coefficient 1.25 corresponds to the frequency tipping point at
(util / max) = 0.8.
All of the computations are carried out in the utilization update
handlers provided by the new governor. One of those handlers is
used for cpufreq policies shared between multiple CPUs and the other
one is for policies with one CPU only (and therefore it doesn't need
to use any extra synchronization means).
The governor supports fast frequency switching if that is supported
by the cpufreq driver in use and possible for the given policy.
In the fast switching case, all operations of the governor take
place in its utilization update handlers. If fast switching cannot
be used, the frequency switch operations are carried out with the
help of a work item which only calls __cpufreq_driver_target()
(under a mutex) to trigger a frequency update (to a value already
computed beforehand in one of the utilization update handlers).
Currently, the governor treats all of the RT and DL tasks as
"unknown utilization" and sets the frequency to the allowed
maximum when updated from the RT or DL sched classes. That
heavy-handed approach should be replaced with something more
subtle and specifically targeted at RT and DL tasks.
The governor shares some tunables management code with the
"ondemand" and "conservative" governors and uses some common
definitions from cpufreq_governor.h, but apart from that it
is stand-alone.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
| |\| | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently register functions for events will be called
through the 'reg' field of event class directly without
any check when seting up triggers.
Triggers for events that don't support register through
debug fs (events under events/ftrace are for trace-cmd to
read event format, and most of them don't have a register
function except events/ftrace/functionx) can't be enabled
at all, and an oops will be hit when setting up trigger
for those events, so just not creating them is an easy way
to avoid the oops.
Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com
Cc: stable@vger.kernel.org # 3.14+
Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| |\| | |
| | | |
| | | |
| | | | |
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.
Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>. Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Nothing major this round. Mostly small clean ups and fixes.
Some visible changes:
- A new flag was added to distinguish traces done in NMI context.
- Preempt tracer now shows functions where preemption is disabled but
interrupts are still enabled.
Other notes:
- Updates were done to function tracing to allow better performance
with perf.
- Infrastructure code has been added to allow for a new histogram
feature for recording live trace event histograms that can be
configured by simple user commands. The feature itself was just
finished, but needs a round in linux-next before being pulled.
This only includes some infrastructure changes that will be needed"
* tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
tracing: Record and show NMI state
tracing: Fix trace_printk() to print when not using bprintk()
tracing: Remove redundant reset per-CPU buff in irqsoff tracer
x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
tracing: Fix crash from reading trace_pipe with sendfile
tracing: Have preempt(irqs)off trace preempt disabled functions
tracing: Fix return while holding a lock in register_tracer()
ftrace: Use kasprintf() in ftrace_profile_tracefs()
ftrace: Update dynamic ftrace calls only if necessary
ftrace: Make ftrace_hash_rec_enable return update bool
tracing: Fix typoes in code comment and printk in trace_nop.c
tracing, writeback: Replace cgroup path to cgroup ino
tracing: Use flags instead of bool in trigger structure
tracing: Add an unreg_all() callback to trigger commands
tracing: Add needs_rec flag to event triggers
tracing: Add a per-event-trigger 'paused' field
tracing: Add get_syscall_name()
tracing: Add event record param to trigger_ops.func()
tracing: Make event trigger functions available
tracing: Make ftrace_event_field checking functions available
...
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The latency tracer format has a nice column to indicate IRQ state, but
this is not able to tell us about NMI state.
When tracing perf interrupt handlers (which often run in NMI context)
it is very useful to see how the events nest.
Link: http://lkml.kernel.org/r/20160318153022.105068893@infradead.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The trace_printk() code will allocate extra buffers if the compile detects
that a trace_printk() is used. To do this, the format of the trace_printk()
is saved to the __trace_printk_fmt section, and if that section is bigger
than zero, the buffers are allocated (along with a message that this has
happened).
If trace_printk() uses a format that is not a constant, and thus something
not guaranteed to be around when the print happens, the compiler optimizes
the fmt out, as it is not used, and the __trace_printk_fmt section is not
filled. This means the kernel will not allocate the special buffers needed
for the trace_printk() and the trace_printk() will not write anything to the
tracing buffer.
Adding a "__used" to the variable in the __trace_printk_fmt section will
keep it around, even though it is set to NULL. This will keep the string
from being printed in the debugfs/tracing/printk_formats section as it is
not needed.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()"
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
There is no reason to do it twice: from commit b6f11df26fdc28
("trace: Call tracing_reset_online_cpus before tracer->init()")
resetting of per-CPU buffers done before tracer->init() call.
tracer->init() calls {irqs,preempt,preemptirqs}off_tracer_init() and it
calls __irqsoff_tracer_init(), which resets per-CPU ringbuffer second
time.
It's slowpath, but anyway.
Link: http://lkml.kernel.org/r/1445278226-16187-1-git-send-email-0x7f454c46@gmail.com
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If tracing contains data and the trace_pipe file is read with sendfile(),
then it can trigger a NULL pointer dereference and various BUG_ON within the
VM code.
There's a patch to fix this in the splice_to_pipe() code, but it's also a
good idea to not let that happen from trace_pipe either.
Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in
Cc: stable@vger.kernel.org # 2.6.30+
Reported-by: Rabin Vincent <rabin.vincent@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Joel Fernandes reported that the function tracing of preempt disabled
sections was not being reported when running either the preemptirqsoff or
preemptoff tracers. This was due to the fact that the function tracer
callback for those tracers checked if irqs were disabled before tracing. But
this fails when we want to trace preempt off locations as well.
Joel explained that he wanted to see funcitons where interrupts are enabled
but preemption was disabled. The expected output he wanted:
<...>-2265 1d.h1 3419us : preempt_count_sub <-irq_exit
<...>-2265 1d..1 3419us : __do_softirq <-irq_exit
<...>-2265 1d..1 3419us : msecs_to_jiffies <-__do_softirq
<...>-2265 1d..1 3420us : irqtime_account_irq <-__do_softirq
<...>-2265 1d..1 3420us : __local_bh_disable_ip <-__do_softirq
<...>-2265 1..s1 3421us : run_timer_softirq <-__do_softirq
<...>-2265 1..s1 3421us : hrtimer_run_pending <-run_timer_softirq
<...>-2265 1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq
<...>-2265 1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq
<...>-2265 1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq
<...>-2265 1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq
<...>-2265 1..s1 3423us : rcu_bh_qs <-__do_softirq
<...>-2265 1d.s1 3423us : irqtime_account_irq <-__do_softirq
<...>-2265 1d.s1 3423us : __local_bh_enable <-__do_softirq
There's a comment saying that the irq disabled check is because there's a
possible race that tracing_cpu may be set when the function is executed. But
I don't remember that race. For now, I added a check for preemption being
enabled too to not record the function, as there would be no race if that
was the case. I need to re-investigate this, as I'm now thinking that the
tracing_cpu will always be correct. But no harm in keeping the check for
now, except for the slight performance hit.
Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com
Fixes: 5e6d2b9cfa3a "tracing: Use one prologue for the preempt irqs off tracer function tracers"
Cc: stable@vget.kernel.org # 2.6.37+
Reported-by: Joel Fernandes <agnel.joel@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
commit d39cdd2036a6 ("tracing: Make tracer_flags use the right set_flag
callback") introduces a potential mutex deadlock issue, as it forgets to
free the mutex when allocaing the tracer_flags gets fail.
The issue was found by Dan Carpenter through Smatch static code check tool.
Link: http://lkml.kernel.org/r/1457958941-30265-1-git-send-email-chuhu@redhat.com
Fixes: d39cdd2036a6 ("tracing: Make tracer_flags use the right set_flag callback")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Use kasprintf() instead of kmalloc() and snprintf().
Link: http://lkml.kernel.org/r/135a7bc36e51fd9eaa57124dd2140285b771f738.1458050835.git.geliangtang@163.com
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Currently dynamic ftrace calls are updated any time
the ftrace_ops is un/registered. If we do this update
only when it's needed, we save lot of time for perf
system wide ftrace function sampling/counting.
The reason is that for system wide sampling/counting,
perf creates event for each cpu in the system.
Each event then registers separate copy of ftrace_ops,
which ends up in FTRACE_UPDATE_CALLS updates. On servers
with many cpus that means serious stall (240 cpus server):
Counting:
# time ./perf stat -e ftrace:function -a sleep 1
Performance counter stats for 'system wide':
370,663 ftrace:function
1.401427505 seconds time elapsed
real 3m51.743s
user 0m0.023s
sys 3m48.569s
Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 141200 events and lost 5 chunks!
[ perf record: Captured and wrote 10.703 MB perf.data (135950 samples) ]
real 2m31.429s
user 0m0.213s
sys 2m29.494s
There's no reason to do the FTRACE_UPDATE_CALLS update
for each event in perf case, because all the ftrace_ops
always share the same filter, so the updated calls are
always the same.
It's required that only first ftrace_ops registration
does the FTRACE_UPDATE_CALLS update (also sometimes
the second if the first one used the trampoline), but
the rest can be only cheaply linked into the ftrace_ops
list.
Counting:
# time ./perf stat -e ftrace:function -a sleep 1
Performance counter stats for 'system wide':
398,571 ftrace:function
1.377503733 seconds time elapsed
real 0m2.787s
user 0m0.005s
sys 0m1.883s
Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 261730 events and lost 9 chunks!
[ perf record: Captured and wrote 19.907 MB perf.data (256293 samples) ]
real 1m31.948s
user 0m0.309s
sys 1m32.051s
Link: http://lkml.kernel.org/r/1458138873-1553-6-git-send-email-jolsa@kernel.org
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Change __ftrace_hash_rec_update to return true in case
we need to update dynamic ftrace call records. It return
false in case no update is needed.
Link: http://lkml.kernel.org/r/1458138873-1553-5-git-send-email-jolsa@kernel.org
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
echo nop > /sys/kernel/debug/tracing/options/current_tracer
echo 1 > /sys/kernel/debug/tracing/options/test_nop_accept
echo 0 > /sys/kernel/debug/tracing/options/test_nop_accept
echo 1 > /sys/kernel/debug/tracing/options/test_nop_refuse
Before the fix, the dmesg is a bit ugly since a align issue.
[ 191.973081] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
[ 195.156942] nop_test_refuse flag set to 1: we refuse.Now cat trace_options to see the result
After the fix, the dmesg will show aligned log for nop_test_refuse and nop_test_accept.
[ 2718.032413] nop_test_refuse flag set to 1: we refuse. Now cat trace_options to see the result
[ 2734.253360] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
Link: http://lkml.kernel.org/r/1457444222-8654-2-git-send-email-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
gcc isn't known for handling bool in structures. Instead of using bool, use
an integer mask and use bit flags instead.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add a new unreg_all() callback that can be used to remove all
command-specific triggers from an event and arrange to have it called
whenever a trigger file is opened with O_TRUNC set.
Commands that don't want truncate semantics, or existing commands that
don't implement this function simply do nothing and their triggers
remain intact.
Link: http://lkml.kernel.org/r/2b7d62854d01f28c19185e1bbb8f826f385edfba.1449767187.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add a new needs_rec flag for triggers that require unconditional
access to trace records in order to function.
Normally a trigger requires access to the contents of a trace record
only if it has a filter associated with it (since filters need the
contents of a record in order to make a filtering decision). Some
types of triggers, such as 'hist' triggers, require access to trace
record contents independent of the presence of filters, so add a new
flag for those triggers.
Link: http://lkml.kernel.org/r/7be8fa38f9b90fdb6c47ca0f98d20a07b9fd512b.1449767187.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add a simple per-trigger 'paused' flag, allowing individual triggers
to pause. We could leave it to individual triggers that need this
functionality to do it themselves, but we also want to allow other
events to control pausing, so add it to the trigger data.
Link: http://lkml.kernel.org/r/fed37e4879684d7dcc57fe00ce0cbf170032b06d.1449767187.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add a utility function to grab the syscall name from the syscall
metadata, given a syscall id.
Link: http://lkml.kernel.org/r/be26a8dfe3f15e16a837799f1c1e2b4d62742843.1449767187.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Some triggers may need access to the trace event, so pass it in. Also
fix up the existing trigger funcs and their callers.
Link: http://lkml.kernel.org/r/543e31e9fc445ef61077421ab219033401c39846.1449767187.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Make various event trigger utility functions available outside of
trace_events_trigger.c so that new triggers can be defined outside of
that file.
Link: http://lkml.kernel.org/r/4a40c1695dd43cac6cd475d72e13ffe30ba84bff.1449767187.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|