diff options
Diffstat (limited to 'Documentation/trace')
-rw-r--r-- | Documentation/trace/coresight-cpu-debug.txt | 175 | ||||
-rw-r--r-- | Documentation/trace/ftrace.txt | 508 |
2 files changed, 571 insertions, 112 deletions
diff --git a/Documentation/trace/coresight-cpu-debug.txt b/Documentation/trace/coresight-cpu-debug.txt new file mode 100644 index 000000000000..b3da1f90b861 --- /dev/null +++ b/Documentation/trace/coresight-cpu-debug.txt @@ -0,0 +1,175 @@ + Coresight CPU Debug Module + ========================== + + Author: Leo Yan <leo.yan@linaro.org> + Date: April 5th, 2017 + +Introduction +------------ + +Coresight CPU debug module is defined in ARMv8-a architecture reference manual +(ARM DDI 0487A.k) Chapter 'Part H: External debug', the CPU can integrate +debug module and it is mainly used for two modes: self-hosted debug and +external debug. Usually the external debug mode is well known as the external +debugger connects with SoC from JTAG port; on the other hand the program can +explore debugging method which rely on self-hosted debug mode, this document +is to focus on this part. + +The debug module provides sample-based profiling extension, which can be used +to sample CPU program counter, secure state and exception level, etc; usually +every CPU has one dedicated debug module to be connected. Based on self-hosted +debug mechanism, Linux kernel can access these related registers from mmio +region when the kernel panic happens. The callback notifier for kernel panic +will dump related registers for every CPU; finally this is good for assistant +analysis for panic. + + +Implementation +-------------- + +- During driver registration, it uses EDDEVID and EDDEVID1 - two device ID + registers to decide if sample-based profiling is implemented or not. On some + platforms this hardware feature is fully or partially implemented; and if + this feature is not supported then registration will fail. + +- At the time this documentation was written, the debug driver mainly relies on + information gathered by the kernel panic callback notifier from three + sampling registers: EDPCSR, EDVIDSR and EDCIDSR: from EDPCSR we can get + program counter; EDVIDSR has information for secure state, exception level, + bit width, etc; EDCIDSR is context ID value which contains the sampled value + of CONTEXTIDR_EL1. + +- The driver supports a CPU running in either AArch64 or AArch32 mode. The + registers naming convention is a bit different between them, AArch64 uses + 'ED' for register prefix (ARM DDI 0487A.k, chapter H9.1) and AArch32 uses + 'DBG' as prefix (ARM DDI 0487A.k, chapter G5.1). The driver is unified to + use AArch64 naming convention. + +- ARMv8-a (ARM DDI 0487A.k) and ARMv7-a (ARM DDI 0406C.b) have different + register bits definition. So the driver consolidates two difference: + + If PCSROffset=0b0000, on ARMv8-a the feature of EDPCSR is not implemented; + but ARMv7-a defines "PCSR samples are offset by a value that depends on the + instruction set state". For ARMv7-a, the driver checks furthermore if CPU + runs with ARM or thumb instruction set and calibrate PCSR value, the + detailed description for offset is in ARMv7-a ARM (ARM DDI 0406C.b) chapter + C11.11.34 "DBGPCSR, Program Counter Sampling Register". + + If PCSROffset=0b0010, ARMv8-a defines "EDPCSR implemented, and samples have + no offset applied and do not sample the instruction set state in AArch32 + state". So on ARMv8 if EDDEVID1.PCSROffset is 0b0010 and the CPU operates + in AArch32 state, EDPCSR is not sampled; when the CPU operates in AArch64 + state EDPCSR is sampled and no offset are applied. + + +Clock and power domain +---------------------- + +Before accessing debug registers, we should ensure the clock and power domain +have been enabled properly. In ARMv8-a ARM (ARM DDI 0487A.k) chapter 'H9.1 +Debug registers', the debug registers are spread into two domains: the debug +domain and the CPU domain. + + +---------------+ + | | + | | + +----------+--+ | + dbg_clock -->| |**| |<-- cpu_clock + | Debug |**| CPU | + dbg_power_domain -->| |**| |<-- cpu_power_domain + +----------+--+ | + | | + | | + +---------------+ + +For debug domain, the user uses DT binding "clocks" and "power-domains" to +specify the corresponding clock source and power supply for the debug logic. +The driver calls the pm_runtime_{put|get} operations as needed to handle the +debug power domain. + +For CPU domain, the different SoC designs have different power management +schemes and finally this heavily impacts external debug module. So we can +divide into below cases: + +- On systems with a sane power controller which can behave correctly with + respect to CPU power domain, the CPU power domain can be controlled by + register EDPRCR in driver. The driver firstly writes bit EDPRCR.COREPURQ + to power up the CPU, and then writes bit EDPRCR.CORENPDRQ for emulation + of CPU power down. As result, this can ensure the CPU power domain is + powered on properly during the period when access debug related registers; + +- Some designs will power down an entire cluster if all CPUs on the cluster + are powered down - including the parts of the debug registers that should + remain powered in the debug power domain. The bits in EDPRCR are not + respected in these cases, so these designs do not support debug over + power down in the way that the CoreSight / Debug designers anticipated. + This means that even checking EDPRSR has the potential to cause a bus hang + if the target register is unpowered. + + In this case, accessing to the debug registers while they are not powered + is a recipe for disaster; so we need preventing CPU low power states at boot + time or when user enable module at the run time. Please see chapter + "How to use the module" for detailed usage info for this. + + +Device Tree Bindings +-------------------- + +See Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt for details. + + +How to use the module +--------------------- + +If you want to enable debugging functionality at boot time, you can add +"coresight_cpu_debug.enable=1" to the kernel command line parameter. + +The driver also can work as module, so can enable the debugging when insmod +module: +# insmod coresight_cpu_debug.ko debug=1 + +When boot time or insmod module you have not enabled the debugging, the driver +uses the debugfs file system to provide a knob to dynamically enable or disable +debugging: + +To enable it, write a '1' into /sys/kernel/debug/coresight_cpu_debug/enable: +# echo 1 > /sys/kernel/debug/coresight_cpu_debug/enable + +To disable it, write a '0' into /sys/kernel/debug/coresight_cpu_debug/enable: +# echo 0 > /sys/kernel/debug/coresight_cpu_debug/enable + +As explained in chapter "Clock and power domain", if you are working on one +platform which has idle states to power off debug logic and the power +controller cannot work well for the request from EDPRCR, then you should +firstly constraint CPU idle states before enable CPU debugging feature; so can +ensure the accessing to debug logic. + +If you want to limit idle states at boot time, you can use "nohlt" or +"cpuidle.off=1" in the kernel command line. + +At the runtime you can disable idle states with below methods: + +Set latency request to /dev/cpu_dma_latency to disable all CPUs specific idle +states (if latency = 0uS then disable all idle states): +# echo "what_ever_latency_you_need_in_uS" > /dev/cpu_dma_latency + +Disable specific CPU's specific idle state: +# echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable + + +Output format +------------- + +Here is an example of the debugging output format: + +ARM external debug module: +coresight-cpu-debug 850000.debug: CPU[0]: +coresight-cpu-debug 850000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) +coresight-cpu-debug 850000.debug: EDPCSR: [<ffff00000808e9bc>] handle_IPI+0x174/0x1d8 +coresight-cpu-debug 850000.debug: EDCIDSR: 00000000 +coresight-cpu-debug 850000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) +coresight-cpu-debug 852000.debug: CPU[1]: +coresight-cpu-debug 852000.debug: EDPRSR: 00000001 (Power:On DLK:Unlock) +coresight-cpu-debug 852000.debug: EDPCSR: [<ffff0000087fab34>] debug_notifier_call+0x23c/0x358 +coresight-cpu-debug 852000.debug: EDCIDSR: 00000000 +coresight-cpu-debug 852000.debug: EDVIDSR: 90000000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index fff8ff6d4893..d4601df6e72e 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -5,10 +5,11 @@ Copyright 2008 Red Hat Inc. Author: Steven Rostedt <srostedt@redhat.com> License: The GNU Free Documentation License, Version 1.2 (dual licensed under the GPL v2) -Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, - John Kacur, and David Teigland. +Original Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, + John Kacur, and David Teigland. Written for: 2.6.28-rc2 Updated for: 3.10 +Updated for: 4.13 - Copyright 2017 VMware Inc. Steven Rostedt Introduction ------------ @@ -26,9 +27,11 @@ a task is woken to the task is actually scheduled in. One of the most common uses of ftrace is the event tracing. Through out the kernel is hundreds of static event points that -can be enabled via the debugfs file system to see what is +can be enabled via the tracefs file system to see what is going on in certain parts of the kernel. +See events.txt for more information. + Implementation Details ---------------------- @@ -39,34 +42,47 @@ See ftrace-design.txt for details for arch porters and such. The File System --------------- -Ftrace uses the debugfs file system to hold the control files as +Ftrace uses the tracefs file system to hold the control files as well as the files to display output. -When debugfs is configured into the kernel (which selecting any ftrace -option will do) the directory /sys/kernel/debug will be created. To mount +When tracefs is configured into the kernel (which selecting any ftrace +option will do) the directory /sys/kernel/tracing will be created. To mount this directory, you can add to your /etc/fstab file: - debugfs /sys/kernel/debug debugfs defaults 0 0 + tracefs /sys/kernel/tracing tracefs defaults 0 0 Or you can mount it at run time with: - mount -t debugfs nodev /sys/kernel/debug + mount -t tracefs nodev /sys/kernel/tracing For quicker access to that directory you may want to make a soft link to it: - ln -s /sys/kernel/debug /debug + ln -s /sys/kernel/tracing /tracing + + *** NOTICE *** + +Before 4.1, all ftrace tracing control files were within the debugfs +file system, which is typically located at /sys/kernel/debug/tracing. +For backward compatibility, when mounting the debugfs file system, +the tracefs file system will be automatically mounted at: + + /sys/kernel/debug/tracing -Any selected ftrace option will also create a directory called tracing -within the debugfs. The rest of the document will assume that you are in -the ftrace directory (cd /sys/kernel/debug/tracing) and will only concentrate -on the files within that directory and not distract from the content with -the extended "/sys/kernel/debug/tracing" path name. +All files located in the tracefs file system will be located in that +debugfs file system directory as well. + + *** NOTICE *** + +Any selected ftrace option will also create the tracefs file system. +The rest of the document will assume that you are in the ftrace directory +(cd /sys/kernel/tracing) and will only concentrate on the files within that +directory and not distract from the content with the extended +"/sys/kernel/tracing" path name. That's it! (assuming that you have ftrace configured into your kernel) -After mounting debugfs, you can see a directory called -"tracing". This directory contains the control and output files +After mounting tracefs you will have access to the control and output files of ftrace. Here is a list of some of the key files: @@ -92,10 +108,20 @@ of ftrace. Here is a list of some of the key files: writing to the ring buffer, the tracing overhead may still be occurring. + The kernel function tracing_off() can be used within the + kernel to disable writing to the ring buffer, which will + set this file to "0". User space can re-enable tracing by + echoing "1" into the file. + + Note, the function and event trigger "traceoff" will also + set this file to zero and stop tracing. Which can also + be re-enabled by user space using this file. + trace: This file holds the output of the trace in a human - readable format (described below). + readable format (described below). Note, tracing is temporarily + disabled while this file is being read (opened). trace_pipe: @@ -109,7 +135,8 @@ of ftrace. Here is a list of some of the key files: will not be read again with a sequential read. The "trace" file is static, and if the tracer is not adding more data, it will display the same - information every time it is read. + information every time it is read. This file will not + disable tracing while being read. trace_options: @@ -128,12 +155,14 @@ of ftrace. Here is a list of some of the key files: tracing_max_latency: Some of the tracers record the max latency. - For example, the time interrupts are disabled. - This time is saved in this file. The max trace - will also be stored, and displayed by "trace". - A new max trace will only be recorded if the - latency is greater than the value in this - file. (in microseconds) + For example, the maximum time that interrupts are disabled. + The maximum time is saved in this file. The max trace will also be + stored, and displayed by "trace". A new max trace will only be + recorded if the latency is greater than the value in this file + (in microseconds). + + By echoing in a time into this file, no latency will be recorded + unless it is greater than the time in this file. tracing_thresh: @@ -152,32 +181,34 @@ of ftrace. Here is a list of some of the key files: that the kernel uses for allocation, usually 4 KB in size). If the last page allocated has room for more bytes than requested, the rest of the page will be used, - making the actual allocation bigger than requested. + making the actual allocation bigger than requested or shown. ( Note, the size may not be a multiple of the page size due to buffer management meta-data. ) + Buffer sizes for individual CPUs may vary + (see "per_cpu/cpu0/buffer_size_kb" below), and if they do + this file will show "X". + buffer_total_size_kb: This displays the total combined size of all the trace buffers. free_buffer: - If a process is performing the tracing, and the ring buffer - should be shrunk "freed" when the process is finished, even - if it were to be killed by a signal, this file can be used - for that purpose. On close of this file, the ring buffer will - be resized to its minimum size. Having a process that is tracing - also open this file, when the process exits its file descriptor - for this file will be closed, and in doing so, the ring buffer - will be "freed". + If a process is performing tracing, and the ring buffer should be + shrunk "freed" when the process is finished, even if it were to be + killed by a signal, this file can be used for that purpose. On close + of this file, the ring buffer will be resized to its minimum size. + Having a process that is tracing also open this file, when the process + exits its file descriptor for this file will be closed, and in doing so, + the ring buffer will be "freed". It may also stop tracing if disable_on_free option is set. tracing_cpumask: - This is a mask that lets the user only trace - on specified CPUs. The format is a hex string - representing the CPUs. + This is a mask that lets the user only trace on specified CPUs. + The format is a hex string representing the CPUs. set_ftrace_filter: @@ -190,6 +221,9 @@ of ftrace. Here is a list of some of the key files: to be traced. Echoing names of functions into this file will limit the trace to only those functions. + The functions listed in "available_filter_functions" are what + can be written into this file. + This interface also allows for commands to be used. See the "Filter commands" section for more details. @@ -202,7 +236,14 @@ of ftrace. Here is a list of some of the key files: set_ftrace_pid: - Have the function tracer only trace a single thread. + Have the function tracer only trace the threads whose PID are + listed in this file. + + If the "function-fork" option is set, then when a task whose + PID is listed in this file forks, the child's PID will + automatically be added to this file, and the child will be + traced by the function tracer as well. This option will also + cause PIDs of tasks that exit to be removed from the file. set_event_pid: @@ -217,17 +258,28 @@ of ftrace. Here is a list of some of the key files: set_graph_function: - Set a "trigger" function where tracing should start - with the function graph tracer (See the section - "dynamic ftrace" for more details). + Functions listed in this file will cause the function graph + tracer to only trace these functions and the functions that + they call. (See the section "dynamic ftrace" for more details). + + set_graph_notrace: + + Similar to set_graph_function, but will disable function graph + tracing when the function is hit until it exits the function. + This makes it possible to ignore tracing functions that are called + by a specific function. available_filter_functions: - This lists the functions that ftrace - has processed and can trace. These are the function - names that you can pass to "set_ftrace_filter" or - "set_ftrace_notrace". (See the section "dynamic ftrace" - below for more details.) + This lists the functions that ftrace has processed and can trace. + These are the function names that you can pass to + "set_ftrace_filter" or "set_ftrace_notrace". + (See the section "dynamic ftrace" below for more details.) + + dyn_ftrace_total_info: + + This file is for debugging purposes. The number of functions that + have been converted to nops and are available to be traced. enabled_functions: @@ -250,12 +302,21 @@ of ftrace. Here is a list of some of the key files: an 'I' will be displayed on the same line as the function that can be overridden. + If the architecture supports it, it will also show what callback + is being directly called by the function. If the count is greater + than 1 it most likely will be ftrace_ops_list_func(). + + If the callback of the function jumps to a trampoline that is + specific to a the callback and not the standard trampoline, + its address will be printed as well as the function that the + trampoline calls. + function_profile_enabled: When set it will enable all functions with either the function - tracer, or if enabled, the function graph tracer. It will + tracer, or if configured, the function graph tracer. It will keep a histogram of the number of functions that were called - and if run with the function graph tracer, it will also keep + and if the function graph tracer was configured, it will also keep track of the time spent in those functions. The histogram content can be displayed in the files: @@ -283,12 +344,11 @@ of ftrace. Here is a list of some of the key files: printk_formats: This is for tools that read the raw format files. If an event in - the ring buffer references a string (currently only trace_printk() - does this), only a pointer to the string is recorded into the buffer - and not the string itself. This prevents tools from knowing what - that string was. This file displays the string and address for - the string allowing tools to map the pointers to what the - strings were. + the ring buffer references a string, only a pointer to the string + is recorded into the buffer and not the string itself. This prevents + tools from knowing what that string was. This file displays the string + and address for the string allowing tools to map the pointers to what + the strings were. saved_cmdlines: @@ -298,6 +358,22 @@ of ftrace. Here is a list of some of the key files: comms for events. If a pid for a comm is not listed, then "<...>" is displayed in the output. + If the option "record-cmd" is set to "0", then comms of tasks + will not be saved during recording. By default, it is enabled. + + saved_cmdlines_size: + + By default, 128 comms are saved (see "saved_cmdlines" above). To + increase or decrease the amount of comms that are cached, echo + in a the number of comms to cache, into this file. + + saved_tgids: + + If the option "record-tgid" is set, on each scheduling context switch + the Task Group ID of a task is saved in a table mapping the PID of + the thread to its TGID. By default, the "record-tgid" option is + disabled. + snapshot: This displays the "snapshot" buffer and also lets the user @@ -336,6 +412,9 @@ of ftrace. Here is a list of some of the key files: # cat trace_clock [local] global counter x86-tsc + The clock with the square brackets around it is the one + in effect. + local: Default clock, but may not be in sync across CPUs global: This clock is in sync with all CPUs but may @@ -448,6 +527,23 @@ of ftrace. Here is a list of some of the key files: See events.txt for more information. + set_event: + + By echoing in the event into this file, will enable that event. + + See events.txt for more information. + + available_events: + + A list of events that can be enabled in tracing. + + See events.txt for more information. + + hwlat_detector: + + Directory for the Hardware Latency Detector. + See "Hardware Latency Detector" section below. + per_cpu: This is a directory that contains the trace per_cpu information. @@ -539,13 +635,25 @@ Here is the list of current tracers that may be configured. to draw a graph of function calls similar to C code source. + "blk" + + The block tracer. The tracer used by the blktrace user + application. + + "hwlat" + + The Hardware Latency tracer is used to detect if the hardware + produces any latency. See "Hardware Latency Detector" section + below. + "irqsoff" Traces the areas that disable interrupts and saves the trace with the longest max latency. See tracing_max_latency. When a new max is recorded, it replaces the old trace. It is best to view this - trace with the latency-format option enabled. + trace with the latency-format option enabled, which + happens automatically when the tracer is selected. "preemptoff" @@ -571,6 +679,26 @@ Here is the list of current tracers that may be configured. RT tasks (as the current "wakeup" does). This is useful for those interested in wake up timings of RT tasks. + "wakeup_dl" + + Traces and records the max latency that it takes for + a SCHED_DEADLINE task to be woken (as the "wakeup" and + "wakeup_rt" does). + + "mmiotrace" + + A special tracer that is used to trace binary module. + It will trace all the calls that a module makes to the + hardware. Everything it writes and reads from the I/O + as well. + + "branch" + + This tracer can be configured when tracing likely/unlikely + calls within the kernel. It will trace when a likely and + unlikely branch is hit and if it was correct in its prediction + of being correct. + "nop" This is the "trace nothing" tracer. To remove all @@ -582,7 +710,7 @@ Examples of using the tracer ---------------------------- Here are typical examples of using the tracers when controlling -them only with the debugfs interface (without using any +them only with the tracefs interface (without using any user-land utilities). Output format: @@ -674,7 +802,7 @@ why a latency happened. Here is a typical trace. This shows that the current tracer is "irqsoff" tracing the time for which interrupts were disabled. It gives the trace version (which never changes) and the version of the kernel upon which this was executed on -(3.10). Then it displays the max latency in microseconds (259 us). The number +(3.8). Then it displays the max latency in microseconds (259 us). The number of trace entries displayed and the total number (both are four: #4/4). VP, KP, SP, and HP are always zero and are reserved for later use. #P is the number of online CPUs (#P:4). @@ -709,6 +837,8 @@ explains which is which. '.' otherwise. hardirq/softirq: + 'Z' - NMI occurred inside a hardirq + 'z' - NMI is running 'H' - hard irq occurred inside a softirq. 'h' - hard irq is running 's' - soft irq is running @@ -757,24 +887,24 @@ nohex nobin noblock trace_printk -nobranch annotate nouserstacktrace nosym-userobj noprintk-msg-only context-info nolatency-format -sleep-time -graph-time record-cmd +norecord-tgid overwrite nodisable_on_free irq-info markers noevent-fork function-trace +nofunction-fork nodisplay-graph nostacktrace +nobranch To disable one of the options, echo in the option prepended with "no". @@ -830,8 +960,6 @@ Here are the available options: trace_printk - Can disable trace_printk() from writing into the buffer. - branch - Enable branch tracing with the tracer. - annotate - It is sometimes confusing when the CPU buffers are full and one CPU buffer had a lot of events recently, thus a shorter time frame, were another CPU may have only had @@ -850,7 +978,8 @@ Here are the available options: <idle>-0 [001] .Ns3 21169.031485: sub_preempt_count <-_raw_spin_unlock userstacktrace - This option changes the trace. It records a - stacktrace of the current userspace thread. + stacktrace of the current user space thread after + each trace event. sym-userobj - when user stacktrace are enabled, look up which object the address belongs to, and print a @@ -873,29 +1002,21 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] context-info - Show only the event data. Hides the comm, PID, timestamp, CPU, and other useful data. - latency-format - This option changes the trace. When - it is enabled, the trace displays - additional information about the - latencies, as described in "Latency - trace format". - - sleep-time - When running function graph tracer, to include - the time a task schedules out in its function. - When enabled, it will account time the task has been - scheduled out as part of the function call. - - graph-time - When running function profiler with function graph tracer, - to include the time to call nested functions. When this is - not set, the time reported for the function will only - include the time the function itself executed for, not the - time for functions that it called. + latency-format - This option changes the trace output. When it is enabled, + the trace displays additional information about the + latency, as described in "Latency trace format". record-cmd - When any event or tracer is enabled, a hook is enabled - in the sched_switch trace point to fill comm cache + in the sched_switch trace point to fill comm cache with mapped pids and comms. But this may cause some overhead, and if you only care about pids, and not the name of the task, disabling this option can lower the - impact of tracing. + impact of tracing. See "saved_cmdlines". + + record-tgid - When any event or tracer is enabled, a hook is enabled + in the sched_switch trace point to fill the cache of + mapped Thread Group IDs (TGID) mapping to pids. See + "saved_tgids". overwrite - This controls what happens when the trace buffer is full. If "1" (default), the oldest events are @@ -935,19 +1056,98 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] functions. This keeps the overhead of the tracer down when performing latency tests. + function-fork - When set, tasks with PIDs listed in set_ftrace_pid will + have the PIDs of their children added to set_ftrace_pid + when those tasks fork. Also, when tasks with PIDs in + set_ftrace_pid exit, their PIDs will be removed from the + file. + display-graph - When set, the latency tracers (irqsoff, wakeup, etc) will use function graph tracing instead of function tracing. - stacktrace - This is one of the options that changes the trace - itself. When a trace is recorded, so is the stack - of functions. This allows for back traces of - trace sites. + stacktrace - When set, a stack trace is recorded after any trace event + is recorded. + + branch - Enable branch tracing with the tracer. This enables branch + tracer along with the currently set tracer. Enabling this + with the "nop" tracer is the same as just enabling the + "branch" tracer. Note: Some tracers have their own options. They only appear in this file when the tracer is active. They always appear in the options directory. +Here are the per tracer options: + +Options for function tracer: + + func_stack_trace - When set, a stack trace is recorded after every + function that is recorded. NOTE! Limit the functions + that are recorded before enabling this, with + "set_ftrace_filter" otherwise the system performance + will be critically degraded. Remember to disable + this option before clearing the function filter. + +Options for function_graph tracer: + + Since the function_graph tracer has a slightly different output + it has its own options to control what is displayed. + + funcgraph-overrun - When set, the "overrun" of the graph stack is + displayed after each function traced. The + overrun, is when the stack depth of the calls + is greater than what is reserved for each task. + Each task has a fixed array of functions to + trace in the call graph. If the depth of the + calls exceeds that, the function is not traced. + The overrun is the number of functions missed + due to exceeding this array. + + funcgraph-cpu - When set, the CPU number of the CPU where the trace + occurred is displayed. + + funcgraph-overhead - When set, if the function takes longer than + A certain amount, then a delay marker is + displayed. See "delay" above, under the + header description. + + funcgraph-proc - Unlike other tracers, the process' command line + is not displayed by default, but instead only + when a task is traced in and out during a context + switch. Enabling this options has the command + of each process displayed at every line. + + funcgraph-duration - At the end of each function (the return) + the duration of the amount of time in the + function is displayed in microseconds. + + funcgraph-abstime - When set, the timestamp is displayed at each + line. + + funcgraph-irqs - When disabled, functions that happen inside an + interrupt will not be traced. + + funcgraph-tail - When set, the return event will include the function + that it represents. By default this is off, and + only a closing curly bracket "}" is displayed for + the return of a function. + + sleep-time - When running function graph tracer, to include + the time a task schedules out in its function. + When enabled, it will account time the task has been + scheduled out as part of the function call. + + graph-time - When running function profiler with function graph tracer, + to include the time to call nested functions. When this is + not set, the time reported for the function will only + include the time the function itself executed for, not the + time for functions that it called. + +Options for blk tracer: + + blk_classic - Shows a more minimalistic output. + irqsoff ------- @@ -1711,6 +1911,85 @@ events. <idle>-0 2d..3 6us : 0:120:R ==> [002] 5882: 94:R sleep +Hardware Latency Detector +------------------------- + +The hardware latency detector is executed by enabling the "hwlat" tracer. + +NOTE, this tracer will affect the performance of the system as it will +periodically make a CPU constantly busy with interrupts disabled. + + # echo hwlat > current_tracer + # sleep 100 + # cat trace +# tracer: hwlat +# +# _-----=> irqs-off +# / _----=> need-resched +# | / _---=> hardirq/softirq +# || / _--=> preempt-depth +# ||| / delay +# TASK-PID CPU# |||| TIMESTAMP FUNCTION +# | | | |||| | | + <...>-3638 [001] d... 19452.055471: #1 inner/outer(us): 12/14 ts:1499801089.066141940 + <...>-3638 [003] d... 19454.071354: #2 inner/outer(us): 11/9 ts:1499801091.082164365 + <...>-3638 [002] dn.. 19461.126852: #3 inner/outer(us): 12/9 ts:1499801098.138150062 + <...>-3638 [001] d... 19488.340960: #4 inner/outer(us): 8/12 ts:1499801125.354139633 + <...>-3638 [003] d... 19494.388553: #5 inner/outer(us): 8/12 ts:1499801131.402150961 + <...>-3638 [003] d... 19501.283419: #6 inner/outer(us): 0/12 ts:1499801138.297435289 nmi-total:4 nmi-count:1 + + +The above output is somewhat the same in the header. All events will have +interrupts disabled 'd'. Under the FUNCTION title there is: + + #1 - This is the count of events recorded that were greater than the + tracing_threshold (See below). + + inner/outer(us): 12/14 + + This shows two numbers as "inner latency" and "outer latency". The test + runs in a loop checking a timestamp twice. The latency detected within + the two timestamps is the "inner latency" and the latency detected + after the previous timestamp and the next timestamp in the loop is + the "outer latency". + + ts:1499801089.066141940 + + The absolute timestamp that the event happened. + + nmi-total:4 nmi-count:1 + + On architectures that support it, if an NMI comes in during the + test, the time spent in NMI is reported in "nmi-total" (in + microseconds). + + All architectures that have NMIs will show the "nmi-count" if an + NMI comes in during the test. + +hwlat files: + + tracing_threshold - This gets automatically set to "10" to represent 10 + microseconds. This is the threshold of latency that + needs to be detected before the trace will be recorded. + + Note, when hwlat tracer is finished (another tracer is + written into "current_tracer"), the original value for + tracing_threshold is placed back into this file. + + hwlat_detector/width - The length of time the test runs with interrupts + disabled. + + hwlat_detector/window - The length of time of the window which the test + runs. That is, the test will run for "width" + microseconds per "window" microseconds + + tracing_cpumask - When the test is started. A kernel thread is created that + runs the test. This thread will alternate between CPUs + listed in the tracing_cpumask between each period + (one "window"). To limit the test to specific CPUs + set the mask in this file to only the CPUs that the test + should run on. + function -------- @@ -1821,15 +2100,15 @@ something like this simple program: #define STR(x) _STR(x) #define MAX_PATH 256 -const char *find_debugfs(void) +const char *find_tracefs(void) { - static char debugfs[MAX_PATH+1]; - static int debugfs_found; + static char tracefs[MAX_PATH+1]; + static int tracefs_found; char type[100]; FILE *fp; - if (debugfs_found) - return debugfs; + if (tracefs_found) + return tracefs; if ((fp = fopen("/proc/mounts","r")) == NULL) { perror("/proc/mounts"); @@ -1839,27 +2118,27 @@ const char *find_debugfs(void) while (fscanf(fp, "%*s %" STR(MAX_PATH) "s %99s %*s %*d %*d\n", - debugfs, type) == 2) { - if (strcmp(type, "debugfs") == 0) + tracefs, type) == 2) { + if (strcmp(type, "tracefs") == 0) break; } fclose(fp); - if (strcmp(type, "debugfs") != 0) { - fprintf(stderr, "debugfs not mounted"); + if (strcmp(type, "tracefs") != 0) { + fprintf(stderr, "tracefs not mounted"); return NULL; } - strcat(debugfs, "/tracing/"); - debugfs_found = 1; + strcat(tracefs, "/tracing/"); + tracefs_found = 1; - return debugfs; + return tracefs; } const char *tracing_file(const char *file_name) { static char trace_file[MAX_PATH+1]; - snprintf(trace_file, MAX_PATH, "%s/%s", find_debugfs(), file_name); + snprintf(trace_file, MAX_PATH, "%s/%s", find_tracefs(), file_name); return trace_file; } @@ -1898,12 +2177,12 @@ Or this simple script! ------ #!/bin/bash -debugfs=`sed -ne 's/^debugfs \(.*\) debugfs.*/\1/p' /proc/mounts` -echo nop > $debugfs/tracing/current_tracer -echo 0 > $debugfs/tracing/tracing_on -echo $$ > $debugfs/tracing/set_ftrace_pid -echo function > $debugfs/tracing/current_tracer -echo 1 > $debugfs/tracing/tracing_on +tracefs=`sed -ne 's/^tracefs \(.*\) tracefs.*/\1/p' /proc/mounts` +echo nop > $tracefs/tracing/current_tracer +echo 0 > $tracefs/tracing/tracing_on +echo $$ > $tracefs/tracing/set_ftrace_pid +echo function > $tracefs/tracing/current_tracer +echo 1 > $tracefs/tracing/tracing_on exec "$@" ------ @@ -2145,13 +2424,18 @@ include the -pg switch in the compiling of the kernel.) At compile time every C file object is run through the recordmcount program (located in the scripts directory). This program will parse the ELF headers in the C object to find all -the locations in the .text section that call mcount. (Note, only -white listed .text sections are processed, since processing other -sections like .init.text may cause races due to those sections -being freed unexpectedly). - -A new section called "__mcount_loc" is created that holds -references to all the mcount call sites in the .text section. +the locations in the .text section that call mcount. Starting +with gcc verson 4.6, the -mfentry has been added for x86, which +calls "__fentry__" instead of "mcount". Which is called before +the creation of the stack frame. + +Note, not all sections are traced. They may be prevented by either +a notrace, or blocked another way and all inline functions are not +traced. Check the "available_filter_functions" file to see what functions +can be traced. + +A section called "__mcount_loc" is created that holds +references to all the mcount/fentry call sites in the .text section. The recordmcount program re-links this section back into the original object. The final linking stage of the kernel will add all these references into a single table. @@ -2679,7 +2963,7 @@ in time without stopping tracing. Ftrace swaps the current buffer with a spare buffer, and tracing continues in the new current (=previous spare) buffer. -The following debugfs files in "tracing" are related to this +The following tracefs files in "tracing" are related to this feature: snapshot: @@ -2752,7 +3036,7 @@ cat: snapshot: Device or resource busy Instances --------- -In the debugfs tracing directory is a directory called "instances". +In the tracefs tracing directory is a directory called "instances". This directory can have new directories created inside of it using mkdir, and removing directories with rmdir. The directory created with mkdir in this directory will already contain files and other |