ftrace/scripts: Add helper script to bisect function tracing problem functions

Every so often, with a special config or a architecture change, running function or function_graph tracing can cause the machien to hard reboot, crash, or simply hard lockup. There's some functions in the function graph tracer that can not be traced otherwise it causes the function tracer to recurse before the recursion protection mechanisms are in place. When this occurs, using the dynamic ftrace featuer that allows limiting what actually gets traced can be used to bisect down to the problem function. This adds a script that helps with this process in the scripts/tracing directory, called ftrace-bisect.sh The set up is to read all the functions that can be traced from available_filter_functions into a file (full_file). Then run this script passing it the full_file and a "test_file" and "non_test_file", where the test_file will be add to set_ftrace_filter. What ftarce_bisect.sh does, is to copy half of the functions in full_file into the test_file and the other half into the non_test_file. This way, one can cat the test_file into the set_ftrace_filter functions and only test the functions that are in that file. If it works, then we run the process again after copying non_test_file to full_file and repeating the process. If the system crashed, then the bad function is in the test_file and after a reboot, the test_file becomes the new full_file in the next iteration. When we get down to a single function in the full_file, then ftrace_bisect.sh will report that as the bad function. Full documentation of how to use this simple script is within the script file itself. Link: http://lkml.kernel.org/r/20160920100716.131d3647@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
author: Steven Rostedt (Red Hat) <rostedt@goodmis.org> 2016-09-21 13:43:48 -0400
committer: Steven Rostedt <rostedt@goodmis.org> 2016-09-21 13:56:55 -0400
commit: 951dbf500aa7df051d7cde15b9ac05608c0bb16f (patch)
tree: 36cefe0aa1b559a7a41ac1c19a418ef636324201 /scripts
parent: f971cc9aabc287120bbe7f3f1abe70c13e61ee94 (diff)
download: talos-op-linux-951dbf500aa7df051d7cde15b9ac05608c0bb16f.tar.gz
talos-op-linux-951dbf500aa7df051d7cde15b9ac05608c0bb16f.zip
1 files changed, 115 insertions, 0 deletions
diff --git a/scripts/tracing/ftrace-bisect.sh b/scripts/tracing/ftrace-bisect.sh
new file mode 100755
index 000000000000..9ff8ac5fc53c
--- /dev/null
+++ b/scripts/tracing/ftrace-bisect.sh
@@ -0,0 +1,115 @@
+#!/bin/bash
+#
+# Here's how to use this:
+#
+# This script is used to help find functions that are being traced by function
+# tracer or function graph tracing that causes the machine to reboot, hang, or
+# crash. Here's the steps to take.
+#
+# First, determine if function tracing is working with a single function:
+#
+#   (note, if this is a problem with function_graph tracing, then simply
+#    replace "function" with "function_graph" in the following steps).
+#
+#  # cd /sys/kernel/debug/tracing
+#  # echo schedule > set_ftrace_filter
+#  # echo function > current_tracer
+#
+# If this works, then we know that something is being traced that shouldn't be.
+#
+#  # echo nop > current_tracer
+#
+#  # cat available_filter_functions > ~/full-file
+#  # ftrace-bisect ~/full-file ~/test-file ~/non-test-file
+#  # cat ~/test-file > set_ftrace_filter
+#
+# *** Note *** this will take several minutes. Setting multiple functions is
+# an O(n^2) operation, and we are dealing with thousands of functions. So go
+# have  coffee, talk with your coworkers, read facebook. And eventually, this
+# operation will end.
+#
+#  # echo function > current_tracer
+#
+# If it crashes, we know that ~/test-file has a bad function.
+#
+#   Reboot back to test kernel.
+#
+#     # cd /sys/kernel/debug/tracing
+#     # mv ~/test-file ~/full-file
+#
+# If it didn't crash.
+#
+#     # echo nop > current_tracer
+#     # mv ~/non-test-file ~/full-file
+#
+# Get rid of the other test file from previous run (or save them off somewhere).
+#  # rm -f ~/test-file ~/non-test-file
+#
+# And start again:
+#
+#  # ftrace-bisect ~/full-file ~/test-file ~/non-test-file
+#
+# The good thing is, because this cuts the number of functions in ~/test-file
+# by half, the cat of it into set_ftrace_filter takes half as long each
+# iteration, so don't talk so much at the water cooler the second time.
+#
+# Eventually, if you did this correctly, you will get down to the problem
+# function, and all we need to do is to notrace it.
+#
+# The way to figure out if the problem function is bad, just do:
+#
+#  # echo <problem-function> > set_ftrace_notrace
+#  # echo > set_ftrace_filter
+#  # echo function > current_tracer
+#
+# And if it doesn't crash, we are done.
+#
+# If it does crash, do this again (there's more than one problem function)
+# but you need to echo the problem function(s) into set_ftrace_notrace before
+# enabling function tracing in the above steps. Or if you can compile the
+# kernel, annotate the problem functions with "notrace" and start again.
+#
+
+
+if [ $# -ne 3 ]; then
+  echo 'usage: ftrace-bisect full-file test-file  non-test-file'
+  exit
+fi
+
+full=$1
+test=$2
+nontest=$3
+
+x=`cat $full | wc -l`
+if [ $x -eq 1 ]; then
+	echo "There's only one function left, must be the bad one"
+	cat $full
+	exit 0
+fi
+
+let x=$x/2
+let y=$x+1
+
+if [ ! -f $full ]; then
+	echo "$full does not exist"
+	exit 1
+fi
+
+if [ -f $test ]; then
+	echo -n "$test exists, delete it? [y/N]"
+	read a
+	if [ "$a" != "y" -a "$a" != "Y" ]; then
+		exit 1
+	fi
+fi
+
+if [ -f $nontest ]; then
+	echo -n "$nontest exists, delete it? [y/N]"
+	read a
+	if [ "$a" != "y" -a "$a" != "Y" ]; then
+		exit 1
+	fi
+fi
+
+sed -ne "1,${x}p" $full > $test
+sed -ne "$y,\$p" $full > $nontest
author	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	2016-09-21 13:43:48 -0400
committer	Steven Rostedt <rostedt@goodmis.org>	2016-09-21 13:56:55 -0400
commit	951dbf500aa7df051d7cde15b9ac05608c0bb16f (patch)
tree	36cefe0aa1b559a7a41ac1c19a418ef636324201 /scripts
parent	f971cc9aabc287120bbe7f3f1abe70c13e61ee94 (diff)
download	talos-op-linux-951dbf500aa7df051d7cde15b9ac05608c0bb16f.tar.gz talos-op-linux-951dbf500aa7df051d7cde15b9ac05608c0bb16f.zip