diff options
Diffstat (limited to 'Documentation/BUG-HUNTING')
-rw-r--r-- | Documentation/BUG-HUNTING | 246 |
1 files changed, 0 insertions, 246 deletions
diff --git a/Documentation/BUG-HUNTING b/Documentation/BUG-HUNTING deleted file mode 100644 index 65022a87bf17..000000000000 --- a/Documentation/BUG-HUNTING +++ /dev/null @@ -1,246 +0,0 @@ -Table of contents -================= - -Last updated: 20 December 2005 - -Contents -======== - -- Introduction -- Devices not appearing -- Finding patch that caused a bug --- Finding using git-bisect --- Finding it the old way -- Fixing the bug - -Introduction -============ - -Always try the latest kernel from kernel.org and build from source. If you are -not confident in doing that please report the bug to your distribution vendor -instead of to a kernel developer. - -Finding bugs is not always easy. Have a go though. If you can't find it don't -give up. Report as much as you have found to the relevant maintainer. See -MAINTAINERS for who that is for the subsystem you have worked on. - -Before you submit a bug report read REPORTING-BUGS. - -Devices not appearing -===================== - -Often this is caused by udev. Check that first before blaming it on the -kernel. - -Finding patch that caused a bug -=============================== - - - -Finding using git-bisect ------------------------- - -Using the provided tools with git makes finding bugs easy provided the bug is -reproducible. - -Steps to do it: -- start using git for the kernel source -- read the man page for git-bisect -- have fun - -Finding it the old way ----------------------- - -[Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)] - -This is how to track down a bug if you know nothing about kernel hacking. -It's a brute force approach but it works pretty well. - -You need: - - . A reproducible bug - it has to happen predictably (sorry) - . All the kernel tar files from a revision that worked to the - revision that doesn't - -You will then do: - - . Rebuild a revision that you believe works, install, and verify that. - . Do a binary search over the kernels to figure out which one - introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but - you know that 1.3.69 does. Pick a kernel in the middle and build - that, like 1.3.50. Build & test; if it works, pick the mid point - between .50 and .69, else the mid point between .28 and .50. - . You'll narrow it down to the kernel that introduced the bug. You - can probably do better than this but it gets tricky. - - . Narrow it down to a subdirectory - - - Copy kernel that works into "test". Let's say that 3.62 works, - but 3.63 doesn't. So you diff -r those two kernels and come - up with a list of directories that changed. For each of those - directories: - - Copy the non-working directory next to the working directory - as "dir.63". - One directory at time, try moving the working directory to - "dir.62" and mv dir.63 dir"time, try - - mv dir dir.62 - mv dir.63 dir - find dir -name '*.[oa]' -print | xargs rm -f - - And then rebuild and retest. Assuming that all related - changes were contained in the sub directory, this should - isolate the change to a directory. - - Problems: changes in header files may have occurred; I've - found in my case that they were self explanatory - you may - or may not want to give up when that happens. - - . Narrow it down to a file - - - You can apply the same technique to each file in the directory, - hoping that the changes in that file are self contained. - - . Narrow it down to a routine - - - You can take the old file and the new file and manually create - a merged file that has - - #ifdef VER62 - routine() - { - ... - } - #else - routine() - { - ... - } - #endif - - And then walk through that file, one routine at a time and - prefix it with - - #define VER62 - /* both routines here */ - #undef VER62 - - Then recompile, retest, move the ifdefs until you find the one - that makes the difference. - -Finally, you take all the info that you have, kernel revisions, bug -description, the extent to which you have narrowed it down, and pass -that off to whomever you believe is the maintainer of that section. -A post to linux.dev.kernel isn't such a bad idea if you've done some -work to narrow it down. - -If you get it down to a routine, you'll probably get a fix in 24 hours. - -My apologies to Linus and the other kernel hackers for describing this -brute force approach, it's hardly what a kernel hacker would do. However, -it does work and it lets non-hackers help fix bugs. And it is cool -because Linux snapshots will let you do this - something that you can't -do with vendor supplied releases. - -Fixing the bug -============== - -Nobody is going to tell you how to fix bugs. Seriously. You need to work it -out. But below are some hints on how to use the tools. - -To debug a kernel, use objdump and look for the hex offset from the crash -output to find the valid line of code/assembler. Without debug symbols, you -will see the assembler code for the routine shown, but if your kernel has -debug symbols the C code will also be available. (Debug symbols can be enabled -in the kernel hacking menu of the menu configuration.) For example: - - objdump -r -S -l --disassemble net/dccp/ipv4.o - -NB.: you need to be at the top level of the kernel tree for this to pick up -your C files. - -If you don't have access to the code you can also debug on some crash dumps -e.g. crash dump output as shown by Dave Miller. - -> EIP is at ip_queue_xmit+0x14/0x4c0 -> ... -> Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00 -> 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08 -> <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85 -> -> Put the bytes into a "foo.s" file like this: -> -> .text -> .globl foo -> foo: -> .byte .... /* bytes from Code: part of OOPS dump */ -> -> Compile it with "gcc -c -o foo.o foo.s" then look at the output of -> "objdump --disassemble foo.o". -> -> Output: -> -> ip_queue_xmit: -> push %ebp -> push %edi -> push %esi -> push %ebx -> sub $0xbc, %esp -> mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb) -> mov 0x8(%ebp), %ebx ! %ebx = skb->sk -> mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt - -In addition, you can use GDB to figure out the exact file and line -number of the OOPS from the vmlinux file. If you have -CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the -OOPS: - - EIP: 0060:[<c021e50e>] Not tainted VLI - -And use GDB to translate that to human-readable form: - - gdb vmlinux - (gdb) l *0xc021e50e - -If you don't have CONFIG_DEBUG_INFO enabled, you use the function -offset from the OOPS: - - EIP is at vt_ioctl+0xda8/0x1482 - -And recompile the kernel with CONFIG_DEBUG_INFO enabled: - - make vmlinux - gdb vmlinux - (gdb) p vt_ioctl - (gdb) l *(0x<address of vt_ioctl> + 0xda8) -or, as one command - (gdb) l *(vt_ioctl + 0xda8) - -If you have a call trace, such as :- ->Call Trace: -> [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5 -> [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e -> [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee -> ... -this shows the problem in the :jbd: module. You can load that module in gdb -and list the relevant code. - gdb fs/jbd/jbd.ko - (gdb) p log_wait_commit - (gdb) l *(0x<address> + 0xa3) -or - (gdb) l *(log_wait_commit + 0xa3) - - -Another very useful option of the Kernel Hacking section in menuconfig is -Debug memory allocations. This will help you see whether data has been -initialised and not set before use etc. To see the values that get assigned -with this look at mm/slab.c and search for POISON_INUSE. When using this an -Oops will often show the poisoned data instead of zero which is the default. - -Once you have worked out a fix please submit it upstream. After all open -source is about sharing what you do and don't you want to be recognised for -your genius? - -Please do read Documentation/SubmittingPatches though to help your code get -accepted. |