編譯內核
下載Linux Kernel源代碼,並編譯生成壓縮的kernel鏡像(/bak/linux/linux-2.6/arch/x86_64/boot/bzImage)與用於gdb的非壓縮的kernel ELF文件(/bak/linux/linux-2.6/vmlinux, ELF object file, symbols included, including debug info)。制作initrd
制作initrd, 使用initrd時的kernel要使用CONFIG_BLK_DEV_INITRD=y編譯。使用busybox制作initrd
chmod +x initramfs/init
initramfs/init文件例如以下:
#!/bin/sh #Mount things needed by this script mount -t proc proc /proc mount -t sysfs sysfs /sys #Disable kernel messages from popping onto the screen echo 0 > /proc/sys/kernel/printk #Clear the screen clear #Create all the symlinks to /bin/busybox busybox --install -s #Create device nodes mknod /dev/null c 1 3 mknod /dev/tty c 5 0 mdev -s #Function for parsing command line options with "=" in them # get_opt("init=/sbin/init") will return "/sbin/init" get_opt() { echo "$@" | cut -d "=" -f 2 } #Defaults init="/sbin/init" root="/dev/hda1" #Process command line options for i in $(cat /proc/cmdline); do case $i in root\=*) root=$(get_opt $i) ;; init\=*) init=$(get_opt $i) ;; esac done #Mount the root device mount "${root}" /newroot #Check if $init exists and is executable if [[ -x "/newroot/${init}" ]] ; then #Unmount all other mounts so that the ram used by #the initramfs can be cleared after switch_root umount /sys /proc #Switch to the new root and execute init exec switch_root /newroot "${init}" fi #This will only be run if the exec above failed echo "Failed to switch_root, dropping to a shell" exec sh
cd initramfs
find . | cpio -H newc -o > ../initramfs.cpio
cd ..
cat initramfs.cpio | gzip > initramfs.igz
但上述 busybox-1.10.1-static.bz2似乎沒有ext2模塊不能識別qemu的-hda參數傳進去ext2格式的硬盤,所以最后改成從busybox-1.24.0的源代碼編譯。
CONFIG_MKFS_EXT2=y
Busybox Settings --->
Build Options --->
[*] Build BusyBox as a static binary (no shared libs) //靜態方式編譯
make & make install
cp -avR /bak/linux/busybox-1.24.0/_install/* /bak/linux/initramfs/
qemu載入內核
wget http://www.nongnu.org/qemu/linux-0.2.img.bz2sudo qemu-system-x86_64 -hda /bak/images/linux-0.2.img -hdb /bak/linux/disk.img -kernel /bak/linux/linux-2.6/arch/x86_64/boot/bzImage -initrd /bak/linux/initramfs.igz -append "root=/dev/sda init=sbin/init console=ttyS0" -nographic -smp 1,cores=1 -S -s
參數解釋例如以下:
- 當中-s為開啟GDB的調試端口1234,而-S則表示運行QEMU時凍結待GDB運行(c)ontinue操作。
- console=ttyS0" -nographic表示不開新的圖形化窗體。直接使用敲命令的bash窗體
- -append "root=/dev/sda init=sbin/init應該與initrd文件中的init腳本一致。
- 加--enable-debug參數編譯的QEMU會自己主動加入符號表
使用gdb調試內核
hua@node1:~$ sudo netstat -anp |grep 1234
tcp 0 0 0.0.0.0:1234 0.0.0.0:* LISTEN 24309/qemu-system-x
hua@node1:~$ /bak/java/gdb/bin/gdb /bak/linux/linux-2.6/vmlinux
...
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000000000000000 in irq_stack_union ()
(gdb) b start_kernel
Breakpoint 1 at 0xffffffff81d66b09: file init/main.c, line 498.
(gdb) info registers
(gdb) bt
(gdb) c
(gdb) list
(gdb) set architecture
Requires an argument. Valid arguments are i386, i386:x86-64, i386:x64-32, i8086, i386:intel,i386:x86-64:intel, i386:x64-32:intel, auto.
(gdb) b sysrq_handle_crash
使用eclipse調試內核
- Preferences -> Generl -> Workspace -> Build automatically (Disable)
Import -> C/C++ -> Existing Code as Makefile Project
3, 創建一個debug啟動器(Debug configurations -> C/C++ Remote Application)
選擇GDB(DSF) Manual Remote Debugging Launcher
Main TAB -> -C/C++ Application指向實際uncompress kernel: /bak/linux/linux-2.6/vmlinux
Main TAB -> -Disable auto build
Debugger TAB -> Stop on startup at 'start_kernel'
Debugger TAB -> connection -> Host Name or IP Address -> = localhost
Debugger TAB -> connection -> Port number = 1234
編譯gdb解決錯誤“Remote 'g' packet reply is too long”
cd /bak/java && wget http://ftp.gnu.org/gnu/gdb/gdb-7.7.tar.gz改動gdb/remote.c文件,在process_g_packet函數里,將例如以下代碼:
if (buf_len > 2 * rsa->sizeof_g_packet)
error (_("Remote 'g' packet reply is too long: %s"), rs->buf);
改動上兩行代碼為以下的代碼,或者直接凝視上兩行什么也不加:
if (buf_len > 2 * rsa->sizeof_g_packet) { rsa->sizeof_g_packet = buf_len ; for (i = 0; i < gdbarch_num_regs (gdbarch); i++) { if (rsa->regs[i].pnum == -1) continue; if (rsa->regs[i].offset >= rsa->sizeof_g_packet) rsa->regs[i].in_g_packet = 0; else rsa->regs[i].in_g_packet = 1; } }
./configure --prefix=/bak/java/gdb && make && make install
接下來又一次配置下Eclipse,點擊菜單“Run”->“Debug Configurations…”,在彈出的對話框中,切換到“Debugger”下的“Main”頁,改動“GDB debugger:”為剛編譯出來的GDB(/bak/java/gdb/bin/gdb),而不是默認的gdb
參考
[1] http://blog.chinaunix.net/uid-26009923-id-3825761.html[2] http://mgalgs.github.io/2012/03/23/how-to-build-a-custom-linux-kernel-for-qemu.html
附錄1, 使用cscope創建索引
1, 創建cscope.filesLNX=/bak/linux/linux-2.6
cd /
find $LNX \
-path "$LNX/arch/*" ! -path "$LNX/arch/i386*" -prune -o \
-path "$LNX/include/asm-*" ! -path "$LNX/include/asm-i386*" -prune -o \
-path "$LNX/tmp*" -prune -o \
-path "$LNX/Documentation*" -prune -o \
-path "$LNX/scripts*" -prune -o \
-path "$LNX/drivers*" -prune -o \
-name "*.[chxsS]" -print >/bak/linux/linux-2.6/cscope/cscope.files
2, 創建索引數據庫
cd /bak/linux/linux-2.6/cscope
3, 使用索引數據庫
cscope -d
附錄2,ELF格式
ELF(Executable and Linking Format),它是一種容器格式。用於存放可運行文件及相關數據。邏輯上分為許多section(可使用objdump -h 或readelf -S命令查看),包含:- executable code & data (.text, .data, .bss, etc) #.data包含初始化的全局數據 .bss未初始化的數據。 .text為可運行代碼
- symbol tables (.symtab)
- ELF string tables (.strtab, .shstrtab)
- debug information (.debug_info, .debug_line, .eh_frame, etc)
- metadata (.notes, .comment)
- dynamic linking information (.plt, .got, etc)
Displaying notes found at file offset 0x0094e5f8 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 8930dc42387f290d882a43eafffb3e6105dd4df0
hua@node1:/bak/linux/linux-2.6$ readelf -p .comment vmlinux
String dump of section '.comment':
[ 0] GCC: (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
#readelf -S 用於讀ELS鏡像中的全部符號表
hua@node1:/bak/linux/linux-2.6$ readelf -S vmlinux
There are 44 section headers, starting at offset 0xa081ae0:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS ffffffff81000000 00200000
000000000074e5f8 0000000000000000 AX 0 0 4096
[ 2] .notes NOTE ffffffff8174e5f8 0094e5f8
0000000000000024 0000000000000000 AX 0 0 4
[ 3] __ex_table PROGBITS ffffffff8174e620 0094e620
0000000000002158 0000000000000000 A 0 0 8
[ 4] .rodata PROGBITS ffffffff81800000 00a00000
000000000033f9fe 0000000000000000 A 0 0 64
[ 5] __bug_table PROGBITS ffffffff81b3fa00 00d3fa00
00000000000072fc 0000000000000000 A 0 0 1
[ 6] .pci_fixup PROGBITS ffffffff81b46d00 00d46d00
0000000000003270 0000000000000000 A 0 0 8
[ 7] .builtin_fw PROGBITS ffffffff81b49f70 00d49f70
0000000000000120 0000000000000000 A 0 0 8
[ 8] .tracedata PROGBITS ffffffff81b4a090 00d4a090
0000000000000078 0000000000000000 A 0 0 1
[ 9] __ksymtab PROGBITS ffffffff81b4a110 00d4a110
00000000000118a0 0000000000000000 A 0 0 16
[10] __ksymtab_gpl PROGBITS ffffffff81b5b9b0 00d5b9b0
000000000000ecc0 0000000000000000 A 0 0 16
[11] __kcrctab PROGBITS ffffffff81b6a670 00d6a670
0000000000008c50 0000000000000000 A 0 0 8
[12] __kcrctab_gpl PROGBITS ffffffff81b732c0 00d732c0
0000000000007660 0000000000000000 A 0 0 8
[13] __ksymtab_strings PROGBITS ffffffff81b7a920 00d7a920
00000000000268c3 0000000000000000 A 0 0 1
[14] __init_rodata PROGBITS ffffffff81ba1200 00da1200
0000000000000240 0000000000000000 A 0 0 32
[15] __param PROGBITS ffffffff81ba1440 00da1440
00000000000025d0 0000000000000000 A 0 0 8
[16] __modver PROGBITS ffffffff81ba3a10 00da3a10
00000000000005f0 0000000000000000 A 0 0 8
[17] .data PROGBITS ffffffff81c00000 00e00000
0000000000144140 0000000000000000 WA 0 0 4096
[18] .vvar PROGBITS ffffffff81d45000 00f45000
0000000000001000 0000000000000000 WA 0 0 16
[19] .data..percpu PROGBITS 0000000000000000 01000000
000000000001f918 0000000000000000 WA 0 0 4096
[20] .init.text PROGBITS ffffffff81d66000 01166000
0000000000060879 0000000000000000 AX 0 0 16
[21] .init.data PROGBITS ffffffff81dc7000 011c7000
00000000000c2e90 0000000000000000 WA 0 0 4096
[22] .x86_cpu_dev.init PROGBITS ffffffff81e89e90 01289e90
0000000000000018 0000000000000000 A 0 0 8
[23] .altinstructions PROGBITS ffffffff81e89ea8 01289ea8
0000000000005f44 0000000000000000 A 0 0 1
[24] .altinstr_replace PROGBITS ffffffff81e8fdec 0128fdec
00000000000017db 0000000000000000 AX 0 0 1
[25] .iommu_table PROGBITS ffffffff81e915c8 012915c8
00000000000000f0 0000000000000000 A 0 0 8
[26] .apicdrivers PROGBITS ffffffff81e916b8 012916b8
0000000000000030 0000000000000000 WA 0 0 8
[27] .exit.text PROGBITS ffffffff81e916e8 012916e8
0000000000001e26 0000000000000000 AX 0 0 1
[28] .smp_locks PROGBITS ffffffff81e94000 01294000
0000000000007000 0000000000000000 A 0 0 4
[29] .data_nosave PROGBITS ffffffff81e9b000 0129b000
0000000000001000 0000000000000000 WA 0 0 4
[30] .bss NOBITS ffffffff81e9c000 0129c000
0000000000142000 0000000000000000 WA 0 0 4096
[31] .brk NOBITS ffffffff81fde000 0129c000
0000000000026000 0000000000000000 WA 0 0 1
[32] .comment PROGBITS 0000000000000000 0129c000
0000000000000029 0000000000000001 MS 0 0 1
[33] .debug_aranges PROGBITS 0000000000000000 0129c030
0000000000023880 0000000000000000 0 0 16
[34] .debug_info PROGBITS 0000000000000000 012bf8b0
000000000724bc4f 0000000000000000 0 0 1
[35] .debug_abbrev PROGBITS 0000000000000000 0850b4ff
00000000002d7be9 0000000000000000 0 0 1
[36] .debug_line PROGBITS 0000000000000000 087e30e8
000000000072232c 0000000000000000 0 0 1
[37] .debug_frame PROGBITS 0000000000000000 08f05418
00000000001f5cd0 0000000000000000 0 0 8
[38] .debug_str PROGBITS 0000000000000000 090fb0e8
00000000002b5264 0000000000000001 MS 0 0 1
[39] .debug_loc PROGBITS 0000000000000000 093b034c
0000000000925080 0000000000000000 0 0 1
[40] .debug_ranges PROGBITS 0000000000000000 09cd53d0
00000000003ac530 0000000000000000 0 0 16
[41] .shstrtab STRTAB 0000000000000000 0a081900
00000000000001dd 0000000000000000 0 0 1
[42] .symtab SYMTAB 0000000000000000 0a0825e0
00000000002490c0 0000000000000018 43 63525 8
[43] .strtab STRTAB 0000000000000000 0a2cb6a0
0000000000219339 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
#查看調試信息
hua@node1:/bak/linux/linux-2.6$ readelf -S vmlinux |grep debug
[33] .debug_aranges PROGBITS 0000000000000000 0129c030
[34] .debug_info PROGBITS 0000000000000000 012bf8b0
[35] .debug_abbrev PROGBITS 0000000000000000 0850b4ff
[36] .debug_line PROGBITS 0000000000000000 087e30e8
[37] .debug_frame PROGBITS 0000000000000000 08f05418
[38] .debug_str PROGBITS 0000000000000000 090fb0e8
[39] .debug_loc PROGBITS 0000000000000000 093b034c
[40] .debug_ranges PROGBITS 0000000000000000 09cd53d0
#readelf -e 用於讀ELS鏡像中的全部段
hua@node1:/bak/linux/linux-2.6$ readelf -e vmlinux
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1000000
Start of program headers: 64 (bytes into file)
Start of section headers: 168303328 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 5
Size of section headers: 64 (bytes)
Number of section headers: 44
Section header string table index: 41
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS ffffffff81000000 00200000
000000000074e5f8 0000000000000000 AX 0 0 4096
[ 2] .notes NOTE ffffffff8174e5f8 0094e5f8
0000000000000024 0000000000000000 AX 0 0 4
[ 3] __ex_table PROGBITS ffffffff8174e620 0094e620
0000000000002158 0000000000000000 A 0 0 8
[ 4] .rodata PROGBITS ffffffff81800000 00a00000
000000000033f9fe 0000000000000000 A 0 0 64
[ 5] __bug_table PROGBITS ffffffff81b3fa00 00d3fa00
00000000000072fc 0000000000000000 A 0 0 1
[ 6] .pci_fixup PROGBITS ffffffff81b46d00 00d46d00
0000000000003270 0000000000000000 A 0 0 8
[ 7] .builtin_fw PROGBITS ffffffff81b49f70 00d49f70
0000000000000120 0000000000000000 A 0 0 8
[ 8] .tracedata PROGBITS ffffffff81b4a090 00d4a090
0000000000000078 0000000000000000 A 0 0 1
[ 9] __ksymtab PROGBITS ffffffff81b4a110 00d4a110
00000000000118a0 0000000000000000 A 0 0 16
[10] __ksymtab_gpl PROGBITS ffffffff81b5b9b0 00d5b9b0
000000000000ecc0 0000000000000000 A 0 0 16
[11] __kcrctab PROGBITS ffffffff81b6a670 00d6a670
0000000000008c50 0000000000000000 A 0 0 8
[12] __kcrctab_gpl PROGBITS ffffffff81b732c0 00d732c0
0000000000007660 0000000000000000 A 0 0 8
[13] __ksymtab_strings PROGBITS ffffffff81b7a920 00d7a920
00000000000268c3 0000000000000000 A 0 0 1
[14] __init_rodata PROGBITS ffffffff81ba1200 00da1200
0000000000000240 0000000000000000 A 0 0 32
[15] __param PROGBITS ffffffff81ba1440 00da1440
00000000000025d0 0000000000000000 A 0 0 8
[16] __modver PROGBITS ffffffff81ba3a10 00da3a10
00000000000005f0 0000000000000000 A 0 0 8
[17] .data PROGBITS ffffffff81c00000 00e00000
0000000000144140 0000000000000000 WA 0 0 4096
[18] .vvar PROGBITS ffffffff81d45000 00f45000
0000000000001000 0000000000000000 WA 0 0 16
[19] .data..percpu PROGBITS 0000000000000000 01000000
000000000001f918 0000000000000000 WA 0 0 4096
[20] .init.text PROGBITS ffffffff81d66000 01166000
0000000000060879 0000000000000000 AX 0 0 16
[21] .init.data PROGBITS ffffffff81dc7000 011c7000
00000000000c2e90 0000000000000000 WA 0 0 4096
[22] .x86_cpu_dev.init PROGBITS ffffffff81e89e90 01289e90
0000000000000018 0000000000000000 A 0 0 8
[23] .altinstructions PROGBITS ffffffff81e89ea8 01289ea8
0000000000005f44 0000000000000000 A 0 0 1
[24] .altinstr_replace PROGBITS ffffffff81e8fdec 0128fdec
00000000000017db 0000000000000000 AX 0 0 1
[25] .iommu_table PROGBITS ffffffff81e915c8 012915c8
00000000000000f0 0000000000000000 A 0 0 8
[26] .apicdrivers PROGBITS ffffffff81e916b8 012916b8
0000000000000030 0000000000000000 WA 0 0 8
[27] .exit.text PROGBITS ffffffff81e916e8 012916e8
0000000000001e26 0000000000000000 AX 0 0 1
[28] .smp_locks PROGBITS ffffffff81e94000 01294000
0000000000007000 0000000000000000 A 0 0 4
[29] .data_nosave PROGBITS ffffffff81e9b000 0129b000
0000000000001000 0000000000000000 WA 0 0 4
[30] .bss NOBITS ffffffff81e9c000 0129c000
0000000000142000 0000000000000000 WA 0 0 4096
[31] .brk NOBITS ffffffff81fde000 0129c000
0000000000026000 0000000000000000 WA 0 0 1
[32] .comment PROGBITS 0000000000000000 0129c000
0000000000000029 0000000000000001 MS 0 0 1
[33] .debug_aranges PROGBITS 0000000000000000 0129c030
0000000000023880 0000000000000000 0 0 16
[34] .debug_info PROGBITS 0000000000000000 012bf8b0
000000000724bc4f 0000000000000000 0 0 1
[35] .debug_abbrev PROGBITS 0000000000000000 0850b4ff
00000000002d7be9 0000000000000000 0 0 1
[36] .debug_line PROGBITS 0000000000000000 087e30e8
000000000072232c 0000000000000000 0 0 1
[37] .debug_frame PROGBITS 0000000000000000 08f05418
00000000001f5cd0 0000000000000000 0 0 8
[38] .debug_str PROGBITS 0000000000000000 090fb0e8
00000000002b5264 0000000000000001 MS 0 0 1
[39] .debug_loc PROGBITS 0000000000000000 093b034c
0000000000925080 0000000000000000 0 0 1
[40] .debug_ranges PROGBITS 0000000000000000 09cd53d0
00000000003ac530 0000000000000000 0 0 16
[41] .shstrtab STRTAB 0000000000000000 0a081900
00000000000001dd 0000000000000000 0 0 1
[42] .symtab SYMTAB 0000000000000000 0a0825e0
00000000002490c0 0000000000000018 43 63525 8
[43] .strtab STRTAB 0000000000000000 0a2cb6a0
0000000000219339 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
0x0000000000ba4000 0x0000000000ba4000 R E 200000
LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
0x0000000000146000 0x0000000000146000 RW 200000
LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d46000
0x000000000001f918 0x000000000001f918 RW 200000
LOAD 0x0000000001166000 0xffffffff81d66000 0x0000000001d66000
0x0000000000136000 0x000000000029e000 RWE 200000
NOTE 0x000000000094e5f8 0xffffffff8174e5f8 0x000000000174e5f8
0x0000000000000024 0x0000000000000024 4
Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .builtin_fw .tracedata __ksymtab __ksymtab_gpl __kcrctab __kcrctab_gpl __ksymtab_strings __init_rodata __param __modver
01 .data .vvar
02 .data..percpu
03 .init.text .init.data .x86_cpu_dev.init .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .data_nosave .bss .brk
04 .notes
附錄三,DWARF格式
DWARF(Debugging With Attributed Record Formats)和ELF是同義詞,從gcc 4.8開始使用DWARF version 4作為默認格式(Linux Kernel開關是:DEBUG_INFO_DWARF4)。附錄四,內核調試舉例一
[158108.522856] general protection fault: 0000 [#1] SMP
#模塊信息
[158108.531877] Modules linked in: dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_crypt gpio_ich xfs x86_pkg_temp_thermal intel_powerclamp coretemp bridge kvm_intel stp kvm joydev llc mei_me mei shpchp lpc_ich ipmi_si acpi_power_meter acpi_pad mac_hid btrfs xor raid6_pq libcrc32c ses enclosure crct10dif_pclmul crc32_pclmul ixgbe igb aesni_intel aes_x86_64 hid_generic dca lrw gf128mul ptp glue_helper usbhid ablk_helper cryptd hid pps_core i2c_algo_bit megaraid_sas mdio wmi
#CPU是20, PID是0, command是swapper/20, 內核版本號。硬件信息
[158108.654066] CPU: 20 PID: 0 Comm: swapper/20 Not tainted 3.13.0-74-generic #118-Ubuntu
[158108.675000] Hardware name: Cisco Systems Inc UCSC-C240-M4SX/UCSC-C240-M4SX, BIOS C240M4.2.0.8b.0.080620151546 08/06/2015
#task_struct(per-cpu variable current_task的內核地址,ti是current_thread_info的內核地址
[158108.699921] task: ffff883f2653b000 ti: ffff883f26536000 task.ti: ffff883f26536000
#寄存器信息, 對於x86,%cr2中的是近期的page fault address, RAX是非法值
[158108.720992] RIP: 0010:[<ffffffff810756a4>] [<ffffffff810756a4>] detach_if_pending+0x34/0xb0
[158108.744725] RSP: 0018:ffff887f7f083d10 EFLAGS: 00010002
[158108.757586] RAX: dead000000200200 RBX: ffffffffa012f040 RCX: 0000000000001896
[158108.779778] RDX: ffff887f25d00938 RSI: ffff887f25eb8000 RDI: ffffffffa012f040
[158108.802864] RBP: ffff887f7f083d30 R08: 0000000000000086 R09: ffff887f25d74000
[158108.826882] R10: 0000000000000002 R11: 0000000000000005 R12: ffffffffa012f040
[158108.851851] R13: ffff887f25eb8000 R14: 0000000000000001 R15: 0000000000000001
[158108.877347] FS: 0000000000000000(0000) GS:ffff887f7f080000(0000) knlGS:0000000000000000
[158108.903997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[158108.918882] CR2: 00000000006f0e58 CR3: 0000000001c0e000 CR4: 00000000001407e0
#棧的raw十六進制信息
[158108.943906] Stack:
[158108.954323] ffffffffa012f040 0000000000000000 ffff887f25eb8000 ffff883f22d7ea00
[158108.978987] ffff887f7f083d60 ffffffff81075766 0000000000000086 ffffffffa012f020
[158109.003697] ffff887f7f083d98 0000000000000100 ffff887f7f083d88 ffffffff81082369
#符號call stack backtrace,結合%rip這是很實用的信息。它也提供了函數的大小及偏移信息, 函數call前加問號。
[158109.028467] Call Trace:
[158109.039181] <IRQ>
[158109.041507] [<ffffffff81075766>] del_timer+0x46/0x70
[158109.062562] [<ffffffff81082369>] try_to_grab_pending+0xa9/0x160
[158109.076953] [<ffffffff81082453>] mod_delayed_work_on+0x33/0x70
[158109.091233] [<ffffffffa012c3ba>] set_timeout+0x3a/0x40 [ib_addr]
[158109.105194] [<ffffffffa012c559>] netevent_callback+0x29/0x30 [ib_addr]
[158109.120083] [<ffffffff8173125c>] notifier_call_chain+0x4c/0x70
[158109.134153] [<ffffffff81634a60>] ? neigh_table_clear+0x120/0x120
[158109.148004] [<ffffffff817312ba>] atomic_notifier_call_chain+0x1a/0x20
[158109.162487] [<ffffffff8163100b>] call_netevent_notifiers+0x1b/0x20
[158109.176677] [<ffffffff81634b21>] neigh_timer_handler+0xc1/0x2c0
[158109.189976] [<ffffffff810745d6>] call_timer_fn+0x36/0x100
[158109.202723] [<ffffffff81634a60>] ? neigh_table_clear+0x120/0x120
[158109.216443] [<ffffffff8107556f>] run_timer_softirq+0x1ef/0x2f0
[158109.229444] [<ffffffff8106cd2c>] __do_softirq+0xec/0x2c0
[158109.241890] [<ffffffff8106d275>] irq_exit+0x105/0x110
[158109.253555] [<ffffffff81737b15>] smp_apic_timer_interrupt+0x45/0x60
[158109.266647] [<ffffffff8173649d>] apic_timer_interrupt+0x6d/0x80
[158109.279320] <EOI>
[158109.281647] [<ffffffff815d65b2>] ?
cpuidle_enter_state+0x52/0xc0
[158109.300117] [<ffffffff815d66d9>] cpuidle_idle_call+0xb9/0x1f0
[158109.312100] [<ffffffff8101d3ee>] arch_cpu_idle+0xe/0x30
[158109.323777] [<ffffffff810bf475>] cpu_startup_entry+0xc5/0x290
[158109.335775] [<ffffffff810415ed>] start_secondary+0x21d/0x2d0
#原生字節(instruction stream),反匯編時才實用
[158109.347654] Code: 89 e5 41 56 41 89 d6 41 55 41 54 49 89 fc 53 48 8b 17 48 85 d2 74 55 49 89 f5 0f 1f 44 00 00 49 8b 44 24 08 45 84 f6 48 89 42 08 <48> 89 10 74 08 49 c7 04 24 00 00 00 00 41 f6 44 24 18 01 48 b8
#Reprint of instruction pointer, current function, and stack pointer
[158109.386072] RIP [<ffffffff810756a4>] detach_if_pending+0x34/0xb0
[158109.398404] RSP <ffff887f7f083d10>
使用上面的內核及RIP寄存器信息找到相關代碼:
addr2line -e ddeb/vmlinux-3.13.0-74-generic 0xffffffff810756a4
linux-3.13.0/include/linux/list.h:89
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
next->prev = prev;
prev->next = next; <<=== HERE
}
上面的高層C代碼看不出什么東西。我們繼續去ELF文件(vmlinux或者System.map)中通過符號找到相應的匯編代碼:
% objdump -d -l ddeb/vmlinux-3.13.0-74-generic 0xffffffff810756a4
[...]
ffffffff81075670 <detach_if_pending>:
[...]
detach_timer():
/build/linux-_xRakU/linux-3.13.0/kernel/timer.c:662
ffffffff81075698: 49 8b 44 24 08 mov 0x8(%r12),%rax
/build/linux-_xRakU/linux-3.13.0/kernel/timer.c:663
ffffffff8107569d: 45 84 f6 test %r14b,%r14b
__list_del():
/build/linux-_xRakU/linux-3.13.0/include/linux/list.h:88
ffffffff810756a0: 48 89 42 08 mov %rax,0x8(%rdx)
/build/linux-_xRakU/linux-3.13.0/include/linux/list.h:89
ffffffff810756a4: 48 89 10 mov %rdx,(%rax)
deatch_if_pending -> detach_timer -> __list_del之間發生了嵌套調用,它是造成panic的根原因。
static int detach_if_pending(struct timer_list *timer, struct tvec_base *base,
bool clear_pending)
{
if (!timer_pending(timer))
return 0;
detach_timer(timer, clear_pending); <== HERE
...
static inline int timer_pending(const struct timer_list * timer)
{
return timer->entry.next != NULL;
}
結合堆棧信息查看代碼。然后依據一些得到的大致的字眼搜索git log看bug是否已經被fix,
附錄五,內核調試舉例二。內存篇
[3387282.901263] ceph-osd: page allocation failure: order:2, mode:0x4020
[3387282.901271] Pid: 10125, comm: ceph-osd Tainted: G C 3.2.0-51-generic #77-Ubuntu
#堆棧說明錯誤並非開始想象的是由ceph-osd造成的, 而是一個網絡設備在分配接收緩存
#上面的order:2說明在分配2的2次方的pages(共16K bytes),為mtu=9000大幀分配的。可是找不着連續的16K的內存了。
[3387282.901274] Call Trace:
[3387282.901277] <IRQ> [<ffffffff8111e9a6>] warn_alloc_failed+0xf6/0x150
[3387282.901294] [<ffffffff815349ac>] ? sk_reset_timer+0x1c/0x30
[3387282.901301] [<ffffffff81599773>] ?
tcp_send_delayed_ack+0xe3/0xf0
[3387282.901308] [<ffffffff8158d3c0>] ? __tcp_ack_snd_check+0x70/0xa0
[3387282.901314] [<ffffffff81122737>] __alloc_pages_nodemask+0x6d7/0x8f0
[3387282.901320] [<ffffffff8159d7bf>] ? tcp_v4_do_rcv+0xff/0x1d0
[3387282.901330] [<ffffffff8164bf15>] kmalloc_large_node+0x57/0x85
[3387282.901338] [<ffffffff81167bb5>] __kmalloc_node_track_caller+0x195/0x1e0
[3387282.901344] [<ffffffff81538a4b>] ?
__alloc_skb+0x4b/0x240
[3387282.901349] [<ffffffff815390c4>] ? __netdev_alloc_skb+0x24/0x50
[3387282.901354] [<ffffffff81538a78>] __alloc_skb+0x78/0x240
[3387282.901359] [<ffffffff815390c4>] __netdev_alloc_skb+0x24/0x50
[3387282.901373] [<ffffffffa00a8909>] ixgbe_alloc_rx_buffers+0x289/0x350 [ixgbe]
[3387282.901380] [<ffffffff81546fc0>] ?
napi_skb_finish+0x50/0x70
[3387282.901385] [<ffffffff815475f5>] ? napi_gro_receive+0xf5/0x140
[3387282.901393] [<ffffffffa00a91bb>] ixgbe_clean_rx_irq+0x7eb/0x8a0 [ixgbe]
[3387282.901401] [<ffffffffa00a99ee>] ixgbe_poll+0xae/0x1a0 [ixgbe]
[3387282.901406] [<ffffffff81547844>] net_rx_action+0x134/0x290
[3387282.901412] [<ffffffff8115d753>] ? isolate_migratepages+0x333/0x660
[3387282.901418] [<ffffffff8106f9e8>] __do_softirq+0xa8/0x210
[3387282.901425] [<ffffffff816606be>] ?
_raw_spin_lock+0xe/0x20
[3387282.901432] [<ffffffff8166af6c>] call_softirq+0x1c/0x30
[3387282.901439] [<ffffffff810162f5>] do_softirq+0x65/0xa0
[3387282.901444] [<ffffffff8106fdce>] irq_exit+0x8e/0xb0
[3387282.901450] [<ffffffff8166b833>] do_IRQ+0x63/0xe0
[3387282.901455] [<ffffffff81660b6e>] common_interrupt+0x6e/0x6e
[3387282.901458] <EOI> [<ffffffff8115d753>] ? isolate_migratepages+0x333/0x660
[3387282.901467] [<ffffffff8115d74d>] ?
isolate_migratepages+0x32d/0x660
[3387282.901472] [<ffffffff8115dadf>] compact_zone.part.14+0x5f/0x270
[3387282.901478] [<ffffffff8115ddd7>] compact_zone+0x37/0x50
[3387282.901482] [<ffffffff8115df63>] compact_zone_order+0x83/0xb0
[3387282.901488] [<ffffffff8115e05d>] try_to_compact_pages+0xcd/0x100
[3387282.901494] [<ffffffff8164b17e>] __alloc_pages_direct_compact+0xb2/0x178
[3387282.901500] [<ffffffff81122595>] __alloc_pages_nodemask+0x535/0x8f0
[3387282.901508] [<ffffffff8164bf15>] kmalloc_large_node+0x57/0x85
[3387282.901514] [<ffffffff81167bb5>] __kmalloc_node_track_caller+0x195/0x1e0
[3387282.901520] [<ffffffff81538a4b>] ? __alloc_skb+0x4b/0x240
[3387282.901526] [<ffffffff81589034>] ?
sk_stream_alloc_skb+0x44/0x120
[3387282.901531] [<ffffffff81538a78>] __alloc_skb+0x78/0x240
[3387282.901536] [<ffffffff81589034>] sk_stream_alloc_skb+0x44/0x120
[3387282.901541] [<ffffffff81589518>] tcp_sendmsg+0x408/0xd90
[3387282.901548] [<ffffffff815af564>] inet_sendmsg+0x64/0xb0
[3387282.901554] [<ffffffff81057d15>] ? reweight_entity+0x165/0x180
[3387282.901562] [<ffffffff812d9837>] ? apparmor_socket_sendmsg+0x17/0x20
[3387282.901569] [<ffffffff8152e49e>] sock_sendmsg+0x10e/0x130
[3387282.901574] [<ffffffff8105725d>] ?
set_next_entity+0xad/0xd0
[3387282.901580] [<ffffffff810573fa>] ? finish_task_switch+0x4a/0xf0
[3387282.901586] [<ffffffff8165e14c>] ? __schedule+0x3cc/0x6f0
[3387282.901591] [<ffffffff8165e79f>] ? schedule+0x3f/0x60
[3387282.901596] [<ffffffff8153c766>] ? verify_iovec+0x56/0xd0
[3387282.901602] [<ffffffff81530076>] ___sys_sendmsg+0x396/0x3b0
[3387282.901609] [<ffffffff8109fd16>] ? get_futex_key+0x166/0x2d0
[3387282.901614] [<ffffffff816606be>] ?
_raw_spin_lock+0xe/0x20
[3387282.901619] [<ffffffff810a02f3>] ?
futex_wake+0x113/0x130
[3387282.901624] [<ffffffff8109ff81>] ?
futex_wait+0x1/0x210
[3387282.901630] [<ffffffff81532029>] __sys_sendmsg+0x49/0x90
[3387282.901636] [<ffffffff81532089>] sys_sendmsg+0x19/0x20
[3387282.901642] [<ffffffff81668d02>] system_call_fastpath+0x16/0x1b
#NUMA節點的相關信息,用途不大,>=kernel4.1取消了這部分信息。
#DMA, 為ISA設備保留的,低於16MB的物理地址
#DMA32, 為32位的pci設備保留的,低於4GB的物理地址
#Normal, x86_64,所以保留的內存,i686是(16MB -> 896MB)
#HighMem, 對於i686為>896MB以上的內存,須要物理的MMU映射才干訪問
#除上面4個zone外的其它zone如active_anon,略。
[3387282.901645] Mem-Info:
[3387282.901647] Node 0 DMA per-cpu:
[3387282.901651] CPU 0: hi: 0, btch: 1 usd: 0
[3387282.901654] CPU 1: hi: 0, btch: 1 usd: 0
[3387282.901657] CPU 2: hi: 0, btch: 1 usd: 0
[3387282.901660] CPU 3: hi: 0, btch: 1 usd: 0
[3387282.901663] CPU 4: hi: 0, btch: 1 usd: 0
[3387282.901666] CPU 5: hi: 0, btch: 1 usd: 0
[3387282.901669] CPU 6: hi: 0, btch: 1 usd: 0
[3387282.901672] CPU 7: hi: 0, btch: 1 usd: 0
[3387282.901675] CPU 8: hi: 0, btch: 1 usd: 0
[3387282.901677] CPU 9: hi: 0, btch: 1 usd: 0
[3387282.901680] CPU 10: hi: 0, btch: 1 usd: 0
[3387282.901683] CPU 11: hi: 0, btch: 1 usd: 0
[3387282.901686] CPU 12: hi: 0, btch: 1 usd: 0
[3387282.901689] CPU 13: hi: 0, btch: 1 usd: 0
[3387282.901692] CPU 14: hi: 0, btch: 1 usd: 0
[3387282.901695] CPU 15: hi: 0, btch: 1 usd: 0
[3387282.901697] Node 0 DMA32 per-cpu:
[3387282.901701] CPU 0: hi: 186, btch: 31 usd: 86
[3387282.901704] CPU 1: hi: 186, btch: 31 usd: 0
[3387282.901707] CPU 2: hi: 186, btch: 31 usd: 0
[3387282.901710] CPU 3: hi: 186, btch: 31 usd: 0
[3387282.901712] CPU 4: hi: 186, btch: 31 usd: 0
[3387282.901715] CPU 5: hi: 186, btch: 31 usd: 96
[3387282.901718] CPU 6: hi: 186, btch: 31 usd: 0
[3387282.901721] CPU 7: hi: 186, btch: 31 usd: 16
[3387282.901724] CPU 8: hi: 186, btch: 31 usd: 0
[3387282.901727] CPU 9: hi: 186, btch: 31 usd: 0
[3387282.901730] CPU 10: hi: 186, btch: 31 usd: 0
[3387282.901732] CPU 11: hi: 186, btch: 31 usd: 0
[3387282.901735] CPU 12: hi: 186, btch: 31 usd: 0
[3387282.901738] CPU 13: hi: 186, btch: 31 usd: 78
[3387282.901741] CPU 14: hi: 186, btch: 31 usd: 0
[3387282.901744] CPU 15: hi: 186, btch: 31 usd: 0
[3387282.901746] Node 0 Normal per-cpu:
[3387282.901750] CPU 0: hi: 186, btch: 31 usd: 162
[3387282.901753] CPU 1: hi: 186, btch: 31 usd: 29
[3387282.901756] CPU 2: hi: 186, btch: 31 usd: 40
[3387282.901759] CPU 3: hi: 186, btch: 31 usd: 42
[3387282.901762] CPU 4: hi: 186, btch: 31 usd: 42
[3387282.901765] CPU 5: hi: 186, btch: 31 usd: 221
[3387282.901768] CPU 6: hi: 186, btch: 31 usd: 37
[3387282.901771] CPU 7: hi: 186, btch: 31 usd: 182
[3387282.901774] CPU 8: hi: 186, btch: 31 usd: 0
[3387282.901777] CPU 9: hi: 186, btch: 31 usd: 0
[3387282.901780] CPU 10: hi: 186, btch: 31 usd: 29
[3387282.901783] CPU 11: hi: 186, btch: 31 usd: 22
[3387282.901786] CPU 12: hi: 186, btch: 31 usd: 0
[3387282.901789] CPU 13: hi: 186, btch: 31 usd: 156
[3387282.901792] CPU 14: hi: 186, btch: 31 usd: 6
[3387282.901795] CPU 15: hi: 186, btch: 31 usd: 0
[3387282.901802] active_anon:277242 inactive_anon:22700 isolated_anon:0
[3387282.901804] active_file:5468942 inactive_file:9468439 isolated_file:0
[3387282.901805] unevictable:0 dirty:95 writeback:0 unstable:0
[3387282.901807] free:103654 slab_reclaimable:700786 slab_unreclaimable:89064
[3387282.901808] mapped:3932 shmem:22 pagetables:3338 bounce:0
#系統的靜態統計信息(/proc/vmstat, /proc/zoneinfo)。假設free和slab_reclaimable很低。說明物理內存不夠了。
[3387282.901811] Node 0 DMA free:15896kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15640kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[3387282.901825] lowmem_reserve[]: 0 1936 64432 64432
#free數據是大的,說明問題不是物理內存不夠造成的
[3387282.901830] Node 0 DMA32 free:250560kB min:2028kB low:2532kB high:3040kB active_anon:12kB inactive_anon:116kB active_file:31272kB inactive_file:276136kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1982592kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:846404kB slab_unreclaimable:144884kB kernel_stack:3696kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[3387282.901845] lowmem_reserve[]: 0 0 62496 62496
[3387282.901850] Node 0 Normal free:148160kB min:65536kB low:81920kB high:98304kB active_anon:1108956kB inactive_anon:90684kB active_file:21844496kB inactive_file:37597620kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:63995904kB mlocked:0kB dirty:380kB writeback:0kB mapped:15724kB shmem:88kB slab_reclaimable:1956740kB slab_unreclaimable:211372kB kernel_stack:18960kB pagetables:13352kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no
[3387282.901864] lowmem_reserve[]: 0 0 0 0
[3387282.901869] Node 0 DMA: 0*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB
[3387282.901883] Node 0 DMA32: 2010*4kB 2207*8kB 4168*16kB 2405*32kB 949*64kB 124*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 250560kB
#里面的0*16KB說明僅僅有0個16KB的內存了,顯然問題就發生了。
[3387282.901897] Node 0 Normal: 36611*4kB 16*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 150668kB
[3387282.901917] 14937512 total pagecache pages
[3387282.901920] 8 pages in swap cache
[3387282.901923] Swap cache stats: add 250, delete 242, find 465/466
[3387282.901925] Free swap = 3902516kB
[3387282.901927] Total swap = 3903484kB