本文記錄解答MIT 6.828 Lab 1 Exercise 10時遇到的一個Bug。
問題描述
在i386_init入口處設置斷點並運行,發現執行memset(edata, 0, end - edata);
時,QEMU窗口會打印以下日志並卡住,GDB窗口會異常結束。這是什么原因?
代碼如下所示:
void i386_init(void)
{
extern char edata[], end[];
// Before doing anything else, complete the ELF loading process.
// Clear the uninitialized global data (BSS) section of our program.
// This ensures that all static/global variables start out zero.
memset(edata, 0, end - edata);
// Initialize the console.
// Can't call cprintf until after we do this!
cons_init();
cprintf("6828 decimal is %o octal!\n", 6828);
// Test the stack backtrace function (lab 1 only)
test_backtrace(5);
// Drop into the kernel monitor.
while (1)
monitor(NULL);
}
QEMU窗口打印的錯誤日志:
EAX=00000000 EBX=00000000 ECX=000001a9 EDX=00000000
ESI=00000000 EDI=f0113000 EBP=f010ffd8 ESP=f010ffcc
EIP=f010171b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 00007c4c 00000017
IDT= 00000000 000003ff
CR0=80010011 CR2=00000040 CR3=00112000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
Triple fault. Halting for inspection via QEMU monitor.
GDB窗口打印的錯誤日志:
Program received signal SIGTRAP, Trace/breakpoint trap.
The target architecture is assumed to be i386
=> 0xf010171b <memset+73>: Error while running hook_stop:
Cannot access memory at address 0xf010171b
0xf010171b in memset (
v=<error reading variable: Cannot access memory at address 0xf010ffd0>,
c=<error reading variable: Cannot access memory at address 0xf010ffd4>,
n=<error reading variable: Cannot access memory at address 0xf010ffd8>) at lib/string.c:131
1: $ebp = (void *) 0xf010ffd8
2: $esp = (void *) 0xf010ffcc
3: /x $eax = 0x0
4: /x $ebx = 0x0
5: $ecx = 488
6: $edx = 0
8: /x $edi = 0xf0112f04
9: /x $esi = 0x0
10: *0xf0111300@10 = <error: Cannot access memory at address 0xf0111300>
11: *0xf0112f00@10 = <error: Cannot access memory at address 0xf0112f00>
12: *0xf01136a0@10 = <error: Cannot access memory at address 0xf01136a0> asm volatile("cld; rep stosl\n"
定位過程
-
memset的匯編實現中是重復執行stosl命令,將0依次傳到0xf0111300~0xf01136a4這段內存空間,每次傳4字節,共需重復2281次。調試中發現,當執行到第2281-488=1793次時,也就是將0傳給0xf0112f04這個地址時系統就報錯了。
-
從官方地址上下載一份干凈的代碼重新編譯執行,發現同樣在memset會崩潰,但我記得很早以前第一次下載代碼來運行時是正常的,很奇怪。
-
注釋掉memset這一行,發現可以繼續運行,但跑到monitor時會在QEMU窗口不斷打印亂碼與"unknown command."信息。使用gdb逐步執行時發現是readline時用戶根本沒輸入但依然能讀到數據,顯示出來是亂碼,因此解析輸入內容時會報“Unknown command”。
-
下午使用gdb跟蹤readline及getchar的代碼,最終跟蹤到通過IN指令來獲取輸入數據的地方,但只能觀察到用戶沒輸入IN指令也能返回,確認不了原因。我懷疑是前面注釋了memset語句,導致I/O需要用到的內存空間沒初始化,進而出錯。因此只能繼續定位memset為什么出錯。
-
晚上決定先確認下是否只有0xf0112f04這個地址的初始化才會有問題,於是memset時避開這個地址,發現果然memset可以成功,但跑到monitor時會崩潰。
memset(edata, 0, 0xf0112f04 - edata);
memset(0xf0112f08, 0, end - 0xf0112f08);
- 后來看代碼注釋時,發現memset語句的目的是初始化BSS段。
// Before doing anything else, complete the ELF loading process.
// Clear the uninitialized global data (BSS) section of our program.
// This ensures that all static/global variables start out zero.
memset(edata, 0, end - edata);
通過objdump -h obj/kern/kernel
命令查看發現,bss段的地址范圍是0xf01130600xf01136a4,而我們要memset的地址范圍卻是0xf01113000xf0113604!這樣除了初始化.bss段之外,還會初始化.got,.got.plt,.data.rel.local和.data.rel.ro.local等4個段。
Sections:
Idx Name Size VMA LMA File off Algn
5 .got 00000008 f0111300 00111300 00012300 2**2
CONTENTS, ALLOC, LOAD, DATA
6 .got.plt 0000000c f0111308 00111308 00012308 2**2
CONTENTS, ALLOC, LOAD, DATA
7 .data.rel.local 00001000 f0112000 00112000 00013000 2**12
CONTENTS, ALLOC, LOAD, DATA
8 .data.rel.ro.local 00000044 f0113000 00113000 00014000 2**2
CONTENTS, ALLOC, LOAD, DATA
9 .bss 00000644 f0113060 00113060 00014044 2**5
ALLOC
- 我嘗試將memset的地址范圍改為bss段的地址范圍(0xf0113060~0xf01136a4),結果memset和monitor都正常運行了。先記錄至此,以后再回頭分析一下。