kernel exception時打印出的ESR相關信息


kernel exception時打印出的ESR相關信息

 

<1>[ 7766.006249] Unhandled fault at 0xffffff800188d408
<1>[ 7766.006256] Mem abort info:
<1>[ 7766.006259]   ESR = 0x86000003
<1>[ 7766.006264]   Exception class = IABT (current EL), IL = 32 bits
<1>[ 7766.006268]   SET = 0, FnV = 0
<1>[ 7766.006271]   EA = 0, S1PTW = 0
<1>[ 7766.006277] swapper pgtable: 4k pages, 39-bit VAs, pgdp = 00000000352033d5
<1>[ 7766.006281] [ffffff800188d408] pgd=000000009d7fe003, pud=000000009d7fe003, pmd=00000000625c6003, pte=0040080063544793
<0>[ 7766.006294] Internal error: level 3 address size fault: 86000003 [#1] PREEMPT SMP

 

ESR相關信息說明

上述kernel exception時打印出的ESR(Exception Syndrome Register (EL1))值為0x86000003,看下ESR_EL1 register bit assignment:

ESR_EL1是一個64bit register,先要看EC(exception class) field,這個field是在這個register的bit[31:26],占6bit。

ISS依EC不同而有不同的含義。

 

此實例中EC值是0x21(0b100001),查看EC值解釋表,可以得知0b100001是instruction abort,然后查看instruction abort對應的ISS

EC Meaning ISS Applies when
0b000000

Unknown reason.

ISS encoding for exceptions with an unknown reason
0b000001

Trapped WF* instruction execution.

Conditional WF* instructions that fail their condition code check do not cause an exception.

ISS encoding for an exception from a WF* instruction
0b100001

Instruction Abort taken without a change in Exception level.

Used for MMU faults generated by instruction accesses and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug-related exceptions.

ISS encoding for an exception from an Instruction Abort

 

主要看IFSC bit field,這個bit field值的含義說明在如下的table里,在本實例中,IFSC bit field的值是3,所以是“Address size fault, level 3”

ISS encoding for an exception from an Instruction Abort

24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RES0 SET FnV EA RES0 S1PTW RES0 IFSC

IFSC, bits [5:0]

Instruction Fault Status Code.

IFSC Meaning Applies when
0b000000

Address size fault, level 0 of translation or translation table base register.

0b000001

Address size fault, level 1.

0b000010

Address size fault, level 2.

0b000011

Address size fault, level 3.

0b000100

Translation fault, level 0.

0b000101

Translation fault, level 1.

 

其打印出來的IL = 32bits表示的是instruction length是32bit,即一條指令長度是4 byte

 

ESR_EL1 register具體說明見如下鏈接:

https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1-?lang=en#fieldset_0-24_0_14-5_0

 

kernel exception是會打印出當前fault address對應的PGD/PUD/PMD/PTE

<1>[ 7766.006281] [ffffff800188d408] pgd=000000009d7fe003, pud=000000009d7fe003, pmd=00000000625c6003, pte=0040080063544793

 

pgd= 000000009d7fe003,
pud= 000000009d7fe003,
pmd=00000000625c6003,
pte=  0040080063544793

此kernel exception(KE)是發生在一台2G DRAM的ARM64機器上,所以看起來PGD/PUD/PMD page table descriptor的值是正常的。而PTE page table descriptor的值有問題,它所表示的物理地址是0x80063544000,對於2G DRAM的機器,物理地址應該要小於0xFFFFFFFF。

 

kernel oops log里的Code行log

[  794.274311] Code: f946a2c9 12001eea 0b350157 9b1b2789 (39402529) 

kernel里發生oops,比如data abort、instruction abort,此時會將哪一條指令觸發的data abort、instruction abort以及其前面的幾條打印出來,根據這條指令,可以定位出對應source code位置。

比如是在某個ko里某一個函數里發生的oops,則根據這個函數的反匯編代碼,在里面搜索39402529,這條指令以及其前面幾條如下,所以直接用39402529指令前的地址來執行llvm-symbolizer即可定位出對應source code位置:

llvm-symbolizer -e xxx.ko 0x39402529

227c7c: 12001eea and w10, w23, #0xff
227c80: 0b350157 add w23, w10, w21, uxtb
227c84: 9b1b2789 madd x9, x28, x27, x9
227c88: 39402529 ldrb w9, [x9,#9]

 

在這之前,可以根據PC所指向的函數的大小,和你反匯編出來的這個函數的匯編代碼大小相比較,如果相等,可以確認這個ko或者vmlinux和發生此問題的image是相匹配的,比如如下PC所指向的函數的大小是0xb10:

[  794.235944] XXX_OSD_WindowDestroy+0xb0/0xb10 [xxx.ko]

 

在反匯編出來的函數里搜索導致問題的instruction時,有可能搜到的不止一條,此時可能需要分析對應的匯編指令來確定是哪一條,或者在確認PC所指向的函數所說明的size和反匯編出來的這個函數的大小是一樣的情況下,用這個函數的基地址加上offset,根據相加結果來定位對應的source code位置,比如上述PC所指向的位置在XXX_OSD_WindowDestroy()里的offset是0xb1

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM