intel:x86架構VT虛擬化(三):x64多核代碼介紹


    一般而言,我們做windows內核和VT測試,都是在自己的物理機裝vmware或virtualBox虛擬機,再在虛擬機裝windows,然后在物理機裝windbg鏈接到虛擬機,通過windbg調試虛擬機的windwos內核;如果是VT測試,就要開啟虛擬機的VT,這就涉及到VT嵌套了,整體架構如下:

       

  L0 = Code that runs on a physical host. Runs a hypervisor;

  L1 = L0’s hypervisor guest. Runs the hypervisor we want to debug;

  L2 = L1’s hypervisor guest;

  從上面的圖能看到用windbg既能調試guestOS,也能調試hostOS的代碼;

 

  上一篇文章用了周壑的VT框架,其優點是代碼簡潔、框架邏輯明晰,適合初學入門;缺點是僅限於32位,無法跑在64位,並且還是單核;這次推薦另一個框架,github地址:https://github.com/zhuhuibeishadiao ,里面有miniVT64和PFHook兩個工程,建議先從miniVT64入手,原因同樣是邏輯簡單,代碼少,易入門;  

   1、第一次跑代碼的時候就藍屏報錯,從windbg看到錯誤類型: 常見的C0000005,access violation,也就是內存無法訪問;

     

  執行出錯的代碼:invvpid

     

 為了徹底了解出錯原因並修復bug,這里簡單介紹一下invvpid這個指令的作用,核心要點如下:

 (1)Intel的VPID(Virtual-Processor Identifier)是一個16位的域,每個TLB表項與一個VPID相關聯,用於唯一標識一個VCPU;

   (2)當進行虛擬地址到物理地址轉換的時候,只有一個TLB表項對應的VPID與當前正在運行的虛擬機的VCPU的VPID相同的時候,才可以用該TLB表項把虛擬地址轉換為物理地址

   (3)利用VPID可以區分一個TLB表項屬於哪個VCPU,從而在虛擬機切換的時候可以保留TLB中已經有的表項,減少了無用的TLB刷新;

   (4)invvpid指令第二個參數叫descriptor,結構如下:一共128bit,0-15就是VPID號,64-127是緩存的線性地址,可有效減少CPU轉換地址時讀內存的次數,提升程序運行效率;

    

     回到這個bug本身:函數有兩個參數,分別是rcx和rdx。看了一下出錯當時的上下文,發現rcx=2,意味着invalidate掉所有VPID(除了000H)對應的虛擬地址翻譯;從access violation的提示看,應該是第二個descriptor參數出錯了:這里訪問了內存;

      

  回到windbg,把dq讀取一下rax地址對應的內容,發現沒任何問題;這就奇怪了:能讀取到內存特定地址的內容,但是windbg又報access violation的錯,這是怎么回事了?繼續看https://www.felixcloutier.com/x86/invvpid 的指令介紹,發現一條重要信息:在訪問內存時發生缺頁會導致異常,這就能解釋這條指令為什么執行失敗了。

 

  執行invvpid時已經開啟了VMX,此時已經進入hostOS。但目前的hostOS剛開始運行,什么代碼都沒有:VMCS還未設置,段寄存器、控制寄存器、GDT/IDT都沒設置,屬於”一窮二白“的階段,此時若發生缺頁異常,去哪找回缺失的頁都不知道,只能宕機;所以invvpid的第二個參數必須要用非分頁內存,確保不會被交換到磁盤;

   

   改進后的代碼:分配一個128bit = 16byte的非分頁內存,再作為descriptor傳入:

       

  即使進入host,分配內存、轉成物理地址(再直白一點:還要依靠guestOS維護的頁表才能把虛擬地址轉成物理地址)等都要依靠guestOS的API,host此時還只是個空架子;

  

 2、正當愉快地單步時,另一個問題接踵而至:出異常的代碼時xsaves [rcx];

  

  出錯時的調用堆棧:

     

  驅動里面的出錯代碼:

    

    這次的異常代碼是在swapcontext,應該是在切換線程時出錯的;老辦法,先查查這個條指令的作用:https://www.felixcloutier.com/x86/xsaves

 “Performs a full or partial save of processor state components to the XSAVE area located at the memory address specified by the destination operand”: 就是保存處理器的各種狀態到指令指定的內存模塊;這里指定的內存在[rcx],先看看這塊內存是不是讀寫出錯了:從結果來看,這塊內存區域是沒問題的;

kd> dq ffffd40acb595cc0
ffffd40a`cb595cc0  00000000`00000000 00000000`00000000
ffffd40a`cb595cd0  00000000`00000000 00000000`00001f80
ffffd40a`cb595ce0  00000000`00000000 00000000`00000000
ffffd40a`cb595cf0  00000000`00000000 00000000`00000000
ffffd40a`cb595d00  00000000`00000000 00000000`00000000
ffffd40a`cb595d10  00000000`00000000 00000000`00000000
ffffd40a`cb595d20  00000000`00000000 00000000`00000000
ffffd40a`cb595d30  00000000`00000000 00000000`00000000
kd> r cr3
cr3=00000000001aa000
kd> !vtop 00000000001aa000 fffff800bf80734c
Amd64VtoP: Virt fffff800bf80734c, pagedir 00000000001aa000
Amd64VtoP: PML4E 00000000001aaf80
Amd64VtoP: PDPE 0000000001109010
Amd64VtoP: PDE 000000000110afe0
Amd64VtoP: PTE 0000000001095038
Amd64VtoP: Mapped phys 000000000220734c
Virtual address fffff800bf80734c translates to physical address 220734c.

  從日志看:是進入guestOS后才產生的異常,既然是這里產生的,很有可能是xsaves產生了vmexit,但是hostOS並未正常handle;

kd> g
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x481
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00000016
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000003f
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x483
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00036dff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x003fffff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x484
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x000011ff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000f3ff
FGP [VT] : [#0][IRQL=0x2](Virtualize): CPU: 0xFFFF8C02E51DFF60 
FGP [VT] : [#0][IRQL=0x2](Virtualize): rsp: 0xffffd40acb595ac8 
FGP [VT] : [#0][IRQL=0x2](ResumeGuest): Resuming guest...

  繼續看intel手冊的說明,從 “Table 24-7. Definitions of Secondary Processor-Based VM-Execution Controls” 發現如下關鍵信息:

    

   如果第20位設置為0,任何執行xsaves的指令都會導致#UD(undefined);

        回到setupvmcs函數,vmwrite的時候把這位設置為1即可:

 

   3、繼續運行時,又遇到bug,日志如下:

kd> g
FGP [VT] : [#0][IRQL=0x0](DriverEntry): Dirver is StartFGP [VT] : [#0][IRQL=0x0](DriverEntry): Dirver is Start
FGP [VT] : [#0][IRQL=0x0](VtStart): virtualizing 1 processors ...
FGP [VT] : [#0][IRQL=0x0](VtStart): Allocated g_cpus array @ 0xffff8c02e3f85370, size=0x8
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMXON region size: 0x0
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMX revision ID: 0x1
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMON 內存虛擬地址 ffffa20189ea0000
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMON 物理地址 7c0e8000
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMCS 內存虛擬地址 ffffa20189ea6000
FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMCS 物理地址 7c085000
FGP [VT] : [#0][IRQL=0x2](SetupVMCS): GuestRsp=FFFFD40ACA21BB28
FGP [VT] : [#0][IRQL=0x2](SetupVMCS): VMCS PHYSICAL_ADDRESS 7c085000
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x481
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00000016
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000003f
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x483
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00036dff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x003fffff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x484
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x000011ff
FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000f3ff
FGP [VT] : [#0][IRQL=0x2](Virtualize): CPU: 0xFFFF8C02E2CE7F60 
FGP [VT] : [#0][IRQL=0x2](Virtualize): rsp: 0xffffd40aca21bac8 
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 16
FGP [VT] : [#0][IRQL=0x2](HandleRdtsc): vmx: HandleRdtsc(): rax = 0x0, rdx = 0x80000003
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 28
FGP [VT] : [#0][IRQL=0x2](HandleCrAccess): HandleCrAccess: pExitQualification->ControlRegister = 3
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 10
FGP [VT] : [#0][IRQL=0x2](HandleCpuid): vmx: HandleCpuid(): guest_rip = 0xfffff800bf62a4b4
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 10
FGP [VT] : [#0][IRQL=0x2](HandleCpuid): vmx: HandleCpuid(): guest_rip = 0xfffff800bf63bb39
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 10
FGP [VT] : [#0][IRQL=0x2](HandleCpuid): vmx: HandleCpuid(): guest_rip = 0xfffff800bf63bada
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31
FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x1b, Msr.LowPart = 0xfee00d00, Msr.HighPart = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31
FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x1b, Msr.LowPart = 0xfee00d00, Msr.HighPart = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31
FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x1b, Msr.LowPart = 0xfee00d00, Msr.HighPart = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31
FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x40000105, Msr.LowPart = 0x0, Msr.HighPart = 0x80000000
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000100, rax = 0x7f, rdx = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000101, rax = 0x8, rdx = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000102, rax = 0xc184de70, rdx = 0xfffff800
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000103, rax = 0x10001f, rdx = 0x0
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000104, rax = 0xbfe2bc98, rdx = 0xfffff800
FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32
FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000105, rax = 0x0, rdx = 0x80000000

  這次虛擬機卡死,點擊鼠標沒任何反應;wingbd顯示running,但斷不下來,感覺也是卡死狀態;從最后一行日志看,guestOS正在往0x40000105號MAR寄存器寫數據,遂google一番,找到了部分原因(https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/hyperv.txt;hb=master):

  “write to HV_X64_MSR_CRASH_CTL causes guest to shutdown. This effectively blocks crash dump generation by Windows”

  往MSR的0x40000105寄存器寫會導致guest shutdown,現在需要進一步排查是什么原因使得guestOS往MSR的0x40000105寫數據!在逐行調試代碼、對比其他VT框架后,終於發現了這份miniVT代碼的坑:

  • 進入VMM后沒有關中斷,這時如果被打斷,因為VMCS已經設置了HOST_IDTR_BASE(用的還是guestOS的中斷向量),所以會跳轉到終端歷程這里,此時會打亂堆棧的平衡,導致棧上保存的寄存器值錯亂;
  • 在棧中保存guestOS的寄存器上下文,rsp未正確保存;

       

   換成PFHook的寫法后正常了;

   4、(1)多核關鍵代碼:遍歷每個核,每個核單獨設置所需內存,不同核千萬不能共享同一塊保存數據的內存

NTSTATUS StartVirtualTechnology()
{
    Asm_int3();
    KeInitializeMutex(&g_GlobalMutex,0);//初始化互斥體
    KeWaitForMutexObject(&g_GlobalMutex,Executive,KernelMode,FALSE,0);
    g_Pml4 = EptInitialization();

    for (int i = 0;i<KeNumberProcessors;i++)
    {
        KeSetSystemAffinityThread((KAFFINITY)(1 << i));//指定哪個CPU運行當前線程的代碼

        SetupVT(); // 設置VT,每個核單獨分配VMXON和VMCS區域需要的內存,不同核千萬不能共享同一塊內存,否則藍屏死機

        KeRevertToUserAffinityThread();//恢復到原來正在跑的線程
    }

    KeReleaseMutex(&g_GlobalMutex, FALSE);

    KdPrint(("VT Engine has been loaded!\n"));

    return STATUS_SUCCESS;
}

  (2)設置VMCS需要注意的點:vmlaunch后進入guestOS運行,但是這里的目的是調試,不需要額外運行任何代碼直接回到下面的push EntryRflags,以guestOS的身份繼續運行,驅動才能加載完成

            這里保存通用寄存器都沒用棧,而是在數據段單獨開辟的空間,避免了guestRSP被改動核破壞;

Asm_RunToVMCS Proc
    mov rax,[rsp]
    mov GuestReturn,rax ;獲取返回地址,讓vmlaunch后客戶機繼續執行驅動加載的代碼,驅動才能加載完成
    
    call SetupVMCS    ;這個函數填充VMCS結構體,然后直接vmlaunch,隨后繼續回到Asm_SetupVMCS的push EntryRflags代碼執行(這時已guestOS身份執行)
    ret
Asm_RunToVMCS Endp

Asm_SetupVMCS Proc        ;在SetupVT中最先被調用
    cli                    ;關中斷,避免被打斷產生函數調用,棧被破壞
    mov GuestRSP,rsp    ;vmlaunch后rsp從這里開始讀數據
    
    mov EntryRAX,rax    ;設置VMCS結構體在函數中,會改變寄存器的值,這里先保存好。因為棧會變動,所以這里不用棧,而是在數據段保存
    mov EntryRCX,rcx
    mov EntryRDX,rdx
    mov EntryRBX,rbx
    mov EntryRSP,rsp
    mov EntryEBP,rbp
    mov EntryESI,rsi
    mov EntryRDI,rdi
    mov EntryR8,r8
    mov EntryR9,r9
    mov EntryR10,r10
    mov EntryR11,r11
    mov EntryR12,r12
    mov EntryR13,r13
    mov EntryR14,r14
    mov EntryR15,r15
    
    pushfq
    pop EntryRflags
    
    call Asm_RunToVMCS    ;從上面繞一圈,打個岔,目的是保存下一行代碼的地址,vmlanuch后guest繼續從這里開始執行
    
    push EntryRflags    ;看上面,這行代碼的地址會賦給GuestReturn,vmlanuch后guest繼續從這里開始執行
    popfq
    mov rax,EntryRAX    ;恢復寄存器的值
    mov rcx,EntryRCX
    mov rdx,EntryRDX
    mov rbx,EntryRBX
    mov rsp,EntryRSP
    mov rbp,EntryEBP
    mov rsi,EntryESI
    mov rdi,EntryRDI
    mov r8,EntryR8
    mov r9,EntryR9
    mov r10,EntryR10
    mov r11,EntryR11
    mov r12,EntryR12
    mov r13,EntryR13
    mov r14,EntryR14
    mov r15,EntryR15
    
    mov rsp,GuestRSP
    sti
    ret
Asm_SetupVMCS Endp

  (3)https://github.com/zhuhuibeishadiao 這里有完整的代碼,建議先看看miniVT64,調試調試,熟悉代碼框架和流程后繼續調試PFHook

 

   經驗總結:

   1、剛開始調試時建議把虛擬機改成單處理和單核,否則多核CPU同時運行,會執行不同的代碼,調試時感覺到處跳躍,不按順序執行。

   

   2、DbgPrint不要打印太多,比如在msr讀寫的時候打印,會造成日志刷屏,虛擬機卡死的假象(實際上windbg還能斷下,說明並未死機)

參考:1、https://github.com/zhuhuibeishadiao  miniVT和PF_HOOk代碼

      2、https://github.com/calware/HV-Playground  匯集了各個VT框架

      3、https://www.felixcloutier.com/x86/invvpid  invvpid指令介紹

      4、https://msrc-blog.microsoft.com/2018/12/10/first-steps-in-hyper-v-research/   First Steps in Hyper-V Research


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM