虛擬內存是現代操作系統普遍使用的一種技術。
虛擬內存的基本思想是,每個進程有用獨立的邏輯地址空間,內存被分為大小相等的多個塊,稱為頁(Page)。每個頁都是一段連續的地址。對於進程來看,邏輯上貌似有很多內存空間,其中一部分對應物理內存上的一塊(稱為頁框 page frame,通常頁和頁框大小相等),還有一些沒加載在內存中的對應在硬盤上。通過引入進程的邏輯地址,把進程地址空間與實際存儲空間分離,增加存儲管理的靈活性。
地址空間和存儲空間兩個基本概念的定義如下:
地址空間:將源程序經過編譯后得到的目標程序,存在於它所限定的地址范圍內,這個范圍稱為地址空間。地址空間是邏輯地址的集合。
存儲空間:指主存中一系列存儲信息的物理單元的集合,這些單元的編號稱為物理地址存儲空間是物理地址的集合。
由此衍生出的管理方式有三種:
頁式存儲管理、段式存儲管理和段頁式存儲管理。這里主要介紹頁式存儲。
在頁式系統中進程建立時,操作系統為進程中所有的頁分配頁框。當進程撤銷時收回所有分配給它的頁框。在程序的運行期間,如果允許進程動態地申請空間,操作系統還要為進程申請的空間分配物理頁框。操作系統為了完成這些功能,必須記錄系統內存中實際的頁框使用情況。操作系統還要在進程切換時,正確地切換兩個不同的進程地址空間到物理內存空間的映射。為了理解操作系統如何完成這些需求,我們先理解頁表技術。先看張圖,轉載自51CTO:
頁表中的條目被稱為頁表項(page table entry),一個頁表項負責記錄一段虛擬地址到物理地址的映射關系。
既然頁表是存儲在內存中的,那么程序每次完成一次內存讀取時都至少會訪問內存兩次,相比於不使用MMU(MMU是Memory Management Unit的縮寫,它代表集成在CPU內部的一個硬件邏輯單元,主要作用是給CPU提供從虛擬地址向物理地址轉換的功能,從硬件上給軟件提供一種內存保護的機制)時的一次內存訪問,效率被大大降低了,如果所使用的內存的性能比較差的話,這種效率的降低將會更明顯。因此,如何在發揮MMU優勢的同時使系統消耗盡量減小,就成為了一個亟待解決的問題。
於是,TLB產生了。TLB是什么呢?我們叫它轉換旁路緩沖器,它實際上是MMU中臨時存放轉換數據的一組重定位寄存器。既然TLB本質上是一組寄存器,那么不難理解,相比於訪問內存中的頁表,訪問TLB的速度要快很多。因此如果頁表的內容全部存放於TLB中,就可以解決訪問效率的問題了。
然而,由於制造成本等諸多限制,所有頁表都存儲在TLB中幾乎是不可能的。這樣一來,我們只能通過在有限容量的TLB中存儲一部分最常用的頁表,從而在一定程度上提高MMU的工作效率。
這一方法能夠產生效果的理論依據叫做存儲器訪問的局部性原理。它的意思是說,程序在執行過程中訪問與當前位置臨近的代碼的概率更高一些。因此,從理論上我們可以說,TLB中存儲了當前時間段需要使用的大多數頁表項,所以可以在很大程度上提高MMU的運行效率。
我們這里所用的是二級頁表的技術,何為二級頁表,即是MMU采用二級查表的方法,即首先由虛擬地址索引出第一張表的某一段內容,然后再根據這段內容搜索第二張表,最后才能確定物理地址。這里的第一張表,我們叫它一級頁表,第二張表被稱為是二級頁表。采用二級查表法的主要目的是減小頁表自身占據的內存空間,但缺點是進一步降低了內存的尋址效率。
好了,前情介紹完畢,下面上干貨,用哈佛大學開發的用於教學的OS161來實現VM,OS161基於MIP-I hardware。
代碼位於github上:https://github.com/tian-jiang/OS161-VirtualMemory
首先看一段代碼,kern/arch/mips/include/vm.h,物理內存的分配定義在此
/* * MIPS-I hardwired memory layout: * 0xc0000000 - 0xffffffff kseg2 (kernel, tlb-mapped) * 0xa0000000 - 0xbfffffff kseg1 (kernel, unmapped, uncached) * 0x80000000 - 0x9fffffff kseg0 (kernel, unmapped, cached) * 0x00000000 - 0x7fffffff kuseg (user, tlb-mapped) * * (mips32 is a little different) */ #define MIPS_KUSEG 0x00000000 #define MIPS_KSEG0 0x80000000 #define MIPS_KSEG1 0xa0000000 #define MIPS_KSEG2 0xc0000000
內存的分配用圖表示如下
這張圖展示了在OS161中物理內存的分配.
讓我們從頭開始:kern/startup/man.c
1 /* Early initialization. */ 2 ram_bootstrap(); 3 ....... 4 5 /* Late phase of initialization. */ 6 vm_bootstrap(); 7 ........
在操作系統啟動的時候,調用raw_bootstrap()以及vm_bootstrap()來啟動vm管理模塊。那么這兩個函數是在哪里定義和使用的呢,我們接着看下面的代碼。
kern/include/vm.h和kern/arch/mips/include/vm.h
/* Initialization function */ void vm_bootstrap(void);
......
/* Allocate/free kernel heap pages (called by kmalloc/kfree) */
void frametable_bootstrap(void);
/* * Interface to the low-level module that looks after the amount of * physical memory we have. * * ram_getsize returns the lowest valid physical address, and one past * the highest valid physical address. (Both are page-aligned.) This * is the memory that is available for use during operation, and * excludes the memory the kernel is loaded into and memory that is * grabbed in the very early stages of bootup. * * ram_stealmem can be used before ram_getsize is called to allocate * memory that cannot be freed later. This is intended for use early * in bootup before VM initialization is complete. */ void ram_bootstrap(void); paddr_t ram_stealmem(unsigned long npages); void ram_getsize(paddr_t *lo, paddr_t *hi);
這兩個function是定義在這里的,那么這兩個function又是干什么事情的呢
kern/arch/mips/vm/ram.c, kern/arch/mips/vm/vm.c, kern/vm/frametable.c
vaddr_t firstfree; /* first free virtual address; set by start.S */ static paddr_t firstpaddr; /* address of first free physical page */ static paddr_t lastpaddr; /* one past end of last free physical page */ /* * Called very early in system boot to figure out how much physical * RAM is available. */ void ram_bootstrap(void) { size_t ramsize; /* Get size of RAM. */ ramsize = mainbus_ramsize(); /* * This is the same as the last physical address, as long as * we have less than 508 megabytes of memory. If we had more, * various annoying properties of the MIPS architecture would * force the RAM to be discontiguous. This is not a case we * are going to worry about. */ if (ramsize > 508*1024*1024) { ramsize = 508*1024*1024; } lastpaddr = ramsize; /* * Get first free virtual address from where start.S saved it. * Convert to physical address. */ firstpaddr = firstfree - MIPS_KSEG0; kprintf("%uk physical memory available\n", (lastpaddr-firstpaddr)/1024); }
/* * Initialise the frame table */ void vm_bootstrap(void) { frametable_bootstrap(); }
/* * Make variables static to prevent it from other file's accessing */ static struct frame_table_entry *frame_table; static paddr_t frametop, freeframe; /* * initialise frame table */ void frametable_bootstrap(void) { struct frame_table_entry *p; paddr_t firsta, lasta, paddr; unsigned long framenum, entry_num, frame_table_size, i; // get the useable range of physical memory ram_getsize(&firsta, &lasta); KASSERT((firsta & PAGE_FRAME) == firsta); KASSERT((lasta & PAGE_FRAME) == lasta); framenum = (lasta - firsta) / PAGE_SIZE; // calculate the size of the whole framemap frame_table_size = framenum * sizeof(struct frame_table_entry); frame_table_size = ROUNDUP(frame_table_size, PAGE_SIZE); entry_num = frame_table_size / PAGE_SIZE; KASSERT((frame_table_size & PAGE_FRAME) == frame_table_size); frametop = firsta; freeframe = firsta + frame_table_size; if (freeframe >= lasta) { // This is impossible for most of the time panic("vm: framemap consume physical memory?\n"); } // keep the frame state in the top of the useable range of physical memory // the free frame page address started from the end of the frame map frame_table = (struct frame_table_entry *) PADDR_TO_KVADDR(firsta); // Initialise the frame list, each entry corrsponding to a frame, // and each entry stores the address of the next free frame. // If the next frame address of this entry equals zero, means this current frame is allocated p = frame_table; for (i = 0; i < framenum-1; i++) { if (i < entry_num) { p->next_freeframe = 0; p += 1; continue; } paddr = frametop + (i+1) * PAGE_SIZE; p->next_freeframe = paddr; p += 1; } }
kern/include/vm.h
struct frame_table_entry { // address of next free frame size_t next_freeframe; };
raw_bootstrap是系統初始化時用來查看有多少物理內存可以使用的。而vm_bootstrap只是簡單的調用了frametable_bootstrap(),而frametable_bootstrap()則是將能用的物理內存分頁,每頁大小為4K,然后保存一個記錄空白頁的linked list在內存中,從free的內存的頂部開始存放,但是在存放之前,先要算出需要多少空間來存放這個frame table。所以代碼的前段在計算frame table的大小,后面則是初始化frame table這個linked list。因為初始化的時候都是空的,所以直接指向下一個page的地址即可。
操作系統的vm初始化到此完畢。那vm是怎么使用的呢,請看下面
kern/include/vm.h
/* Fault handling function called by trap code */ int vm_fault(int faulttype, vaddr_t faultaddress); vaddr_t alloc_kpages(int npages); void free_kpages(vaddr_t addr);
kern/include/addrspace.h,實現在kern/vm/addrspace.c
/* * Address space - data structure associated with the virtual memory * space of a process. * * You write this. */ /* * A linked list which defined to store the information for regions(code, text, bss...) */ struct as_region { vaddr_t as_vbase; /* the started virtual address for one region */ size_t as_npages; /* how many pages this region occupied from the vbase */ unsigned int as_permissions; /* does this region readable? writable? executable? */ struct as_region *as_next_region; /* address of the following region */ }; struct addrspace { #if OPT_DUMBVM vaddr_t as_vbase1; paddr_t as_pbase1; size_t as_npages1; vaddr_t as_vbase2; paddr_t as_pbase2; size_t as_npages2; paddr_t as_stackpbase; #else /* Put stuff here for your VM system */ struct as_region *as_regions_start; /* header of the regions linked list */ vaddr_t as_pagetable; /* address of the first-level page table */ #endif }; /* * The structure of PTE in page table: * | address | PTE_VALID | PE_W | PF_R | PF_X * the virtual address of frame | valid indicator | writeable flag | readable flag | executable flag * I don't use structure to represent PTE, just use type vaddr_t, and becuase the last 12 bit is free * for a virtual address of frame, some of they could be used for the flags */ /* * Functions in addrspace.c: * * as_create - create a new empty address space. You need to make * sure this gets called in all the right places. You * may find you want to change the argument list. May * return NULL on out-of-memory error. * * as_copy - create a new address space that is an exact copy of * an old one. Probably calls as_create to get a new * empty address space and fill it in, but that's up to * you. * * as_activate - make the specified address space the one currently * "seen" by the processor. Argument might be NULL, * meaning "no particular address space". * * as_destroy - dispose of an address space. You may need to change * the way this works if implementing user-level threads. * * as_define_region - set up a region of memory within the address * space. * * as_prepare_load - this is called before actually loading from an * executable into the address space. * * as_complete_load - this is called when loading from an executable * is complete. * * as_define_stack - set up the stack region in the address space. * (Normally called *after* as_complete_load().) Hands * back the initial stack pointer for the new process. * * as_zero_region - zero out a new allocated page. * * as_destroy_regions - free all the space allocated for regions storeage. */ struct addrspace *as_create(void); int as_copy(struct addrspace *src, struct addrspace **ret); void as_activate(struct addrspace *); void as_destroy(struct addrspace *); int as_define_region(struct addrspace *as, vaddr_t vaddr, size_t sz, int readable, int writeable, int executable); int as_prepare_load(struct addrspace *as); int as_complete_load(struct addrspace *as); int as_define_stack(struct addrspace *as, vaddr_t *initstackptr); void as_zero_region(vaddr_t vaddr, unsigned npages); void as_destroy_regions(struct as_region *ar);
kern/vm/frametable.c
/* * Allocate n pages. * Before frame table initialisation, using ram_stealmem */ static paddr_t getppages(int npages) { paddr_t paddr; struct frame_table_entry *p; int i; spinlock_acquire(&frametable_lock); if (frame_table == 0) paddr = ram_stealmem(npages); else { if (npages > 1){ spinlock_release(&frametable_lock); return 0; } // Freeframe equals zero means all the frames have been allocated // and there is no frame to use. if (freeframe == 0){ spinlock_release(&frametable_lock); return 0; } // Get the current free frame's entry id // and retrieve the next free frame paddr = freeframe; i = (freeframe - frametop) / PAGE_SIZE; p = frame_table + i; freeframe = p->next_freeframe; p->next_freeframe = 0; } spinlock_release(&frametable_lock); return paddr; } /* * Allocation function for public accessing * Returning virtual address of frame */ vaddr_t alloc_kpages(int npages) { paddr_t paddr = getppages(npages); if(paddr == 0) return 0; return PADDR_TO_KVADDR(paddr); } /* * Free page * Stores the address of the current freeframe into the entry of the frame to be freed * and update the address of the freeframe. */ static void freeppages(paddr_t paddr) { struct frame_table_entry *p; int i; spinlock_acquire(&frametable_lock); i = (paddr - frametop) / PAGE_SIZE; p = frame_table + i; p->next_freeframe = freeframe; freeframe = paddr; spinlock_release(&frametable_lock); } /* * Free page function for public accessing */ void free_kpages(vaddr_t addr) { KASSERT(addr >= MIPS_KSEG0); paddr_t paddr = KVADDR_TO_PADDR(addr); if (paddr <= frametop) { // memory leakage } else { freeppages(paddr); } }
kern/arch/mips/vm
這是最關鍵的一個函數,當TLB里面找不到用戶app需要的virtual page時,怎么處理
/* * When TLB miss happening, a page fault will be trigged. * The way to handle it is as follow: * 1. check what page fault it is, if it is READONLY fault, * then do nothing just pop up an exception and kill the process * 2. if it is a read fault or write fault * 1. first check whether this virtual address is within any of the regions * or stack of the current addrspace. if it is not, pop up a exception and * kill the process, if it is there, goes on. * 2. then try to find the mapping in the page table, * if a page table entry exists for this virtual address insert it into TLB * 3. if this virtual address is not mapped yet, mapping this address, * update the pagetable, then insert it into TLB */ int vm_fault(int faulttype, vaddr_t faultaddress) { vaddr_t *vaddr1, *vaddr2, vaddr, vbase, vtop, faultadd = 0; paddr_t paddr; struct addrspace *as; struct as_region *s; uint32_t ehi, elo; int i, index1, index2, spl; unsigned int permis = 0; switch (faulttype) { case VM_FAULT_READONLY: return EFAULT; case VM_FAULT_READ: case VM_FAULT_WRITE: break; default: return EINVAL; } as = curthread -> t_addrspace; if (as == NULL) { return EFAULT; } // Align faultaddress faultaddress &= PAGE_FRAME; // Go through the link list of regions // Check the validation of the faultaddress KASSERT(as->as_regions_start != 0); s = as->as_regions_start; while (s != 0) { KASSERT(s->as_vbase != 0); KASSERT(s->as_npages != 0); KASSERT((s->as_vbase & PAGE_FRAME) == s->as_vbase); vbase = s->as_vbase; vtop = vbase + s->as_npages * PAGE_SIZE; if (faultaddress >= vbase && faultaddress < vtop) { faultadd = faultaddress; permis = s->as_permissions; break; } s = s->as_next_region; } if (faultadd == 0) { vtop = USERSTACK; vbase = vtop - VM_STACKPAGES * PAGE_SIZE; if (faultaddress >= vbase && faultaddress < vtop) { faultadd = faultaddress; // Stack is readable, writable but not executable permis |= (PF_W | PF_R); } // faultaddress is not within any range of the regions and stack if (faultadd == 0) { return EFAULT; } } index1 = (faultaddress & TOP_TEN) >> 22; index2 = (faultaddress & MID_TEN) >> 12; vaddr1 = (vaddr_t *)(as->as_pagetable + index1 * 4); if (*vaddr1) { vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4); // If the mapping exits in page table, // get the address stores in PTE, // translate it into physical address, // check writeable flag, // and prepare the physical address for TLBLO if (*vaddr2 & PTE_VALID) { vaddr = *vaddr2 & PAGE_FRAME; paddr = KVADDR_TO_PADDR(vaddr); if (permis & PF_W) { paddr |= TLBLO_DIRTY; } } // If not exists, do the mapping, // update the PTE of the second page table, // check writeable flag, // and prepare the physical address for TLBLO else { vaddr = alloc_kpages(1); KASSERT(vaddr != 0); as_zero_region(vaddr, 1); *vaddr2 |= (vaddr | PTE_VALID); paddr = KVADDR_TO_PADDR(vaddr); if (permis & PF_W) { paddr |= TLBLO_DIRTY; } } } // If second page table even doesn't exists, // create second page table, // do the mapping, // update the PTE, // and prepare the physical address. else { *vaddr1 = alloc_kpages(1); KASSERT(*vaddr1 != 0); as_zero_region(*vaddr1, 1); vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4); vaddr = alloc_kpages(1); KASSERT(vaddr != 0); as_zero_region(vaddr, 1); *vaddr2 |= (vaddr | PTE_VALID); paddr = KVADDR_TO_PADDR(vaddr); if (permis & PF_W) { paddr |= TLBLO_DIRTY; } } spl = splhigh(); // update TLB entry // if there still a empty TLB entry, insert new one in // if not, randomly select one, throw it, insert new one in for (i=0; i<NUM_TLB; i++) { tlb_read(&ehi, &elo, i); if (elo & TLBLO_VALID) { continue; } ehi = faultaddress; elo = paddr | TLBLO_VALID; tlb_write(ehi, elo, i); splx(spl); return 0; } // FIXME, TLB replacement algo. ehi = faultaddress; elo = paddr | TLBLO_VALID; tlb_random(ehi, elo); splx(spl); return 0; }
在系統運行的過程中,會不斷的產生page fault,這是因為,雖然系統給了運行的程序分配了頁(分配的函數見kern/vm/frametable.c),但是這個TLB里面沒有記錄這個頁面從虛擬地址到物理地址的映射,所以無法使用。所以在程序真正需要使用這個頁的時候,需要首先訪問TLB,從里面取出對應的物理地址。