操作系統中的虛擬內存技術及其實現代碼


虛擬內存是現代操作系統普遍使用的一種技術。

虛擬內存的基本思想是,每個進程有用獨立的邏輯地址空間,內存被分為大小相等的多個塊,稱為(Page)。每個頁都是一段連續的地址。對於進程來看,邏輯上貌似有很多內存空間,其中一部分對應物理內存上的一塊(稱為頁框 page frame,通常頁和頁框大小相等),還有一些沒加載在內存中的對應在硬盤上。通過引入進程的邏輯地址,把進程地址空間與實際存儲空間分離,增加存儲管理的靈活性。

地址空間和存儲空間兩個基本概念的定義如下:

 

地址空間:將源程序經過編譯后得到的目標程序,存在於它所限定的地址范圍內,這個范圍稱為地址空間。地址空間是邏輯地址的集合。

 

存儲空間:指主存中一系列存儲信息的物理單元的集合,這些單元的編號稱為物理地址存儲空間是物理地址的集合。

由此衍生出的管理方式有三種:
頁式存儲管理、段式存儲管理和段頁式存儲管理。這里主要介紹頁式存儲。

在頁式系統中進程建立時,操作系統為進程中所有的頁分配頁框。當進程撤銷時收回所有分配給它的頁框。在程序的運行期間,如果允許進程動態地申請空間,操作系統還要為進程申請的空間分配物理頁框。操作系統為了完成這些功能,必須記錄系統內存中實際的頁框使用情況。操作系統還要在進程切換時,正確地切換兩個不同的進程地址空間到物理內存空間的映射。為了理解操作系統如何完成這些需求,我們先理解頁表技術。先看張圖,轉載自51CTO:

頁表中的條目被稱為頁表項(page table entry),一個頁表項負責記錄一段虛擬地址到物理地址的映射關系。

既然頁表是存儲在內存中的,那么程序每次完成一次內存讀取時都至少會訪問內存兩次,相比於不使用MMU(MMU是Memory Management Unit的縮寫,它代表集成在CPU內部的一個硬件邏輯單元,主要作用是給CPU提供從虛擬地址向物理地址轉換的功能,從硬件上給軟件提供一種內存保護的機制)時的一次內存訪問,效率被大大降低了,如果所使用的內存的性能比較差的話,這種效率的降低將會更明顯。因此,如何在發揮MMU優勢的同時使系統消耗盡量減小,就成為了一個亟待解決的問題。

於是,TLB產生了。TLB是什么呢?我們叫它轉換旁路緩沖器,它實際上是MMU中臨時存放轉換數據的一組重定位寄存器。既然TLB本質上是一組寄存器,那么不難理解,相比於訪問內存中的頁表,訪問TLB的速度要快很多。因此如果頁表的內容全部存放於TLB中,就可以解決訪問效率的問題了。

然而,由於制造成本等諸多限制,所有頁表都存儲在TLB中幾乎是不可能的。這樣一來,我們只能通過在有限容量的TLB中存儲一部分最常用的頁表,從而在一定程度上提高MMU的工作效率。

這一方法能夠產生效果的理論依據叫做存儲器訪問的局部性原理。它的意思是說,程序在執行過程中訪問與當前位置臨近的代碼的概率更高一些。因此,從理論上我們可以說,TLB中存儲了當前時間段需要使用的大多數頁表項,所以可以在很大程度上提高MMU的運行效率。

我們這里所用的是二級頁表的技術,何為二級頁表,即是MMU采用二級查表的方法,即首先由虛擬地址索引出第一張表的某一段內容,然后再根據這段內容搜索第二張表,最后才能確定物理地址。這里的第一張表,我們叫它一級頁表,第二張表被稱為是二級頁表。采用二級查表法的主要目的是減小頁表自身占據的內存空間,但缺點是進一步降低了內存的尋址效率。

好了,前情介紹完畢,下面上干貨,用哈佛大學開發的用於教學的OS161來實現VM,OS161基於MIP-I hardware

代碼位於github上:https://github.com/tian-jiang/OS161-VirtualMemory

首先看一段代碼,kern/arch/mips/include/vm.h,物理內存的分配定義在此

/*
 * MIPS-I hardwired memory layout:
 *    0xc0000000 - 0xffffffff   kseg2 (kernel, tlb-mapped)
 *    0xa0000000 - 0xbfffffff   kseg1 (kernel, unmapped, uncached)
 *    0x80000000 - 0x9fffffff   kseg0 (kernel, unmapped, cached)
 *    0x00000000 - 0x7fffffff   kuseg (user, tlb-mapped)
 *
 * (mips32 is a little different)
 */

#define MIPS_KUSEG  0x00000000
#define MIPS_KSEG0  0x80000000
#define MIPS_KSEG1  0xa0000000
#define MIPS_KSEG2  0xc0000000


內存的分配用圖表示如下

這張圖展示了在OS161中物理內存的分配. 

讓我們從頭開始:kern/startup/man.c

1     /* Early initialization. */
2     ram_bootstrap();
3         .......
4 
5     /* Late phase of initialization. */
6     vm_bootstrap();
7         ........

在操作系統啟動的時候,調用raw_bootstrap()以及vm_bootstrap()來啟動vm管理模塊。那么這兩個函數是在哪里定義和使用的呢,我們接着看下面的代碼。

kern/include/vm.h和kern/arch/mips/include/vm.h

/* Initialization function */
void vm_bootstrap(void);
......

  /* Allocate/free kernel heap pages (called by kmalloc/kfree) */

  void frametable_bootstrap(void);

/*
 * Interface to the low-level module that looks after the amount of
 * physical memory we have.
 *
 * ram_getsize returns the lowest valid physical address, and one past
 * the highest valid physical address. (Both are page-aligned.) This
 * is the memory that is available for use during operation, and
 * excludes the memory the kernel is loaded into and memory that is
 * grabbed in the very early stages of bootup.
 *
 * ram_stealmem can be used before ram_getsize is called to allocate
 * memory that cannot be freed later. This is intended for use early
 * in bootup before VM initialization is complete.
 */

void ram_bootstrap(void);
paddr_t ram_stealmem(unsigned long npages);
void ram_getsize(paddr_t *lo, paddr_t *hi);

這兩個function是定義在這里的,那么這兩個function又是干什么事情的呢

kern/arch/mips/vm/ram.c, kern/arch/mips/vm/vm.c, kern/vm/frametable.c

vaddr_t firstfree;   /* first free virtual address; set by start.S */

static paddr_t firstpaddr;  /* address of first free physical page */
static paddr_t lastpaddr;   /* one past end of last free physical page */

/*
 * Called very early in system boot to figure out how much physical
 * RAM is available.
 */
void
ram_bootstrap(void)
{
    size_t ramsize;
    
    /* Get size of RAM. */
    ramsize = mainbus_ramsize();

    /*
     * This is the same as the last physical address, as long as
     * we have less than 508 megabytes of memory. If we had more,
     * various annoying properties of the MIPS architecture would
     * force the RAM to be discontiguous. This is not a case we 
     * are going to worry about.
     */
    if (ramsize > 508*1024*1024) {
        ramsize = 508*1024*1024;
    }

    lastpaddr = ramsize;

    /* 
     * Get first free virtual address from where start.S saved it.
     * Convert to physical address.
     */
    firstpaddr = firstfree - MIPS_KSEG0;

    kprintf("%uk physical memory available\n", 
        (lastpaddr-firstpaddr)/1024);
}
/*
 * Initialise the frame table
 */
void
vm_bootstrap(void)
{
    frametable_bootstrap();
}
/*
 * Make variables static to prevent it from other file's accessing
 */
static struct frame_table_entry *frame_table;
static paddr_t frametop, freeframe;

/*
 * initialise frame table
 */
void
frametable_bootstrap(void)
{
    struct frame_table_entry *p;
    paddr_t firsta, lasta, paddr;
    unsigned long framenum, entry_num, frame_table_size, i;
    
    // get the useable range of physical memory
    ram_getsize(&firsta, &lasta);
    KASSERT((firsta & PAGE_FRAME) == firsta);
    KASSERT((lasta & PAGE_FRAME) == lasta);
    
    framenum = (lasta - firsta) / PAGE_SIZE;
    
    // calculate the size of the whole framemap
    frame_table_size = framenum * sizeof(struct frame_table_entry);
    frame_table_size = ROUNDUP(frame_table_size, PAGE_SIZE);
    entry_num = frame_table_size / PAGE_SIZE;
    KASSERT((frame_table_size & PAGE_FRAME) == frame_table_size);
    
    frametop = firsta;
    freeframe = firsta + frame_table_size;
    
    if (freeframe >= lasta) {
        // This is impossible for most of the time
        panic("vm: framemap consume physical memory?\n");
    }
    
    // keep the frame state in the top of the useable range of physical memory
    // the free frame page address started from the end of the frame map
    frame_table = (struct frame_table_entry *) PADDR_TO_KVADDR(firsta);
    
    // Initialise the frame list, each entry corrsponding to a frame,
    // and each entry stores the address of the next free frame.
    // If the next frame address of this entry equals zero, means this current frame is allocated
    p = frame_table;
    for (i = 0; i < framenum-1; i++) {
        if (i < entry_num) {
            p->next_freeframe = 0;
            p += 1;
            continue;
        }
        paddr = frametop + (i+1) * PAGE_SIZE;
        p->next_freeframe = paddr;
        p += 1;
    }
}
kern/include/vm.h
struct
frame_table_entry { // address of next free frame size_t next_freeframe; };

raw_bootstrap是系統初始化時用來查看有多少物理內存可以使用的。而vm_bootstrap只是簡單的調用了frametable_bootstrap(),而frametable_bootstrap()則是將能用的物理內存分頁,每頁大小為4K,然后保存一個記錄空白頁的linked list在內存中,從free的內存的頂部開始存放,但是在存放之前,先要算出需要多少空間來存放這個frame table。所以代碼的前段在計算frame table的大小,后面則是初始化frame table這個linked list。因為初始化的時候都是空的,所以直接指向下一個page的地址即可。

操作系統的vm初始化到此完畢。那vm是怎么使用的呢,請看下面

kern/include/vm.h

/* Fault handling function called by trap code */
int vm_fault(int faulttype, vaddr_t faultaddress);

vaddr_t alloc_kpages(int npages);
void free_kpages(vaddr_t addr);

kern/include/addrspace.h,實現在kern/vm/addrspace.c

/* 
 * Address space - data structure associated with the virtual memory
 * space of a process.
 *
 * You write this.
 */

/*
 * A linked list which defined to store the information for regions(code, text, bss...)
 */
struct as_region {
    vaddr_t as_vbase;    /* the started virtual address for one region */
    size_t as_npages;    /* how many pages this region occupied from the vbase */
    unsigned int as_permissions;    /* does this region readable? writable? executable? */
    struct as_region *as_next_region;    /* address of the following region */
};

struct addrspace {
#if OPT_DUMBVM
        vaddr_t as_vbase1;
        paddr_t as_pbase1;
        size_t as_npages1;
        vaddr_t as_vbase2;
        paddr_t as_pbase2;
        size_t as_npages2;
        paddr_t as_stackpbase;
#else
        /* Put stuff here for your VM system */
    struct as_region *as_regions_start;    /* header of the regions linked list */
    vaddr_t as_pagetable;               /* address of the first-level page table */
#endif
};

/*
 * The structure of PTE in page table:
 * |        address             |  PTE_VALID      |    PE_W        |    PF_R        |    PF_X
 *  the virtual address of frame | valid indicator | writeable flag | readable flag | executable flag 
 * I don't use structure to represent PTE, just use type vaddr_t, and becuase the last 12 bit is free 
 * for a virtual address of frame, some of they could be used for the flags
 */

/*
 * Functions in addrspace.c:
 *
 *    as_create - create a new empty address space. You need to make 
 *                sure this gets called in all the right places. You
 *                may find you want to change the argument list. May
 *                return NULL on out-of-memory error.
 *
 *    as_copy   - create a new address space that is an exact copy of
 *                an old one. Probably calls as_create to get a new
 *                empty address space and fill it in, but that's up to
 *                you.
 *
 *    as_activate - make the specified address space the one currently
 *                "seen" by the processor. Argument might be NULL, 
 *                meaning "no particular address space".
 *
 *    as_destroy - dispose of an address space. You may need to change
 *                the way this works if implementing user-level threads.
 *
 *    as_define_region - set up a region of memory within the address
 *                space.
 *
 *    as_prepare_load - this is called before actually loading from an
 *                executable into the address space.
 *
 *    as_complete_load - this is called when loading from an executable
 *                is complete.
 *
 *    as_define_stack - set up the stack region in the address space.
 *                (Normally called *after* as_complete_load().) Hands
 *                back the initial stack pointer for the new process.
 *
 *    as_zero_region - zero out a new allocated page.
 *
 *    as_destroy_regions - free all the space allocated for regions storeage.
 */

struct addrspace *as_create(void);
int               as_copy(struct addrspace *src, struct addrspace **ret);
void              as_activate(struct addrspace *);
void              as_destroy(struct addrspace *);

int               as_define_region(struct addrspace *as, 
                                   vaddr_t vaddr, size_t sz,
                                   int readable, 
                                   int writeable,
                                   int executable);
int               as_prepare_load(struct addrspace *as);
int               as_complete_load(struct addrspace *as);
int               as_define_stack(struct addrspace *as, vaddr_t *initstackptr);
void          as_zero_region(vaddr_t vaddr, unsigned npages);
void          as_destroy_regions(struct as_region *ar);

kern/vm/frametable.c

/*
 * Allocate n pages. 
 * Before frame table initialisation, using ram_stealmem
 */
static
paddr_t
getppages(int npages)
{
    paddr_t paddr;
    struct frame_table_entry *p;
    int i;
    
    spinlock_acquire(&frametable_lock);
    if (frame_table == 0)
        paddr = ram_stealmem(npages);
    else
    {
        if (npages > 1){
            spinlock_release(&frametable_lock);
            return 0;
        }
        
        // Freeframe equals zero means all the frames have been allocated
        // and there is no frame to use.
        if (freeframe == 0){
            spinlock_release(&frametable_lock);
            return 0;
        }
        
        // Get the current free frame's entry id 
        // and retrieve the next free frame 
        paddr = freeframe;
        i = (freeframe - frametop) / PAGE_SIZE;
        p = frame_table + i;
        
        freeframe = p->next_freeframe;
        p->next_freeframe = 0;
    }
    spinlock_release(&frametable_lock);
    
    return paddr;
}

/*
 * Allocation function for public accessing
 * Returning virtual address of frame
 */
vaddr_t
alloc_kpages(int npages)
{
    paddr_t paddr = getppages(npages);
    
    if(paddr == 0)
        return 0;
    
    return PADDR_TO_KVADDR(paddr);
}

/*
 * Free page
 * Stores the address of the current freeframe into the entry of the frame to be freed
 * and update the address of the freeframe.
 */
static
void
freeppages(paddr_t paddr)
{
    struct frame_table_entry *p;
    int i;
    spinlock_acquire(&frametable_lock);
    i = (paddr - frametop) / PAGE_SIZE;
    p = frame_table + i;
    p->next_freeframe = freeframe;
    freeframe = paddr;
    spinlock_release(&frametable_lock);
}

/*
 * Free page function for public accessing
 */
void
free_kpages(vaddr_t addr)
{
    KASSERT(addr >= MIPS_KSEG0);
    
    paddr_t paddr = KVADDR_TO_PADDR(addr);
    if (paddr <= frametop) {
        // memory leakage
    }
    else {
        freeppages(paddr);
    }
}

kern/arch/mips/vm

這是最關鍵的一個函數,當TLB里面找不到用戶app需要的virtual page時,怎么處理

/*
 * When TLB miss happening, a page fault will be trigged.
 * The way to handle it is as follow:
 * 1. check what page fault it is, if it is READONLY fault, 
 *    then do nothing just pop up an exception and kill the process
 * 2. if it is a read fault or write fault
 *    1. first check whether this virtual address is within any of the regions
 *       or stack of the current addrspace. if it is not, pop up a exception and
 *       kill the process, if it is there, goes on. 
 *    2. then try to find the mapping in the page table, 
 *       if a page table entry exists for this virtual address insert it into TLB 
 *    3. if this virtual address is not mapped yet, mapping this address,
 *     update the pagetable, then insert it into TLB
 */
int
vm_fault(int faulttype, vaddr_t faultaddress)
{
    vaddr_t *vaddr1, *vaddr2, vaddr, vbase, vtop, faultadd = 0;
    paddr_t paddr;
    struct addrspace *as;
    struct as_region *s;
    uint32_t ehi, elo;
    int i, index1, index2, spl;
    unsigned int permis = 0;
    
    switch (faulttype) {
        case VM_FAULT_READONLY:
            return EFAULT;
        case VM_FAULT_READ:
        case VM_FAULT_WRITE:
            break;
        default:
            return EINVAL;
    }
    
    as = curthread -> t_addrspace;
    if (as == NULL) {
        return EFAULT;
    }
    
    // Align faultaddress
    faultaddress &= PAGE_FRAME;
    
    // Go through the link list of regions 
    // Check the validation of the faultaddress
    KASSERT(as->as_regions_start != 0);
    s = as->as_regions_start;
    while (s != 0) {
        KASSERT(s->as_vbase != 0);
        KASSERT(s->as_npages != 0);
        KASSERT((s->as_vbase & PAGE_FRAME) == s->as_vbase);
        vbase = s->as_vbase;
        vtop = vbase + s->as_npages * PAGE_SIZE;
        if (faultaddress >= vbase && faultaddress < vtop) {
            faultadd = faultaddress;
            permis = s->as_permissions;
            break;
        }
        s = s->as_next_region;
    }
    
    if (faultadd == 0) {
        vtop = USERSTACK;
        vbase = vtop - VM_STACKPAGES * PAGE_SIZE;
        if (faultaddress >= vbase && faultaddress < vtop) {
            faultadd = faultaddress;
            // Stack is readable, writable but not executable
            permis |= (PF_W | PF_R);
        }
        
        // faultaddress is not within any range of the regions and stack
        if (faultadd == 0) {
            return EFAULT;
        }
    }
    
    index1 = (faultaddress & TOP_TEN) >> 22;
    index2 = (faultaddress & MID_TEN) >> 12;

    vaddr1 = (vaddr_t *)(as->as_pagetable + index1 * 4);
    if (*vaddr1) {
        vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4);
        // If the mapping exits in page table,
        // get the address stores in PTE, 
        // translate it into physical address, 
        // check writeable flag,
        // and prepare the physical address for TLBLO
        if (*vaddr2 & PTE_VALID) {
            vaddr = *vaddr2 & PAGE_FRAME;
            paddr = KVADDR_TO_PADDR(vaddr);
            if (permis & PF_W) {
                paddr |= TLBLO_DIRTY;
            }
        }
        // If not exists, do the mapping, 
        // update the PTE of the second page table,
        // check writeable flag,
        // and prepare the physical address for TLBLO
        else {
            vaddr = alloc_kpages(1);
            KASSERT(vaddr != 0);
            
            as_zero_region(vaddr, 1);
            *vaddr2 |= (vaddr | PTE_VALID);
            
            paddr = KVADDR_TO_PADDR(vaddr);
            if (permis & PF_W) {
                paddr |= TLBLO_DIRTY;
            }
        }
    }
    // If second page table even doesn't exists, 
    // create second page table,
    // do the mapping,
    // update the PTE,
    // and prepare the physical address.
    else {
        *vaddr1 = alloc_kpages(1);
        KASSERT(*vaddr1 != 0);
        as_zero_region(*vaddr1, 1);
        
        vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4);
        vaddr = alloc_kpages(1);
        KASSERT(vaddr != 0);
        as_zero_region(vaddr, 1);
        *vaddr2 |= (vaddr | PTE_VALID);

        paddr = KVADDR_TO_PADDR(vaddr);
        if (permis & PF_W) {
            paddr |= TLBLO_DIRTY;
        }
    }
        
    spl = splhigh();
    
    // update TLB entry
    // if there still a empty TLB entry, insert new one in
    // if not, randomly select one, throw it, insert new one in
    for (i=0; i<NUM_TLB; i++) {
        tlb_read(&ehi, &elo, i);
        if (elo & TLBLO_VALID) {
            continue;
        }
        ehi = faultaddress;
        elo = paddr | TLBLO_VALID;
        tlb_write(ehi, elo, i);
        splx(spl);
        return 0;
    }
    
    // FIXME, TLB replacement algo.
    ehi = faultaddress;
    elo = paddr | TLBLO_VALID;
    tlb_random(ehi, elo);
    splx(spl);
    return 0;
}

在系統運行的過程中,會不斷的產生page fault,這是因為,雖然系統給了運行的程序分配了頁(分配的函數見kern/vm/frametable.c),但是這個TLB里面沒有記錄這個頁面從虛擬地址到物理地址的映射,所以無法使用。所以在程序真正需要使用這個頁的時候,需要首先訪問TLB,從里面取出對應的物理地址。

       

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM