作者
彭東林
QQ 405728433
平台
Linux-4.10.17
Qemu-2.8 + vexpress-a9
概述
前面兩篇介紹了remap_pfn_range的使用,下面學習一下該函數的實現。
正文
前提:
下面的分析基於2級頁表
remap_pfn_range的實現在mm/memory.c。
1 int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, 2 unsigned long pfn, unsigned long size, pgprot_t prot) 3 { 4 pgd_t *pgd; 5 unsigned long next; 6 unsigned long end = addr + PAGE_ALIGN(size); 7 struct mm_struct *mm = vma->vm_mm; 8 unsigned long remap_pfn = pfn; 9 int err; 10 11 /* 12 * Physically remapped pages are special. Tell the 13 * rest of the world about it: 14 * VM_IO tells people not to look at these pages 15 * (accesses can have side effects). 16 * VM_PFNMAP tells the core MM that the base pages are just 17 * raw PFN mappings, and do not have a "struct page" associated 18 * with them. 19 * VM_DONTEXPAND 20 * Disable vma merging and expanding with mremap(). 21 * VM_DONTDUMP 22 * Omit vma from core dump, even when VM_IO turned off. 23 * 24 * There's a horrible special case to handle copy-on-write 25 * behaviour that some programs depend on. We mark the "original" 26 * un-COW'ed pages by matching them up with "vma->vm_pgoff". 27 * See vm_normal_page() for details. 28 */ 29 vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; 30 31 BUG_ON(addr >= end); 32 pfn -= addr >> PAGE_SHIFT; 33 pgd = pgd_offset(mm, addr); 34 flush_cache_range(vma, addr, end); 35 do { 36 next = pgd_addr_end(addr, end); 37 err = remap_pud_range(mm, pgd, addr, next, 38 pfn + (addr >> PAGE_SHIFT), prot); 39 if (err) 40 break; 41 } while (pgd++, addr = next, addr != end); 42 43 return err; 44 }
第2行,pfn是將要被映射的物理頁幀號,size表示需要映射的尺寸
第6行,計算本次映射的結尾虛擬地址
第32行的pfn-=addr>>PAGE_SHIFT,和第38行的pfn+(addr>>PAGE_SHIFT)是為了循環處理上的便利
第33行,計算addr在第1級頁表中對應的頁表項的地址,pgd_offset宏展開后是:mm->pgd + (addr >>21)
第34行,刷新cache
第36行,pgd_addr_end(addr, end)計算下一個將要被映射的虛擬地址,如果addr到end可以被一個pgd映射的話,那么返回end的值
第37行的remap_pud_range的定義如下:
1 static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, 2 unsigned long addr, unsigned long end, 3 unsigned long pfn, pgprot_t prot) 4 { 5 pud_t *pud; 6 unsigned long next; 7 8 pfn -= addr >> PAGE_SHIFT; 9 pud = pud_alloc(mm, pgd, addr); 10 if (!pud) 11 return -ENOMEM; 12 do { 13 next = pud_addr_end(addr, end); 14 if (remap_pmd_range(mm, pud, addr, next, 15 pfn + (addr >> PAGE_SHIFT), prot)) 16 return -ENOMEM; 17 } while (pud++, addr = next, addr != end); 18 return 0; 19 }
第9行,對於2級頁表,pud_alloc(mm, pgd, addr)返回的是pgd的值
第13行,對於2級頁表,pud_addr_end(addr, end)返回end的值
第14行,函數remap_pmd_range定義如下:
1 static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, 2 unsigned long addr, unsigned long end, 3 unsigned long pfn, pgprot_t prot) 4 { 5 pmd_t *pmd; 6 unsigned long next; 7 8 pfn -= addr >> PAGE_SHIFT; 9 pmd = pmd_alloc(mm, pud, addr); 10 if (!pmd) 11 return -ENOMEM; 12 VM_BUG_ON(pmd_trans_huge(*pmd)); 13 do { 14 next = pmd_addr_end(addr, end); 15 if (remap_pte_range(mm, pmd, addr, next, 16 pfn + (addr >> PAGE_SHIFT), prot)) 17 return -ENOMEM; 18 } while (pmd++, addr = next, addr != end); 19 return 0; 20 }
第9行,對於2級頁表,pmd_alloc(mm, pud, addr)返回的是pud的值,其實也就是pgd的值
第14行,對於2級頁表,pmd_addr_end(addr, end)返回end的值
第15行,函數remap_pte_range定義如下:
1 static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, 2 unsigned long addr, unsigned long end, 3 unsigned long pfn, pgprot_t prot) 4 { 5 pte_t *pte; 6 spinlock_t *ptl; 7 8 pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); 9 if (!pte) 10 return -ENOMEM; 11 arch_enter_lazy_mmu_mode(); 12 do { 13 BUG_ON(!pte_none(*pte)); 14 set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); 15 pfn++; 16 } while (pte++, addr += PAGE_SIZE, addr != end); 17 arch_leave_lazy_mmu_mode(); 18 pte_unmap_unlock(pte - 1, ptl); 19 return 0; 20 }
第8行,pte_alloc_map_lock的定義如下:
#define pte_alloc_map_lock(mm, pmd, address, ptlp) \ (pte_alloc(mm, pmd, address) ? \ NULL : pte_offset_map_lock(mm, pmd, address, ptlp))
pte_alloc首先檢查*pmd是否為空,如果為空的話,表示第2級頁表還尚未分配,那么調用__pte_alloc分配一個頁(其實是調用alloc_pages分配了一個page,也就是4KB),並將起始地址存放的*pmd中,其實就是*pgd。如果不出意外的話,pte_alloc返回0,這樣pte_offset_map_lock就會被調用,返回address在第2級頁表中的表項的地址
第14行,調用pte_mkspecial構造第2級頁表項的內容,函數set_pte_at用於將表項內容設置到pte指向的第2級頁表項中
第15行,計算下一個將要被映射的物理頁幀號
第16行,計算第2級頁表項中下一個將要被填充的表項的地址
==