設備層是實現了文件系統與Flash之間的橋梁,其基於MTD原始層提供了兩種上層訪問Flash的方式:MTD的字符設備和塊設備,字符設備通過向內核注冊字符設備的file_operations結構實現了對MTD設備的讀寫和控制,提供了對閃存的原始字符訪問,關聯的設備是/dev/mtd*,而MTD塊設備則是定義了一個描述MTD塊設備mtdblock_tr的結構,關聯的設備是/dev/mtdblock*,下面主要看看其實現的原理。
1. MTD字符設備
對於MTD的字符設備,主要集中在driver/mtd/mtdchar.c文件中,其流程相對比較簡單,主要是向內核注冊了一個字符設備(主設備號為90)並提供有其操作集file_operations,其代碼為
int __init init_mtdchar(void) { int ret; ret = __register_chrdev(MTD_CHAR_MAJOR, 0, 1 << MINORBITS, "mtd", &mtd_fops); if (ret < 0) { pr_err("Can't allocate major number %d for MTD\n", MTD_CHAR_MAJOR); return ret; } return ret; }
只是簡單的一句話__register_chrdev,標標准准的字符設備的接口,這個只是在/dev下面創建了mtd*這個文件,並且提供了mtd_fops,當系統將Flash設備當作字符設備,訪問/dev/mtd*(設備是系統實現mtd分區所對應的字符設備)時,會調用到操作集的對應的操作函數集中對應的函數來實現把數據讀入/讀出Flash。
1 static const struct file_operations mtd_fops = { 2 .owner = THIS_MODULE, 3 .llseek = mtdchar_lseek, 4 .read = mtdchar_read, 5 .write = mtdchar_write, 6 .unlocked_ioctl = mtdchar_unlocked_ioctl, 7 #ifdef CONFIG_COMPAT 8 .compat_ioctl = mtdchar_compat_ioctl, 9 #endif 10 .open = mtdchar_open, 11 .release = mtdchar_close, 12 .mmap = mtdchar_mmap, 13 #ifndef CONFIG_MMU 14 .get_unmapped_area = mtdchar_get_unmapped_area, 15 .mmap_capabilities = mtdchar_mmap_capabilities, 16 #endif 17 }
其實就是實現了字符設備接口,用戶可以直接read/write系統調用可以實現對flash的操作,那么看看一個read的過程,其實從下面的流程來看,最后會根據此時Flash支持的類型來調用mtd原始層mtd_info對應的相關接口。
其實在對Flash的操作之前,需要擦除才能寫,那么就需要有其他的命令,如擦除,獲取Flash信息,寫ecc等接口,這些接口可以通過ioctl來實現,其結果也是會用到mtd_info這個結構。
2. MTD塊設備層
mtd塊設備代碼文件是Mtd_blkdevs.c,其的功能是為mtd塊設備讀寫提供緩沖操作對於塊設備層,而還有mtdblock_ro.c,它定義的是mtd塊設備緩沖的只讀操作。首先看看是塊設備層是如何注冊一個MTD的塊設備
static int __init init_mtdblock(void) { return register_mtd_blktrans(&mtdblock_tr); }
其僅僅調用了register_mtd_blktrans,同樣也提供了塊設備的操作函數集mtdblock_tr,那么重點函數出現了,register_mtd_blktrans主要是干了些什么事呢?怎么完成塊設備的訪問呢?
int register_mtd_blktrans(struct mtd_blktrans_ops *tr) { struct mtd_info *mtd; int ret; /* Register the notifier if/when the first device type is registered, to prevent the link/init ordering from fucking us over. */ if (!blktrans_notifier.list.next) register_mtd_user(&blktrans_notifier); //為每個設備注冊注冊notifier mutex_lock(&mtd_table_mutex); ret = register_blkdev(tr->major, tr->name); //注冊一個塊設備,主設備號為31 if (ret < 0) { printk(KERN_WARNING "Unable to register %s block device on major %d: %d\n", tr->name, tr->major, ret); mutex_unlock(&mtd_table_mutex); return ret; } if (ret) tr->major = ret; tr->blkshift = ffs(tr->blksize) - 1; //獲取mtd設備偏移值 INIT_LIST_HEAD(&tr->devs); //初始化mtd設備鏈表 list_add(&tr->list, &blktrans_majors); //添加到blktrans_majors鏈表中 mtd_for_each_device(mtd) //遍歷mtd_idr idr機制32叉樹 查找添加到該樹下的節點對應的mtd_info if (mtd->type != MTD_ABSENT) tr->add_mtd(tr, mtd); //調用mtdblock_add_mtd mutex_unlock(&mtd_table_mutex); return 0; }
1. 為每一個塊設備注冊一個notifier,之后將塊設備添加到鏈表中
2. 注冊一個塊設備register_blkdev,主設備號為31
3. 最后通過尋找對於的mtd_info,判斷是否是有效的Mtd設備后調用到對應的add_mtd,將設備添加。
void register_mtd_user (struct mtd_notifier *new) { struct mtd_info *mtd; mutex_lock(&mtd_table_mutex); list_add(&new->list, &mtd_notifiers); //將blktrans_notifier添加到mtd_notifiers __module_get(THIS_MODULE); mtd_for_each_device(mtd) //查找添加到該樹下的節點對應的mtd_info調用mtdblock_add_mtd new->add(mtd); mutex_unlock(&mtd_table_mutex); }
這個將mtd_notifier添加到mtd_notifiers鏈表中,之后會在add_mtd_device中遍歷該鏈表,也會調用到該notifier的add的接口,然后查找添加到樹下的節點對應的mtd_info,調用對mtd_blktrans_ops應的add的接口。
static void blktrans_notify_add(struct mtd_info *mtd) { struct mtd_blktrans_ops *tr; if (mtd->type == MTD_ABSENT) return; list_for_each_entry(tr, &blktrans_majors, list) tr->add_mtd(tr, mtd); }
最終會調用到mtdblock_add_mtd來添加mtd的分區,對於mtd屬於塊設備,那么這部分應該會按照塊設備的流程來,下面來看看代碼
static void mtdblock_add_mtd(struct mtd_blktrans_ops *tr, struct mtd_info *mtd) { struct mtdblk_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) return; dev->mbd.mtd = mtd; //mtd分區對象 dev->mbd.devnum = mtd->index; //mtd分區編號 dev->mbd.size = mtd->size >> 9; //對齊 dev->mbd.tr = tr; //mtd_blktrans_ops if (!(mtd->flags & MTD_WRITEABLE)) dev->mbd.readonly = 1; if (add_mtd_blktrans_dev(&dev->mbd)) //添加mtd_blktrans_dev設備 kfree(dev); }
這個函數簡單的分配了一個mtd_blktrans_dev結構體,並做了相應的初始化,之后就調用了add_mtd_blktrans_dev函數將,其實好戲在這個add_mtd_blktrans_dev函數里面。
int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) { struct mtd_blktrans_ops *tr = new->tr; struct mtd_blktrans_dev *d; int last_devnum = -1; struct gendisk *gd; int ret; if (mutex_trylock(&mtd_table_mutex)) { mutex_unlock(&mtd_table_mutex); BUG(); } /*這段代碼只是檢查一下mtd block層中次設備號有沒有被分區出去 從前面的分析知道mtd block設備的主設備號為31,而mtd設備將加入 mtd_talbe時的index作為mtd block設備的次設備號,這里需要首先 檢查一下這個次設備號是否被占用*/ mutex_lock(&blktrans_ref_mutex); list_for_each_entry(d, &tr->devs, list) { if (new->devnum == -1) { /* Use first free number */ if (d->devnum != last_devnum+1) { /* Found a free devnum. Plug it in here */ new->devnum = last_devnum+1; list_add_tail(&new->list, &d->list); goto added; } } else if (d->devnum == new->devnum) { /* Required number taken */ mutex_unlock(&blktrans_ref_mutex); return -EBUSY; } else if (d->devnum > new->devnum) { /* Required number was free */ list_add_tail(&new->list, &d->list); goto added; } last_devnum = d->devnum; } ret = -EBUSY; if (new->devnum == -1) new->devnum = last_devnum+1; /* Check that the device and any partitions will get valid * minor numbers and that the disk naming code below can cope * with this number. */ if (new->devnum > (MINORMASK >> tr->part_bits) || (tr->part_bits && new->devnum >= 27 * 26)) { mutex_unlock(&blktrans_ref_mutex); goto error1; } /* 1. 分配次設備號成功,將mtd_blktrans_dev鏈接到 mtdblock_tr->devs上面*/ list_add_tail(&new->list, &tr->devs); added: mutex_unlock(&blktrans_ref_mutex); mutex_init(&new->lock); kref_init(&new->ref); if (!tr->writesect) new->readonly = 1; /* Create gendisk */ ret = -ENOMEM; /* 2. 在分配了mtd block設備后,這里又 要將mtd block設備抽象成一個block設備,將它注冊到block層*/ gd = alloc_disk(1 << tr->part_bits); if (!gd) goto error2; /* 3. 設置gendisk數據結構 */ new->disk = gd; gd->private_data = new; gd->major = tr->major; gd->first_minor = (new->devnum) << tr->part_bits; gd->fops = &mtd_block_ops; if (tr->part_bits) if (new->devnum < 26) snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%c", tr->name, 'a' + new->devnum); else snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%c%c", tr->name, 'a' - 1 + new->devnum / 26, 'a' + new->devnum % 26); else snprintf(gd->disk_name, sizeof(gd->disk_name), "%s%d", tr->name, new->devnum); /*4. 設置容量 */ set_capacity(gd, (new->size * tr->blksize) >> 9); /*5. 初始化一個塊設備的請求的調度隊列,所有的mtd塊設備 都會用這個隊列來完成IO操作 */ spin_lock_init(&new->queue_lock); new->rq = blk_init_queue(mtd_blktrans_request, &new->queue_lock); if (!new->rq) goto error3; if (tr->flush) blk_queue_flush(new->rq, REQ_FLUSH); new->rq->queuedata = new; //重點,后面會使用 blk_queue_logical_block_size(new->rq, tr->blksize); queue_flag_set_unlocked(QUEUE_FLAG_NONROT, new->rq); queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, new->rq); if (tr->discard) { queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, new->rq); new->rq->limits.max_discard_sectors = UINT_MAX; } gd->queue = new->rq; /*6. 初始化一個mtd block的工作隊列,和處理mtd block 設備的IO請求有關系 */ new->wq = alloc_workqueue("%s%d", 0, 0, tr->name, new->mtd->index); if (!new->wq) goto error4; INIT_WORK(&new->work, mtd_blktrans_work); gd->driverfs_dev = &new->mtd->dev; if (new->readonly) set_disk_ro(gd, 1); /*7 添加 disk */ add_disk(gd); if (new->disk_attributes) { ret = sysfs_create_group(&disk_to_dev(gd)->kobj, new->disk_attributes); WARN_ON(ret); } return 0; error4: blk_cleanup_queue(new->rq); error3: put_disk(new->disk); error2: list_del(&new->list); error1: return ret; }
看到這,我感覺有一種似曾相識的感覺,清風徐來,標准的一個塊設備的注冊流程,申請一個gendisk,設置,之后在注冊,都是熟悉的味道啊,到此mtd以一個塊設備被添加到內核中,那么它是如何完成請求的呢?
static void mtd_blktrans_request(struct request_queue *rq) { struct mtd_blktrans_dev *dev; struct request *req = NULL; dev = rq->queuedata; //dev=new if (!dev) while ((req = blk_fetch_request(rq)) != NULL)//默認方法處理請求 __blk_end_request_all(req, -ENODEV); else queue_work(dev->wq, &dev->work);//使用隊列處理請求 }
注冊的處理請求隊列會根據dev來選擇是用默認的方法來處理請求隊列呢?還是使用隊列的方式來處理隊列,而在add_mtd_blktrans_dev中已經將 rq->queuedata = new已經賦值,所以不為空,會使用隊列的方式來處請求。
static void mtd_blktrans_work(struct work_struct *work) { struct mtd_blktrans_dev *dev = container_of(work, struct mtd_blktrans_dev, work); struct mtd_blktrans_ops *tr = dev->tr; struct request_queue *rq = dev->rq; struct request *req = NULL; int background_done = 0; spin_lock_irq(rq->queue_lock); while (1) { int res; dev->bg_stop = false; /* 檢查req和從請求隊列中取出一個請求是否是有效 */ if (!req && !(req = blk_fetch_request(rq))) { if (tr->background && !background_done) { spin_unlock_irq(rq->queue_lock); mutex_lock(&dev->lock); tr->background(dev); mutex_unlock(&dev->lock); spin_lock_irq(rq->queue_lock); /* * Do background processing just once per idle * period. */ background_done = !dev->bg_stop; continue; } break; } spin_unlock_irq(rq->queue_lock); mutex_lock(&dev->lock); /* 塊設備請求處理函數 */ res = do_blktrans_request(dev->tr, dev, req); mutex_unlock(&dev->lock); spin_lock_irq(rq->queue_lock); /* 判讀如果不是最后一個請求 */ if (!__blk_end_request_cur(req, res)) req = NULL; background_done = 0; } spin_unlock_irq(rq->queue_lock); }
隊列的處理函數也非常的簡單,先判斷req是否是有效的,如果有效,就調用do_blktrans_request來處理這些請求,根據塊設備的驅動程序,提交io請求的時候,是經過優化的一大堆請求,所以請求處理完畢后,判斷這個請求是否是最后一個,如果不是,就接着去調用do_blktrans_request來處理,直到請求處理完畢。
static int do_blktrans_request(struct mtd_blktrans_ops *tr, struct mtd_blktrans_dev *dev, struct request *req) { unsigned long block, nsect; char *buf; block = blk_rq_pos(req) << 9 >> tr->blkshift; //要處理的扇區 nsect = blk_rq_cur_bytes(req) >> tr->blkshift; //傳送的扇區數目 buf = bio_data(req->bio); if (req->cmd_type != REQ_TYPE_FS) return -EIO; if (req->cmd_flags & REQ_FLUSH) return tr->flush(dev); if (blk_rq_pos(req) + blk_rq_cur_sectors(req) > get_capacity(req->rq_disk)) //檢查是否在容量內 return -EIO; if (req->cmd_flags & REQ_DISCARD) return tr->discard(dev, block, nsect); switch(rq_data_dir(req)) { case READ: for (; nsect > 0; nsect--, block++, buf += tr->blksize) if (tr->readsect(dev, block, buf)) //讀操作 return -EIO; rq_flush_dcache_pages(req); //flush all pages in a request return 0; case WRITE: if (!tr->writesect) return -EIO; rq_flush_dcache_pages(req); //flush all pages in a request for (; nsect > 0; nsect--, block++, buf += tr->blksize) if (tr->writesect(dev, block, buf)) //寫操作 return -EIO; return 0; default: printk(KERN_NOTICE "Unknown request %u\n", rq_data_dir(req)); return -EIO; } }
對於處理請求的流程,如此之簡單,經過簡單的參數檢查,調用對應的read/write接口就完成了塊設備的讀寫操作。