源代碼及NVMe協議版本
- SPDK : spdk-17.07.1
- DPDK : dpdk-17.08
- NVMe Spec: 1.2.1
基本分析方法
- 01 - 到官網http://www.spdk.io/下載spdk-17.07.1.tar.gz
- 02 - 到官網http://www.dpdk.org/下載dpdk-17.08.tar.xz
- 03 - 創建目錄nvme/src, 將spdk-17.07.1.tar.gz和dpdk-17.08.tar.xz解壓縮到nvme/src中,然后用OpenGrok創建網頁版的源代碼樹
- 04 - 閱讀SPDK/NVMe驅動源代碼, 同時參考NVMeDirect和Linux內核NVMe驅動
1. 識別NVMe固態硬盤的方法
NVMe SSD是一個PCIe設備, 那么怎么識別這種類型的設備? 有兩種方法。
方法1: 通過Device ID + Vendor ID
方法2: 通過Class Code
在Linux內核NVMe驅動中,使用的是第一種方法。而在SPDK中,使用的是第二種方法。 上代碼:
- src/spdk-17.07.1/include/spdk/pci_ids.h
52 /** 53 * PCI class code for NVMe devices. 54 * 55 * Base class code 01h: mass storage 56 * Subclass code 08h: non-volatile memory 57 * Programming interface 02h: NVM Express 58 */ 59 #define SPDK_PCI_CLASS_NVME 0x010802
而Class Code (0x010802) 在NVMe Specification中的定義如下:
2. Hello World
開始學習一門新的語言或者開發套件的時候,總是離不開"Hello World"。 SPDK也不例外, 讓我們從hello_world.c開始, 看一下main()是如何使用SPDK/NVMe驅動的API的,從而幫助我們發現使用NVMe SSDs的主邏輯,
- src/spdk-17.07.1/examples/nvme/hello_world/hello_world.c
306 int main(int argc, char **argv) 307 { 308 int rc; 309 struct spdk_env_opts opts; 310 311 /* 312 * SPDK relies on an abstraction around the local environment 313 * named env that handles memory allocation and PCI device operations. 314 * This library must be initialized first. 315 * 316 */ 317 spdk_env_opts_init(&opts); 318 opts.name = "hello_world"; 319 opts.shm_id = 0; 320 spdk_env_init(&opts); 321 322 printf("Initializing NVMe Controllers\n"); 323 324 /* 325 * Start the SPDK NVMe enumeration process. probe_cb will be called 326 * for each NVMe controller found, giving our application a choice on 327 * whether to attach to each controller. attach_cb will then be 328 * called for each controller after the SPDK NVMe driver has completed 329 * initializing the controller we chose to attach. 330 */ 331 rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL); 332 if (rc != 0) { 333 fprintf(stderr, "spdk_nvme_probe() failed\n"); 334 cleanup(); 335 return 1; 336 } 337 338 if (g_controllers == NULL) { 339 fprintf(stderr, "no NVMe controllers found\n"); 340 cleanup(); 341 return 1; 342 } 343 344 printf("Initialization complete.\n"); 345 hello_world(); 346 cleanup(); 347 return 0; 348 }
main()的處理流程為:
001 - 317 spdk_env_opts_init(&opts); 002 - 320 spdk_env_init(&opts); 003 - 331 rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL); 004 - 345 hello_world(); 005 - 346 cleanup();
- 001-002,spdk運行環境初始化
- 003,調用函數spdk_nvme_probe()主動發現NVMe SSDs設備。 顯然, 接下來我們要分析的關鍵函數就是spdk_nvme_probe()。
- 004,調用函數hello_world()做簡單的讀寫操作
- 005,調用函數cleanup()以釋放內存資源,detach NVMe SSD設備等。
在分析關鍵函數spdk_nvme_probe()之前,讓我們先搞清楚兩個問題:
- 問題1: 每一塊NVMe固態硬盤里都一個控制器(Controller), 那么發現的所有NVMe固態硬盤(也就是NVMe Controllers)以什么方式組織在一起?
- 問題2: 每一塊NVMe固態硬盤都可以划分為多個NameSpace (類似邏輯分區的概念), 那么這些NameSpace以什么方式組織在一起?
對有經驗的C程序員來說,回答這兩個問題很easy,那就是鏈表。我們的hello_world.c也是這么干的。看代碼:
39 struct ctrlr_entry { 40 struct spdk_nvme_ctrlr *ctrlr; 41 struct ctrlr_entry *next; 42 char name[1024]; 43 }; 44 45 struct ns_entry { 46 struct spdk_nvme_ctrlr *ctrlr; 47 struct spdk_nvme_ns *ns; 48 struct ns_entry *next; 49 struct spdk_nvme_qpair *qpair; 50 }; 51 52 static struct ctrlr_entry *g_controllers = NULL; 53 static struct ns_entry *g_namespaces = NULL;
其中,
- g_controllers是管理所有NVMe固態硬盤(i.e. NVMe Controllers)的全局鏈表頭。
- g_namespaces是管理所有的namespaces的全局鏈表頭。
那么,回到main()的L338-342, 就很好理解了。 因為g_controllers指針為NULL, 所以沒有找到NVMe SSD盤啊,於是cleanup后退出。
338 if (g_controllers == NULL) { 339 fprintf(stderr, "no NVMe controllers found\n"); 340 cleanup(); 341 return 1; 342 }
現在看看hello_world.c是如何使用spdk_nvme_probe()的,
331 rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
顯然,probe_cb和attach_cb是兩個callback函數, (其實還有remove_cb, L331未使用)
- probe_cb: 當枚舉到一個NVMe設備的時候被調用
- attach_cb: 當一個NVMe設備已經被attach(掛接?)到一個用戶態的NVMe 驅動的時候被調用
probe_cb, attach_cb以及remove_cb的相關定義如下:
- src/spdk-17.07.1/include/spdk/nvme.h
268 /** 269 * Callback for spdk_nvme_probe() enumeration. 270 * 271 * \param opts NVMe controller initialization options. This structure will be populated with the 272 * default values on entry, and the user callback may update any options to request a different 273 * value. The controller may not support all requested parameters, so the final values will be 274 * provided during the attach callback. 275 * \return true to attach to this device. 276 */ 277 typedef bool (*spdk_nvme_probe_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid, 278 struct spdk_nvme_ctrlr_opts *opts); 279 280 /** 281 * Callback for spdk_nvme_probe() to report a device that has been attached to the userspace NVMe driver. 282 * 283 * \param opts NVMe controller initialization options that were actually used. Options may differ 284 * from the requested options from the probe call depending on what the controller supports. 285 */ 286 typedef void (*spdk_nvme_attach_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid, 287 struct spdk_nvme_ctrlr *ctrlr, 288 const struct spdk_nvme_ctrlr_opts *opts); 289 290 /** 291 * Callback for spdk_nvme_probe() to report that a device attached to the userspace NVMe driver 292 * has been removed from the system. 293 * 294 * The controller will remain in a failed state (any new I/O submitted will fail). 295 * 296 * The controller must be detached from the userspace driver by calling spdk_nvme_detach() 297 * once the controller is no longer in use. It is up to the library user to ensure that 298 * no other threads are using the controller before calling spdk_nvme_detach(). 299 * 300 * \param ctrlr NVMe controller instance that was removed. 301 */ 302 typedef void (*spdk_nvme_remove_cb)(void *cb_ctx, struct spdk_nvme_ctrlr *ctrlr); 303 304 /** 305 * \brief Enumerate the bus indicated by the transport ID and attach the userspace NVMe driver 306 * to each device found if desired. 307 * 308 * \param trid The transport ID indicating which bus to enumerate. If the trtype is PCIe or trid is NULL, 309 * this will scan the local PCIe bus. If the trtype is RDMA, the traddr and trsvcid must point at the 310 * location of an NVMe-oF discovery service. 311 * \param cb_ctx Opaque value which will be passed back in cb_ctx parameter of the callbacks. 312 * \param probe_cb will be called once per NVMe device found in the system. 313 * \param attach_cb will be called for devices for which probe_cb returned true once that NVMe 314 * controller has been attached to the userspace driver. 315 * \param remove_cb will be called for devices that were attached in a previous spdk_nvme_probe() 316 * call but are no longer attached to the system. Optional; specify NULL if removal notices are not 317 * desired. 318 * 319 * This function is not thread safe and should only be called from one thread at a time while no 320 * other threads are actively using any NVMe devices. 321 * 322 * If called from a secondary process, only devices that have been attached to the userspace driver 323 * in the primary process will be probed. 324 * 325 * If called more than once, only devices that are not already attached to the SPDK NVMe driver 326 * will be reported. 327 * 328 * To stop using the the controller and release its associated resources, 329 * call \ref spdk_nvme_detach with the spdk_nvme_ctrlr instance returned by this function. 330 */ 331 int spdk_nvme_probe(const struct spdk_nvme_transport_id *trid, 332 void *cb_ctx, 333 spdk_nvme_probe_cb probe_cb, 334 spdk_nvme_attach_cb attach_cb, 335 spdk_nvme_remove_cb remove_cb);
為了不被proce_cb, attach_cb, remove_cb帶跑偏了,我們接下來看看結構體struct spdk_nvme_transport_id和spdk_nvme_probe()函數的主邏輯。
- src/spdk-17.07.1/include/spdk/nvme.h
142 /** 143 * NVMe transport identifier. 144 * 145 * This identifies a unique endpoint on an NVMe fabric. 146 * 147 * A string representation of a transport ID may be converted to this type using 148 * spdk_nvme_transport_id_parse(). 149 */ 150 struct spdk_nvme_transport_id { 151 /** 152 * NVMe transport type. 153 */ 154 enum spdk_nvme_transport_type trtype; 155 156 /** 157 * Address family of the transport address. 158 * 159 * For PCIe, this value is ignored. 160 */ 161 enum spdk_nvmf_adrfam adrfam; 162 163 /** 164 * Transport address of the NVMe-oF endpoint. For transports which use IP 165 * addressing (e.g. RDMA), this should be an IP address. For PCIe, this 166 * can either be a zero length string (the whole bus) or a PCI address 167 * in the format DDDD:BB:DD.FF or DDDD.BB.DD.FF 168 */ 169 char traddr[SPDK_NVMF_TRADDR_MAX_LEN + 1]; 170 171 /** 172 * Transport service id of the NVMe-oF endpoint. For transports which use 173 * IP addressing (e.g. RDMA), this field shoud be the port number. For PCIe, 174 * this is always a zero length string. 175 */ 176 char trsvcid[SPDK_NVMF_TRSVCID_MAX_LEN + 1]; 177 178 /** 179 * Subsystem NQN of the NVMe over Fabrics endpoint. May be a zero length string. 180 */ 181 char subnqn[SPDK_NVMF_NQN_MAX_LEN + 1]; 182 };
對於NVMe over PCIe, 我們只需要關注"NVMe transport type"這一項:
154 enum spdk_nvme_transport_type trtype;
而目前,支持兩種傳輸類型, PCIe和RDMA。
130 enum spdk_nvme_transport_type { 131 /** 132 * PCIe Transport (locally attached devices) 133 */ 134 SPDK_NVME_TRANSPORT_PCIE = 256, 135 136 /** 137 * RDMA Transport (RoCE, iWARP, etc.) 138 */ 139 SPDK_NVME_TRANSPORT_RDMA = SPDK_NVMF_TRTYPE_RDMA, 140 };
有關RDMA的問題,我們后面暫時不做討論,因為我們目前主要關心NVMe over PCIe。
接下來看函數spdk_nvme_probe()的代碼,
- src/spdk-17.07.1/lib/nvme/nvme.c
396 int 397 spdk_nvme_probe(const struct spdk_nvme_transport_id *trid, void *cb_ctx, 398 spdk_nvme_probe_cb probe_cb, spdk_nvme_attach_cb attach_cb, 399 spdk_nvme_remove_cb remove_cb) 400 { 401 int rc; 402 struct spdk_nvme_ctrlr *ctrlr; 403 struct spdk_nvme_transport_id trid_pcie; 404 405 rc = nvme_driver_init(); 406 if (rc != 0) { 407 return rc; 408 } 409 410 if (trid == NULL) { 411 memset(&trid_pcie, 0, sizeof(trid_pcie)); 412 trid_pcie.trtype = SPDK_NVME_TRANSPORT_PCIE; 413 trid = &trid_pcie; 414 } 415 416 if (!spdk_nvme_transport_available(trid->trtype)) { 417 SPDK_ERRLOG("NVMe trtype %u not available\n", trid->trtype); 418 return -1; 419 } 420 421 nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock); 422 423 nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb); 424 425 if (!spdk_process_is_primary()) { 426 TAILQ_FOREACH(ctrlr, &g_spdk_nvme_driver->attached_ctrlrs, tailq) { 427 nvme_ctrlr_proc_get_ref(ctrlr); 428 429 /* 430 * Unlock while calling attach_cb() so the user can call other functions 431 * that may take the driver lock, like nvme_detach(). 432 */ 433 nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock); 434 attach_cb(cb_ctx, &ctrlr->trid, ctrlr, &ctrlr->opts); 435 nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock); 436 } 437 438 nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock); 439 return 0; 440 } 441 442 nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock); 443 /* 444 * Keep going even if one or more nvme_attach() calls failed, 445 * but maintain the value of rc to signal errors when we return. 446 */ 447 448 rc = nvme_init_controllers(cb_ctx, attach_cb); 449 450 return rc; 451 }
spdk_nvme_probe()的處理流程為:
001 405: rc = nvme_driver_init(); 002 410-414: set trid if it is NULL 003 416: check NVMe trtype via spdk_nvme_transport_available(trid->trtype) 004 423: nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb); 005 425: check spdk process is primary, if not, do something at L426-440 006 448: rc = nvme_init_controllers(cb_ctx, attach_cb);
接下來,讓我們看看函數nvme_transport_ctrlr_scan(),
423 nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
/* src/spdk-17.07.1/lib/nvme/nvme_transport.c#92 */ 91 int 92 nvme_transport_ctrlr_scan(const struct spdk_nvme_transport_id *trid, 93 void *cb_ctx, 94 spdk_nvme_probe_cb probe_cb, 95 spdk_nvme_remove_cb remove_cb) 96 { 97 NVME_TRANSPORT_CALL(trid->trtype, ctrlr_scan, (trid, cb_ctx, probe_cb, remove_cb)); 98 }
而宏NVME_TRANSPORT_CALL的定義是:
/* src/spdk-17.07.1/lib/nvme/nvme_transport.c#60 */ 52 #define TRANSPORT_PCIE(func_name, args) case SPDK_NVME_TRANSPORT_PCIE: return nvme_pcie_ ## func_name args; .. 60 #define NVME_TRANSPORT_CALL(trtype, func_name, args) \ 61 do { \ 62 switch (trtype) { \ 63 TRANSPORT_PCIE(func_name, args) \ 64 TRANSPORT_FABRICS_RDMA(func_name, args) \ 65 TRANSPORT_DEFAULT(trtype) \ 66 } \ 67 SPDK_UNREACHABLE(); \ 68 } while (0) ..
於是, nvme_transport_ctrlr_scan()被轉化為nvme_pcie_ctrlr_scan()調用(對NVMe over PCIe)來說,
/* src/spdk-17.07.1/lib/nvme/nvme_pcie.c#620 */ 619 int 620 nvme_pcie_ctrlr_scan(const struct spdk_nvme_transport_id *trid, 621 void *cb_ctx, 622 spdk_nvme_probe_cb probe_cb, 623 spdk_nvme_remove_cb remove_cb) 624 { 625 struct nvme_pcie_enum_ctx enum_ctx = {}; 626 627 enum_ctx.probe_cb = probe_cb; 628 enum_ctx.cb_ctx = cb_ctx; 629 630 if (strlen(trid->traddr) != 0) { 631 if (spdk_pci_addr_parse(&enum_ctx.pci_addr, trid->traddr)) { 632 return -1; 633 } 634 enum_ctx.has_pci_addr = true; 635 } 636 637 if (hotplug_fd < 0) { 638 hotplug_fd = spdk_uevent_connect(); 639 if (hotplug_fd < 0) { 640 SPDK_TRACELOG(SPDK_TRACE_NVME, "Failed to open uevent netlink socket\n"); 641 } 642 } else { 643 _nvme_pcie_hotplug_monitor(cb_ctx, probe_cb, remove_cb); 644 } 645 646 if (enum_ctx.has_pci_addr == false) { 647 return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx); 648 } else { 649 return spdk_pci_nvme_device_attach(pcie_nvme_enum_cb, &enum_ctx, &enum_ctx.pci_addr); 650 } 651 }
接下來重點看看L647對應的函數spck_pci_nvme_enumerate()就好,因為我們的目標是看明白是如何利用Class Code發現SSD設備的。
647 return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx);
/* src/spdk-17.07.1/lib/env_dpdk/pci_nvme.c */ 81 int 82 spdk_pci_nvme_enumerate(spdk_pci_enum_cb enum_cb, void *enum_ctx) 83 { 84 return spdk_pci_enumerate(&g_nvme_pci_drv, enum_cb, enum_ctx); 85 }
注意: L84第一個參數為一個全局變量g_nvme_pci_drv的地址, ( 看到一個全局結構體變量總是令人興奮的:-) )
/* src/spdk-17.07.1/lib/env_dpdk/pci_nvme.c */ 38 static struct rte_pci_id nvme_pci_driver_id[] = { 39 #if RTE_VERSION >= RTE_VERSION_NUM(16, 7, 0, 1) 40 { 41 .class_id = SPDK_PCI_CLASS_NVME, 42 .vendor_id = PCI_ANY_ID, 43 .device_id = PCI_ANY_ID, 44 .subsystem_vendor_id = PCI_ANY_ID, 45 .subsystem_device_id = PCI_ANY_ID, 46 }, 47 #else 48 {RTE_PCI_DEVICE(0x8086, 0x0953)}, 49 #endif 50 { .vendor_id = 0, /* sentinel */ }, 51 }; .. 53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = { 54 .driver = { 55 .drv_flags = RTE_PCI_DRV_NEED_MAPPING, 56 .id_table = nvme_pci_driver_id, .. 66 }, 67 68 .cb_fn = NULL, 69 .cb_arg = NULL, 70 .mtx = PTHREAD_MUTEX_INITIALIZER, 71 .is_registered = false, 72 };
啊哈! 終於跟Class Code (SPDK_PCI_CLASS_NVME=0x010802)扯上了關系。 全局變量g_nvme_pci_drv就是在L53行定義的,而g_nvme_pci_drv.driver.id_table則是在L38行定義的。
38 static struct rte_pci_id nvme_pci_driver_id[] = { .. 41 .class_id = SPDK_PCI_CLASS_NVME, .. 53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = { 54 .driver = { .. 56 .id_table = nvme_pci_driver_id, ..
那么,我們只需要進一步深挖spdk_pci_enumerate()就可以找到SSD設備是如何被發現的了...
/* src/spdk-17.07.1/lib/env_dpdk/pci.c#150 */ 149 int 150 spdk_pci_enumerate(struct spdk_pci_enum_ctx *ctx, 151 spdk_pci_enum_cb enum_cb, 152 void *enum_ctx) 153 { ... 168 169 #if RTE_VERSION >= RTE_VERSION_NUM(17, 05, 0, 4) 170 if (rte_pci_probe() != 0) { 171 #else 172 if (rte_eal_pci_probe() != 0) { 173 #endif ... 184 return 0; 185 }
省略了一些代碼,我們接下來重點關注L170,
170 if (rte_pci_probe() != 0) {
從rte_pci_probe()函數的實現開始,我們就深入到DPDK的內部了,代碼如下,
/* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#413 */ 407 /* 408 * Scan the content of the PCI bus, and call the probe() function for 409 * all registered drivers that have a matching entry in its id_table 410 * for discovered devices. 411 */ 412 int 413 rte_pci_probe(void) 414 { 415 struct rte_pci_device *dev = NULL; 416 size_t probed = 0, failed = 0; 417 struct rte_devargs *devargs; 418 int probe_all = 0; 419 int ret = 0; 420 421 if (rte_pci_bus.bus.conf.scan_mode != RTE_BUS_SCAN_WHITELIST) 422 probe_all = 1; 423 424 FOREACH_DEVICE_ON_PCIBUS(dev) { 425 probed++; 426 427 devargs = dev->device.devargs; 428 /* probe all or only whitelisted devices */ 429 if (probe_all) 430 ret = pci_probe_all_drivers(dev); 431 else if (devargs != NULL && 432 devargs->policy == RTE_DEV_WHITELISTED) 433 ret = pci_probe_all_drivers(dev); 434 if (ret < 0) { 435 RTE_LOG(ERR, EAL, "Requested device " PCI_PRI_FMT 436 " cannot be used\n", dev->addr.domain, dev->addr.bus, 437 dev->addr.devid, dev->addr.function); 438 rte_errno = errno; 439 failed++; 440 ret = 0; 441 } 442 } 443 444 return (probed && probed == failed) ? -1 : 0; 445 }
L430是我們關注的重點,
430 ret = pci_probe_all_drivers(dev);
函數pci_probe_all_drivers()的實現如下:
/* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#307 */ 301 /* 302 * If vendor/device ID match, call the probe() function of all 303 * registered driver for the given device. Return -1 if initialization 304 * failed, return 1 if no driver is found for this device. 305 */ 306 static int 307 pci_probe_all_drivers(struct rte_pci_device *dev) 308 { 309 struct rte_pci_driver *dr = NULL; 310 int rc = 0; 311 312 if (dev == NULL) 313 return -1; 314 315 /* Check if a driver is already loaded */ 316 if (dev->driver != NULL) 317 return 0; 318 319 FOREACH_DRIVER_ON_PCIBUS(dr) { 320 rc = rte_pci_probe_one_driver(dr, dev); 321 if (rc < 0) 322 /* negative value is an error */ 323 return -1; 324 if (rc > 0) 325 /* positive value means driver doesn't support it */ 326 continue; 327 return 0; 328 } 329 return 1; 330 }
L320是我們關注的重點,
320 rc = rte_pci_probe_one_driver(dr, dev);
/* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#200 */ 195 /* 196 * If vendor/device ID match, call the probe() function of the 197 * driver. 198 */ 199 static int 200 rte_pci_probe_one_driver(struct rte_pci_driver *dr, 201 struct rte_pci_device *dev) 202 { 203 int ret; 204 struct rte_pci_addr *loc; 205 206 if ((dr == NULL) || (dev == NULL)) 207 return -EINVAL; 208 209 loc = &dev->addr; 210 211 /* The device is not blacklisted; Check if driver supports it */ 212 if (!rte_pci_match(dr, dev)) 213 /* Match of device and driver failed */ 214 return 1; 215 216 RTE_LOG(INFO, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n", 217 loc->domain, loc->bus, loc->devid, loc->function, 218 dev->device.numa_node); 219 220 /* no initialization when blacklisted, return without error */ 221 if (dev->device.devargs != NULL && 222 dev->device.devargs->policy == 223 RTE_DEV_BLACKLISTED) { 224 RTE_LOG(INFO, EAL, " Device is blacklisted, not" 225 " initializing\n"); 226 return 1; 227 } 228 229 if (dev->device.numa_node < 0) { 230 RTE_LOG(WARNING, EAL, " Invalid NUMA socket, default to 0\n"); 231 dev->device.numa_node = 0; 232 } 233 234 RTE_LOG(INFO, EAL, " probe driver: %x:%x %s\n", dev->id.vendor_id, 235 dev->id.device_id, dr->driver.name); 236 237 if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) { 238 /* map resources for devices that use igb_uio */ 239 ret = rte_pci_map_device(dev); 240 if (ret != 0) 241 return ret; 242 } 243 244 /* reference driver structure */ 245 dev->driver = dr; 246 dev->device.driver = &dr->driver; 247 248 /* call the driver probe() function */ 249 ret = dr->probe(dr, dev); 250 if (ret) { 251 dev->driver = NULL; 252 dev->device.driver = NULL; 253 if ((dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) && 254 /* Don't unmap if device is unsupported and 255 * driver needs mapped resources. 256 */ 257 !(ret > 0 && 258 (dr->drv_flags & RTE_PCI_DRV_KEEP_MAPPED_RES))) 259 rte_pci_unmap_device(dev); 260 } 261 262 return ret; 263 }
L212是我們關注的重點,
212 if (!rte_pci_match(dr, dev))
而rte_pci_match()的實現如下,
/* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#163 */ 151 /* 152 * Match the PCI Driver and Device using the ID Table 153 * 154 * @param pci_drv 155 * PCI driver from which ID table would be extracted 156 * @param pci_dev 157 * PCI device to match against the driver 158 * @return 159 * 1 for successful match 160 * 0 for unsuccessful match 161 */ 162 static int 163 rte_pci_match(const struct rte_pci_driver *pci_drv, 164 const struct rte_pci_device *pci_dev) 165 { 166 const struct rte_pci_id *id_table; 167 168 for (id_table = pci_drv->id_table; id_table->vendor_id != 0; 169 id_table++) { 170 /* check if device's identifiers match the driver's ones */ 171 if (id_table->vendor_id != pci_dev->id.vendor_id && 172 id_table->vendor_id != PCI_ANY_ID) 173 continue; 174 if (id_table->device_id != pci_dev->id.device_id && 175 id_table->device_id != PCI_ANY_ID) 176 continue; 177 if (id_table->subsystem_vendor_id != 178 pci_dev->id.subsystem_vendor_id && 179 id_table->subsystem_vendor_id != PCI_ANY_ID) 180 continue; 181 if (id_table->subsystem_device_id != 182 pci_dev->id.subsystem_device_id && 183 id_table->subsystem_device_id != PCI_ANY_ID) 184 continue; 185 if (id_table->class_id != pci_dev->id.class_id && 186 id_table->class_id != RTE_CLASS_ANY_ID) 187 continue; 188 189 return 1; 190 } 191 192 return 0; 193 }
看到這里,我們終於找到了SSD設備是如何被發現的, L185-187是我們最希望看到的三行代碼:
185 if (id_table->class_id != pci_dev->id.class_id && 186 id_table->class_id != RTE_CLASS_ANY_ID) 187 continue;
而結構體struct rte_pci_driver和struct rte_pci_device的定義為:
/* src/dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#100 */ 96 /** 97 * A structure describing an ID for a PCI driver. Each driver provides a 98 * table of these IDs for each device that it supports. 99 */ 100 struct rte_pci_id { 101 uint32_t class_id; /**< Class ID (class, subclass, pi) or RTE_CLASS_ANY_ID. */ 102 uint16_t vendor_id; /**< Vendor ID or PCI_ANY_ID. */ 103 uint16_t device_id; /**< Device ID or PCI_ANY_ID. */ 104 uint16_t subsystem_vendor_id; /**< Subsystem vendor ID or PCI_ANY_ID. */ 105 uint16_t subsystem_device_id; /**< Subsystem device ID or PCI_ANY_ID. */ 106 }; /* src/dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#120 */ 120 /** 121 * A structure describing a PCI device. 122 */ 123 struct rte_pci_device { 124 TAILQ_ENTRY(rte_pci_device) next; /**< Next probed PCI device. */ 125 struct rte_device device; /**< Inherit core device */ 126 struct rte_pci_addr addr; /**< PCI location. */ 127 struct rte_pci_id id; /**< PCI ID. */ 128 struct rte_mem_resource mem_resource[PCI_MAX_RESOURCE]; 129 /**< PCI Memory Resource */ 130 struct rte_intr_handle intr_handle; /**< Interrupt handle */ 131 struct rte_pci_driver *driver; /**< Associated driver */ 132 uint16_t max_vfs; /**< sriov enable if not zero */ 133 enum rte_kernel_driver kdrv; /**< Kernel driver passthrough */ 134 char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */ 135 }; /* src/dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#178 */ 175 /** 176 * A structure describing a PCI driver. 177 */ 178 struct rte_pci_driver { 179 TAILQ_ENTRY(rte_pci_driver) next; /**< Next in list. */ 180 struct rte_driver driver; /**< Inherit core driver. */ 181 struct rte_pci_bus *bus; /**< PCI bus reference. */ 182 pci_probe_t *probe; /**< Device Probe function. */ 183 pci_remove_t *remove; /**< Device Remove function. */ 184 const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */ 185 uint32_t drv_flags; /**< Flags contolling handling of device. */ 186 };
到此為止,我們可以對SSD設備發現做如下總結:
- 01 - 使用Class Code (0x010802)作為SSD設備發現的依據
- 02 - 發現SSD設備的時候,從SPDK進入到DPDK中,函數調用棧為:
00 hello_word.c 01 -> main() 02 --> spdk_nvme_probe() 03 ---> nvme_transport_ctrlr_scan() 04 ----> nvme_pcie_ctrlr_scan() 05 -----> spdk_pci_nvme_enumerate() 06 ------> spdk_pci_enumerate(&g_nvme_pci_drv, ...) | SPDK | ========================================================================= 07 -------> rte_pci_probe() | DPDK | 08 --------> pci_probe_all_drivers() 09 ---------> rte_pci_probe_one_driver() 10 ----------> rte_pci_match()
- 03 - DPDK中環境抽象層(EAL: Environment Abstraction Layer)的函數rte_pci_match()是發現SSD設備的關鍵。
- 04 - DPDK的EAL在DPDK架構中所處的位置,如下圖所示:
Your greatness is measured by your horizons. | 你的成就是由你的眼界來衡量的。