[SPDK/NVMe存儲技術分析]004 - SSD設備的發現


源代碼及NVMe協議版本

  • SPDK : spdk-17.07.1
  • DPDK : dpdk-17.08
  • NVMe Spec: 1.2.1

基本分析方法

  • 01 - 到官網http://www.spdk.io/下載spdk-17.07.1.tar.gz
  • 02 - 到官網http://www.dpdk.org/下載dpdk-17.08.tar.xz
  • 03 - 創建目錄nvme/src, 將spdk-17.07.1.tar.gz和dpdk-17.08.tar.xz解壓縮到nvme/src中,然后用OpenGrok創建網頁版的源代碼樹
  • 04 - 閱讀SPDK/NVMe驅動源代碼, 同時參考NVMeDirect和Linux內核NVMe驅動

1. 識別NVMe固態硬盤的方法

NVMe SSD是一個PCIe設備, 那么怎么識別這種類型的設備? 有兩種方法。

方法1: 通過Device ID + Vendor ID

方法2: 通過Class Code

在Linux內核NVMe驅動中,使用的是第一種方法。而在SPDK中,使用的是第二種方法。 上代碼:

  • src/spdk-17.07.1/include/spdk/pci_ids.h
52 /**
53  * PCI class code for NVMe devices.
54  *
55  * Base class code 01h: mass storage
56  * Subclass code 08h: non-volatile memory
57  * Programming interface 02h: NVM Express
58  */
59 #define SPDK_PCI_CLASS_NVME          0x010802

Class Code (0x010802) 在NVMe Specification中的定義如下:

2. Hello World

開始學習一門新的語言或者開發套件的時候,總是離不開"Hello World"。 SPDK也不例外, 讓我們從hello_world.c開始, 看一下main()是如何使用SPDK/NVMe驅動的API的,從而幫助我們發現使用NVMe SSDs的主邏輯,

  • src/spdk-17.07.1/examples/nvme/hello_world/hello_world.c
306 int main(int argc, char **argv)
307 {
308     int rc;
309     struct spdk_env_opts opts;
310
311     /*
312      * SPDK relies on an abstraction around the local environment
313      * named env that handles memory allocation and PCI device operations.
314      * This library must be initialized first.
315      *
316      */
317     spdk_env_opts_init(&opts);
318     opts.name = "hello_world";
319     opts.shm_id = 0;
320     spdk_env_init(&opts);
321
322     printf("Initializing NVMe Controllers\n");
323
324     /*
325      * Start the SPDK NVMe enumeration process.  probe_cb will be called
326      *  for each NVMe controller found, giving our application a choice on
327      *  whether to attach to each controller.  attach_cb will then be
328      *  called for each controller after the SPDK NVMe driver has completed
329      *  initializing the controller we chose to attach.
330      */
331     rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
332     if (rc != 0) {
333             fprintf(stderr, "spdk_nvme_probe() failed\n");
334             cleanup();
335             return 1;
336     }
337
338     if (g_controllers == NULL) {
339             fprintf(stderr, "no NVMe controllers found\n");
340             cleanup();
341             return 1;
342     }
343
344     printf("Initialization complete.\n");
345     hello_world();
346     cleanup();
347     return 0;
348 }

main()的處理流程為:

001 - 317     spdk_env_opts_init(&opts);
002 - 320     spdk_env_init(&opts);
003 - 331     rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
004 - 345     hello_world();
005 - 346     cleanup();
  • 001-002,spdk運行環境初始化
  • 003,調用函數spdk_nvme_probe()主動發現NVMe SSDs設備。 顯然, 接下來我們要分析的關鍵函數就是spdk_nvme_probe()
  • 004,調用函數hello_world()做簡單的讀寫操作
  • 005,調用函數cleanup()以釋放內存資源,detach NVMe SSD設備等。

 在分析關鍵函數spdk_nvme_probe()之前,讓我們先搞清楚兩個問題:

  • 問題1: 每一塊NVMe固態硬盤里都一個控制器(Controller), 那么發現的所有NVMe固態硬盤(也就是NVMe Controllers)以什么方式組織在一起?
  • 問題2: 每一塊NVMe固態硬盤都可以划分為多個NameSpace (類似邏輯分區的概念), 那么這些NameSpace以什么方式組織在一起?

對有經驗的C程序員來說,回答這兩個問題很easy,那就是鏈表。我們的hello_world.c也是這么干的。看代碼:

39 struct ctrlr_entry {
40      struct spdk_nvme_ctrlr  *ctrlr;
41      struct ctrlr_entry      *next;
42      char                    name[1024];
43 };
44
45 struct ns_entry {
46      struct spdk_nvme_ctrlr  *ctrlr;
47      struct spdk_nvme_ns     *ns;
48      struct ns_entry         *next;
49      struct spdk_nvme_qpair  *qpair;
50 };
51
52 static struct ctrlr_entry *g_controllers = NULL;
53 static struct ns_entry *g_namespaces = NULL;

其中,

  • g_controllers是管理所有NVMe固態硬盤(i.e. NVMe Controllers)的全局鏈表頭。
  • g_namespaces是管理所有的namespaces的全局鏈表頭。

那么,回到main()的L338-342, 就很好理解了。 因為g_controllers指針為NULL, 所以沒有找到NVMe SSD盤啊,於是cleanup后退出。

338     if (g_controllers == NULL) {
339             fprintf(stderr, "no NVMe controllers found\n");
340             cleanup();
341             return 1;
342     }

現在看看hello_world.c是如何使用spdk_nvme_probe()的,

331     rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);

顯然,probe_cb和attach_cb是兩個callback函數, (其實還有remove_cb, L331未使用)

  • probe_cb: 當枚舉到一個NVMe設備的時候被調用
  • attach_cb: 當一個NVMe設備已經被attach(掛接?)到一個用戶態的NVMe 驅動的時候被調用

probe_cb, attach_cb以及remove_cb的相關定義如下:

  • src/spdk-17.07.1/include/spdk/nvme.h
268 /**
269  * Callback for spdk_nvme_probe() enumeration.
270  *
271  * \param opts NVMe controller initialization options.  This structure will be populated with the
272  * default values on entry, and the user callback may update any options to request a different
273  * value.  The controller may not support all requested parameters, so the final values will be
274  * provided during the attach callback.
275  * \return true to attach to this device.
276  */
277 typedef bool (*spdk_nvme_probe_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid,
278                                struct spdk_nvme_ctrlr_opts *opts);
279
280 /**
281  * Callback for spdk_nvme_probe() to report a device that has been attached to the userspace NVMe driver.
282  *
283  * \param opts NVMe controller initialization options that were actually used.  Options may differ
284  * from the requested options from the probe call depending on what the controller supports.
285  */
286 typedef void (*spdk_nvme_attach_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid,
287                                 struct spdk_nvme_ctrlr *ctrlr,
288                                 const struct spdk_nvme_ctrlr_opts *opts);
289
290 /**
291  * Callback for spdk_nvme_probe() to report that a device attached to the userspace NVMe driver
292  * has been removed from the system.
293  *
294  * The controller will remain in a failed state (any new I/O submitted will fail).
295  *
296  * The controller must be detached from the userspace driver by calling spdk_nvme_detach()
297  * once the controller is no longer in use.  It is up to the library user to ensure that
298  * no other threads are using the controller before calling spdk_nvme_detach().
299  *
300  * \param ctrlr NVMe controller instance that was removed.
301  */
302 typedef void (*spdk_nvme_remove_cb)(void *cb_ctx, struct spdk_nvme_ctrlr *ctrlr);
303
304 /**
305  * \brief Enumerate the bus indicated by the transport ID and attach the userspace NVMe driver
306  * to each device found if desired.
307  *
308  * \param trid The transport ID indicating which bus to enumerate. If the trtype is PCIe or trid is NULL,
309  * this will scan the local PCIe bus. If the trtype is RDMA, the traddr and trsvcid must point at the
310  * location of an NVMe-oF discovery service.
311  * \param cb_ctx Opaque value which will be passed back in cb_ctx parameter of the callbacks.
312  * \param probe_cb will be called once per NVMe device found in the system.
313  * \param attach_cb will be called for devices for which probe_cb returned true once that NVMe
314  * controller has been attached to the userspace driver.
315  * \param remove_cb will be called for devices that were attached in a previous spdk_nvme_probe()
316  * call but are no longer attached to the system. Optional; specify NULL if removal notices are not
317  * desired.
318  *
319  * This function is not thread safe and should only be called from one thread at a time while no
320  * other threads are actively using any NVMe devices.
321  *
322  * If called from a secondary process, only devices that have been attached to the userspace driver
323  * in the primary process will be probed.
324  *
325  * If called more than once, only devices that are not already attached to the SPDK NVMe driver
326  * will be reported.
327  *
328  * To stop using the the controller and release its associated resources,
329  * call \ref spdk_nvme_detach with the spdk_nvme_ctrlr instance returned by this function.
330  */
331 int spdk_nvme_probe(const struct spdk_nvme_transport_id *trid,
332                 void *cb_ctx,
333                 spdk_nvme_probe_cb probe_cb,
334                 spdk_nvme_attach_cb attach_cb,
335                 spdk_nvme_remove_cb remove_cb);

為了不被proce_cb, attach_cb, remove_cb帶跑偏了,我們接下來看看結構體struct spdk_nvme_transport_id和spdk_nvme_probe()函數的主邏輯。

  • src/spdk-17.07.1/include/spdk/nvme.h
142 /**
143  * NVMe transport identifier.
144  *
145  * This identifies a unique endpoint on an NVMe fabric.
146  *
147  * A string representation of a transport ID may be converted to this type using
148  * spdk_nvme_transport_id_parse().
149  */
150 struct spdk_nvme_transport_id {
151     /**
152      * NVMe transport type.
153      */
154     enum spdk_nvme_transport_type trtype;
155
156     /**
157      * Address family of the transport address.
158      *
159      * For PCIe, this value is ignored.
160      */
161     enum spdk_nvmf_adrfam adrfam;
162
163     /**
164      * Transport address of the NVMe-oF endpoint. For transports which use IP
165      * addressing (e.g. RDMA), this should be an IP address. For PCIe, this
166      * can either be a zero length string (the whole bus) or a PCI address
167      * in the format DDDD:BB:DD.FF or DDDD.BB.DD.FF
168      */
169     char traddr[SPDK_NVMF_TRADDR_MAX_LEN + 1];
170
171     /**
172      * Transport service id of the NVMe-oF endpoint.  For transports which use
173      * IP addressing (e.g. RDMA), this field shoud be the port number. For PCIe,
174      * this is always a zero length string.
175      */
176     char trsvcid[SPDK_NVMF_TRSVCID_MAX_LEN + 1];
177
178     /**
179      * Subsystem NQN of the NVMe over Fabrics endpoint. May be a zero length string.
180      */
181     char subnqn[SPDK_NVMF_NQN_MAX_LEN + 1];
182 };

對於NVMe over PCIe, 我們只需要關注"NVMe transport type"這一項:

154    enum spdk_nvme_transport_type trtype;

 而目前,支持兩種傳輸類型, PCIe和RDMA。

130 enum spdk_nvme_transport_type {
131     /**
132      * PCIe Transport (locally attached devices)
133      */
134     SPDK_NVME_TRANSPORT_PCIE = 256,
135
136     /**
137      * RDMA Transport (RoCE, iWARP, etc.)
138      */
139     SPDK_NVME_TRANSPORT_RDMA = SPDK_NVMF_TRTYPE_RDMA,
140 };

有關RDMA的問題,我們后面暫時不做討論,因為我們目前主要關心NVMe over PCIe

接下來看函數spdk_nvme_probe()的代碼,

  • src/spdk-17.07.1/lib/nvme/nvme.c
396 int
397 spdk_nvme_probe(const struct spdk_nvme_transport_id *trid, void *cb_ctx,
398             spdk_nvme_probe_cb probe_cb, spdk_nvme_attach_cb attach_cb,
399             spdk_nvme_remove_cb remove_cb)
400 {
401     int rc;
402     struct spdk_nvme_ctrlr *ctrlr;
403     struct spdk_nvme_transport_id trid_pcie;
404
405     rc = nvme_driver_init();
406     if (rc != 0) {
407             return rc;
408     }
409
410     if (trid == NULL) {
411             memset(&trid_pcie, 0, sizeof(trid_pcie));
412             trid_pcie.trtype = SPDK_NVME_TRANSPORT_PCIE;
413             trid = &trid_pcie;
414     }
415
416     if (!spdk_nvme_transport_available(trid->trtype)) {
417             SPDK_ERRLOG("NVMe trtype %u not available\n", trid->trtype);
418             return -1;
419     }
420
421     nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock);
422
423     nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
424
425     if (!spdk_process_is_primary()) {
426             TAILQ_FOREACH(ctrlr, &g_spdk_nvme_driver->attached_ctrlrs, tailq) {
427                     nvme_ctrlr_proc_get_ref(ctrlr);
428
429                     /*
430                      * Unlock while calling attach_cb() so the user can call other functions
431                      *  that may take the driver lock, like nvme_detach().
432                      */
433                     nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
434                     attach_cb(cb_ctx, &ctrlr->trid, ctrlr, &ctrlr->opts);
435                     nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock);
436             }
437
438             nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
439             return 0;
440     }
441
442     nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
443     /*
444      * Keep going even if one or more nvme_attach() calls failed,
445      *  but maintain the value of rc to signal errors when we return.
446      */
447
448     rc = nvme_init_controllers(cb_ctx, attach_cb);
449
450     return rc;
451 }

spdk_nvme_probe()的處理流程為:

001 405:         rc = nvme_driver_init();
002 410-414: set trid if it is NULL
003 416:     check NVMe trtype via spdk_nvme_transport_available(trid->trtype)
004 423:     nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
005 425:     check spdk process is primary, if not, do something at L426-440
006 448:         rc = nvme_init_controllers(cb_ctx, attach_cb);

接下來,讓我們看看函數nvme_transport_ctrlr_scan(),

423     nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
/* src/spdk-17.07.1/lib/nvme/nvme_transport.c#92 */

91 int
92 nvme_transport_ctrlr_scan(const struct spdk_nvme_transport_id *trid,
93                        void *cb_ctx,
94                        spdk_nvme_probe_cb probe_cb,
95                        spdk_nvme_remove_cb remove_cb)
96 {
97      NVME_TRANSPORT_CALL(trid->trtype, ctrlr_scan, (trid, cb_ctx, probe_cb, remove_cb));
98 }

而宏NVME_TRANSPORT_CALL的定義是:

/* src/spdk-17.07.1/lib/nvme/nvme_transport.c#60 */
52 #define TRANSPORT_PCIE(func_name, args)      case SPDK_NVME_TRANSPORT_PCIE: return nvme_pcie_ ## func_name args;
..
60 #define NVME_TRANSPORT_CALL(trtype, func_name, args)         \
61      do {                                                    \
62              switch (trtype) {                               \
63              TRANSPORT_PCIE(func_name, args)                 \
64              TRANSPORT_FABRICS_RDMA(func_name, args)         \
65              TRANSPORT_DEFAULT(trtype)                       \
66              }                                               \
67              SPDK_UNREACHABLE();                             \
68      } while (0)
..

於是, nvme_transport_ctrlr_scan()被轉化為nvme_pcie_ctrlr_scan()調用(對NVMe over PCIe)來說,

/* src/spdk-17.07.1/lib/nvme/nvme_pcie.c#620 */
619 int
620 nvme_pcie_ctrlr_scan(const struct spdk_nvme_transport_id *trid,
621                  void *cb_ctx,
622                  spdk_nvme_probe_cb probe_cb,
623                  spdk_nvme_remove_cb remove_cb)
624 {
625     struct nvme_pcie_enum_ctx enum_ctx = {};
626
627     enum_ctx.probe_cb = probe_cb;
628     enum_ctx.cb_ctx = cb_ctx;
629
630     if (strlen(trid->traddr) != 0) {
631             if (spdk_pci_addr_parse(&enum_ctx.pci_addr, trid->traddr)) {
632                     return -1;
633             }
634             enum_ctx.has_pci_addr = true;
635     }
636
637     if (hotplug_fd < 0) {
638             hotplug_fd = spdk_uevent_connect();
639             if (hotplug_fd < 0) {
640                     SPDK_TRACELOG(SPDK_TRACE_NVME, "Failed to open uevent netlink socket\n");
641             }
642     } else {
643             _nvme_pcie_hotplug_monitor(cb_ctx, probe_cb, remove_cb);
644     }
645
646     if (enum_ctx.has_pci_addr == false) {
647             return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx);
648     } else {
649             return spdk_pci_nvme_device_attach(pcie_nvme_enum_cb, &enum_ctx, &enum_ctx.pci_addr);
650     }
651 }

接下來重點看看L647對應的函數spck_pci_nvme_enumerate()就好,因為我們的目標是看明白是如何利用Class Code發現SSD設備的。

647         return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx);
/* src/spdk-17.07.1/lib/env_dpdk/pci_nvme.c */

81 int
82 spdk_pci_nvme_enumerate(spdk_pci_enum_cb enum_cb, void *enum_ctx)
83 {
84      return spdk_pci_enumerate(&g_nvme_pci_drv, enum_cb, enum_ctx);
85 }

注意: L84第一個參數為一個全局變量g_nvme_pci_drv的地址, ( 看到一個全局結構體變量總是令人興奮的:-) )

/* src/spdk-17.07.1/lib/env_dpdk/pci_nvme.c */

38 static struct rte_pci_id nvme_pci_driver_id[] = {
39 #if RTE_VERSION >= RTE_VERSION_NUM(16, 7, 0, 1)
40      {
41              .class_id = SPDK_PCI_CLASS_NVME,
42              .vendor_id = PCI_ANY_ID,
43              .device_id = PCI_ANY_ID,
44              .subsystem_vendor_id = PCI_ANY_ID,
45              .subsystem_device_id = PCI_ANY_ID,
46      },
47 #else
48      {RTE_PCI_DEVICE(0x8086, 0x0953)},
49 #endif
50      { .vendor_id = 0, /* sentinel */ },
51 };
..
53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = {
54      .driver = {
55              .drv_flags      = RTE_PCI_DRV_NEED_MAPPING,
56              .id_table       = nvme_pci_driver_id,
..
66      },
67
68      .cb_fn = NULL,
69      .cb_arg = NULL,
70      .mtx = PTHREAD_MUTEX_INITIALIZER,
71      .is_registered = false,
72 };

啊哈! 終於跟Class Code (SPDK_PCI_CLASS_NVME=0x010802)扯上了關系。 全局變量g_nvme_pci_drv就是在L53行定義的,而g_nvme_pci_drv.driver.id_table則是在L38行定義的。

38 static struct rte_pci_id nvme_pci_driver_id[] = {
..
41              .class_id = SPDK_PCI_CLASS_NVME,
..
53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = {
54      .driver = {
..
56              .id_table       = nvme_pci_driver_id,
..

那么,我們只需要進一步深挖spdk_pci_enumerate()就可以找到SSD設備是如何被發現的了...

/* src/spdk-17.07.1/lib/env_dpdk/pci.c#150 */

149 int
150 spdk_pci_enumerate(struct spdk_pci_enum_ctx *ctx,
151                spdk_pci_enum_cb enum_cb,
152                void *enum_ctx)
153 {
...
168
169 #if RTE_VERSION >= RTE_VERSION_NUM(17, 05, 0, 4)
170     if (rte_pci_probe() != 0) {
171 #else
172     if (rte_eal_pci_probe() != 0) {
173 #endif
...
184     return 0;
185 }

省略了一些代碼,我們接下來重點關注L170,

170     if (rte_pci_probe() != 0) {

rte_pci_probe()函數的實現開始,我們就深入到DPDK的內部了,代碼如下,

/* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#413 */

407 /*
408  * Scan the content of the PCI bus, and call the probe() function for
409  * all registered drivers that have a matching entry in its id_table
410  * for discovered devices.
411  */
412 int
413 rte_pci_probe(void)
414 {
415     struct rte_pci_device *dev = NULL;
416     size_t probed = 0, failed = 0;
417     struct rte_devargs *devargs;
418     int probe_all = 0;
419     int ret = 0;
420
421     if (rte_pci_bus.bus.conf.scan_mode != RTE_BUS_SCAN_WHITELIST)
422             probe_all = 1;
423
424     FOREACH_DEVICE_ON_PCIBUS(dev) {
425             probed++;
426
427             devargs = dev->device.devargs;
428             /* probe all or only whitelisted devices */
429             if (probe_all)
430                     ret = pci_probe_all_drivers(dev);
431             else if (devargs != NULL &&
432                     devargs->policy == RTE_DEV_WHITELISTED)
433                     ret = pci_probe_all_drivers(dev);
434             if (ret < 0) {
435                     RTE_LOG(ERR, EAL, "Requested device " PCI_PRI_FMT
436                              " cannot be used\n", dev->addr.domain, dev->addr.bus,
437                              dev->addr.devid, dev->addr.function);
438                     rte_errno = errno;
439                     failed++;
440                     ret = 0;
441             }
442     }
443
444     return (probed && probed == failed) ? -1 : 0;
445 }

L430是我們關注的重點,

430             ret = pci_probe_all_drivers(dev);

函數pci_probe_all_drivers()的實現如下:

/* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#307 */

301 /*
302  * If vendor/device ID match, call the probe() function of all
303  * registered driver for the given device. Return -1 if initialization
304  * failed, return 1 if no driver is found for this device.
305  */
306 static int
307 pci_probe_all_drivers(struct rte_pci_device *dev)
308 {
309     struct rte_pci_driver *dr = NULL;
310     int rc = 0;
311
312     if (dev == NULL)
313             return -1;
314
315     /* Check if a driver is already loaded */
316     if (dev->driver != NULL)
317             return 0;
318
319     FOREACH_DRIVER_ON_PCIBUS(dr) {
320             rc = rte_pci_probe_one_driver(dr, dev);
321             if (rc < 0)
322                     /* negative value is an error */
323                     return -1;
324             if (rc > 0)
325                     /* positive value means driver doesn't support it */
326                     continue;
327             return 0;
328     }
329     return 1;
330 }

L320是我們關注的重點,

320             rc = rte_pci_probe_one_driver(dr, dev);
/* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#200 */

195 /*
196  * If vendor/device ID match, call the probe() function of the
197  * driver.
198  */
199 static int
200 rte_pci_probe_one_driver(struct rte_pci_driver *dr,
201                      struct rte_pci_device *dev)
202 {
203     int ret;
204     struct rte_pci_addr *loc;
205
206     if ((dr == NULL) || (dev == NULL))
207             return -EINVAL;
208
209     loc = &dev->addr;
210
211     /* The device is not blacklisted; Check if driver supports it */
212     if (!rte_pci_match(dr, dev))
213             /* Match of device and driver failed */
214             return 1;
215
216     RTE_LOG(INFO, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
217                     loc->domain, loc->bus, loc->devid, loc->function,
218                     dev->device.numa_node);
219
220     /* no initialization when blacklisted, return without error */
221     if (dev->device.devargs != NULL &&
222             dev->device.devargs->policy ==
223                     RTE_DEV_BLACKLISTED) {
224             RTE_LOG(INFO, EAL, "  Device is blacklisted, not"
225                     " initializing\n");
226             return 1;
227     }
228
229     if (dev->device.numa_node < 0) {
230             RTE_LOG(WARNING, EAL, "  Invalid NUMA socket, default to 0\n");
231             dev->device.numa_node = 0;
232     }
233
234     RTE_LOG(INFO, EAL, "  probe driver: %x:%x %s\n", dev->id.vendor_id,
235             dev->id.device_id, dr->driver.name);
236
237     if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
238             /* map resources for devices that use igb_uio */
239             ret = rte_pci_map_device(dev);
240             if (ret != 0)
241                     return ret;
242     }
243
244     /* reference driver structure */
245     dev->driver = dr;
246     dev->device.driver = &dr->driver;
247
248     /* call the driver probe() function */
249     ret = dr->probe(dr, dev);
250     if (ret) {
251             dev->driver = NULL;
252             dev->device.driver = NULL;
253             if ((dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) &&
254                     /* Don't unmap if device is unsupported and
255                      * driver needs mapped resources.
256                      */
257                     !(ret > 0 &&
258                             (dr->drv_flags & RTE_PCI_DRV_KEEP_MAPPED_RES)))
259                     rte_pci_unmap_device(dev);
260     }
261
262     return ret;
263 }

L212是我們關注的重點,

212     if (!rte_pci_match(dr, dev))

而rte_pci_match()的實現如下,

/* src/dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#163 */

151 /*
152  * Match the PCI Driver and Device using the ID Table
153  *
154  * @param pci_drv
155  *  PCI driver from which ID table would be extracted
156  * @param pci_dev
157  *  PCI device to match against the driver
158  * @return
159  *  1 for successful match
160  *  0 for unsuccessful match
161  */
162 static int
163 rte_pci_match(const struct rte_pci_driver *pci_drv,
164               const struct rte_pci_device *pci_dev)
165 {
166     const struct rte_pci_id *id_table;
167
168     for (id_table = pci_drv->id_table; id_table->vendor_id != 0;
169          id_table++) {
170             /* check if device's identifiers match the driver's ones */
171             if (id_table->vendor_id != pci_dev->id.vendor_id &&
172                             id_table->vendor_id != PCI_ANY_ID)
173                     continue;
174             if (id_table->device_id != pci_dev->id.device_id &&
175                             id_table->device_id != PCI_ANY_ID)
176                     continue;
177             if (id_table->subsystem_vendor_id !=
178                 pci_dev->id.subsystem_vendor_id &&
179                 id_table->subsystem_vendor_id != PCI_ANY_ID)
180                     continue;
181             if (id_table->subsystem_device_id !=
182                 pci_dev->id.subsystem_device_id &&
183                 id_table->subsystem_device_id != PCI_ANY_ID)
184                     continue;
185             if (id_table->class_id != pci_dev->id.class_id &&
186                             id_table->class_id != RTE_CLASS_ANY_ID)
187                     continue;
188
189             return 1;
190     }
191
192     return 0;
193 }

看到這里,我們終於找到了SSD設備是如何被發現的, L185-187是我們最希望看到的三行代碼:

185             if (id_table->class_id != pci_dev->id.class_id &&
186                             id_table->class_id != RTE_CLASS_ANY_ID)
187                     continue;

而結構體struct rte_pci_driver和struct rte_pci_device的定義為:

/* src/dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#100 */

96  /**
97   * A structure describing an ID for a PCI driver. Each driver provides a
98   * table of these IDs for each device that it supports.
99   */
100 struct rte_pci_id {
101     uint32_t class_id;            /**< Class ID (class, subclass, pi) or RTE_CLASS_ANY_ID. */
102     uint16_t vendor_id;           /**< Vendor ID or PCI_ANY_ID. */
103     uint16_t device_id;           /**< Device ID or PCI_ANY_ID. */
104     uint16_t subsystem_vendor_id; /**< Subsystem vendor ID or PCI_ANY_ID. */
105     uint16_t subsystem_device_id; /**< Subsystem device ID or PCI_ANY_ID. */
106 };

/* src/dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#120 */

120 /**
121  * A structure describing a PCI device.
122  */
123 struct rte_pci_device {
124     TAILQ_ENTRY(rte_pci_device) next;       /**< Next probed PCI device. */
125     struct rte_device device;               /**< Inherit core device */
126     struct rte_pci_addr addr;               /**< PCI location. */
127     struct rte_pci_id id;                   /**< PCI ID. */
128     struct rte_mem_resource mem_resource[PCI_MAX_RESOURCE];
129                                             /**< PCI Memory Resource */
130     struct rte_intr_handle intr_handle;     /**< Interrupt handle */
131     struct rte_pci_driver *driver;          /**< Associated driver */
132     uint16_t max_vfs;                       /**< sriov enable if not zero */
133     enum rte_kernel_driver kdrv;            /**< Kernel driver passthrough */
134     char name[PCI_PRI_STR_SIZE+1];          /**< PCI location (ASCII) */
135 };

/* src/dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#178 */

175 /**
176  * A structure describing a PCI driver.
177  */
178 struct rte_pci_driver {
179     TAILQ_ENTRY(rte_pci_driver) next;       /**< Next in list. */
180     struct rte_driver driver;               /**< Inherit core driver. */
181     struct rte_pci_bus *bus;                /**< PCI bus reference. */
182     pci_probe_t *probe;                     /**< Device Probe function. */
183     pci_remove_t *remove;                   /**< Device Remove function. */
184     const struct rte_pci_id *id_table;      /**< ID table, NULL terminated. */
185     uint32_t drv_flags;                     /**< Flags contolling handling of device. */
186 };

 

到此為止,我們可以對SSD設備發現做如下總結

  • 01 - 使用Class Code (0x010802)作為SSD設備發現的依據
  • 02 - 發現SSD設備的時候,從SPDK進入到DPDK中,函數調用棧為:
00 hello_word.c
01 -> main()
02 --> spdk_nvme_probe()
03 ---> nvme_transport_ctrlr_scan()
04 ----> nvme_pcie_ctrlr_scan()
05 -----> spdk_pci_nvme_enumerate()
06 ------> spdk_pci_enumerate(&g_nvme_pci_drv, ...)                 | SPDK |
   =========================================================================
07 -------> rte_pci_probe()                                         | DPDK |
08 --------> pci_probe_all_drivers()
09 ---------> rte_pci_probe_one_driver()
10 ----------> rte_pci_match()
  • 03 - DPDK中環境抽象層(EAL: Environment Abstraction Layer)的函數rte_pci_match()是發現SSD設備的關鍵。
  • 04 - DPDK的EAL在DPDK架構中所處的位置,如下圖所示:

Your greatness is measured by your horizons. | 你的成就是由你的眼界來衡量的。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM