最近看scsi相關處理的一些備忘,比較零碎,僅作參考。
先從最顯而易見的打印入手:
[0:0:0:0] disk ATA INTEL SSDSC2BX20 0150 - [0:0:1:0] disk ATA INTEL SSDSC2BX20 0150 - [0:1:0:0] disk LSI Logical Volume 3000 /dev/sda [5:0:0:0] enclosu AIC 12G 4U60: Hub 0c29 - [5:0:1:0] disk SEAGATE ST4000NM0025 N003 /dev/sdb [5:0:2:0] disk SEAGATE ST4000NM0025 N004 /dev/sdc [5:0:3:0] disk SEAGATE ST4000NM0025 N003 /dev/sdd [5:0:4:0] disk SEAGATE ST4000NM0025 N003 /dev/sde [5:0:5:0] disk SEAGATE ST4000NM0025 N003 /dev/sdf [5:0:6:0] disk SEAGATE ST4000NM0025 N003 /dev/sdg [5:0:7:0] disk SEAGATE ST4000NM0025 N004 /dev/sdh [5:0:8:0] disk SEAGATE ST4000NM0025 N003 /dev/sdi [5:0:9:0] disk SEAGATE ST4000NM0025 N004 /dev/sdj [5:0:10:0] disk SEAGATE ST4000NM0025 N003 /dev/sdk [5:0:11:0] disk SEAGATE ST4000NM0025 N004 /dev/sdl [5:0:12:0] disk SEAGATE ST4000NM0025 N004 /dev/sdm [5:0:13:0] disk SEAGATE ST4000NM0025 N004 /dev/sdn [5:0:14:0] disk SEAGATE ST4000NM0025 N004 /dev/sdo [5:0:15:0] disk SEAGATE ST4000NM0025 N003 /dev/sdp [5:0:16:0] disk SEAGATE ST4000NM0025 N003 /dev/sdq [5:0:17:0] disk SEAGATE ST4000NM0025 N003 /dev/sdr [5:0:18:0] disk SEAGATE ST4000NM0025 N004 /dev/sds [5:0:19:0] disk SEAGATE ST4000NM0025 N003 /dev/sdt [5:0:20:0] disk SEAGATE ST4000NM0025 N003 /dev/sdu [5:0:21:0] enclosu AIC 12G 4U60: Edge-C 0c2a - [5:0:22:0] disk SEAGATE ST4000NM0025 N003 /dev/sdv [5:0:23:0] disk SEAGATE ST4000NM0025 N003 /dev/sdw [5:0:24:0] disk SEAGATE ST4000NM0025 N004 /dev/sdx [5:0:25:0] disk SEAGATE ST4000NM0025 N003 /dev/sdy [5:0:26:0] disk SEAGATE ST4000NM0025 N003 /dev/sdz [5:0:27:0] disk SEAGATE ST4000NM0025 N003 /dev/sdaa [5:0:28:0] disk SEAGATE ST4000NM0025 N003 /dev/sdab [5:0:29:0] disk SEAGATE ST4000NM0025 N004 /dev/sdac [5:0:30:0] disk SEAGATE ST4000NM0025 N004 /dev/sdad [5:0:31:0] disk SEAGATE ST4000NM0025 N003 /dev/sdae [5:0:32:0] disk SEAGATE ST4000NM0025 N004 /dev/sdaf [5:0:33:0] disk SEAGATE ST4000NM0025 N003 /dev/sdag [5:0:34:0] disk SEAGATE ST4000NM0025 N003 /dev/sdah [5:0:35:0] disk SEAGATE ST4000NM0025 N003 /dev/sdai [5:0:36:0] disk SEAGATE ST4000NM0025 N003 /dev/sdaj [5:0:37:0] disk SEAGATE ST4000NM0025 N004 /dev/sdak [5:0:38:0] disk SEAGATE ST4000NM0025 N003 /dev/sdal [5:0:39:0] disk SEAGATE ST4000NM0025 N003 /dev/sdam [5:0:40:0] disk SEAGATE ST4000NM0025 N003 /dev/sdan [5:0:41:0] disk SEAGATE ST4000NM0025 N004 /dev/sdao [5:0:42:0] enclosu AIC 12G 4U60: Edge-R 0c2a - [5:0:43:0] disk SEAGATE ST4000NM0025 N003 /dev/sdap [5:0:44:0] disk SEAGATE ST4000NM0025 N003 /dev/sdaq [5:0:45:0] disk SEAGATE ST4000NM0025 N003 /dev/sdar [5:0:46:0] disk SEAGATE ST4000NM0025 N003 /dev/sdas [5:0:47:0] disk SEAGATE ST4000NM0025 N003 /dev/sdat [5:0:48:0] disk SEAGATE ST4000NM0025 N004 /dev/sdau [5:0:49:0] disk SEAGATE ST4000NM0025 N004 /dev/sdav [5:0:50:0] disk SEAGATE ST4000NM0025 N004 /dev/sdaw [5:0:51:0] disk SEAGATE ST4000NM0025 N003 /dev/sdax [5:0:52:0] disk SEAGATE ST4000NM0025 N004 /dev/sday [5:0:53:0] disk SEAGATE ST4000NM0025 N003 /dev/sdaz [5:0:54:0] disk SEAGATE ST4000NM0025 N003 /dev/sdba [5:0:55:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbb [5:0:56:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbc [5:0:57:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbd [5:0:58:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbe [5:0:59:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbf [5:0:60:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbg [5:0:61:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbh [5:0:62:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbi [5:0:63:0] enclosu AIC 12G 4U60: Edge-L 0c2a - [6:0:0:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbj [6:0:1:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbk [6:0:2:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbl [6:0:3:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbm [6:0:4:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbn [6:0:5:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbo [6:0:6:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbp [6:0:7:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbq [7:0:0:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbr [7:0:1:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbs [7:0:2:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbt [7:0:3:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbu [7:0:4:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbv [7:0:5:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbw [7:0:6:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbx [7:0:7:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdby
前面第一列數字是什么?各個數字之間的關系是什么?內核中對scsi層的抽象是怎么做的?scsi命令的抽象是什么?
scsi命令下發后遇到錯誤怎么辦,返回超時怎么辦?正常返回的流程是什么樣的?下面就帶着這些疑問來看代碼。
前面第一列數字是什么?
lsscsi顯示的第一列是scsi設備在內核中展示的各級編號,根據編號可以唯一確定一個設備,
如果使用 cat /proc/scsi/scsi 來查看會顯得好理解一些:
cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 01 Id: 00 Lun: 00 Vendor: LSI Model: Logical Volume Rev: 3000 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: INTEL SSDSC2BX20 Rev: 0150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: ATA Model: INTEL SSDSC2BX20 Rev: 0150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 00 Lun: 00 Vendor: AIC 12G Model: 4U60: Hub Rev: 0c29 Type: Enclosure ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 02 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 03 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 04 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 05 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 06 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 07 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 08 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 09 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 10 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 11 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 12 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 13 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 14 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 15 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 16 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 17 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 18 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 19 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 20 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 21 Lun: 00 Vendor: AIC 12G Model: 4U60: Edge-C Rev: 0c2a Type: Enclosure ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 22 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 23 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 24 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 25 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 26 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 27 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 28 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 29 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 30 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 31 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 32 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 33 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 34 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 35 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 36 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 37 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 38 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 39 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 40 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 41 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 42 Lun: 00 Vendor: AIC 12G Model: 4U60: Edge-R Rev: 0c2a Type: Enclosure ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 43 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 44 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 45 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 46 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 47 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 48 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 49 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 50 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 51 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 52 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 53 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 54 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 55 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 56 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 57 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 58 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 59 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 60 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 61 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 62 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 63 Lun: 00 Vendor: AIC 12G Model: 4U60: Edge-L Rev: 0c2a Type: Enclosure ANSI SCSI revision: 05 Host: scsi6 Channel: 00 Id: 00 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 01 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 02 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 03 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 04 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 05 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 06 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 07 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 00 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 01 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 02 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 03 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 04 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 05 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 06 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 07 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06
從編號可以看出,第一級是host,第二級是channel,第三級是target編號,第四級是LUN號
h == hostadapter id (first one being 0) c == SCSI channel on hostadapter (first one being 0) t == ID l == LUN (first one being 0)
各個數字之間的關系是什么?
一個主板可能接多個host,比如上面的服務器,在有多個sas芯片的情況下,肯定就有多個host。一個sas芯片又可以分割為多個通道,也就是channel,也叫bus。一個通道下多個target,一個target下多個lun。
如果一個硬盤支持雙通道,那么在scsi層,就是展示為兩個scsi標號。
內核中對scsi層的抽象是怎么做的?
對於device,有個scsi_device的抽象,host成員指向它歸屬的scsi_host,siblings成員嵌入到host的__device成員中。同時,它的 sdev_gendev 成員的parent指向 對應的scsi_target的dev地址,
這個只要熟悉linux的驅動模型就能理解了。
下面看一下scsi_device的實際例子:
crash> scsi_device ffff881fcee44800 struct scsi_device { host = 0xffff883fd0e38000,-----------------指向scsi_host,這個會在后面描述 request_queue = 0xffff883fc1e28828,--------這個大家應該清楚,就是之前申請存放下發io的request_queue,要注意區分單隊列和多隊列 siblings = {-------------------------------當前host下的所有scsi_device通過這個串起來,他們是兄弟關系,所以成員名就叫siblings next = 0xffff881fcece9810, prev = 0xffff881fcee44010 }, same_target_siblings = {------------------這個是同一個target下的scsi_device的串接,這里有個問題是,串接這個也需要獲取host的鎖,其實可以優化。 next = 0xffff883fc1e21c18, prev = 0xffff883fc1e21c18 }, { device_busy = { counter = 6 }, __UNIQUE_ID_rh_kabi_hide20 = { device_busy = 6 }, {<No data fields>} }, list_lock = { { rlock = { raw_lock = { { head_tail = 1215842424, tickets = { head = 18552, tail = 18552 } } } } } }, cmd_list = { next = 0xffff881f49a2d508, prev = 0xffff883eeccee308 }, starved_entry = { next = 0xffff881fcee44848, prev = 0xffff881fcee44848 }, current_cmnd = 0x0, queue_depth = 254, max_queue_depth = 254, last_queue_full_depth = 0, last_queue_full_count = 0, last_queue_full_time = 0, queue_ramp_up_period = 120000, last_queue_ramp_up = 0, id = 4,--------------------------------這個一般賦值為target的id lun = 0,-------------------------------就是大家看到的四級編號的最后一級,lun channel = 0,---------------------------通道號 manufacturer = 0, sector_size = 512, hostdata = 0xffff883fca92ed20, type = 0 '\000', scsi_level = 7 '\a', inq_periph_qual = 0 '\000', inquiry_len = 144 '\220', inquiry = 0xffff883fc1e60b40 "", vendor = 0xffff883fc1e60b48 "SEAGATE ST4000NM0025 N003ZC18ASFP", model = 0xffff883fc1e60b50 "ST4000NM0025 N003ZC18ASFP", rev = 0xffff883fc1e60b60 "N003ZC18ASFP", current_tag = 0 '\000', sdev_target = 0xffff883fc1e21c00,------這個指向scsi_target,按注釋是說single lun的時候才有效,但我看target的single lun的值為0,比較奇怪,穩妥取scsi_target最好不用這個 sdev_bflags = 0, eh_timeout = 10000, writeable = 1, removable = 0, changed = 0, busy = 0, lockable = 0, locked = 0, borken = 0, disconnect = 0, soft_reset = 0, sdtr = 0, wdtr = 0, ppr = 1, tagged_supported = 1, simple_tags = 0, ordered_tags = 0, was_reset = 0, expecting_cc_ua = 0, use_10_for_rw = 1, use_10_for_ms = 0, no_report_opcodes = 0, no_write_same = 0, use_16_for_rw = 1, skip_ms_page_8 = 0, skip_ms_page_3f = 0, skip_vpd_pages = 0, use_192_bytes_for_3f = 0, no_start_on_add = 0, allow_restart = 0, manage_start_stop = 0, start_stop_pwr_cond = 0, no_uld_attach = 0, select_no_atn = 0, fix_capacity = 0, guess_capacity = 0, retry_hwerror = 0, last_sector_bug = 0, no_read_disc_info = 0, no_read_capacity_16 = 0, try_rc_10_first = 0, is_visible = 1, wce_default_on = 0, no_dif = 0, broken_fua = 0, vpd_reserved = 0, xcopy_reserved = 0, lun_in_cdb = 0, disk_events_disable_depth = { counter = 0 }, supported_events = {0}, pending_events = {0}, event_list = { next = 0xffff881fcee44900, prev = 0xffff881fcee44900 }, event_work = { data = { counter = 68719476704 }, entry = { next = 0xffff881fcee44918, prev = 0xffff881fcee44918 }, func = 0xffffffff814241a0 <scsi_evt_thread> }, { device_blocked = { counter = 0 }, __UNIQUE_ID_rh_kabi_hide21 = { device_blocked = 0 }, {<No data fields>} }, max_device_blocked = 3, iorequest_cnt = {------------下發的io counter = 4641 }, iodone_cnt = {-----------------完成的io counter = 4635 }, ioerr_cnt = { counter = 283----------------這個要關注,出錯的io統計,這個會導出到/proc/diskstat中 }, sdev_gendev = {----------------設備模型,scsi_device的sdev_gendev的的parent指向scsi_target的dev成員,驅動的樹狀模型體現。 parent = 0xffff883fc1e21c28, p = 0xffff883fd0cf2b40, kobj = { name = 0xffff883fc21d9790 "5:0:4:0",----------四級命名的name,說明host_no為5,channel為0,target的id為4,lun為0 entry = { next = 0xffff881fcee44c00, prev = 0xffff883fc1e21c40 }, parent = 0xffff883fc1e21c38, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fccc5ee70, kref = { refcount = { counter = 25 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 1, state_remove_uevent_sent = 0, uevent_suppress = 0 }, init_name = 0x0, type = 0xffffffff81a19760 <scsi_dev_type>, mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff881fcee449b0, prev = 0xffff881fcee449b0 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide0 = { spin_mlock = 0x0 }, {<No data fields>} } }, bus = 0xffffffff81a19320 <scsi_bus_type>,---------總線類型的指針,也在sdev_gendev成員中 driver = 0xffffffffa011e008 <sd_template+8>, platform_data = 0x0, power = { power_state = { event = 0 }, can_wakeup = 0, async_suspend = 1, is_prepared = false, is_suspended = false, ignore_children = false, early_init = true, lock = { { rlock = { raw_lock = { { head_tail = 1310740, tickets = { head = 20, tail = 20 } } } } } }, entry = { next = 0xffff881fcee44c98, prev = 0xffff883fc1e21cd8 }, completion = { done = 2147483647, wait = { lock = { { rlock = { raw_lock = { { head_tail = 131074, tickets = { head = 2, tail = 2 } } } } } }, task_list = { next = 0xffff881fcee44a18, prev = 0xffff881fcee44a18 } } }, wakeup = 0x0, wakeup_path = false, syscore = false, suspend_timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff883fd0c94000, function = 0xffffffff81402e90 <pm_suspend_timer_fn>, data = 18446612268929272136, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, timer_expires = 0, work = { data = { counter = 68719476704 }, entry = { next = 0xffff881fcee44a98, prev = 0xffff881fcee44a98 }, func = 0xffffffff81402f10 <pm_runtime_work> }, wait_queue = { lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, task_list = { next = 0xffff881fcee44ab8, prev = 0xffff881fcee44ab8 } }, usage_count = { counter = 2 }, child_count = { counter = 0 }, disable_depth = 0, idle_notification = 0, request_pending = 0, deferred_resume = 0, run_wake = 0, runtime_auto = 0, no_callbacks = 0, irq_safe = 0, use_autosuspend = 1, timer_autosuspends = 0, memalloc_noio = 1, request = RPM_REQ_NONE, runtime_status = RPM_ACTIVE, runtime_error = 0, autosuspend_delay = -1, last_busy = 4295244282, active_jiffies = 0, suspended_jiffies = 0, accounting_timestamp = 4294683149, subsys_data = 0x0, qos = 0x0 }, pm_domain = 0x0, numa_node = 1, dma_mask = 0x0, coherent_dma_mask = 0, dma_parms = 0x0, dma_pools = { next = 0xffff881fcee44b40, prev = 0xffff881fcee44b40 }, dma_mem = 0x0, archdata = { dma_ops = 0x0, iommu = 0x0 }, of_node = 0x0, acpi_node = { { companion = 0x0, __UNIQUE_ID_rh_kabi_hide9 = { handle = 0x0 }, {<No data fields>} } }, devt = 0, id = 0, devres_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, devres_head = { next = 0xffff881fcee44b88, prev = 0xffff881fcee44b88 }, knode_class = { n_klist = 0x0, n_node = { next = 0x0, prev = 0x0 }, n_ref = { refcount = { counter = 0 } } }, class = 0x0, groups = 0x0, release = 0x0, iommu_group = 0x0, offline_disabled = false, offline = false, device_rh = 0xffff881fcee33378 }, sdev_dev = { parent = 0xffff881fcee44948, p = 0xffff883fd0cf2cc0, kobj = { name = 0xffff883fc21d9798 "5:0:4:0",---在 scsi_sysfs_device_initialize 函數中,設置為和scsi_device.sdev_gendev一樣的name entry = { next = 0xffff883fc1854418, prev = 0xffff881fcee44960 }, parent = 0xffff881fcee4cde0, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fcac120e0, kref = { refcount = { counter = 3 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 1, state_remove_uevent_sent = 0, uevent_suppress = 0 }, init_name = 0x0, type = 0x0, mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff881fcee44c50, prev = 0xffff881fcee44c50 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide0 = { spin_mlock = 0x0 }, {<No data fields>} } }, bus = 0x0, driver = 0x0, platform_data = 0x0, power = { power_state = { event = 0 }, can_wakeup = 0, async_suspend = 1, is_prepared = false, is_suspended = false, ignore_children = false, early_init = true, lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, entry = { next = 0xffff883fc18544b0, prev = 0xffff881fcee449f8 }, completion = { done = 2147483647, wait = { lock = { { rlock = { raw_lock = { { head_tail = 131074, tickets = { head = 2, tail = 2 } } } } } }, task_list = { next = 0xffff881fcee44cb8, prev = 0xffff881fcee44cb8 } } }, wakeup = 0x0, wakeup_path = false, syscore = false, suspend_timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff883fd0c94000, function = 0xffffffff81402e90 <pm_suspend_timer_fn>, data = 18446612268929272808, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, timer_expires = 0, work = { data = { counter = 68719476704 }, entry = { next = 0xffff881fcee44d38, prev = 0xffff881fcee44d38 }, func = 0xffffffff81402f10 <pm_runtime_work> }, wait_queue = { lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, task_list = { next = 0xffff881fcee44d58, prev = 0xffff881fcee44d58 } }, usage_count = { counter = 0 }, child_count = { counter = 0 }, disable_depth = 1, idle_notification = 0, request_pending = 0, deferred_resume = 0, run_wake = 0, runtime_auto = 1, no_callbacks = 0, irq_safe = 0, use_autosuspend = 0, timer_autosuspends = 0, memalloc_noio = 0, request = RPM_REQ_NONE, runtime_status = RPM_SUSPENDED, runtime_error = 0, autosuspend_delay = 0, last_busy = 0, active_jiffies = 0, suspended_jiffies = 0, accounting_timestamp = 4294680236, subsys_data = 0x0, qos = 0x0 }, pm_domain = 0x0, numa_node = 1, dma_mask = 0x0, coherent_dma_mask = 0, dma_parms = 0x0, dma_pools = { next = 0xffff881fcee44de0, prev = 0xffff881fcee44de0 }, dma_mem = 0x0, archdata = { dma_ops = 0x0, iommu = 0x0 }, of_node = 0x0, acpi_node = { { companion = 0x0, __UNIQUE_ID_rh_kabi_hide9 = { handle = 0x0 }, {<No data fields>} } }, devt = 0, id = 0, devres_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, devres_head = { next = 0xffff881fcee44e28, prev = 0xffff881fcee44e28 }, knode_class = { n_klist = 0xffff883fcff106a8, n_node = { next = 0xffff881fcece9e40, prev = 0xffff881fcee44640 }, n_ref = { refcount = { counter = 1 } } }, class = 0xffffffff81a193e0 <sdev_class>, groups = 0x0, release = 0x0, iommu_group = 0x0, offline_disabled = false, offline = false, device_rh = 0xffff881fcee33398 }, ew = { work = { data = { counter = 0 }, entry = { next = 0x0, prev = 0x0 }, func = 0x0 } }, requeue_work = { data = { counter = 68719476704 }, entry = { next = 0xffff881fcee44eb0, prev = 0xffff881fcee44eb0 }, func = 0xffffffff814236e0 <scsi_requeue_run_queue> }, scsi_dh_data = 0x0, sdev_state = SDEV_RUNNING,---------------當前設備的狀態為運行態 { vpd_pg83 = 0xffff883fc1e62400 "", __UNIQUE_ID_rh_kabi_hide22 = { vpd_reserved1 = 0xffff883fc1e62400 }, {<No data fields>} }, { vpd_pg83_len = 76, __UNIQUE_ID_rh_kabi_hide23 = { vpd_reserved2 = 0x4c }, {<No data fields>} }, { vpd_pg80 = 0xffff883fc1e62300 "", __UNIQUE_ID_rh_kabi_hide24 = { vpd_reserved3 = 0xffff883fc1e62300 }, {<No data fields>} }, { vpd_pg80_len = 24, __UNIQUE_ID_rh_kabi_hide25 = { vpd_reserved4 = 0x18 }, {<No data fields>} }, vpd_reserved5 = 0 '\000', vpd_reserved6 = 0 '\000', vpd_reserved7 = 0 '\000', vpd_reserved8 = 0 '\000', vpd_reserved9 = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, rh_reserved1 = 0x0, rh_reserved2 = 0x0, rh_reserved3 = 0x0, rh_reserved4 = 0x0, rh_reserved5 = 0x0, rh_reserved6 = 0x0, scsi_mq_reserved1 = { counter = 0 }, scsi_mq_reserved2 = { counter = 0 }, sdev_data = 0xffff881fcee44f38 }
通過scsi_device 怎么找到它歸屬的scsi_target呢?從前面的打印看,
crash> scsi_device.sdev_target ffff881fcee44800 sdev_target = 0xffff883fc1e21c00 crash> struct -xo scsi_device.sdev_gendev ffff881fcee44800 struct scsi_device { [ffff881fcee44948] struct device sdev_gendev; } crash> device.parent ffff881fcee44948 parent = 0xffff883fc1e21c28 crash> struct -xo scsi_target.dev struct scsi_target { [0x28] struct device dev; } crash> px 0xffff883fc1e21c28-0x28 $4 = 0xffff883fc1e21c00--------------和直接取的sdev_target是一樣的,不過建議還是用第二種方法
也可以直接看,不用一級一級查看:
crash> scsi_device.sdev_gendev.parent ffff881fcee44800
sdev_gendev.parent = 0xffff883fc1e21c28,
對於target,有個scsi_target 的抽象。它的starget_sdev_user成員指向當前active的lun,
/* * scsi_target: representation of a scsi target, for now, this is only * used for single_lun devices. If no one has active IO to the target,-------注釋過時了么? * starget_sdev_user is NULL, else it points to the active sdev. */ struct scsi_target { struct scsi_device *starget_sdev_user;---要么之前當前active的scsi_device,要么為NULL,用於當前target只支持一個lun的場景
...
下面是一個scsi_target的例子:
crash> scsi_target 0xffff883fc1e21c00----這個就是前面scsi_device的歸屬scsi_target struct scsi_target { starget_sdev_user = 0x0, siblings = { next = 0xffff883fc1e5f008,-------------這個成員嵌入到host的__target成員 prev = 0xffff883fcaa4b408 }, devices = {-------------------一個target下的scsi_device的鏈 next = 0xffff881fcee44820, prev = 0xffff881fcee44820 }, dev = {-----------------------從驅動模型說,scsi_device的sdev_gendev的parent指向scsi_target的dev parent = 0xffff883fcaa4fc00, p = 0xffff883fd0cf29c0, kobj = { name = 0xffff883fc1e00660 "target5:0:4", entry = { next = 0xffff881fcee44960, prev = 0xffff883fc1853018 }, parent = 0xffff883fcaa4fc10, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fccc5ea10, 。。。。。。-----------------------------省略了其他device模型的又臭又長的結構體 reap_ref = 0, channel = 0, id = 4, create = 0, single_lun = 0, pdt_1f_for_no_lun = 0, no_report_luns = 0, expecting_lun_change = 0, { target_busy = { counter = 0 }, __UNIQUE_ID_rh_kabi_hide19 = { target_busy = 0 }, {<No data fields>} }, can_queue = 0, { target_blocked = { counter = 0 }, __UNIQUE_ID_rh_kabi_hide20 = { target_blocked = 0 }, {<No data fields>} }, max_target_blocked = 3, scsi_level = 7 '\a', ew = { work = { data = { counter = 0 }, entry = { next = 0x0, prev = 0x0 }, func = 0x0 } }, state = STARGET_RUNNING, hostdata = 0xffff883fc1e22000, rh_reserved1 = 0x0, rh_reserved2 = 0x0, rh_reserved3 = 0x0, rh_reserved4 = 0x0, scsi_mq_reserved1 = { counter = 0 }, scsi_mq_reserved2 = { counter = 0 }, starget_data = 0xffff883fc1e21f48 }
scsi_request_fn 函數在給某個設備發送io請求的時候,還會判斷當前設備歸屬的scsi_target 是否busy。
static void scsi_request_fn(struct request_queue *q) __releases(q->queue_lock) __acquires(q->queue_lock) { 。。。。 if (!scsi_target_queue_ready(shost, sdev)) goto not_ready; 。。。。 }
雖然從內核管理的角度說,scsi_target和scsi_device是一對多的,但是我看到的實際情況卻是一對一,由於這個 starget_sdev_user 成員會指向active的scsi_device,但這個是個瞬間態。
大多時候是為NULL的。
對於bus/channel,沒有抽象,有一個id來表示,在host中有一個最大的channel編號 max_channel 成員來區分一個host下的各個channel。
對於scsi的host,有個scsi_host的抽象。它通過__devices 成員串接它管理的所有scsi_device,通過__targets成員串接它管理的所有target,通過 scsi_add_host 函數往系統增加host。
下面是一個host的例子:
crash> struct Scsi_Host 0xffff883fd0e38000 struct Scsi_Host { __devices = { next = 0xffff881fcee42810, prev = 0xffff883fc18a6010 }, __targets = { next = 0xffff883fc1e19008, prev = 0xffff883fc18f2408 }, cmd_pool = 0xffffffff81a18680 <scsi_cmd_pool>, free_list_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, free_list = { next = 0xffff883fc8db0008, prev = 0xffff883fc8db0008 }, starved_list = { next = 0xffff883fd0e38040, prev = 0xffff883fd0e38040 }, default_lock = { { rlock = { raw_lock = { { head_tail = 93193614, tickets = { head = 1422, tail = 1422 } } } } } }, host_lock = 0xffff883fd0e38050, scan_mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff883fd0e38068, prev = 0xffff883fd0e38068 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide1 = { spin_mlock = 0x0 }, {<No data fields>} } }, eh_cmd_q = { next = 0xffff883fd0e38088, prev = 0xffff883fd0e38088 }, ehandler = 0xffff881fcc24a280,----這個對應的是PID: 680 TASK: ffff881fcc24a280 CPU: 37 COMMAND: "scsi_eh_5" eh_action = 0x0, host_wait = { lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, task_list = { next = 0xffff883fd0e380b0, prev = 0xffff883fd0e380b0 } }, hostt = 0xffffffffa00d01c0,--------host的自己模板 transportt = 0xffff883fcd090000,----這個就是 mpt3sas_transport_template,不同的host類型有不同的傳輸類型模板 { bqt = 0x0, tag_set = 0x0 }, { host_busy = { counter = 8----------------8個busy的io,其實就是目前已經離開request_queue之后的io統計 }, __UNIQUE_ID_rh_kabi_hide30 = { host_busy = 8 }, {<No data fields>} }, host_failed = 0,--------------目前沒有fail的 host_eh_scheduled = 0, host_no = 5,------------------這個關聯的錯誤處理內核線程,有多少個host就有多少個錯誤處理線程--680 2 37 ffff881fcc24a280 IN 0.0 0 0 [scsi_eh_5] eh_deadline = -1, last_reset = 0, max_id = 4294967295, max_lun = 16895, max_channel = 0, unique_id = 1, max_cmd_len = 32, this_id = -1, can_queue = 2936, cmd_per_lun = 7, sg_tablesize = 128, sg_prot_tablesize = 0, max_sectors = 32767, dma_boundary = 4294967295, cmd_serial_number = 0, active_mode = 1, unchecked_isa_dma = 0, use_clustering = 1, use_blk_tcq = 0, host_self_blocked = 0, reverse_ordering = 0, ordered_tag = 0, tmf_in_progress = 0, async_scan = 0, eh_noresume = 0, no_write_same = 0, use_blk_mq = 0,------------------是否使用多隊列 no_scsi2_lun_in_cdb = 0, work_q_name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", work_q = 0x0, tmf_work_q = 0xffff881fcef98800, { host_blocked = { counter = 0 }, __UNIQUE_ID_rh_kabi_hide31 = { host_blocked = 0 }, {<No data fields>} }, max_host_blocked = 7, prot_capabilities = 7, prot_guard_type = 3 '\003', uspace_req_q = 0x0, base = 0, io_port = 0, n_io_port = 0 '\000', dma_channel = 255 '\377', irq = 0, shost_state = SHOST_RUNNING,---------60個硬盤的也是running狀態 shost_gendev = { parent = 0xffff883fcfded098, p = 0xffff883fd0c8b500, kobj = { name = 0xffff883fcd0661c0 "host5",---------設備驅動類型的名稱 entry = { next = 0xffff883fd0e38448, prev = 0xffff881fca1cd818 }, parent = 0xffff883fcfded0a8, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fccc9e690, kref = { refcount = { counter = 39 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 1, state_remove_uevent_sent = 0, uevent_suppress = 0 }, init_name = 0x0, type = 0xffffffff81a18a80 <scsi_host_type>, mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff883fd0e381f8, prev = 0xffff883fd0e381f8 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide1 = { spin_mlock = 0x0 }, {<No data fields>} } }, bus = 0xffffffff81a19320 <scsi_bus_type>, driver = 0x0, platform_data = 0x0, power = { power_state = { event = 0 }, can_wakeup = 0, async_suspend = 1, is_prepared = false, is_suspended = false, ignore_children = false, early_init = true, lock = { { rlock = { raw_lock = { { head_tail = 9568402, tickets = { head = 146, tail = 146 } } } } } }, entry = { next = 0xffff883fd0e384e0, prev = 0xffff881fca1cd8b0 }, completion = { done = 2147483647, wait = { lock = { { rlock = { raw_lock = { { head_tail = 131074, tickets = { head = 2, tail = 2 } } } } } }, task_list = { next = 0xffff883fd0e38260, prev = 0xffff883fd0e38260 } } }, wakeup = 0x0, wakeup_path = false, syscore = false, suspend_timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff883fd0e24000, function = 0xffffffff81402e90 <pm_suspend_timer_fn>, data = 18446612406401728912, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, timer_expires = 0, work = { data = { counter = 68719476704 }, entry = { next = 0xffff883fd0e382e0, prev = 0xffff883fd0e382e0 }, func = 0xffffffff81402f10 <pm_runtime_work> }, wait_queue = { lock = { { rlock = { raw_lock = { { head_tail = 262148, tickets = { head = 4, tail = 4 } } } } } }, task_list = { next = 0xffff883fd0e38300, prev = 0xffff883fd0e38300 } }, usage_count = { counter = 0 }, child_count = { counter = 0 }, disable_depth = 0, idle_notification = 0, request_pending = 0, deferred_resume = 0, run_wake = 0, runtime_auto = 1, no_callbacks = 0, irq_safe = 0, use_autosuspend = 0, timer_autosuspends = 0, memalloc_noio = 1, request = RPM_REQ_NONE, runtime_status = RPM_SUSPENDED, runtime_error = 0, autosuspend_delay = 0, last_busy = 0, active_jiffies = 9752, suspended_jiffies = 0, accounting_timestamp = 4294683155, subsys_data = 0x0, qos = 0x0 }, pm_domain = 0x0, numa_node = 1, dma_mask = 0x0, coherent_dma_mask = 0, dma_parms = 0x0, dma_pools = { next = 0xffff883fd0e38388, prev = 0xffff883fd0e38388 }, dma_mem = 0x0, archdata = { dma_ops = 0x0, iommu = 0x0 }, of_node = 0x0, acpi_node = { { companion = 0x0, __UNIQUE_ID_rh_kabi_hide7 = { handle = 0x0 }, {<No data fields>} } }, devt = 0, id = 0, devres_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, devres_head = { next = 0xffff883fd0e383d0, prev = 0xffff883fd0e383d0 }, knode_class = { n_klist = 0x0, n_node = { next = 0x0, prev = 0x0 }, n_ref = { refcount = { counter = 0 } } }, class = 0x0, groups = 0x0, release = 0x0, iommu_group = 0x0, offline_disabled = false, offline = false, device_rh = 0xffff883fccc84738 }, shost_dev = { parent = 0xffff883fd0e38190, p = 0xffff883fd0c8b5c0, kobj = { name = 0xffff883fcd0661c8 "host5",----------設備驅動類型的名稱 entry = { next = 0xffff881fce507818, prev = 0xffff883fd0e381a8 }, parent = 0xffff883fcd039180, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fccc9eb60, kref = { refcount = { counter = 3 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 1, state_remove_uevent_sent = 0, uevent_suppress = 0 }, init_name = 0x0, type = 0x0, mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff883fd0e38498, prev = 0xffff883fd0e38498 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide1 = { spin_mlock = 0x0 }, {<No data fields>} } }, bus = 0x0, driver = 0x0, platform_data = 0x0, power = { power_state = { event = 0 }, can_wakeup = 0, async_suspend = 1, is_prepared = false, is_suspended = false, ignore_children = false, early_init = true, lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, entry = { next = 0xffff881fce5078b0, prev = 0xffff883fd0e38240 }, completion = { done = 2147483647, wait = { lock = { { rlock = { raw_lock = { { head_tail = 131074, tickets = { head = 2, tail = 2 } } } } } }, task_list = { next = 0xffff883fd0e38500, prev = 0xffff883fd0e38500 } } }, wakeup = 0x0, wakeup_path = false, syscore = false, suspend_timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff883fd0e24000, function = 0xffffffff81402e90 <pm_suspend_timer_fn>, data = 18446612406401729584, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, timer_expires = 0, work = { data = { counter = 68719476704 }, entry = { next = 0xffff883fd0e38580, prev = 0xffff883fd0e38580 }, func = 0xffffffff81402f10 <pm_runtime_work> }, wait_queue = { lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, task_list = { next = 0xffff883fd0e385a0, prev = 0xffff883fd0e385a0 } }, usage_count = { counter = 0 }, child_count = { counter = 0 }, disable_depth = 1, idle_notification = 0, request_pending = 0, deferred_resume = 0, run_wake = 0, runtime_auto = 1, no_callbacks = 0, irq_safe = 0, use_autosuspend = 0, timer_autosuspends = 0, memalloc_noio = 0, request = RPM_REQ_NONE, runtime_status = RPM_SUSPENDED, runtime_error = 0, autosuspend_delay = 0, last_busy = 0, active_jiffies = 0, suspended_jiffies = 0, accounting_timestamp = 4294671976, subsys_data = 0x0, qos = 0x0 }, pm_domain = 0x0, numa_node = 1, dma_mask = 0x0, coherent_dma_mask = 0, dma_parms = 0x0, dma_pools = { next = 0xffff883fd0e38628, prev = 0xffff883fd0e38628 }, dma_mem = 0x0, archdata = { dma_ops = 0x0, iommu = 0x0 }, of_node = 0x0, acpi_node = { { companion = 0x0, __UNIQUE_ID_rh_kabi_hide7 = { handle = 0x0 }, {<No data fields>} } }, devt = 0, id = 0, devres_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, devres_head = { next = 0xffff883fd0e38670, prev = 0xffff883fd0e38670 }, knode_class = { n_klist = 0xffff883fcff102a8, n_node = { next = 0xffff883fd0cb2688, prev = 0xffff881fce2b6688 }, n_ref = { refcount = { counter = 1 } } }, class = 0xffffffff81a18ac0 <shost_class>, groups = 0xffffffff81a19470 <scsi_sysfs_shost_attr_groups>, release = 0x0, iommu_group = 0x0, offline_disabled = false, offline = false, device_rh = 0xffff883fccc84758 }, sht_legacy_list = { next = 0x0, prev = 0x0 }, shost_data = 0xffff883fcd0391e0, dma_dev = 0xffff883fcfded098, rh_reserved1 = 0x0, rh_reserved2 = 0x0, rh_reserved3 = 0x0, rh_reserved4 = 0x0, rh_reserved5 = 0x0, rh_reserved6 = 0x0, scsi_mq_reserved1 = 0, scsi_mq_reserved2 = 0, scsi_mq_reserved3 = 0x0, scsi_mq_reserved4 = 0x0, scsi_mq_reserved5 = { counter = 0 }, scsi_mq_reserved6 = { counter = 0 }, hostdata = 0xffff883fd0e38740---這個一般存放控制器的相關數據,如MPT3SAS_ADAPTER,MPT2SAS_ADAPTER等 }
一般在scsi主機適配器驅動的probe里面,先是scsi_alloc_host,然后scsi_add_host,緊接着就調用scsi_scan_host掃描scsi總線。
scsi總線掃描的目的是通過協議特定或芯片特定的方式探測出掛接在主機適配器后面的目標節點和邏輯單元,為它們在內存中構建相應的數據結構,將它們添加到系統中。
scsi中間層依次以可能的ID和LUN構造INQUIRY命令,之后將這些INQUIRY命令提交到塊IO系統,后者最終將調用中間層的策略例程,再次提取到SCSI命令后,調用scsi底層驅動的queuecommand回調函數。其實內核中,只要涉及到注冊的,基本都涉及到往上層和往下層的關系的建立。
各個Scsi_Host之間什么關系?
從設備驅動模型的角度說,各個host的shost_dev.parent指向同一個device,其他沒有相關性。
crash> device.parent ffff883fd0cb4190 parent = 0xffff883fcfdef098 crash> device.parent 0xffff883fd0e38190 parent = 0xffff883fcfded098 crash> device.parent 0xffff883fd0cb2190 parent = 0xffff883fcfdee098 crash> device.parent 0xffff881fce2b6190 parent = 0xffff883fcfdb0098
SCSI 子系統處理塊訪問請求
當 SCSI 子系統的請求隊列處理函數被通用塊層調用后,SCSI 中間層會根據塊訪問請求的內容,生成、初始並提交 SCSI 命令 (struct scsi_cmd
) 到 SCSI TARGET 端。
scsi這些是按層級去描述對應通信的設備的,分別為host級,bus級,target級,device級。前面提到的scsi_device就是device層的抽象,對應的是lun,可能是磁盤,也可能是光盤之類的,
如果是磁盤,則還會生成一個scsi_disk的對象,光盤的話,則會產生一個 scsi_cd 的對象來和scsi_device 對應。
在scsi總線掃描的時候,每當探測到一個設備,就會調用scsi_alloc_sdev()函數,然后里面會繼續調用scsi_alloc_queue(),也就是當內核識別到一個scsi設備之后,需要為該設備設置一個request_queue,這個動作在下面完成,具體怎么識別到scsi_device ,有一堆探測的流程,在此不展開。
struct request_queue *scsi_alloc_queue(struct scsi_device *sdev) { struct request_queue *q; q = __scsi_alloc_queue(sdev->host, scsi_request_fn);----------申請常見的request_queue,並且設置它的成員,scsi_request_fn 用用來執行request調用的 if (!q) return NULL; blk_queue_prep_rq(q, scsi_prep_fn);-------------------scsi_prep_fn准備scsi命令用的函數 blk_queue_unprep_rq(q, scsi_unprep_fn); blk_queue_softirq_done(q, scsi_softirq_done); blk_queue_rq_timed_out(q, scsi_times_out); blk_queue_lld_busy(q, scsi_lld_busy); return q; }
scsi命令的抽象:
內核中使用scsi_cmnd 來管理生成的scsi命令,包括命令的時間,重試次數,上下文指針,承載CDB的命令體等。一個典型的fs下發的request包含的scsi_cmnd 例子如下:
crash> scsi_cmnd 0xffff881f49a2d500 struct scsi_cmnd { device = 0xffff881fcee44800,------這個命令歸屬的scsi_device對象的指針 list = { next = 0xffff881f49a2cfc8, prev = 0xffff881fcee44838 }, eh_entry = {----嵌入到錯誤處理鏈表的成員,當該scsi命令出現錯誤或者超時的時候用到 next = 0x0, prev = 0x0 }, abort_work = {----命令出現超時的時候用到,這個會嵌入到scsi_host的一個workqueue中去處理 work = { data = { counter = 68719476704 }, entry = { next = 0xffff881f49a2d530, prev = 0xffff881f49a2d530 }, func = 0xffffffff8141eee0 <scmd_eh_abort_handler>----work_struct中的處理函數 }, timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff881fd2d8c002, function = 0xffffffff8109c100 <delayed_work_timer_fn>, data = 18446612266693612840, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, wq = 0x0, cpu = 0 }, eh_eflags = 0, serial_number = 0,------------------命令編號 jiffies_at_alloc = 4298774713,------這個命令在alloc時的時戳 retries = 0, allowed = 5, prot_op = 0 '\000', prot_type = 0 '\000', cmd_len = 16, sc_data_direction = DMA_FROM_DEVICE, cmnd = 0xffff883e9d3f7e98 "\210", sdb = { table = { sgl = 0xffff880cdf50fe00, nents = 1, orig_nents = 1 }, length = 4096, resid = 0 }, prot_sdb = 0x0, underflow = 4096, transfersize = 512, request = 0xffff883e9d3f7d80,------------------命令對應的blk層的request sense_buffer = 0xffff880168be0f00 "", scsi_done = 0xffffffff81420a90 <scsi_done>,---命令執行后的回調 SCp = { ptr = 0x0, this_residual = 0, buffer = 0x0, buffers_residual = 0, dma_handle = 0, Status = 0, Message = 0, have_data_in = 0, sent_command = 0, phase = 0 }, host_scribble = 0x0, result = 0, tag = 255 '\377', rh_reserved1 = 0x0, rh_reserved2 = 0x0, rh_reserved3 = 0x0, rh_reserved4 = 0x0 }
SCSI 命令初始化和提交
除了通用塊層下發的scsi命令之外,可以通過sg來下發scsi命令。
SCSI 子系統的錯誤處理
由於 硬盤底層驅動是由廠商自己實現的,在此就不予討論。除此之外,SCSI 子系統的出錯處理,主要是由 SCSI 中間層完成。在第一次回調過程中,SCSI 底層驅動將 SCSI 命令的處理結果以及獲取的 SCSI 狀態信息返回給 SCSI 中間層,SCSI 中間層先對 SCSI 底層驅動返回的 SCSI 命令執行的結果進行判斷,若無法得到明確的結論,則對 SCSI 底層驅動返回的 SCSI 狀態、感測數據等進行判斷。對於判斷結論為處理成功的 SCSI 命令,SCSI 中間層會直接進行第二次回調;對於判斷結論為需要重試的命令,則會被加入塊設備請求對列,重新被處理。這個過程可稱為 SCSI 中間層對 SCSI 命令執行結果的基本判斷方法。
一切看起來似乎是這么簡單,但是實際上並非如此,有些錯誤是沒有明確的判斷依據的,如感測數據錯誤或 TIMEOUT 錯誤。為了解決這個問題,LINUX 內核中 SCSI 子系統引入了一個專門進行錯誤處理的線程,對於無法判斷錯誤原因的 SCSI 命令,都會交由該線程進行處理。線程處理過程和兩個隊列密切相關,一個是錯誤處理隊列(eh_work_q
),一個是錯誤處理完成隊列 (done_q
) 。錯誤處理隊列記錄了需要進行錯誤處理的 SCSI 命令,錯誤處理完成隊列記錄了在錯誤處理過程中被處理完成的 SCSI 命令。下圖顯示了線程對錯誤處理隊列上記錄的命令進行錯誤處理的過程。
錯誤處理的過程
static void scsi_unjam_host(struct Scsi_Host *shost) { unsigned long flags; LIST_HEAD(eh_work_q); LIST_HEAD(eh_done_q); spin_lock_irqsave(shost->host_lock, flags); list_splice_init(&shost->eh_cmd_q, &eh_work_q); spin_unlock_irqrestore(shost->host_lock, flags); SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q)); if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q)) if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q)) scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); spin_lock_irqsave(shost->host_lock, flags); if (shost->eh_deadline != -1) shost->last_reset = 0; spin_unlock_irqrestore(shost->host_lock, flags); scsi_eh_flush_done_q(&eh_done_q); }
整個處理過程可歸納為四個階段:
- 感測數據查詢階段
通過查詢感測數據,為處理 SCSI 命令重新提供判斷依據,並按照前述基本判斷方法進行判斷。如果判斷結果為成功或者重試,則可將該命令從錯誤處理隊列移到錯誤處理完成隊列。若判斷失敗,則命令將會繼續保留在 SCSI 錯誤處理隊列中,錯誤處理進入到 ABORT 階段。
- ABORT階段
在這個階段中,錯誤處理隊列上的 SCSI 命令會被主動 ABORT 掉。被 ABORT 的命令,會被加入到錯誤處理完成隊列。若 ABORT 過程結束,錯誤處理隊列上還存在未能被處理的命令,則需進入 START STOP UNIT 階段進行處理。
- START STOP UNIT階段
在這個階段,START STOP UNIT[6] 命令會被發送到與錯誤處理隊列上的命令相關的 SCSI DEVICE 上,去試圖恢復 SCSI DEVICE,如果在 START STOP UNIT 階段結束后,依舊有命令在錯誤處理隊列上,則需要進入 RESET 階段進行處理。
- RESET階段
RESET 階段的處理過程分四個層次:DEVICE RESET,TARGET RESET, BUS RESET 和 HOST RESET 。首先對與錯誤隊列上的命令相關的 SCSI DEVICE,進行 RESET 操作,如果 DEVICE RESET 后,SCSI 設備能處於正常狀態,則和該設備相關的錯誤處理隊列上的錯誤命令,會被加入到錯誤處理完成隊列中。若通過 DEVICE RESET 不能處理所有的錯誤命令,則需進入TARGET RESET,再失敗則需進入到 BUS RESET 階段,BUS RESET 會對與錯誤處理隊列上的命令相關的 BUS,進行 RESET 操作。若 BUS RESET 還不能成功處理所有錯誤處理隊列上的 SCSI 命令,則會進入到 HOST RESET 階段,HOST RESET 會對與錯誤處理隊列上的命令相關的 HOST 進行 RESET 操作。當然,很有可能 HOST RESET 也不能成功處理所有錯誤命令,則只能認為錯誤處理隊列上錯誤命令相關的 SCSI 設備不能被使用了。這些不能被使用的設備會被標記為不能使用狀態,同時相關的錯誤命令都會被加入到錯誤處理完成隊列中。對應的函數如下:
那些簡寫:
void blk_rq_timed_out_timer(unsigned long data) { struct request_queue *q = (struct request_queue *) data; unsigned long flags, next = 0; struct request *rq, *tmp; int next_set = 0; spin_lock_irqsave(q->queue_lock, flags); list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list) blk_rq_check_expired(rq, &next, &next_set);-------遍歷下發給驅動的request,查看這些request是否超時了,這些request都串接在timeout_list中 if (next_set) mod_timer(&q->timeout, round_jiffies_up(next)); spin_unlock_irqrestore(q->queue_lock, flags); }
這里有一個需要注意的地方,從網上看,之前是一個request一個定時器,這樣定時器就可能設置很多,而且這些定時器很有可能都沒有用到,畢竟超時的概率還是比較低的,所以要不停創建和插入加刪除定時器,而目前是一個request_queue一個定時器,然后這個定時器負責掃描到期的request,且這個定時器是常駐內存的。
static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout, unsigned int *next_set) { if (time_after_eq(jiffies, rq->deadline)) {-------這個request超時了 list_del_init(&rq->timeout_list);-------從request_queue的timeout_list中摘取出來 /* * Check if we raced with end io completion */ if (!blk_mark_rq_complete(rq))---防止並發 blk_rq_timed_out(rq);------------處理這個超時的req } else if (!*next_set || time_after(*next_timeout, rq->deadline)) { *next_timeout = rq->deadline; *next_set = 1; } }
static void blk_rq_timed_out(struct request *req) { struct request_queue *q = req->q; enum blk_eh_timer_return ret; ret = q->rq_timed_out_fn(req);---我們調用的是 scsi_times_out switch (ret) { case BLK_EH_HANDLED: /* Can we use req->errors here? */ __blk_complete_request(req); break; case BLK_EH_RESET_TIMER: blk_add_timer(req); blk_clear_rq_complete(req); break; case BLK_EH_NOT_HANDLED: /* * LLD handles this for now but in the future * we can send a request msg to abort the command * and we can move more of the generic scsi eh code to * the blk layer. */ break; default: printk(KERN_ERR "block: bad eh return: %d\n", ret); break; } }
If all scmds either complete or fail, the number of in-flight scmds
becomes equal to the number of failed scmds - i.e. shost->host_busy ==
shost->host_failed. This wakes up SCSI EH thread. So, once woken up,
SCSI EH thread can expect that all in-flight commands have failed and
are linked on shost->eh_cmd_q.
對於LUN的定義位於中間層的scsi_device結構體。而對於node的定義是中間層的scsi_target結構體,channel沒有對應的結構體,如果對應的是硬盤,則還有一個scsi_disk的抽象,光盤的話,則有一個類似的 scsi_cd 結構。
系統中也有可能同時存在多個SCSI控制芯片,比如常見的服務器帶jbod的方式接入存儲,也即多個SCSIhost。對於如何定位每個LUN設備就需要一種編碼方式。根據拓撲結構可以很容易的知道定位的編碼方式是:host_id: channel_id: node_id:lun_id。這些ID的生成方式不討論,但是根據每個各設備的編號就可以定位到具體的單個lun設備了。
對於被加入到錯誤處理完成隊列上的請求,若是在設備狀態正確,命令重試次數小於允許次數的情況下,這些命令將被重新加入到塊訪問請求隊列中,進行重新處理;否則,直接進行第二次回調處理,完成 SCSI 子系統對塊訪問請求的處理。這樣,SCSI 子系統就完成了 SCSI 命令錯誤處理的整個過程。
static void scsi_softirq_done(struct request *rq) { struct scsi_cmnd *cmd = rq->special; unsigned long wait_for = (cmd->allowed + 1) * rq->timeout; int disposition; INIT_LIST_HEAD(&cmd->eh_entry); atomic_inc(&cmd->device->iodone_cnt); if (cmd->result) atomic_inc(&cmd->device->ioerr_cnt); disposition = scsi_decide_disposition(cmd); if (disposition != SUCCESS && time_before(cmd->jiffies_at_alloc + wait_for, jiffies)) { sdev_printk(KERN_ERR, cmd->device, "timing out command, waited %lus\n", wait_for/HZ); disposition = SUCCESS; } scsi_log_completion(cmd, disposition); switch (disposition) { case SUCCESS: scsi_finish_command(cmd); break; case NEEDS_RETRY: scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY); break; case ADD_TO_MLQUEUE: scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); break; default: if (!scsi_eh_scmd_add(cmd, 0)) scsi_finish_command(cmd); } }
sd表示磁盤,(你可以使用scsi_disk簡寫的方式來記憶,對應的模塊是sd_mod)sr表示光盤,st表示磁帶,sg表示通用,文件系統向下調用磁盤中的文件需要用到的是sd,而sg內核驅動的存在使我們可以不使用文件系統,直接在用戶空間調用scsi命令,比如有一次crash,看到大多數命令都是REQ_TYPE_FS,但是有一個是dfs通過ioctl直接訪問硬盤,命令類型就是 REQ_TYPE_BLOCK_PC。一個lun可能對應一個sd或sr,也可能對應一個級聯phy口。Linux中的SCSI層看起來只包含SCSI命令,並不完全實現標准的scsi協議,你可以把linux的scsi理解為符合協議的一個命令構造,命令執行,命令返回的控制層。
sd,sr等,都需要實例化一個 scsi_driver 的對象,
struct scsi_driver { struct module *owner; struct device_driver gendrv; void (*rescan)(struct device *); int (*init_command)(struct scsi_cmnd *); void (*uninit_command)(struct scsi_cmnd *); int (*done)(struct scsi_cmnd *); int (*eh_action)(struct scsi_cmnd *, int); int (*scsi_mq_reserved1)(struct scsi_cmnd *); void (*scsi_mq_reserved2)(struct scsi_cmnd *); void (*rh_reserved)(void); };
比如我們的sd,則實例化如下:
static struct scsi_driver sd_template = { .owner = THIS_MODULE, .gendrv = { .name = "sd", .probe = sd_probe, .remove = sd_remove, .shutdown = sd_shutdown, .pm = &sd_pm_ops, }, .rescan = sd_rescan, .init_command = sd_init_command, .uninit_command = sd_uninit_command, .done = sd_done,-----------------------阮中斷回調 .eh_action = sd_eh_action, };
正常返回時:
0xffffffffc0273860 : sd_done+0x0/0x350 [sd_mod] 0xffffffff8147527d : scsi_finish_command+0xcd/0x140 [kernel] 0xffffffff8147f7b2 : scsi_softirq_done+0x142/0x190 [kernel]---這個就是req->q->softirq_done_fn 0xffffffff8130ec66 : blk_done_softirq+0x96/0xc0 [kernel]------處理io返回的軟中斷 0xffffffff810960ed : __do_softirq+0xfd/0x290 [kernel] 0xffffffff816cf45c : call_softirq+0x1c/0x30 [kernel] 0xffffffff8102d465 : do_softirq+0x65/0xa0 [kernel] 0xffffffff81096535 : irq_exit+0x175/0x180 [kernel] 0xffffffff810522b9 : smp_call_function_single_interrupt+0x39/0x40 [kernel] 0xffffffff816ceb77 : call_function_single_interrupt+0x87/0x90 [kernel]
scsi_finish_command 是一個很關鍵的函數,比如清除上層request的定時器之類的動作在這個函數中調用完成。
0xffffffff81307d10 : blk_finish_request+0x0/0x100 [kernel] 0xffffffff814800f6 : scsi_end_request+0x116/0x1e0 [kernel] 0xffffffff81480388 : scsi_io_completion+0x168/0x6a0 [kernel] 0xffffffff8147528c : scsi_finish_command+0xdc/0x140 [kernel] 0xffffffff8147f7b2 : scsi_softirq_done+0x142/0x190 [kernel] 0xffffffff8130ec66 : blk_done_softirq+0x96/0xc0 [kernel] 0xffffffff810960ed : __do_softirq+0xfd/0x290 [kernel] 0xffffffff816cf45c : call_softirq+0x1c/0x30 [kernel] 0xffffffff8102d465 : do_softirq+0x65/0xa0 [kernel] 0xffffffff81096535 : irq_exit+0x175/0x180 [kernel] 0xffffffff810522b9 : smp_call_function_single_interrupt+0x39/0x40 [kernel] 0xffffffff816ceb77 : call_function_single_interrupt+0x87/0x90 [kernel]
硬中斷的回調:
0xffffffff8147ec70 : scsi_done+0x0/0x60 [kernel] 0xffffffffc0166fd7 : _scsih_io_done+0x117/0x11a0 [mpt3sas] 0xffffffffc0156ad7 : _base_interrupt+0x247/0xc80 [mpt3sas] 0xffffffff81138c74 : __handle_irq_event_percpu+0x44/0x1c0 [kernel] 0xffffffff81138e22 : handle_irq_event_percpu+0x32/0x80 [kernel] 0xffffffff81138eac : handle_irq_event+0x3c/0x60 [kernel] 0xffffffff8113bbaf : handle_edge_irq+0x7f/0x150 [kernel] 0xffffffff8102d321 : handle_irq+0xe1/0x1c0 [kernel] 0xffffffff816d058d : __irqentry_text_start+0x4d/0xf0 [kernel] 0xffffffff816c4287 : ret_from_intr+0x0/0x15 [kernel]
對於用於錯誤恢復的scsi命令,比如scsi_send_eh_cmnd 函數,設置的 scmd->scsi_done = scsi_eh_done;而正常下發的命令則一般是scsi_done.
上層從通用塊層接收到了數據訪問的請求,將其轉化為SCSI命令,這個命令在上層中定義為scsi_cmnd結構體。然后調用中間層的scsi_host_template結構體中定義的queuecommand接口,將此命令交付中層處理。在命令處理結束,本層的回調函數會被以軟中斷的形式調用,以處理與命令相關的后續操作和通知通用塊層該條命令的執行結果。
root 1007 2 0 Feb26 ? 00:00:00 [scsi_eh_0] root 1019 2 0 Feb26 ? 00:00:00 [scsi_eh_1] root 1030 2 0 Feb26 ? 00:00:00 [scsi_eh_2] root 1036 2 0 Feb26 ? 00:00:00 [scsi_eh_3] root 1046 2 0 Feb26 ? 00:00:00 [scsi_eh_4] root 1054 2 0 Feb26 ? 00:00:00 [scsi_eh_5]
response
對CDB命令的響應命令叫sense。但是這個響應可不是自動產生的,需要scsi設備主動使用sense request命令去查詢。所以對於發送request方來說,命令的執行結束分為兩個階段,發送成功和磁盤設備執行成功。函數調用結束的狀態只表示是本機發送該命令的結果狀態,而不表示實際磁盤設備的執行情況。如果需要獲得執行情況,需要去手動獲取sense數據。
目前的linux的scsi實現就是這兩個階段的回調,一個是處理本機處理結果,另一個是發送sense request查詢設備的執行結果,才會繼續向下執行。
SCSI Enclosure Services (SES)
參考資料:
Documentation/scsi/scsi_eh.txt
彪哥的博客《http://blog.chinaunix.net/uid-14528823-id-4924157.html》