blktrace是一款block層的trace工具,block層在IO路徑上的位置:
一個IO的生命周期大約是:
● I/O enters block layer – it can be:
– Remapped onto another device (MD, DM)
– Split into 2 separate I/Os (alignment, size, ...)
– Added to the request queue
– Merged with a previous entry on the queue All I/Os end up on a request queue at some point
● At some later time, the I/O is issued to a device driver, and submitted to a device
● Later, the I/O is completed by the device, and its driver
blkparse顯示的各指標點示意:
Q------->G------------>I--------->M------------------->D----------------------------->C
|-Q time-|-Insert time-|
|--------- merge time ------------|-merge with other IO|
|----------------scheduler time time-------------------|---driver,adapter,storagetime--|
|----------------------- await time in iostat output ----------------------------------|
其中:
Q2Q — time between requests sent to the block layer
Q2G — timefrom a block I/O is queued to the time it gets a request allocatedforit
G2I — time from a request is allocated to the time it is Inserted into the device's queue
Q2M — timefrom a block I/O is queued to the time it gets merged with an existing request
I2D — timefrom a request is inserted into the device's queue to the time it is actually issued to the device
M2D — time froma block I/O is merged with an exiting request until the request is issued to the device
D2C — service time of the request by the device
Q2C — total time spent in the block layerfora request
下面通過示例簡單介紹使用blktrace工具鏈分析IO的一般方法:
1,使用blktrace 抓取設備上的IO信息:
blktrace -w 120 -d /dev/nvme0n1
這會在本地目錄下面生成device.blktrace.cpu命名的一堆二進制文件
2,使用blkparse讀取blktrace生成的二進制文件:
blkparse -i nvme0n1 -d blkparse.out
這個命令會將分析結果輸出到屏幕,並且將分析結果的二進制數據輸出到blkparse.out文件中
3,使用btt查看和分析各種IO相關數據
3.1 使用btt查看IO的整體情況:
btt -i blkparse.out
上圖中幾個X2Y的解釋:
Q2I – time it takes to process an I/O prior to it being inserted or merged onto a request queue – Includes split, and remap time
I2D – time the I/O is “idle” on the request queue
D2C – time the I/O is “active” in the driver and on the device
Q2I + I2D + D2C = Q2C
Q2C: Total processing time of the I/O
可以看到設備處理時間D2C占整個處理時間Q2C的91.95%
3.3 使用btt查看每個請求的latency的詳細情況:
btt -i blkparse.out -q q2c.lat
它會生成下面這些文件:
-rw-r--r-- 1 root root 876 Jun 13 18:14 sys_mbps_fp.dat
-rw-r--r-- 1 root root 451 Jun 13 18:14 sys_iops_fp.dat
-rw-r--r-- 1 root root 429815 Jun 13 18:14 q2c.lat_259,6_q2c.dat
-rw-r--r-- 1 root root 876 Jun 13 18:14 259,6_mbps_fp.dat
-rw-r--r-- 1 root root 451 Jun 13 18:14 259,6_iops_fp.dat
-rw-r--r-- 1 root root 451 Jun 13 18:14 sys_iops_fp.dat
-rw-r--r-- 1 root root 429815 Jun 13 18:14 q2c.lat_259,6_q2c.dat
-rw-r--r-- 1 root root 876 Jun 13 18:14 259,6_mbps_fp.dat
-rw-r--r-- 1 root root 451 Jun 13 18:14 259,6_iops_fp.dat
sys_mbps_fs.dat中是本次統計中所有設備吞吐量,sys_iops_fp.dat中是本次統計中所有設備的IOPS,q2c.lat_259,6_q2c.dat中是每個請求的q2c的latency詳情:
第一列表示時間(以秒為單位),第二列表示每個請求的q2c處理時間
也可以用-l查看d2c的latency
3.4 使用btt查看IO pattern
btt -i blkparse.out -B offset
它會生成三個文件:
-rw-r--r-- 1 root root 819132 Jun 13 18:21 offset_259,6_w.dat
-rw-r--r-- 1 root root 108 Jun 13 18:21 offset_259,6_r.dat
-rw-r--r-- 1 root root 819240 Jun 13 18:21 offset_259,6_c.dat
-rw-r--r-- 1 root root 108 Jun 13 18:21 offset_259,6_r.dat
-rw-r--r-- 1 root root 819240 Jun 13 18:21 offset_259,6_c.dat
prefix_device_r.dat
All read block numbers are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
prefix_device_w.dat
All write block numbers are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
prefix_device_c.dat
All block numbers (read and write) are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
All read block numbers are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
prefix_device_w.dat
All write block numbers are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
prefix_device_c.dat
All block numbers (read and write) are output, first column is time (seconds), second is the block number, and the third column is the ending block number.
4,高級功能
blkparse的 -f 選項能從trace數據中抓取特定的信息輸出。
比如:
blkparse -i nvme0n1.blktrace.* -f "%5T.%9t, %p, %C, %a, %d, %N\n" -a complete -o output.txt
它會將進程號(%p),進程名(%C),操作類型(%a),LBA號(%d)和LBA個數(%N)這些信息輸出到output.txt中:
其他格式化參數請man blkparse。
更多用法請參考man blktrace和man blkparse.
值得一提的是,blktrace對應用程序的性能影響極小,作者是這么說的:Seeing less than 2% hits to application performance in relatively stressful I/O situations。
