個人博客:點擊這里進入
一.問題描述
某一台跑批服務器硬盤無法正常讀寫,提示input/output error,服務器每天均需使用,詢問情況后發現服務器首先為硬盤故障,更換硬盤后提示此錯誤(RAID已正常同步)
二.排查問題
出現問題,先查看日志,收集日志進行分析查看,日志分析結果如下:
[12922471.544897] smartpqi 0000:5e:00.0: reset of scsi 14:1:0:3: SUCCESS
[12922471.545034] sd 14:1:0:3: [sdd] Medium access timeout failure. Offlining disk!
...
[12922471.546144] blk_update_request: I/O error, dev sdd, sector 2351217920
[12922471.546473] sd 14:1:0:3: rejecting I/O to offline device
[12922471.547836] XFS (sdd1): metadata I/O error: block 0x8bbac400 ("xlog_iodone") error 5 numblks 512
[12922471.547840] XFS (sdd1): xfs_do_force_shutdown(0x2) called from line 1200 of file fs/xfs/xfs_log.c. Return address = 0xffffffffc07a1ea0
[12922471.547866] XFS (sdd1): Log I/O Error Detected. Shutting down filesystem
[12922471.547868] XFS (sdd1): Please umount the filesystem and rectify the problem(s)
[12922471.547870] XFS (sdd1): metadata I/O error: block 0x8bbac600 ("xlog_iodone") error 5 numblks 512
[12922471.547872] XFS (sdd1): xfs_do_force_shutdown(0x2) called from line 1200 of file fs/xfs/xfs_log.c. Return address = 0xffffffffc07a1ea0
[12922471.547891] XFS (sdd1): metadata I/O error: block 0x2bc1a6c0 ("xfs_trans_read_buf_map") error 5 numblks 32
[12922471.547898] XFS (sdd1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[12922471.548349] XFS (sdd1): metadata I/O error: block 0xc65b63f8 ("xfs_trans_read_buf_map") error 5 numblks 8
[12922471.548390] XFS (sdd1): metadata I/O error: block 0x8bdb5820 ("xfs_trans_read_buf_map") error 5 numblks 32
[12922471.548408] XFS (sdd1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[12922471.548412] XFS (sdd1): metadata I/O error: block 0x11771540 ("xfs_trans_read_buf_map") error 5 numblks 32
[12922471.548417] XFS (sdd1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
...
[15351852.339037] sd 14:1:0:3: rejecting I/O to offline device
- 查看日志發現磁盤已經offline,並且文件系統已經異常.
三.解決方案
- 1.手動將此硬盤設置為online
# echo running > /sys/block/sdd/device/state
- 2.查詢是否為running
cat /sys/block/sdd/device/state
- 3.修復文件系統,並確認硬盤處於umount狀態(視情況而定,如無法umount則只能進行重啟,我是進行的重啟操作)
- 4.開始修復
XFS : Corruption detected. Unmount and run xfs_repair
官方文檔如下:https://access.redhat.com/solutions/1194613- 5.按照上述方法修復完成后,再進行mount操作