1 故障起因
收到白盒告警:線上機器ip:x.x.x.x 文件系統沒有掛載(/search/odin)。
看來得登上機器排查了。
2 df -h看下情況
[@djt_22_168 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 40G 5.4G 32G 15% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 8.6M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
tmpfs 783M 0 783M 0% /run/user/0
果然/search/odin沒了。
3 journalctl看日志
- journalctl 用來查詢 systemd-journald 服務收集到的日志。systemd-journald服務是systemd init 系統提供的收集系統日志的服務。
- 使用journalctl -xb;看到錯誤上是有關I/O的錯誤,首先想到是不是磁盤問題,搜索/mount,按n逐步搜索,看下有沒有錯誤。

4 嘗試掛載
[@djt_22_168 ~]# mount /dev/vdb /search/odin/
mount: mount /dev/vdb on /search/odin failed: Structure needs cleaning
這是xfs文件系統,報錯需要修復。
5 修復磁盤
- 如果是ext4文件系統,使用命令fsck.ext4 /dev/xxx修復;
- 如果是xfs文件系統,使用命令 xfs_repair -L /dev/xxx修復。
- 一般情況修復后均可掛載,如果磁盤有問題,或者陣列出問題時此種修復可能會失敗,掛載時依然要求格盤,那就果斷的格盤。
[@djt_22_168 ~]# xfs_repair /dev/vdb
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
報錯,提示使用-L參數:
[@djt_22_168 ~]# xfs_repair -L /dev/vdb
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
agi unlinked bucket 11 is 7499 in ag 3 (inode=805313867)
sb_icount 7296, counted 13184
sb_ifree 111, counted 644
sb_fdblocks 78500862, counted 59430965
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 3
- agno = 2
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 805313867, moving to lost+found
Phase 7 - verify and correct link counts...
Maximum metadata LSN (815:101693) is ahead of log (1:2).
Format log to cycle 818.
done
修復成功。
6 再次掛載
[@djt_22_168 ~]# mount /dev/vdb /search/odin
[@djt_22_168 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 40G 5.4G 32G 15% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 8.5M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
tmpfs 783M 0 783M 0% /run/user/0
/dev/vdb 300G 74G 227G 25% /search/odin
掛載成功。
6 恢復服務
由於我的線上是php+nginx服務,且接入層做過負載均衡。現在修復文件系統需要重新啟動php-fpm與nginx,然后就OK了。
