Linux系統中,進行頻繁的讀寫操作,容易發送只讀、以及磁盤損壞等故障。下文為其解決方案:
1、如何界定磁盤已經存在故障
方法一(界定將如下內容另存為Repair.sh然后執行即可):
#!/bin/sh
cd /root
DiskFlag=`/bin/df -k | /bin/awk '{print $1"\t"$2}' | /bin/sort -k 2 -n | /bin/awk 'END{print $1}'`
num=`tune2fs -l $DiskFlag | grep -c "clean with errors"`
echo $num
if [ $num -lt 1 ];then
date >> RepairDisk.log
echo -e "System Is OK ! " >> RepairDisk.log
echo >> RepairDisk.log
exit 0
else
echo -e '\033[0;31;1m Repairing Operationing System!\033[0m'
date >> RepairDisk.log
echo "Start Repairing Disk ! " >> RepairDisk.log
fsck.ext3 -y /dev/sda6 >> RepairDisk.log ###修復
echo "Repairing Disk End! " >> RepairDisk.log
date >> RepairDisk.log
fi
====上文的腳本中,包含了如下查找最大的磁盤以及將發現故障時自動修復。這種修復方案在邏輯層損壞尤其有效。
方案二(通過查看mount信息界定磁盤是否存在只讀只讀時,文件會有ro的信息):
cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,data=ordered 0 0
/dev /dev tmpfs rw 0 0
/proc /proc proc rw 0 0
/sys /sys sysfs rw 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
devpts /dev/pts devpts rw 0 0
/dev/sda2 /b ext3 rw,data=ordered 0 0
/dev/sda1 /boot ext3 rw,data=ordered 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
/etc/auto.misc /misc autofs rw,fd=7,pgrp=2664,timeout=300,minproto=5,maxproto=5,indirect 0 0
-hosts /net autofs rw,fd=13,pgrp=2664,timeout=300,minproto=5,maxproto=5,indirect 0 0
/dev/sda6 /usr/share/TSMIS ext3 rw,data=ordered 0 0
方案三(界定是否存在硬件故障,方案只做只讀測試):
- # badblocks /dev/sda1 從物理層掃描硬盤有無壞塊
- # badblocks -v /dev/sda1 同上,運行時輸出詳細信息
- Checking blocks 0 to 200781
- Checking for bad blocks (read-only test): done
- Pass completed, 0 bad blocks found.
可以看到進度:
- # badblocks -vsn /dev/sda1 檢查壞塊,不具破壞性
- Checking for bad blocks in non-destructive read-write mode
- From block 0 to 200781
- Testing with random pattern: Pass completed, 0 bad blocks found.
方案四(有損測試,會擦拭硬盤內所有的數據):
警告 這條命令會擦除硬盤分區里的所有數據。
- # badblocks -vsw /dev/sda1 檢查壞塊,具有破壞性
- Checking for bad blocks in read-write mode
- From block 0 to 200781
- Testing with pattern 0xaa: done
- Reading and comparing: done
- Testing with pattern 0x55: done
- Reading and comparing: done
- Testing with pattern 0xff: done
- Reading and comparing: done
- Testing with pattern 0x00: done
- Reading and comparing: done
- Pass completed, 0 bad blocks found.
方案五(如果是ext3的文件系統,可以用fsck進行測試)
- # fsck -TVy /dev/sda1
- [/sbin/fsck.ext3 (1) -- /mnt/mymount] fsck.ext3 -y /dev/sda1
- e2fsck 1.39 (29-May-2006)
- Couldn't find ext2 superblock, trying backup blocks...
- Resize inode not valid. Recreate? yes
- mypart was not cleanly unmounted, check forced.
- Pass 1: Checking inodes, blocks, and sizes
- Pass 2: Checking directory structure
- Pass 3: Checking directory connectivity
- Pass 4: Checking reference counts
- Pass 5: Checking group summary information
- Free blocks count wrong for group #0 (3552, counted=3553).
- Fix? yes
- Free blocks count wrong (188777, counted=188778).
- Fix? yes
解決:
1、mount的信息優化,比如日志文件,不更新文件
2、 tune2fs -c 5 /dev/sda1 強制重啟多次后磁盤檢查
3、關閉cache,尤其對於電壓不穩的環境, hdparm -W 0 /dev/sda6