通過遠控發現有幾塊壞的硬盤

Raid10環境下換硬盤還是很簡單的,支持熱插拔,直接拔下換掉就可以了,下面是操作步驟。
通過磁盤SN查看壞磁盤是哪個(可以在遠控查看磁盤SN)
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -B 25 3SL1KEF2
卸載故障硬盤
/opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv[32:7] -a0
上面命令中 32 和 7 以及 -a0 的對應關系:
Adapter #0
Enclosure Device ID: 32
Slot Number: 7
點亮指定硬盤(定位,讓磁盤閃燈)
/opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[32:7] -a0
注:磁盤換完后關閉指定硬盤指示燈
/opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[32:7] -a0
替換故障硬盤
此時故障硬盤已經OFFLINE,在服務器現場查看時,故障硬盤閃爍的是黃燈,正常硬盤的綠燈; 拔下故障硬盤,插上好硬盤,硬盤燈閃爍為綠色,並硬盤快速旋轉,表示硬盤正在rebuild狀態,查看狀態如下:
$ MegaCli -PDList -aAll -NoLog
...
Enclosure Device ID: 32
Slot Number: 7
...
Firmware state: Rebuild
查看rebuild進度
# /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv[32:7] -aAll
Rebuild Progress on Device at Enclosure 32, Slot 3 Completed 16% in 94 Minutes.
或者以動態可視化文字界面顯示
#/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ProgDsply -PhysDrv[32:7] -a0
Rebuild progress of physical drives...
Enclosure:Slot Percent Complete Time Elps
032 :07 #######****************15 %*********************** 00:24:37
Press <ESC> key to quit...
換盤完成
# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep 'Firmware state'
Firmware state: Copyback
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Hotspare, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Offline
設置熱備
為了防止磁盤損壞過多,為raid設置一個熱備盤
# /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -Dedicated -Array1 -physdrv[32:9] -a0 #添加局部熱備盤,其中array1表示第1個raid(Target Id: 1)
添加完成后查看熱備的位置
# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :Virtual Disk 0
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 223.0 GB
Sector Size : 512
Mirror Data : 223.0 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: No
LD has drives that support T10 power conditions: No
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 1.635 TB
Sector Size : 512
Mirror Data : 1.635 TB
State : Degraded
Strip Size : 64 KB
Number Of Drives per span:2
Span Depth : 3
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: Yes
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No
Number of Dedicated Hot Spares: 1
0 : EnclId - 32 SlotId - 9
Exit Code: 0x00
# 查看邏輯盤詳細信息
sudo /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aAll -NoLog
當有raid有熱備的時候,更換磁盤會是Firmware state: Copyback的狀態
查看copyback的進度可以直接查看日志
# watch -n 30 'MegaCli -FwTermLog -Dsply -aALL | tail -f'
Every 30.0s: MegaCli -FwTermLog -Dsply -aALL | tail -f
07/29/19 13:16:36: Load Balance Statistics Path0PDs d Path1PDs 0
07/29/19 13:16:36: EVT#25896-07/29/19 13:16:36: 91=Inserted: PD 00(e0x20/s0)
07/29/19 13:16:36: EVT#25897-07/29/19 13:16:36: 247=Inserted: PD 00(e0x20/s0) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=5000c500720794fd,0000000000000000
07/29/19 13:16:37: request temp sensor i2c failed
07/29/19 13:16:37: PD_InsertionPostProcess: Setting foreign DDF type on pd=0
07/29/19 13:16:37: EVT#25898-07/29/19 13:16:37: 114=State change on PD 00(e0x20/s0) from UNCONFIGURED_BAD(1) to UNCONFIGURED_GOOD(0)
07/29/19 13:16:37: pdHspHistCheckInsertedPdCallback: Start copy back from sparePd=03 to pd=0, changing entryType to ok
07/29/19 13:16:37: ArDiskTypeMisMatch : NO_MIXING_VIOLATION array=1 destPD=0
07/29/19 13:16:37: EVT#25899-07/29/19 13:16:37: 281=CopyBack automatically started on PD 00(e0x20/s0) from PD 03(e0x20/s3)
07/29/19 13:16:37: EVT#25900-07/29/19 13:16:37: 114=State change on PD 00(e0x20/s0) from UNCONFIGURED_GOOD(0) to COPYBACK(20)
07/29/19 13:18:18: EVT#25901-07/29/19 13:18:18: 279=CopyBack progress on PD 00(e0x20/s0) is 0.99%(99s)
07/29/19 13:19:57: EVT#25902-07/29/19 13:19:57: 279=CopyBack progress on PD 00(e0x20/s0) is 1.99%(197s)
07/29/19 13:21:37: EVT#25903-07/29/19 13:21:37: 279=CopyBack progress on PD 00(e0x20/s0) is 2.99%(297s)
07/29/19 13:23:17: EVT#25904-07/29/19 13:23:17: 279=CopyBack progress on PD 00(e0x20/s0) is 3.99%(397s)
07/29/19 13:24:57: EVT#25905-07/29/19 13:24:57: 279=CopyBack progress on PD 00(e0x20/s0) is 4.99%(497s)
07/29/19 13:26:39: EVT#25906-07/29/19 13:26:39: 279=CopyBack progress on PD 00(e0x20/s0) is 5.99%(598s)
Exit Code: 0x00
megacli基本用法
# 查raid級別
$ megacli -LDInfo -Lall -aALL
# 查看邏輯盤詳細信息
$
/opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aAll -NoLog
# 查raid卡信息
$ megacli -AdpAllInfo -aALL
# 查看硬盤信息
$ /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
# 查看電池信息
$ megacli -AdpBbuCmd -aAll
# 查看raid卡日志
$ /opt/MegaRAID/MegaCli/MegaCli64 -FwTermLog -Dsply -aALL
# 顯示適配器個數
$ megacli -adpCount
# 顯示適配器時間
$ megacli -AdpGetTime –aALL
# 顯示所有適配器信息
$ megacli -AdpAllInfo -aAll
# 顯示所有邏輯磁盤組信息
$ megacli -LDInfo -LALL -aAll
# 顯示所有的物理信息
$ megacli -PDList -aAll
# 查看充電狀態
$ megacli -AdpBbuCmd -GetBbuStatus -aALL |grep 'Charger Status'
# 顯示BBU狀態信息
$ megacli -AdpBbuCmd -GetBbuStatus -aALL
# 顯示BBU容量信息
$ megacli -AdpBbuCmd -GetBbuCapacityInfo -aALL
# 顯示BBU設計參數
$ megacli -AdpBbuCmd -GetBbuDesignInfo -aALL
# 顯示當前BBU屬性
$ megacli -AdpBbuCmd -GetBbuProperties -aALL
# 顯示Raid卡型號,Raid設置,Disk相關信息
$ megacli -cfgdsply -aALL
## 磁帶狀態的變化,從拔盤,到插盤的過程中。
Device |Normal |Damage |Rebuild |Normal
Virtual Drive |Optimal|Degraded|Degraded|Optimal
Physical Drive |Online |Failed Unconfigured|Rebuild|Online
# 查看物理磁盤狀態:
$ megacli -PDRbld -ShowProg -PhysDrv [Enclosure Device ID:Slot Number] -a0
## Rebuild 中的物理磁盤狀態中會顯示:"Firmware state: Rebuild"
# 查詢 Rebuild 進度:
$ megacli -pdrbld -showprog -physdrv[E:S] -aALL
## 返回內容類似於下面這樣:
Rebuild Progress on Device at Enclosure 32, Slot 5 Completed 77% in 101 Minutes.
# 以文本進度條樣式顯示 Rebuild 進度:
$ megacli -pdrbld -progdsply -physdrv[E:S] -aALL
## 屏幕顯示類似下面的內容:
Rebuild progress of physical drives...
Enclosure:Slot Percent Complete Time Elps
032 :05 #######################87 %################******* 01:59:07
Press key to quit...
# 查看 RAID 卡 Rebuild 參數:
$ megacli -AdpAllinfo -aALL | grep -i rebuild
## 返回結果類似下面這樣
Rebuild Rate : 30%
Auto Rebuild : Enabled
Rebuild Rate : YesForce
Rebuild : Yes
# 設置 RAID 卡 Rebuild 比例為60%(提升Rebuild速度):
$ /opt/MegaRAID/MegaCli/MegaCli64 -AdpSetProp RebuildRate -60 -a0
## 設置成功后返回:
Adapter 0: Set rebuild rate to 60% success.
# 設置HotSpare
/opt/MegaRAID/MegaCli/MegaCli64 -pdhsp -set[-Dedicated[-Array2]][-EnclAffinity][-nonRevertible]-PhysDrv[4:11]-a0
/opt/MegaRAID/MegaCli/MegaCli64 -pdhsp -set[-EnclAffinity][-nonRevertible]-PhysDrv[32:1}]-a0
MegaCli -PDHSP -Set -Dedicated -Array0 -physdrv[E:S] -a0 添加局部熱備盤,其中array0表示第0個raid(Target Id: 0)
示范:sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -Dedicated -Array1 -physdrv[32:9] -a0 #添加局部熱備盤,其中array1表示第1個raid(Target Id: 1)
MegaCli -pdhsp -set -physdrv[E:S] -a0 添加全局熱備盤
MegaCli -pdhsp -rmv -physdrv[E:S] -a0 移除全局和熱備局部熱備
示范:sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -rmv -physdrv[32:9] -a0
# 刪除陣列
/opt/MegaRAID/MegaCli/MegaCli64 -cfglddel -L2
-Force -a0 強制刪除指定的raid組(Target Id: 2)的raid組,可以通過上面的“查看邏輯盤詳細信息”得到。(有時不加強制參數,會報錯--Virtual Disk is associate with Cache Cade. Please Use force option to delete)
/opt/MegaRAID/MegaCli/MegaCli64 -cfgclr -a0 清除所有的raid組的配置
# 清除外來配置
/opt/MegaRAID/MegaCli/MegaCli64 -cfgforeign -clear -a0
# 再次掃描外來配置的個數
/opt/MegaRAID/MegaCli/MegaCli64 -cfgforeign -scan -a0
常見問題:
1.Firmware state: Unconfigured(good), Spun Up( Idrac監控報錯:登陸idrac卡后如下如所示:硬盤狀態是感嘆號,狀態是外來)
解決辦法:/opt/MegaRAID/MegaCli/MegaCli64 -CfgForeign -Import -aall
導入后我們發現了另外一個問題,就是這塊磁盤歸屬到一個只有一塊磁盤的raid組中了,這和我本來要把這塊磁盤加到熱備的目的有沖突
於是我們刪除新出現的raid組
/opt/MegaRAID/MegaCli/MegaCli64 -cfglddel -L2 -Force -a0 強制刪除指定的raid組(Target Id: 2)的raid組,可以通過上面的“查看邏輯盤詳細信息”得到。(有時不加強制參數,會報錯--Virtual Disk is associate with Cache Cade. Please Use force option to delete)
最后執行
將驅動設置為熱備(hotspare)。
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -Dedicated -Array1 -physdrv[32:9] -a0
2.Firmware state: Unconfigured(bad) 怎么解決--我有新的磁盤想作為磁盤組的熱備
Enclosure Device ID: 32
Slot Number: 9
Enclosure position: 1
Device Id: 9
Firmware state:
Unconfigured(bad)
服務器硬盤出現Unconfigured Bad可能是因為驅動器出現誤差,具體操作如下:
1、用命令行監測一下驅動是否配置良好。
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDMakeGood -physdrv[32:9] -a0
2、再檢測一下32:9的狀態是否配置良好。
Enclosure Device ID: 32
Slot Number: 9
Enclosure position: 1
Device Id: 9
Firmware state: Unconfigured(good), Spun Up
3、然后需要清理一下foreign conifig。(坑的一毛 整個服務器掛機了,千萬不要執行清理foreign conifig,要不只能去bios里導入foreign conifig才能恢復)
### sudo /opt/MegaRAID/MegaCli/MegaCli64 -cfgforeign -clear -a0
/opt/MegaRAID/MegaCli/MegaCli64 -CfgForeign -Import -aall #謹慎操作
參考:
http://www.51niux.com/?id=77(MegaCLI 工具的使用)
4、最后清除以前的外部配置,將驅動設置為熱備(hotspare)。
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -Dedicated -Array1 -physdrv[32:9] -a0