一、MCC概述
Clustered Metro Cluster(簡稱MCC)是Netapp Data Ontap提供的存儲雙活解決方案,當初的方案是把1個FAS/ V系列雙控在數據中心之間拉遠形成異地HA Pair,每站點只有單控制器節點,數據中心兩站點之間通過額外的FC/VI集群適配器相連,數據中心間SAS磁盤框通過SAS轉FC的FibreBridge相連。在500米以內、同一個機房采用直接光纖通道交換機連接;在500米以上(最遠100km)采用光纖通道和DWDM交換機相連。
MetroCluster在此架構上也進行了演變。通過在站點A、B兩個站點分別放置兩套FAS/ V雙控陣列,陣列A的A控和陣列B的A控,陣列A的B控和陣列B的B控分別形成集群,這樣可以充分把A、B站點數據中心資源充分利用,同時對外提供存儲服務;但陣列內的A、B不是集群。如果站點間形成集群Pair的任意一個控制器節點故障,故障站點的主機都需要遠程訪問遠端控制器節點;如何站點間形成集群Pair的兩個節點同時故障,就會發生業務中斷。
Netapp Data Ontap8.3版本推出了4控雙活解決方案,最遠支持200公里距離,4控Metro Cluster方案首先由2個HA Pair組成2個本地集群,然后再從2個集群上做4節點集群。集群控制器之間內存日志通過存放在NVRAM里面,NVRAM對沒有下盤的日志做了鏡像,保證節點故障以后,HA Pair集群的Partner節點能夠接管業務;或者站點故障以后,遠端HA Pair集群能夠接管業務。當日志到達一定水位或者發生系統操作刷盤時,下盤數據同步通過SyncMirror實現主從站點雙寫,從而確保一個站點磁盤故障以后,另外一個站點磁盤還能提供系統訪問,實現站點故障切換,保證業務不中斷。
MetroCluster使用兩個不同地點的鏡像和集群來保護數據,每個集群把數據和Storage Virtual Machine (SVM) 配置都鏡像同步另一個集群。當某個站點發生災難時,管理員可以激活遠端SVM並在另一站點接管業務。此外,每個集群在本地節點均配置為HA Pair,從而提供了本地故障轉移能力。
NetApp MetroCluster是以NetApp SyncMirror是配合Cluster_remote和控制器Cluster Failover的功能實現的。
-
Clustered Failover – 在主存儲和容災存儲間提供高可用性失敗恢復能力,故障接管的決策是由管理員通過單一命令行決定的。
-
SyncMirror – 為遠端存儲提供即時的數據拷貝,當故障接管時,數據可以僅通過遠端的存儲進行訪問。
-
ClusterRemote – 提供管理機制用以判斷災難的發生並初始遠端存儲進行接管。
二、MCC巡檢常用命令
1、系統健康狀態檢查
cluster1::> system health status show Status --------------- ok
2、集群狀態檢查
cluster1::> cluster show Node Health Eligibility --------------------- ------- ------------ cluster1-01 true true cluster1-02 true true 2 entries were displayed.
3、集群統計狀態檢查
cluster1::> cluster statistics show Counter Value Delta ---------------- ----------------- ------------- CPU Busy: 0% - Operations: Total: 0 - NFS: 0 - CIFS: 0 - Data Network: Busy: 0% - Received: 5.78GB - Sent: 13.7GB - Cluster Network: Busy: 0% - Received: 967KB - Sent: 979KB - Storage Disk: Read: 6.38PB - Write: 6.26PB -
4、查看RAID組信息
cluster1::> aggr show Aggregate Size Available Used% State #Vols Nodes RAID Status --------- -------- --------- ----- ------- ------ ---------------- ------------ aggr0_A1 953.8GB 247.3GB 74% online 1 cluster1-01 raid4, mirrored, normal aggr0_A2 953.8GB 247.3GB 74% online 1 cluster1-02 raid4, mirrored, normal aggr_data_A1 68.93TB 16.04TB 77% online 32 cluster1-01 mixed_raid_ type, mirrored, hybrid, normal aggr_data_A2 68.93TB 14.77TB 79% online 31 cluster1-02 mixed_raid_ type, mirrored, hybrid, normal 4 entries were displayed.
5、查看節點信息
cluster1::> node show Node Health Eligibility Uptime Model Owner Location --------- ------ ----------- ------------- ----------- -------- --------------- cluster1-01 true true 369 days 19:12 FAS8040 gz_idc cluster1-02 true true 369 days 19:23 FAS8040 gz_idc 2 entries were displayed.
6、查看版本信息
cluster1::> version NetApp Release 8.3.2P9: Fri Jan 06 05:54:05 UTC 2017
7、查看序列號
cluster1::> system license show Serial Number: 1-80-023992 Owner: cluster1 Package Type Description Expiration ----------------- ------- --------------------- -------------------- Base license Cluster Base License - Serial Number: 1-81-0000000000000451515****** Package Type Description Expiration ----------------- ------- --------------------- -------------------- NFS license NFS License - iSCSI license iSCSI License - Serial Number: 1-81-0000000000000451515****** Owner: cluster1-02 Package Type Description Expiration ----------------- ------- --------------------- -------------------- NFS license NFS License - iSCSI license iSCSI License - 5 entries were displayed.
8、查看子系統健康狀態
cluster1::> system health subsystem show Subsystem Health ----------------- ------------------ SAS-connect ok Environment ok Memory ok Service-Processor ok Switch-Health ok CIFS-NDO ok Motherboard ok IO ok MetroCluster ok MetroCluster_Node ok FHM-Switch ok FHM-Bridge ok 12 entries were displayed.
9、查看MCC集群信息狀態及節點信息狀態
cluster1::> metrocluster show Configuration: fabric Cluster Configuration State Mode ------------------------------ ---------------------- ------------------------ Local: cluster1 configured normal Remote: cluster1_dr configured normal cluster1::> metrocluster node show DR Configuration DR Group Cluster Node State Mirroring Mode ----- ------- ------------------ -------------- --------- -------------------- 1 cluster1 cluster1-01 configured enabled normal cluster1-02 configured enabled normal cluster1_dr cluster1_dr-01 configured enabled normal cluster1_dr-02 configured enabled normal 4 entries were displayed.
10、查看控制器狀態
cluster1::> system controller show Controller Name System ID Serial Number Model Status ------------------------- ------------- ----------------- -------- ----------- cluster1-01 536964819 451515****** FAS8040 ok cluster1-02 536961600 451515****** FAS8040 ok 2 entries were displayed.
11、查看故障硬盤
cluster1::> storage disk show -broken
There are no entries matching your query.
12、查看spare硬盤
cluster1::> storage disk show -spare Original Owner: cluster1-01 Checksum Compatibility: block Usable Physical Disk HA Shelf Bay Chan Pool Type RPM Size Size Owner --------------- ------------ ---- ------ ----- ------ -------- -------- -------- 1.30.11 3a 30 11 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01 1.30.13 3a 30 13 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01 1.31.4 3a 31 4 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01 1.32.20 4b 32 20 B Pool0 SAS 10000 1.09TB 1.09TB cluster1-01 1.32.23 3a 32 23 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01 1.33.0 3a 33 0 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01 1.33.1 3a 33 1 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01 1.33.10 4b 33 10 B Pool0 SAS 10000 1.09TB 1.09TB cluster1-01 2.42.22 3a 42 22 A Pool1 SAS 10000 1.09TB 1.09TB cluster1-01 2.42.23 4b 42 23 B Pool1 SAS 10000 1.09TB 1.09TB cluster1-01 2.43.2 4b 43 2 B Pool1 SAS 10000 1.09TB 1.09TB cluster1-01 2.43.22 3b 43 22 A Pool1 SAS 10000 1.09TB 1.09TB cluster1-01 2.43.23 4b 43 23 B Pool1 SAS 10000 1.09TB 1.09TB cluster1-01 3.11.21 4b 11 21 B Pool0 SSD - 372.4GB 372.6GB cluster1-01 4.20.21 3a 20 21 A Pool1 SSD - 372.4GB 372.6GB cluster1-01 4.21.14 3a 21 14 A Pool1 SAS 10000 1.09TB 1.09TB cluster1-01 Original Owner: cluster1-02 Checksum Compatibility: block Usable Physical Disk HA Shelf Bay Chan Pool Type RPM Size Size Owner --------------- ------------ ---- ------ ----- ------ -------- -------- -------- 2.44.23 3b 44 23 A Pool1 SAS 10000 1.09TB 1.09TB cluster1-02 3.12.21 4a 12 21 B Pool0 SSD - 372.4GB 372.6GB cluster1-02 4.23.21 3b 23 21 A Pool1 SSD - 372.4GB 372.6GB cluster1-02 5.60.23 3b 60 23 B Pool1 SAS 10000 1.09TB 1.09TB cluster1-02 20 entries were displayed.
13、查看SAS橋故障
cluster1::> storage bridge show Is Monitor Bridge Symbolic Name Monitored Status Vendor Model Bridge WWN ------------------------ ------------- --------- ------- ------ --------------------- ---------------- ATTO_10.0.15.17 BRIDGE_B_1 true ok Atto FibreBridge 6500N 2000001086627bc0 ATTO_10.0.15.18 BRIDGE_B_2 true ok Atto FibreBridge 6500N 2000001086630f0e ATTO_10.0.15.19 BRIDGE_B_3 true ok Atto FibreBridge 6500N 2000001086630edc ATTO_10.0.15.20 BRIDGE_B_4 true ok Atto FibreBridge 6500N 2000001086630ed2 ATTO_10.0.15.6 BRIDGE_A_1 true ok Atto FibreBridge 6500N 2000001086630eb4 ATTO_10.0.15.7 BRIDGE_A_2 true ok Atto FibreBridge 6500N 2000001086630efa ATTO_10.0.15.8 BRIDGE_A_3 true ok Atto FibreBridge 6500N 2000001086630f18 ATTO_10.0.15.9 BRIDGE_A_4 true ok Atto FibreBridge 6500N 2000001086630ef0 ATTO_FibreBridge6500N_10 - false - Atto FibreBridge6500N 200000108663e514 ATTO_FibreBridge6500N_11 - false - Atto FibreBridge6500N 200000108663e3f2 ATTO_FibreBridge6500N_12 - false - Atto FibreBridge6500N 200000108663e488 ATTO_FibreBridge6500N_13 - false - Atto FibreBridge6500N 20000010866114ec ATTO_FibreBridge6500N_14 - false - Atto FibreBridge6500N 2000001086627bc0 ATTO_FibreBridge6500N_7 - false - Atto FibreBridge6500N 2000001086630e96 ATTO_FibreBridge6500N_9 - false - Atto FibreBridge6500N 200000108663e4c4 15 entries were displayed.
14、查看纖交換機故障
cluster1::> storage switch show Symbolic Is Monitor Switch Name Vendor Model Switch WWN Monitored Status --------------------- -------- ------- ----- ---------------- --------- ------- Brocade_10.0.15.10 SW_A_1 Brocade Brocade6505 100050eb1a88327f true ok Brocade_10.0.15.11 SW_A_2 Brocade Brocade6505 100050eb1a881582 true ok Brocade_10.0.15.21 SW_B_3 Brocade Brocade6505 100050eb1a882f69 true ok Brocade_10.0.15.22 SW_B_4 Brocade Brocade6505 100050eb1a881522 true ok 4 entries were displayed.
15、查看failover狀態
cluster1::> storage failover show Takeover Node Partner Possible State Description -------------- -------------- -------- ------------------------------------- cluster1-01 cluster1-02 true Connected to cluster1-02 cluster1-02 cluster1-01 true Connected to cluster1-01 2 entries were displayed.
16、查看嚴重告警日志及錯誤告警日志
cluster1::> event log show -severity critical There are no entries matching your query. cluster1::> event log show -severity error Time Node Severity Event ------------------- ---------------- ------------- --------------------------- 3/6/2018 02:28:30 cluster1-02 ERROR asup.post.drop: AutoSupport message (HA Group Notification from cluster1-02 (MANAGEMENT_LOG) INFO) for host (0) was not posted to NetApp. The system will drop the message. 3/6/2018 01:28:18 cluster1-02 ERROR asup.post.drop: AutoSupport message (HA Group Notification from cluster1-02 (PERFORMANCE DATA) INFO) for host (0) was not posted to NetApp. The system will drop the message. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) cluster1, Serial Number 5589765F, Certificate Authority 'cluster1' and type server for Vserver cluster1 has expired. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UC_SVM2, Serial Number 55A03966, Certificate Authority 'SVM2' and type server for Vserver SVM2 has expired. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UC_SVM, Serial Number 559FFD76, Certificate Authority 'SVM' and type server for Vserver SVM has expired. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UCS_SVM_DR, Serial Number 545845C16E278, Certificate Authority 'SVM_DR' and type server for Vserver SVM_DR-mc has expired. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UCS_SVM2_DR, Serial Number 545845A7B01FA, Certificate Authority 'SVM2_DR' and type server for Vserver SVM2_DR-mc has expired. 7 entries were displayed.
17、查看某個聚合下的Volume狀態信息
cluster1::> vol show -aggregate aggr_data_A1
18、查看Lun信息及Lun詳細信息
cluster1::> lun show
cluster1::> lun show -v
19、查看map信息及map詳情
cluster1::> igroup show
cluster1::> igroup show -v
20、查看Lun的map情況
cluster1::> lun show -m
21、進入某一節點
cluster1::> run -node cluster1-01 Type 'exit' or 'Ctrl-D' to return to the CLI cluster1-01>
22、節點下查看spare disks
cluster1-01> vol status -s Local spares Pool1 spare disks RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- Spare disks for block checksum spare SW_B_3:6.126L41 3a 21 14 FC:A 1 SAS 10000 1142352/2339537408 1144641/2344225968 (not zeroed) spare SW_B_3:7.126L75 3a 42 22 FC:A 1 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_B_3:7.126L101 3b 43 22 FC:A 1 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_B_4:7.126L76 4b 42 23 FC:B 1 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_B_4:7.126L29 4b 43 2 FC:B 1 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_B_4:7.126L50 4b 43 23 FC:B 1 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_B_3:6.126L22 3a 20 21 FC:A 1 SSD N/A 381304/780910592 381554/781422768 Pool0 spare disks RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- Spare disks for block checksum spare SW_A_1:7.126L12 3a 30 11 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_A_1:7.126L14 3a 30 13 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_A_1:7.126L31 3a 31 4 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_A_1:7.126L76 3a 32 23 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_A_1:7.126L79 3a 33 0 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_A_1:7.126L80 3a 33 1 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_A_2:7.126L73 4b 32 20 FC:B 0 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_A_2:7.126L37 4b 33 10 FC:B 0 SAS 10000 1142352/2339537408 1144641/2344225968 spare SW_A_2:6.126L74 4b 11 21 FC:B 0 SSD N/A 381304/780910592 381554/781422768
23、節點下查看fail disk
cluster1-01> vol status -f Broken disks (empty)
24、顯示沒有ownership(歸屬權)的硬盤
cluster1-01> disk show -n disk show : No unassigned disks
25、分配硬盤的歸屬(硬盤更換常用)
cluster1-01> disk assign all
26、查看所有硬盤位置信息
cluster1-01> storage show disk -p