案例場景:
如圖所示,7609-1和7609-2分別是網絡中的核心設備,起了HSRP,7609-1連接的是WLC-1,,7609-2連接的是WLC-2,WLC1和WLC2的RP口相互連接。
WLC的管理地址為192.168.53.1/24,而RMI地址分別為192.168.53.3和192.168.53.4.
關鍵知識:RMI和RP
Redundancy Management Interface
The IP address on this interface should be configured in the same subnet as the management interface. This interface will check the health of the Active WLC via network infrastructure once the Active WLC does not respond to Keepalive messages on the Redundant Port. This provides an additional health check of the network and Active WLC, and confirms if switchover should or should not be executed. Also, the Standby WLC uses this interface in order to source ICMP ping packets to check gateway reachability. This interface is also used in order to send notifications from the Active WLC to the Standby WLC in the event of Box failure or Manual Reset. The Standby WLC will use this interface in order to communicate to Syslog, the NTP server, and the TFTP server for any configuration upload.
Redundancy Port
This interface has a very important role in the new HA architecture. Bulk configuration during boot up and incremental configuration are synced from the Active WLC to the Standby WLC using the Redundant Port. WLCs in a HA setup will use this port to perform HA role negotiation. The Redundancy Port is also used in order to check peer reachability sending UDP keep-alive messages every 100 msec (default timer) from the Standby WLC to the Active WLC. Also, in the event of a box failure, the Active WLC will send notification to the Standby WLC via the Redundant Port. If the NTP server is not configured, a manual time sync is performed from the Active WLC to the Standby WLC on the Redundant Port. This port in case of standalone controller and redundancy VLAN in case of WISM-2 will be assigned an auto generated IP Address where last 2 octets are picked from the last 2 octets of Redundancy Management Interface (the first 2 octets are always 169.254).
故障情況:
在7609上出現資源占用100%的情況(例如CPU),無線的業務流量受到了影響無法正常使用。管理流量應該也受到了影響,可能導致了WLC之間RMI通信可能出現了問題,從而導致主備之間的HA狀態有所異常。可能發生了SSO切換。
為了緩解核心的問題,將7609設備重啟,重啟之后,兩台連接WLC的板卡都down了,於是,主設備發現所有上行接口全部down掉,無法通過網關和備份設備的RMI接口通信,自己直接進入維護模式,且所有的接口處於管理down的狀態,所以mgmt的ip不通,最后臨時使用備機先維持無線網絡,等待7609的板卡更換。
故障恢復:
7609的板卡更換完畢,下一步嘗試恢復WLC HA。
在更換7609的時候嘗試重啟過主WLC(未連接任何線纜),最終還是進入了維護模式,因為所有的端口(port)都是down的。
恢復步驟:
1、嘗試主WLC在獨立狀態下是否可以正常啟動。關閉SSO(config redundancy mode disable),然后重啟設備(reset system),設備重啟后,發現可以正常進入。配置也在。
2、嘗試恢復WLC HA。將備機斷電,並恢復之前的連線,連接到7609-2。
3、在主WLC開啟port(config port adminmode all enable),然后開啟SSO(config redundancy mode sso)。
4、立即將主WLC連接7609-1的線纜恢復,之后立即開啟備WLC的電源。
5、觀察主備WLC的啟動和協商過程。
6、檢查WLC HA情況,檢查AP join情況,業務情況等。
Other:在備用設備正常的情況下,可以直接將主WLC恢復連接,然后重新啟動,這樣主WLC應該會進入standb-hot的模式,后續在Active設備上輸入 redundancy force-switchover 手動切換一下。