排查問題思路
一般出現這種狀況都是網卡mac地址錯誤引起的!要么網卡配置文件中的mac地址不對,要么/etc/udev/rules.d/70-persistent-net.rules文件中的mac地址不對!!!
問題現象描述
- bond網卡地址ping不通;
- 交換機側看對應端口狀態如下(無關信息省略)
<CL202-R04F02-H3CS7610-SW01>display interface Ten-GigabitEthernet 1/2/0/4
Ten-GigabitEthernet1/2/0/4
Current state: UP
Line protocol state: UP
IP packet frame type: Ethernet II, hardware address: 7057-bf25-8a00
......
<CL202-R04F02-H3CS7610-SW01>display interface Ten-GigabitEthernet 2/2/0/4
Ten-GigabitEthernet2/2/0/4
Current state: UP
Line protocol state: DOWN(LAGG)
IP packet frame type: Ethernet II, hardware address: 7057-bf24-b800
......
- 在配置bond的兩張網卡上執行
ifconfig eth2 up
和ifconfig eth3 up
都報類似的錯:eth2: unknown interface: No such device;
故障分析定位
- 從故障現象描述第3條手動UP網卡的報錯信息以及交換機側看對應端口的信息,基本可以排除是交換機側的故障和物理鏈路故障,主要排查服務器側的故障;一般此問題是服務器網卡的MAC地址不對造成的。
故障排查過程
- 查看網卡
如下,我們可以看到系統中有4張網卡,eth0、eth1、eth2和eth3:
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# ll ifcfg-*
-rw-r--r--. 1 root root 196 Mar 23 15:34 ifcfg-bond0
-rw-r--r-- 1 root root 328 Mar 23 21:02 ifcfg-eth0
-rw-r--r--. 1 root root 212 Mar 23 15:30 ifcfg-eth1
-rw-r--r-- 1 root root 117 May 7 16:58 ifcfg-eth2
-rw-r--r-- 1 root root 117 May 7 16:58 ifcfg-eth3
-rw-r--r--. 1 root root 254 Apr 27 2018 ifcfg-lo
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
- 查看
/etc/udev/rules.d/70-persistent-net.rules
文件內容如下
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# more /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.
# PCI device 0x8086:0x1521 (igb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="50:af:73:2e:5c:37", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
# PCI device 0x8086:0x1521 (igb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="50:af:73:2e:5c:38", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:c5:a8:28", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:c5:a8:29", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:a3:ac:49", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:a3:ac:48", ATTR{type}=="1", KERNEL=="eth*", NAME="eth5"
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
- 發現的問題:在網卡配置文件目錄下只有eth0、eth1、eth2和eth3這4張網卡,但是在/etc/udev/rules.d/70-persistent-net.rules文件中發現竟然多了eth4和eth5這2張網卡;並且查看eth2和eth3網卡配置文件時發現其mac地址和/etc/udev/rules.d/70-persistent-net.rules文件中顯示的eth2和eth3文件的mac地址不一樣;eth2和eth3配置文件內容如下:
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat ifcfg-eth2
DEVICE="eth2"
#HWADDR="6c:92:bf:c5:a8:28"
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NAME="eth2"
MASTER=bond0
SLAVE=yes
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat ifcfg-eth3
DEVICE="eth3"
#HWADDR="6c:92:bf:c5:a8:29"
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NAME="eth3"
MASTER=bond0
SLAVE=yes
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
-
可以從以上信息看出,eth2和eth3網卡配置文件中的mac地址和/etc/udev/rules.d/70-persistent-net.rules中eth2和eth3中的mac地址不一樣;
-
遠程登錄IPMI查看主機mac地址信息如下圖:
-
從上述信息可以判定配置文件中eth2和eth3的mac地址信息是錯的
造成mac地址錯誤的原因
之前這台設備報修過,更換過網卡文件,所以網卡的mac地址變了;但是/etc/udev/rules.d/70-persistent-net.rules和網卡配置文件中eth2和eth3的mac地址沒有對應更新,而是異常新增了並不存在的eth4和eth5網卡,而實際的bond配置還是使用的eth2和eth3網卡,所以網絡異常,手動UP網卡報錯unknown interface: No such device。
解決辦法
修改網卡配置文件和/etc/udev/rules.d/70-persistent-net.rules,修改后正確配置如下:
- /etc/udev/rules.d/70-persistent-net.rules
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.
# PCI device 0x8086:0x1521 (igb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="50:af:73:2e:5c:37", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
# PCI device 0x8086:0x1521 (igb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="50:af:73:2e:5c:38", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:a3:ac:48", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"
# PCI device 0x8086:0x37d3 (i40e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="6c:92:bf:a3:ac:49", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
- ifcfg-eth2
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat ifcfg-eth2
DEVICE="eth2"
#HWADDR="6c:92:bf:a3:ac:48"
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NAME="eth2"
MASTER=bond0
SLAVE=yes
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
- ifcfg-eth3
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]# cat ifcfg-eth3
DEVICE="eth3"
#HWADDR="6c:92:bf:a3:ac:49"
ONBOOT=yes
BOOTPROTO=none
TYPE=Ethernet
NAME="eth3"
MASTER=bond0
SLAVE=yes
[root@CL202-R04F05-NRGL-MONGODB-SLAVE-NF5180M5-SV01 network-scripts]#
關鍵最后一步 --- 重啟主機
修改配置文件后,嘗試過重啟網卡,但是依舊未成功,所以嘗試了重啟主機后世界豁然開朗,網絡馬上ojbk。
注:沒修改mac地址之前重啟網卡也是無效的。