參考http://www.just4coding.com/2020/04/20/vxlan-fdb/
基於BGP EVPN的VXLAN通信實踐
http://www.just4coding.com/2020/04/26/vxlan-evpn/
下面我們通過實例來手動更新FDB表來實現VXLAN通信。我們的實驗環境如下圖, VTEP本地使用Linux bridge來掛載連接到network namespace中的veth pair
虛擬網卡,我們要實現3.3.3.3
二層訪問3.3.3.4
。
分別兩台主機執行命令構建環境,Host1上的命令如下:
1 |
sysctl -w net.ipv4.ip_forward=1 |
在創建VXLAN設備時指定了nolearning
來禁用源地址學習。在Host2中修改相應IP同樣進行配置:
1 |
sysctl -w net.ipv4.ip_forward=1 |
在Host1的ns1
中訪問Host2中的3.3.3.4
, 此時無法連通:
[root@bogon ~]# ip netns exec ns1 ping -c2 3.3.3.4 PING 3.3.3.4 (3.3.3.4) 56(84) bytes of data. From 3.3.3.3 icmp_seq=1 Destination Host Unreachable From 3.3.3.3 icmp_seq=2 Destination Host Unreachable --- 3.3.3.4 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1058ms pipe 2 [root@bogon ~]#
[root@bogon ~]# tcpdump -i vxlan100 arp -env tcpdump: listening on vxlan100, link-type EN10MB (Ethernet), capture size 262144 bytes 09:15:25.874319 3a:c1:45:47:6a:53 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 3.3.3.4 tell 3.3.3.3, length 28 09:15:26.932482 3a:c1:45:47:6a:53 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 3.3.3.4 tell 3.3.3.3, length 28 09:15:27.972485 3a:c1:45:47:6a:53 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 3.3.3.4 tell 3.3.3.3, length 28
root@ubuntu:/home/ubuntu# ip netns exec ns1 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if442: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 6e:5c:e1:bb:30:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 3.3.3.4/24 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::6c5c:e1ff:febb:3082/64 scope link valid_lft forever preferred_lft forever root@ubuntu:/home/ubuntu#
[root@bogon ~]# ip netns exec ns1 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 3a:c1:45:47:6a:53 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 3.3.3.3/24 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::38c1:45ff:fe47:6a53/64 scope link valid_lft forever preferred_lft forever [root@bogon ~]#
[root@bogon ~]# bridge fdb append 6e:5c:e1:bb:30:82 dev vxlan100 dst 192.168.33.16 [root@bogon ~]#
root@ubuntu:/home/ubuntu# bridge fdb append 3a:c1:45:47:6a:53 dev vxlan100 dst 192.168.33.15 root@ubuntu:/home/ubuntu#
[root@bogon ~]# ip netns exec ns1 ping -c2 3.3.3.4 PING 3.3.3.4 (3.3.3.4) 56(84) bytes of data. ^C --- 3.3.3.4 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1017ms [root@bogon ~]#
再添加一個fdb項
root@ubuntu:/home/ubuntu# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 192.168.33.15 root@ubuntu:/home/ubuntu#
[root@bogon ~]# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 192.168.33.16 [root@bogon ~]#
全零表項表示沒有匹配的MAC地址時,就發送到該表項中的VTEP, 用於處理BUM流量。
可以訪問了
[root@bogon ~]# ip netns exec ns1 ping -c2 3.3.3.4 PING 3.3.3.4 (3.3.3.4) 56(84) bytes of data. 64 bytes from 3.3.3.4: icmp_seq=1 ttl=64 time=0.542 ms 64 bytes from 3.3.3.4: icmp_seq=2 ttl=64 time=0.136 ms --- 3.3.3.4 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1038ms rtt min/avg/max/mdev = 0.136/0.339/0.542/0.203 ms [root@bogon ~]#
[root@bogon ~]# bridge fdb show brport vxlan100 66:7c:ea:83:54:db master br1 6e:5c:e1:bb:30:82 master br1 3a:01:87:9f:cb:e3 vlan 1 master br1 permanent 3a:01:87:9f:cb:e3 master br1 permanent 00:00:00:00:00:00 dst 192.168.33.16 self permanent 6e:5c:e1:bb:30:82 dst 192.168.33.16 self permanent [root@bogon ~]#
[root@bogon ~]# ip netns exec ns1 ip n 3.3.3.4 dev eth0 lladdr 6e:5c:e1:bb:30:82 STALE [root@bogon ~]# ip netns exec ns1 ip n del 3.3.3.4 dev eth0 lladdr 6e:5c:e1:bb:30:82 --------刪掉neighbor項 [root@bogon ~]# ip netns exec ns1 ip n [root@bogon ~]# ip netns exec ns1 ping -c2 3.3.3.4 PING 3.3.3.4 (3.3.3.4) 56(84) bytes of data. 64 bytes from 3.3.3.4: icmp_seq=1 ttl=64 time=0.521 ms 64 bytes from 3.3.3.4: icmp_seq=2 ttl=64 time=0.129 ms --- 3.3.3.4 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1048ms rtt min/avg/max/mdev = 0.129/0.325/0.521/0.196 ms [root@bogon ~]#
[root@bogon ~]# tcpdump -i vxlan100 arp -env ------------------- arp報文 tcpdump: listening on vxlan100, link-type EN10MB (Ethernet), capture size 262144 bytes 09:51:52.604485 3a:c1:45:47:6a:53 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 3.3.3.4 tell 3.3.3.3, length 28 09:51:52.604698 6e:5c:e1:bb:30:82 > 3a:c1:45:47:6a:53, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 3.3.3.4 is-at 6e:5c:e1:bb:30:82, length 28 09:51:57.818120 6e:5c:e1:bb:30:82 > 3a:c1:45:47:6a:53, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 3.3.3.3 tell 3.3.3.4, length 28 09:51:57.818161 3a:c1:45:47:6a:53 > 6e:5c:e1:bb:30:82, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 3.3.3.3 is-at 3a:c1:45:47:6a:53, length 28
如果我們能獲取MAC所在的VTEP,則可由VXLAN設備實現ARP代答,將ARP廣播范圍控制在本地,避免ARP廣播請求發送到整個VXLAN網絡環境中。Linux VXLAN設備支持通過proxy
參數開啟ARP代答。
root@ubuntu:/home/ubuntu# vi vxlan2.sh ip link del vxlan100 ip link add vxlan100 type vxlan id 100 dstport 4789 local 192.168.33.16 nolearning proxy ip link set vxlan100 master br1 ip link set up vxlan100 bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 192.168.33.15 bridge fdb append 3a:c1:45:47:6a:53 dev vxlan100 dst 192.168.33.15
[root@bogon ~]# vi vxlan2.sh 1 ip link del vxlan100 2 ip link add vxlan100 type vxlan id 100 dstport 4789 local 192.168.33.15 nolearning proxy 3 ip link set vxlan100 master br1 4 ip link set up vxlan100 5 bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 192.168.33.16 6 bridge fdb append 6e:5c:e1:bb:30:82 dev vxlan100 dst 192.168.33.16
[root@bogon ~]# ip netns exec ns1 ping -c2 3.3.3.4 PING 3.3.3.4 (3.3.3.4) 56(84) bytes of data. From 3.3.3.3 icmp_seq=1 Destination Host Unreachable From 3.3.3.3 icmp_seq=2 Destination Host Unreachable --- 3.3.3.4 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1037ms pipe 2 [root@bogon ~]# [root@bogon ~]# tcpdump -i vxlan100 arp -env tcpdump: listening on vxlan100, link-type EN10MB (Ethernet), capture size 262144 bytes 10:02:08.774777 3a:c1:45:47:6a:53 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 3.3.3.4 tell 3.3.3.3, length 28 10:02:09.812478 3a:c1:45:47:6a:53 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 3.3.3.4 tell 3.3.3.3, length 28 10:02:10.852463 3a:c1:45:47:6a:53 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 3.3.3.4 tell 3.3.3.3, length 28
vxlan設備代答arp
ip neighbor add 3.3.3.4 lladdr 6e:5c:e1:bb:30:82 dev vxlan100
ip neighbor add 3.3.3.3 lladdr 3a:c1:45:47:6a:53 dev vxlan100
10:06:28.394594 3a:c1:45:47:6a:53 > Broadcast, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 3.3.3.4 tell 3.3.3.3, length 28 10:06:28.394608 6e:5c:e1:bb:30:82 > 3a:c1:45:47:6a:53, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 3.3.3.4 is-at 6e:5c:e1:bb:30:82, length 28
6e:5c:e1:bb:30:82不是vxlan100的地址,是3.3.3.4的mac地址
[root@bogon ~]# ip a show vxlan100 21: vxlan100: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1 state UNKNOWN group default qlen 1000 link/ether da:96:9f:d2:0c:5e brd ff:ff:ff:ff:ff:ff inet6 fe80::d896:9fff:fed2:c5e/64 scope link valid_lft forever preferred_lft forever [root@bogon ~]#
Linux的VXLAN設備還支持對於表項匹配MISS的消息通知。內核在發現在ARP或者FDB表項中找不到相應的表項,則可以通過NETLINK消息發送通知,用戶態進程可以監聽相應消息並補充所缺失的表項記錄,從而實現動態的表項維護。VXLAN設備支持兩種消息:
-
L2MISS: VXLAN設備在FDB表中找不到目的MAC地址所屬的VTEP IP地址。L2MISS消息的發送需要滿足如下條件:
- 目的MAC地址未知,即在FDB表中沒有相應條項
- FDB表中沒有全零表項
- 目的MAC地址不是組播或多播地址
-
L3MISS: VXLAN設備在ARP表中找不到目的IP所對應的MAC地址
我們在bogon上刪除vxlan100
,重新添加開啟l2miss
和l3miss
的vxlan100
接口:
[root@bogon ~]# vi vxlan3.sh 1 ip link del vxlan100 2 ip link add vxlan100 type vxlan id 100 dstport 4789 local 192.168.33.15 nolearning proxy l2miss l3miss 3 ip link set vxlan100 master br1 4 ip link set up vxlan100
[root@bogon ~]# ip netns exec ns1 ping -c2 3.3.3.4 PING 3.3.3.4 (3.3.3.4) 56(84) bytes of data. From 3.3.3.3 icmp_seq=1 Destination Host Unreachable From 3.3.3.3 icmp_seq=2 Destination Host Unreachable --- 3.3.3.4 ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1077ms pipe 2 [root@bogon ~]#
[root@bogon ~]# ip monitor all dev vxlan100 [NEIGH][NEIGH]miss 3.3.3.4 STALE [NEIGH][NEIGH]lladdr 22:e3:36:8e:f1:d9 REACHABLE [NEIGH][NEIGH]miss 3.3.3.4 STALE [NEIGH]miss 3.3.3.4 STALE
[root@bogon ~]# bridge fdb show brport vxlan100 0a:c5:33:d1:e9:e7 vlan 1 master br1 permanent 0a:c5:33:d1:e9:e7 master br1 permanent 22:e3:36:8e:f1:d9 master br1 [root@bogon ~]#
root@ubuntu:/home/ubuntu# bridge fdb show brport vxlan100 22:e3:36:8e:f1:d9 vlan 1 master br1 permanent 22:e3:36:8e:f1:d9 master br1 permanent 00:00:00:00:00:00 dst 192.168.33.15 self permanent 3a:c1:45:47:6a:53 dst 192.168.33.15 self permanent root@ubuntu:/home/ubuntu#
[root@bogon ~]# ip n show dev vxlan100 [root@bogon ~]# ip neighbor add 3.3.3.4 lladdr 6e:5c:e1:bb:30:82 dev vxlan100 nud reachable [root@bogon ~]# ip n show dev vxlan100 3.3.3.4 lladdr 6e:5c:e1:bb:30:82 REACHABLE [root@bogon ~]#
[root@bogon ~]# ip monitor all dev vxlan100 [NEIGH][NEIGH]miss 3.3.3.4 STALE [NEIGH][NEIGH]lladdr 22:e3:36:8e:f1:d9 REACHABLE [NEIGH][NEIGH]miss 3.3.3.4 STALE [NEIGH]miss 3.3.3.4 STALE [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]3.3.3.4 lladdr 6e:5c:e1:bb:30:82 REACHABLE [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]3.3.3.4 lladdr 6e:5c:e1:bb:30:82 STALE
[root@bogon ~]# ip neighbor replace 3.3.3.4 lladdr 6e:5c:e1:bb:30:82 dev vxlan100 nud reachable [root@bogon ~]# [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]3.3.3.4 lladdr 6e:5c:e1:bb:30:82 REACHABLE [NEIGH][NEIGH][NEIGH]Deleted lladdr 22:e3:36:8e:f1:d9 STALE
[root@bogon ~]# ip neighbor replace 3.3.3.4 lladdr 6e:5c:e1:bb:30:82 dev vxlan100 nud reachable [root@bogon ~]# ip netns exec ns1 ping -c2 3.3.3.4 PING 3.3.3.4 (3.3.3.4) 56(84) bytes of data. ^C --- 3.3.3.4 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1047ms [root@bogon ~]# bridge fdb append 6e:5c:e1:bb:30:82 dev vxlan100 dst 192.168.33.16 [root@bogon ~]# ip netns exec ns1 ping -c2 3.3.3.4 PING 3.3.3.4 (3.3.3.4) 56(84) bytes of data. 64 bytes from 3.3.3.4: icmp_seq=1 ttl=64 time=0.332 ms 64 bytes from 3.3.3.4: icmp_seq=2 ttl=64 time=0.127 ms --- 3.3.3.4 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1047ms rtt min/avg/max/mdev = 0.127/0.229/0.332/0.103 ms [root@bogon ~]#
[root@bogon ~]# ip monitor all dev vxlan100 [NEIGH][NEIGH]miss 3.3.3.4 STALE ------------------- L3MISS的消息,通過配置neighbor [NEIGH][NEIGH]lladdr 22:e3:36:8e:f1:d9 REACHABLE [NEIGH][NEIGH]miss 3.3.3.4 STALE [NEIGH]miss 3.3.3.4 STALE [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]3.3.3.4 lladdr 6e:5c:e1:bb:30:82 REACHABLE [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]3.3.3.4 lladdr 6e:5c:e1:bb:30:82 STALE [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]3.3.3.4 lladdr 6e:5c:e1:bb:30:82 REACHABLE [NEIGH][NEIGH][NEIGH]Deleted lladdr 22:e3:36:8e:f1:d9 STALE [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]3.3.3.4 lladdr 6e:5c:e1:bb:30:82 STALE [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]3.3.3.4 lladdr 6e:5c:e1:bb:30:82 REACHABLE [NEIGH]lladdr 6e:5c:e1:bb:30:82 REACHABLE [NEIGH]miss lladdr 6e:5c:e1:bb:30:82 STALE -----------------------------通過配置 fdb [NEIGH]miss lladdr 6e:5c:e1:bb:30:82 STALE [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]3.3.3.4 lladdr 6e:5c:e1:bb:30:82 STALE [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]Deleted ff02::1:ffd1:e9e7 lladdr 33:33:ff:d1:e9:e7 NOARP [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]Deleted ff02::16 lladdr 33:33:00:00:00:16 NOARP [NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH][NEIGH]??? lladdr 6e:5c:e1:bb:30:82 NOARP,PERMANENT [NEIGH]lladdr 6e:5c:e1:bb:30:82 NOARP,PERMANENT