docker網絡之overlay


使用docker network的好處是:在同一個網絡上的容器之間可以相互通信,而無需使用expose端口特性

本文使用docker-swarm進行overlay網絡的分析。使用2個vmware模擬2個node節點,不關注swarm的使用,無關信息會有所刪減,如不相關的接口或端口

  • 將node1作為master,在node1上執行init swarm,初始化swarm環境
# docker swarm init

啟動docker swarm之后可以在host上看到啟動了2個端口:2377和7946,2377作為cluster管理端口,7946用於節點發現。swarm的overlay network會用到3個端口,由於此時沒有創建overlay network,故沒有4789端口(注:4789端口號為IANA分配的vxlan的UDP端口號)。官方描述參見use overlay network

# netstat -ntpl
tcp6       0      0 :::2377                 :::*                    LISTEN      3333/dockerd-curren 
tcp6       0      0 :::7946                 :::*                    LISTEN      3333/dockerd-curren
  • 設置node2加入swarm,同時該節點上也會打開一個7946端口,與swarm服務端通信。
docker swarm join --token SWMTKN-1-62iqlof4q9xj2vmlwlm2s03xfncq6v9urgysg96f4npe2qeuac-4iymsra0xv4f7ujtg1end3qva 192.168.80.130:2377
  • 在node1上查看節點信息,可以看到2個節點信息,即node1和node2
# docker node ls 
ID                           HOSTNAME               STATUS  AVAILABILITY  MANAGER STATUS
50u2a7anjo59k5yw4p4namzv4    localhost.localdomain  Ready   Active        
qetcamqa2xgk5rvpmhf03nxu1 *  localhost.localdomain  Ready   Active        Leader

查看網絡信息,可以發現新增了如下網絡,docker_gwbridge和ingress,前者提供通過bridge方式提供容器與host的通信,后者在默認情況下提供通過overlay方式與其他容器跨host通信

# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
6405d33c608c        docker_gwbridge     bridge              local
lgcns0epsksl        ingress             overlay             swarm
  • 在node1創建一個自定義的overlay網絡
docker network create -d overlay --attachable my-overlay 
  • 在node1上創建一個連接到my-overlay的容器
# docker run -itd --network=my-overlay --name=CT1 centos /bin/sh

在node2上創建連接到my-overlay的容器

# docker run -itd --network=my-overlay --name=CT2 centos /bin/sh

  在CT2上ping CT1的地址,可以ping通

sh-4.2# ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.20 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.908 ms

  查看node2上CT2容器的網絡信息,可以看到overlay網絡的接口為eth0,它對應的對端網卡編號為26;eth1對應的對端網卡編號為28,該網卡連接的網橋就是docker_gwbridge

sh-4.2# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
25: eth0@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:00:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.3/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe00:3/64 scope link 
       valid_lft forever preferred_lft forever
27: eth1@if28: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 172.18.0.3/16 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:3/64 scope link 
       valid_lft forever preferred_lft forever
sh-4.2# ip route default via 172.18.0.1 dev eth1 10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.3 172.18.0.0/16 dev eth1 proto kernel scope link src 172.18.0.3

  CT2的eth1對應的網卡即為host上的vethd4eadd9,非overlay的報文使用該bridge轉發,轉發流程參見docker網絡之bridge

# ip link show master docker_gwbridge
23: veth5d40a11@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP mode DEFAULT group default 
    link/ether 6a:4a:2a:95:dc:73 brd ff:ff:ff:ff:ff:ff link-netnsid 1
28: vethd4eadd9@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP mode DEFAULT group default 
    link/ether d6:90:57:a6:44:e0 brd ff:ff:ff:ff:ff:ff link-netnsid 3

 overlay的網絡通過CT2容器的eth0流出,那eth0對應的對端網卡在哪里?由於CT2連接到名為my-overlay的網絡,在/var/run/docker/netns下查看該網絡對應的namespace(1-9gtpq8ds3g),可以看到eth0對應該my-overlay的veth2,且它們連接到bridge br0

# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
2cd167c9fb17        bridge              bridge              local
d84c003d86f9        docker_gwbridge     bridge              local
e8476b504e33        host                host                local
lgcns0epsksl        ingress             overlay             swarm
9gtpq8ds3gsu        my-overlay          overlay             swarm
96a70c1a9516        none                null                local

# ll
total 0
-r--r--r--. 1 root root 0 Nov 25 08:05 1-9gtpq8ds3g
-r--r--r--. 1 root root 0 Nov 25 08:05 1-lgcns0epsk
-r--r--r--. 1 root root 0 Nov 25 08:05 dd845ad1c97a
-r--r--r--. 1 root root 0 Nov 25 08:05 ingress_sbox

# nsenter --net=1-9gtpq8ds3g ip a 
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 8a:11:66:30:16:aa brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/24 scope global br0
       valid_lft forever preferred_lft forever
    inet6 fe80::6823:3dff:fe77:c01c/64 scope link 
       valid_lft forever preferred_lft forever
24: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN group default 
    link/ether 8a:11:66:30:16:aa brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::8811:66ff:fe30:16aa/64 scope link 
       valid_lft forever preferred_lft forever
26: veth2@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP group default 
    link/ether fe:41:38:d6:00:c6 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::fc41:38ff:fed6:c6/64 scope link 
       valid_lft forever preferred_lft forever
# nsenter --net=1-9gtpq8ds3g ip link show master br0
24: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN mode DEFAULT group default 
    link/ether 8a:11:66:30:16:aa brd ff:ff:ff:ff:ff:ff link-netnsid 0
26: veth2@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP mode DEFAULT group default 
    link/ether fe:41:38:d6:00:c6 brd ff:ff:ff:ff:ff:ff link-netnsid 1
# nsenter --net=1-9gtpq8ds3g ip route 
10.0.0.0/24 dev br0 proto kernel scope link src 10.0.0.1 

  這樣overlay在CT2上的報文走向如下,所有的容器使用bridge方式直接連接在默認的docker_gwbridge上,而overlay方式通過在my-overlay上的br0進行轉發。這樣做的好處是,當在一個host上同時運行多個容器的時候,僅需要一個vxlan的udp端口即可,所有的vxlan流量由br0轉發。

 

  br0的vxlan(該vxlan是由swarm創建的)會在host上開啟一個4789的端口,報文過該端口進行跨主機傳輸

# netstat -anup
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name       
udp        0      0 0.0.0.0:4789            0.0.0.0:*  

  在CT2上ping CT1,並在node2上開啟tcpdump抓包,可以看到有vxlan的報文

# tcpdump -i ens33 udp and port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes
16:31:41.604930 IP localhost.localdomain.57170 > 192.168.80.130.4789: VXLAN, flags [I] (0x08), vni 4097
IP 10.0.0.3 > 10.0.0.2: ICMP echo request, id 138, seq 1, length 64
16:31:41.606434 IP 192.168.80.130.45035 > localhost.localdomain.4789: VXLAN, flags [I] (0x08), vni 4097
IP 10.0.0.2 > 10.0.0.3: ICMP echo reply, id 138, seq 1, length 64

CT2 ping CT1的ping報文如下,可以看到外層為node2節點的地址,通過udp目的端口上送到node1的vxlan端口處理

實現一個基於自定義的overlay網絡

組網如下,2個node節點上分別創建一個bridge和一個netns,使用單播方式指定對端node(vxlan的多播方式參見linux 上實現 vxlan 網絡)。

  • node1配置如下(端口最好使用IANA指定的端口4789,其他端口wireshark無法解析):
ip netns add netns1
ip link add my-br type bridge
ip link add veth_C type veth peer name veth_H
ip link add vxlan100 type vxlan id 100 dstport 4789 remote 192.168.80.137 local 192.168.80.134 dev ens33

ip link set veth_C netns netns1
ip netns exec netns1 ip addr add 1.1.1.1/24 dev veth_C
ip netns exec netns1 ip link set dev veth_C up

ip link set dev my-br up
ip link set dev vxlan100 up
ip link set dev veth_H up
ip link set dev veth_H master my-br
ip link set dev vxlan100 master my-br
  • node2配置如下:
ip netns add netns2
ip link add my-br type bridge
ip link add veth_C type veth peer name veth_H
ip link add vxlan100 type vxlan id 100 dstport 4789 remote 192.168.80.134 local 192.168.80.137 dev ens33 

ip link set veth_C netns netns2
ip netns exec netns2 ip addr add 1.1.1.2/24 dev veth_C
ip netns exec netns2 ip link set dev veth_C up

ip link set dev my-br up
ip link set dev vxlan100 up
ip link set dev veth_H up
ip link set dev veth_H master my-br
ip link set dev vxlan100 master my-br

這樣在node2的netns2上ping node1的netns1就可以ping通了

# ip netns exec netns2 ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.342 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=64 time=0.460 ms

 TIPS:

可以使用ip -d link show的方式查看接口的詳細信息,如在node2上查看vxlan100的結果如下:

# ip -d link show vxlan100
9: vxlan100: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master my-br state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether a6:53:2a:b4:8f:d7 brd ff:ff:ff:ff:ff:ff promiscuity 1 
    vxlan id 100 remote 192.168.80.134 local 192.168.80.137 dev ens33 srcport 0 0 dstport 4789 ageing 300 noudpcsum noudp6zerocsumtx noudp6zerocsumrx 
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8002 port_no 0x2 designated_port 32770 designated_cost 0 designated_bridge 8000.a6:53:2a:b4:8f:d7 designated_root 8000.a6:53:2a:b4:8f:d7 hold_timer    0.00 message_age_timer    0.00 forward_delay_timer    0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
  • 使用bridge -s fdb show br 可以顯示bridge的mac轉發表,在node2上查看my-br的mac轉發表如下:

第一列表示目的mac地址,全0的mac地址表示默認mac地址,類似默認路由;第三列表示達到目的mac的出接口,后面self表示自身的意思,即該條表項類似到達loopback接口的路由,permanent表示該表項是永久的。最后一條默認mac轉發表中可以看到,到達node1(192.168.80.138)的出接口為vxlan100,對應設備接口為ens33(該設置是在創建vxlan接口時創建的)。used表示該表項使用/更新的時間。

# bridge -s fdb show br my-br
33:33:00:00:00:01 dev my-br self permanent
01:00:5e:00:00:01 dev my-br self permanent
33:33:ff:54:9c:74 dev my-br self permanent
e6:2f:83:c4:88:e0 dev veth_H used 7352/7352 master my-br permanent
e6:2f:83:c4:88:e0 dev veth_H vlan 1 used 7352/7352 master my-br permanent
33:33:00:00:00:01 dev veth_H self permanent
01:00:5e:00:00:01 dev veth_H self permanent
33:33:ff:c4:88:e0 dev veth_H self permanent
a6:53:2a:b4:8f:d7 dev vxlan100 used 6060/6060 master my-br permanent
a6:53:2a:b4:8f:d7 dev vxlan100 vlan 1 used 6060/6060 master my-br permanent
00:00:00:00:00:00 dev vxlan100 dst 192.168.80.134 via ens33 used 3156/6092 self permanent
  •  特別注意:在自建overlay的場景下,最好關閉防火牆,否則報文可能無法上送到vxlan端口,也可以使用firewall-cmd命令單獨開放vxlan端口,如下開放vxlan的4789端口
firewall-cmd --zone=public --add-port=4789/udp --permanent
firewall-cmd --reload

 

參考:

https://docs.docker.com/v17.12/network/overlay/#create-an-overlay-network

https://www.securitynik.com/2016/12/docker-networking-internals-container.html

http://blog.nigelpoulton.com/demystifying-docker-overlay-networking/

https://github.com/docker/labs/blob/master/networking/concepts/06-overlay-networks.md

https://cumulusnetworks.com/blog/5-ways-design-container-network/

https://neuvector.com/network-security/docker-swarm-container-networking/

http://man7.org/linux/man-pages/man8/bridge.8.html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM