docker網絡之bridge


建議閱讀本文章之前了解一下文章,本文不作bridge的基本介紹

https://blog.csdn.net/u014027051/article/details/53908878/ 

http://williamherry.blogspot.com/2012/05/linux.html

https://tonybai.com/2016/01/15/understanding-container-networking-on-single-host/

 

linux bridge

  • 創建2個netns
ip netns add ns0
ip netns add ns1
  • 為每個netns各添加一個網卡,類型為veth
ip link add veth0_ns0 type veth peer name veth_ns0
ip link add veth0_ns1 type veth peer name veth_ns1
ip link set
veth0_ns0 netns ns0
ip link set veth0_ns1 netns ns1  

  查看netns下的網絡,可以看到ns0和ns1分別新增接口veth0_ns0,veth0_ns1

[root@localhost home]# ip netns exec ns0 ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
12: veth0_ns0@if11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 82:87:07:8f:59:a9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    
[root@localhost home]# ip netns exec ns1 ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
14: veth0_ns1@if13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 9a:14:d8:63:56:45 brd ff:ff:ff:ff:ff:ff link-netnsid 0

host上查看接口信息,通過網卡序號可以看到veth0_ns0(12)和veth_ns0(11)為一對veth,veth0_ns1(14)和veth_ns1(13)為一對veth

[root@localhost home]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:12:5d:af brd ff:ff:ff:ff:ff:ff
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:37:84:0b:5f brd ff:ff:ff:ff:ff:fff
11: veth_ns0@if12: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c2:e3:ef:a8:9c:08 brd ff:ff:ff:ff:ff:ff link-netnsid 2
13: veth_ns1@if14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether a6:77:5d:48:10:81 brd ff:ff:ff:ff:ff:ff link-netnsid 3
  • 為ns0和ns1的接口配置IP並UP
ip netns exec ns0 ip addr add 1.1.1.1/24 dev veth0_ns0
ip netns exec ns0 ip link set dev veth0_ns0 up

ip netns exec ns1 ip addr add 1.1.1.2/24 dev veth0_ns1
ip netns exec ns1 ip link set dev veth0_ns1 up

查看ns0和ns1的網卡信息:

[root@localhost home]# ip netns exec ns0 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
12: veth0_ns0@if11: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether 82:87:07:8f:59:a9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 1.1.1.1/24 scope global veth0_ns0
       valid_lft forever preferred_lft forever
[root@localhost home]# ip netns exec ns1 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 14: veth0_ns1@if13: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000 link/ether 9a:14:d8:63:56:45 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 1.1.1.2/24 scope global veth0_ns1 valid_lft forever preferred_lft forever

因為兩個ns相互獨立,此時ns0 ping ns1是ping不通的

[root@localhost home]# ip netns exec ns0 ping 1.1.1.2
PING 1.1.1.2 (1.1.1.2) 56(84) bytes of data.
  • 創建linux bridge並添加ns0和ns1的veth pair: veth_ns0和veth_ns1
ip link add br0 type bridge
ip link set dev br0 up
ip link set dev veth_ns0 up
ip link set dev veth_ns1 up
ip link set dev veth_ns0 master br0
ip link set dev veth_ns1 master br0

查看br0信息,可以看到ns0和ns1的pair veth都已經連接到br0

[root@localhost home]# ip a show br0
15: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a6:77:5d:48:10:81 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::42e:2dff:fe70:43d7/64 scope link 
       valid_lft forever preferred_lft forever

[root@localhost home]# ip a show master br0
11: veth_ns0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
    link/ether c2:e3:ef:a8:9c:08 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::c0e3:efff:fea8:9c08/64 scope link 
       valid_lft forever preferred_lft forever
13: veth_ns1@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
    link/ether a6:77:5d:48:10:81 brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::a477:5dff:fe48:1081/64 scope link 
       valid_lft forever preferred_lft forever

  ns0 ping ns1,此時可以ping 通

[root@localhost netns]# ip netns exec ns0 ping 1.1.1.2
PING 1.1.1.2 (1.1.1.2) 56(84) bytes of data.
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.148 ms

當前組網如下

  • 如果ping 外網網關(本環境為192.168.80.2),因為br0上只連接了2個小網(ns0和ns1),此時是無法連通外網的,將host網卡ens33添加到bridge(本操作會導致遠程連接失敗,請在測試環境操作),並刪除ens33的網卡地址和相關路由(原因參見https://blog.csdn.net/sld880311/article/details/77840343的“”連通性“一節)
[root@localhost home]# ip link show master br0
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:12:5d:af brd ff:ff:ff:ff:ff:ff
11: veth_ns0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
    link/ether c2:e3:ef:a8:9c:08 brd ff:ff:ff:ff:ff:ff link-netnsid 2
13: veth_ns1@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
    link/ether a6:77:5d:48:10:81 brd ff:ff:ff:ff:ff:ff link-netnsid 3

 

可以看到br0上已經存在ens33網卡,但ns0 ping 網關仍然失敗,原因是當前br0上連接了2個ns的接口,而host的接口ens33接入br0之后其IP會失效,導致網絡不通

[root@localhost netns]# ip netns exec ns0 ping 192.168.80.2 -I veth0_ns0
PING 192.168.80.2 (192.168.80.2) from 1.1.1.1 veth0_ns0: 56(84) bytes of data.

  一個簡單的解決辦法是給ns01添加一個與網關同網段的IP,並給br0設置與網關同網段的IP,配置如下

[root@localhost netns]# ip netns exec ns0 ip addr add 192.168.80.80/24 dev veth0_ns0
[root@localhost netns]# ip netns exec ns1 ip addr add 192.168.80.81/24 dev veth0_ns1
[root@localhost netns]# ip addr add 192.168.80.82/24 dev br0

這樣ns0和ns1都可以ping 通網關,但這樣有個問題就是會導致host主機無法與外界相接,並不是一個好的解決方案

[root@localhost netns]# ip netns exec ns0 ping 192.168.80.2 -I veth0_ns0
PING 192.168.80.2 (192.168.80.2) from 192.168.80.80 veth0_ns0: 56(84) bytes of data.
64 bytes from 192.168.80.2: icmp_seq=1 ttl=128 time=0.236 ms
64 bytes from 192.168.80.2: icmp_seq=2 ttl=128 time=0.239 ms

[root@localhost netns]# ip netns exec ns1 ping 192.168.80.2 -I veth0_ns1
PING 192.168.80.2 (192.168.80.2) from 192.168.80.81 veth0_ns1: 56(84) bytes of data.
64 bytes from 192.168.80.2: icmp_seq=1 ttl=128 time=0.288 ms
64 bytes from 192.168.80.2: icmp_seq=2 ttl=128 time=0.280 ms

 

docker bridge:

  docker的netns在centos下的路徑為:/var/run/docker/netns,每創建一個容器就會在該路徑下生成一個對應的namespace文件,使用nsenter進入該ns可以看到與容器的網絡信息是一樣的

首先創建bridge網絡,並啟動兩個docker

[root@localhost home]# docker network create -d bridge --subnet 172.1.1.0/24 my_br
[root@localhost home]# docker run -itd --net=my_br --name=centos0 centos /bin/sh
[root@localhost home]# docker run -itd --net=my_br --name=centos1 centos /bin/sh

查看my_br情況如下,centos0 IP為172.1.1.2,centos1 IP為172.1.1.3,為my_br的子網內地址

[root@localhost home]# docker network inspect my_br 
[
    {
        "Name": "my_br",
        "Id": "f830aee4b13fa17479f850ea62d570ea61bc1c7d182a88010709a7285193bb64",
        "Created": "2018-10-17T07:23:53.31341481+08:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.1.1.0/24"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Containers": {
            "03cda0f3fdd1fc65d198adb832998e11098bcc8c1bb5a8379f9c2ee82a14be07": {
                "Name": "centos1",
                "EndpointID": "d608d888da293967949340c1d946e92a6be06d525bcec611d0f20a6188de01ff",
                "MacAddress": "02:42:ac:01:01:03",
                "IPv4Address": "172.1.1.3/24",
                "IPv6Address": ""
            },
            "c739d26d51b08a36d3402e32fbe83656a7ac1b3f611a6c228f8ec80c84423439": {
                "Name": "centos0",
                "EndpointID": "9b38292d043fba31a5d04076c9d6a333c5beac08aba68dadeb84a5e17fed4dd6",
                "MacAddress": "02:42:ac:01:01:02",
                "IPv4Address": "172.1.1.2/24",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

當然容器centos0是可以直接ping 網關的

[root@localhost home]# docker exec centos0 /bin/sh -c "ping 192.168.80.2"
PING 192.168.80.2 (192.168.80.2) 56(84) bytes of data.
64 bytes from 192.168.80.2: icmp_seq=1 ttl=127 time=0.273 ms
64 bytes from 192.168.80.2: icmp_seq=2 ttl=127 time=0.642 ms

查看centos0 centos1和host網卡信息,可以看到centos0的eth0與host的veth00f659d為veth pair,centos0的eth0與host的veth05377ae為另一veth pair

[root@localhost home]# docker exec centos0 /bin/sh -c "ip link"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ac:01:01:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
[root@localhost home]# docker exec centos1 /bin/sh -c "ip link" 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 9: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:ac:01:01:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
[root@localhost home]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:12:5d:af brd ff:ff:ff:ff:ff:ff
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:48:2d:4c brd ff:ff:ff:ff:ff:ff
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:48:2d:4c brd ff:ff:ff:ff:ff:ff
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:37:84:0b:5f brd ff:ff:ff:ff:ff:ff
6: br-f830aee4b13f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:af:60:4b:4e brd ff:ff:ff:ff:ff:ff
8: veth00f659d@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-f830aee4b13f state UP mode DEFAULT group default 
    link/ether 0e:45:69:f8:34:57 brd ff:ff:ff:ff:ff:ff link-netnsid 0
10: veth05377ae@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-f830aee4b13f state UP mode DEFAULT group default 
    link/ether aa:ae:fc:5c:dd:06 brd ff:ff:ff:ff:ff:ff link-netnsid 1

  查看centos0的路由,可以看到默認網關為172.1.1.1,該地址對應的網卡就是名為my_br的網橋

[root@localhost home]# docker exec centos0 /bin/bash -c "ip route"
default via 172.1.1.1 dev eth0 
172.1.1.0/24 dev eth0 proto kernel scope link src 172.1.1.2
[root@localhost home]# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
8678329d58ab        bridge              bridge              local
e8476b504e33        host                host                local
f830aee4b13f  my_br               bridge              local
96a70c1a9516        none                null                local
[root@localhost home]# ip a
|grep f830aee4b13f 6: br-f830aee4b13f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default inet 172.1.1.1/24 scope global br-f830aee4b13f

host上的與centos0相關的路由如下,加粗的第一行為對內centos0的路由,加粗的第二行為對外路由

[root@localhost home]# ip route
default via 192.168.80.2 dev ens33 proto dhcp metric 100 
172.1.1.0/24 dev br-f830aee4b13f proto kernel scope link src 172.1.1.1 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.80.0/24 dev ens33 proto kernel scope link src 192.168.80.128 metric 100 

同時查看與172.1.1.0相關的iptables,nat表有如下內容,即對源地址為172.1.1.0/24,出接口非網橋接口的報文進行MASQUERADE,將容器發過來的報文SNAT為host網卡地址

Chain POSTROUTING (policy ACCEPT 332 packets, 21915 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  all  --  *      !br-f830aee4b13f  172.1.1.0/24         0.0.0.0/0

總結一下:centos0上ping外網網關(192.168.80.2)的處理流程:icmp報文目的地址為192.168.80.2,由於沒有對應的路由,直接走默認路由,報文從容器的eth0發出去,進入到默認網關my_br(172.1.1.1),網橋my_br根據host的路由將目的地址為192.168.80.2的報文發送到ens33,同時將源地址使用MASQUERADE SNAT為ens33的地址。這就是docker bridge的報文轉發流程。

 

改造自定義的bridge

  第一部分的網絡方案是有缺陷的,它使得host主機的一個網口失效,根據對docker網絡的分析,改造其網絡如下:

  • 首先創建veth pair,其中一個連接到br0
ip link add veth0 type veth peer name veth1
ip link set dev veth0 up
ip link set dev veth1 up
ip link set dev veth1 master br0
  •  host上添加到達1.1.1.0/24網絡的路由,其中1.1.1.3為br0的地址
[root@localhost home]# ip route add 1.1.1.0/24 via 1.1.1.3
  • ns0內部添加默認網關路由(注意必須是網關路由)
ip netns exec ns0 ip route add default via 1.1.1.3 dev veth0_ns0
  • host主機添加對1.1.1.0/24網段的SNAT
iptables -t nat -A POSTROUTING -s 1.1.1.0/24 ! -o br0 -j MASQUERADE

這樣就構造了一個模仿docker bridge的網絡,ns0就可以ping通外部網關了

[root@localhost home]# ip netns exec ns0 ip route
default via 1.1.1.3 dev veth0_ns0 
1.1.1.0/24 dev veth0_ns0 proto kernel scope link src 1.1.1.1 

[root@localhost home]# ip netns exec ns0 ping 192.168.80.2
PING 192.168.80.2 (192.168.80.2) 56(84) bytes of data.
64 bytes from 192.168.80.2: icmp_seq=1 ttl=127 time=0.439 ms
64 bytes from 192.168.80.2: icmp_seq=2 ttl=127 time=0.533 ms

那么使用MASQUERADE時,iptables是怎么判斷不同ping的返回包呢?此時host和ns0的icmp返回包有相同的地址和協議,且沒有端口號。答案是通過 ip_conntrack識別icmp報文中的id字段來判斷不同的ping進程,參見ICMP connections,ip_conntrack是實現NAT的基礎。(新版本內核使用nf_conntrack)

 

 TIPS:

  • 可以使用iptstate查看/proc/net/nf_conntrack的連接狀態,如下

 

參考:

https://blog.csdn.net/sld880311/article/details/77840343

https://docs.docker.com/network/bridge/#differences-between-user-defined-bridges-and-the-default-bridge

http://success.docker.com/article/networking#dockerbridgenetworkdriver

http://vinllen.com/linux-bridgeshe-ji-yu-shi-xian/


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM