理論知識儲備:
想了解vxlan網絡的知識:https://www.cnblogs.com/shuiguizi/p/10923841.html
想了解docker網絡的原理知識:https://www.cnblogs.com/shuiguizi/p/10922049.html
提前准備:
鏡像使用centos和nginx,為了方便,將從官網上下載的鏡像安裝一些工具再重新commit得到新的鏡像
yum install net-tools
yum install iputils
yum install iproute *
配置步驟:
0,安裝啟動etcd,步驟見網絡.
在這里兩個主機都部署etcd,它會自動選舉出leader,訪問etcd數據庫的話,兩個地址都可以。
1,重新配置docker daemon
vi /usr/lib/systemd/system/docker.service
--cluster-store=etcd://106.y.y.3:2379 \
--cluster-advertise=106.y.y.31:2375 \
--cluster-store=etcd://188.x.x.113:2379 \
--cluster-advertise=188.x.x.113:2375 \
systemctl daemon-reload
systemctl restart docker.service
2,創建docker network
#docker network create ov_net2 -d overlay
[root@master ~]# docker network ls
NETWORK ID NAME DRIVER SCOPE
731d1b63b387 ov_net2 overlay global
3,分別在兩個節點上創建兩個docker 容器
master:
docker run -ti -d --network=ov_net2 --name=centos21 centos:wxy /bin/sh
minion:
docker run -d --name nginx --network=ov_net2 nginx22
開始解析
1.這個overlay網絡的信息情況
[root@master ~]# docker network inspect ov_net2 [ { "Name": "ov_net2", "Id": "731d1b63b38768022160534b619d09d2e0fb139a7504070bf370a7706ed8ee9e", "Created": "2019-05-14T20:08:29.045284861+08:00", "Scope": "global", "Driver": "overlay", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "10.0.1.0/24", "Gateway": "10.0.1.1" } ] }, "Internal": false, "Attachable": false, "Containers": { "d7dc5bf71ccb2e5fb1f6b98dd47b4f79f44a8b73e9536f3f15b036ba0c94f55d": { "Name": "centos21", "EndpointID": "c7771dca216130e46b60cb921d4488eb82f9a5e1e168ec4d7a9d91f183e82ea6", "MacAddress": "02:42:0a:00:01:02", "IPv4Address": "10.0.1.2/24", "IPv6Address": "" }, "ep-611f45864389f90630fa70340dddd4c76b16ac070c49f60aa1679c753b41db7f": { "Name": "nginx22", "EndpointID": "611f45864389f90630fa70340dddd4c76b16ac070c49f60aa1679c753b41db7f", "MacAddress": "02:42:0a:00:01:03", "IPv4Address": "10.0.1.3/24", "IPv6Address": "" } }, "Options": {}, "Labels": {} } ]
2,不同host上的容器是怎么知道對方的地址情況呢?答:從etcd中讀取到的。
[root@minion ~]# etcdctl get /docker/network/v1.0/endpoint/731d1b63b38768022160534b619d09d2e0fb139a7504070bf370a7706ed8ee9e/611f45864389f90630fa70340dddd4c76b16ac070c49f60aa1679c753b41db7f {"anonymous":false,"disableResolution":false,"ep_iface":{"addr":"10.0.1.3/24","dstPrefix":"eth","mac":"02:42:0a:00:01:03","routes":null,"srcName":"vethb5c341b","v4PoolID":"GlobalDefault/10.0.1.0/24","v6PoolID":""},"exposed_ports":[{"Proto":6,"Port":80}],"generic":{"com.docker.network.endpoint.exposedports":[{"Proto":6,"Port":80}],"com.docker.network.portmap":[]},"id":"611f45864389f90630fa70340dddd4c76b16ac070c49f60aa1679c753b41db7f","ingressPorts":null,"joinInfo":{"StaticRoutes":null,"disableGatewayService":false},"locator":"106.13.146.31","myAliases":["42c3eff8768d"],"name":"nginx22","sandbox":"f7a0ce169bd7690b45887a462efc169953150311dbb03e4bb2ccaf17ab75add8","svcAliases":null,"svcID":"","svcName":"","virtualIP":"\u003cnil\u003e"}
3,驗證連接情況
[root@master ~]# ip netns exec a740da7c2043 ping 10.0.1.3 -c 2 PING 10.0.1.3 (10.0.1.3) 56(84) bytes of data. From 10.0.1.2 icmp_seq=1 Destination Host Unreachable From 10.0.1.2 icmp_seq=2 Destination Host Unreachable
4,看過Docker原理的都知道,一個容器其實就是創建了一個namespce;另外對於overlay網絡為每個host還會再創建一個用於vxlan連接的namespace
[root@master ~]# ip netns exec 1-731d1b63b3 tcpdump -i vxlan1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vxlan1, link-type EN10MB (Ethernet), capture size 262144 bytes 20:36:47.283514 ARP, Request who-has 10.0.1.3 tell 10.0.1.2, length 28 20:36:48.304014 ARP, Request who-has 10.0.1.3 tell 10.0.1.2, length 28 20:36:49.328182 ARP, Request who-has 10.0.1.3 tell 10.0.1.2, length 28 [root@master ~]# ip netns exec 1-731d1b63b3 netstat -i Kernel Interface table Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg br0 1450 35 0 0 0 14 0 0 0 BMRU lo 65536 0 0 0 0 0 0 0 0 LRU veth2 1450 35 0 0 0 25 0 0 0 BMRU vxlan1 1450 0 0 0 0 0 0 39 0 BMRU
解析:
tcpdump的時候vxlan接口有發出數據,
netstat統計的時候,veth2中接收的數據有增加,而vxlan1中沒有,
這說明什么?
簡述bridge的發包原理:
將各種接口add到網橋上,當網橋的一個接口接收到數據,它會根據數據的目的地址決定是上送上層協議棧還是轉發,在這里無論arp報文還是icmp報文都屬於要轉發。
首先,於是報文經過bridge內部的處理從網橋的其他接口出去,包括vxlan1接口,此處的代碼屬於內核的L2層,如果有tcpdump監聽接口則會將報文報給tcpdump一份
然后,報文進入驅動,vxlan的驅動,可以認為驅動就是介於二層鏈路層和物理設備之間,netstat -i 統計的就是驅動上數據收發情況。
好了,回來,以上可以知道數據會在vxlan驅動那里被丟掉了,而在驅動都做了什么呢?vxlan的驅動簡單說就是查fdb表指導轉發,如下可以發現
[root@master ~]# ip netns exec 1-731d1b63b3 bridge fdb 12:54:57:62:92:74 dev vxlan1 vlan 1 master br0 permanent 12:54:57:62:92:74 dev vxlan1 master br0 permanent
解析一下,12:54:57:62:92:74為vxlan1也是網橋br0的接口的mac地址
第一條:表示目的mac關聯的是vxlan1接口,也可以就認為這個mac地址就是這個接口的,后面的都是修飾vxlan1的:然后他屬於vlan1的,從屬於br0橋
對比我自己搭建vxlan網絡時,namespace中的fdb表的情況
[root@minion ~]# ip netns exec ns200 bridge fdb f2:4d:be:62:09:50 dev vxlan20 vlan 1 master br-vx2 permanent f2:4d:be:62:09:50 dev vxlan20 master br-vx2 permanent 00:00:00:00:00:00 dev vxlan20 dst 188.131.210.113 via ifindex 2 link-netnsid 0 self permanent
rount1:手動再fdb表中添加缺省路由
和之前用namespace模擬的vxlan隧道比較,可以發現他少了缺省路由,即數據不知道從哪里出去,數據本來應該從真正的物理接口eth0出去,所以這里就嘗試了手動配置,包括兩個節點
ip netns exec 1-731d1b63b3 bridge fdb add 00:00:00:00:00:00 dev vxlan1 dst 106.13.146.3
不行,
結果:
還是不行,數據還是不知道出去哪里,原因是沒有指定出口接口,fdb表增加條目的時候無法指定非本namespace下的接口我也是醉了....
round2:新創建一個namespace,替換Docker創建的那個用於vxlan連接的namespace,再把容器代表的接口添加到該ns的橋上
---------------------------未完成,待續--------------------------------------------
============================================
只有使能swarm才能啟動gossip協議,進行節點的發現
vi /usr/lib/systemd/system/docker.service
systemctl daemon-reload
systemctl restart docker
]# docker swarm init --advertise-addr=188.131.210.113:2377
Swarm initialized: current node (m5jpgmwxow5ec256vw8bpgxi9) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-5vpumm8bssl8tsa7aq9bc9ytpgxdob5rp5dm0y4b8zed3ef5e9-eckhrx62u9vb0o95xxam98qjc \
188.131.210.113:2377
坑1:端口號之前設置的是2375,則node在加入master的時候會發生
[root@minion ~]# docker swarm join --token SWMTKN-1-4dt7opomolsz9q2kdykheknj2cbmj8sgydxcljl99h07ob9dtj-0acex2ahyx98wm8fuc6rt0k4i 188.131.210.113:2375
Error response from daemon: rpc error: code = 14 desc = grpc: the connection is unavailable
原因是:
For this test, you need two different Docker hosts that can communicate with each other. Each host must have Docker 17.06 or higher with the following ports open between the two Docker hosts:
TCP port 2377 for cluster management communications
TCP and UDP port 7946 for communication among nodes
UDP port 4789 for overlay network traffic
這個原理是這樣的:
2377接口
用來集群管理方面的通信,比如某節點想要加入集群啊,那么他就向master節點的該接口發送加入消息
7946接口
用來在集群節點之間進行通信,它首先利用tcp:7946向其peer發消息,如果收到正確回應,說明線路暢通可以保持聯系
然后使用udp:7946將自己節點中需要共享的網絡信息同步出去(bulk sync)
於是,在master上創建的某個network也就能夠被其他節點所發現了
--wxy:這個就是所謂的網絡管理平面吧,是不是通過這個替代了etcd等k-v存儲結構
4789接口
是不是很熟悉?沒錯,正是vxlan使用的缺省接口,即用來為overlay網絡傳輸數據,就是vxlan網絡中tunnel兩端的vtep使用的端口號,不同主機上的container就是用他來通信的
解決辦法:將端口號改成2377好了
[root@master ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
m5jpgmwxow5ec256vw8bpgxi9 * master.wxy Ready Active Leader
z50ektieet4esj1gwlfaisdc4 minion.wxy Ready Active
[root@master ~]# docker network ls
NETWORK ID NAME DRIVER SCOPE
8808070a27e1 bridge bridge local
e0407f6da9d8 docker_gwbridge bridge local
e866d30f43bf host host local
7j887qknji9s ingress overlay swarm
424fac469906 none null local
[root@master ~]#
[root@master ~]# docker network create -d overlay nginx-net
s8o2cknp3hc8l9cs5uqm4gskd
[root@master ~]# docker network ls
NETWORK ID NAME DRIVER SCOPE
8808070a27e1 bridge bridge local
e0407f6da9d8 docker_gwbridge bridge local
e866d30f43bf host host local
7j887qknji9s ingress overlay swarm
s8o2cknp3hc8 nginx-net overlay swarm
424fac469906 none null local
docker network create --driver=overlay --attachable ov-test
[root@master 5.0.10-1.el7.elrepo.x86_64]# ip netns exec 1-po3p8i2id3 bridge fdb show dev vxlan1
02:42:0a:00:00:04 master br0
9e:7e:18:eb:71:77 vlan 1 master br0 permanent
9e:7e:18:eb:71:77 master br0 permanent
02:42:0a:00:00:04 dst 172.16.0.4 link-netnsid 0 self permanent ---重要,
說明:
已經通過網絡控制平面就知道對方的mac地址了,但是目的地址是不對的,不能是對方的私網地址(172.16.0.4正是minion的私網地址),得想辦法改變這個地址,難道是我加入swarm使用的地址不對?改一下
[root@master ~]# docker network inspect ov-test
[
{
"Name": "ov-test",
"Id": "po3p8i2id3f9thvd7b8qiuu1o",
"Created": "2019-05-19T19:12:30.411636993+08:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Containers": {
"18d6f2f74b8d615f67a2c270f846102bc043d9ee63514a1edaa1087964e0486f": {
"Name": "centos",
"EndpointID": "65e8f927972704b27e19647040d5abb999d7142f327a26234ff01775fdf95991",
"MacAddress": "02:42:0a:00:00:02",
"IPv4Address": "10.0.0.2/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "master.wxy-61c0f37910dd",
"IP": "188.131.210.113"
},
{
"Name": "minion.wxy-362fc1004972",
"IP": "172.16.0.4"
}
]
}
]
[root@minion ~]# docker network inspect ov-test
[
{
"Name": "ov-test",
"Id": "po3p8i2id3f9thvd7b8qiuu1o",
"Created": "2019-05-19T19:13:41.541163575+08:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Containers": {
"12459f10b7f68e5eba856a446c46a24bce7d2a7f75b7db0cbd896bc72772455b": {
"Name": "nginx",
"EndpointID": "8ace966fe08b40b75f1e67d7c7a77e0e99abb3b5f9dc56f6df5c45b695698a85",
"MacAddress": "02:42:0a:00:00:04",
"IPv4Address": "10.0.0.4/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "master.wxy-61c0f37910dd",
"IP": "188.131.210.113"
},
{
"Name": "minion.wxy-362fc1004972",
"IP": "172.16.0.4"
}
]
}
]
修正后:
docker swarm init --advertise-addr=188.131.210.113:2377
docker swarm join \
--token SWMTKN-1-4m6s7rs3hkhh4bhdv78gymjc7bbq663cfw6qy4yh9ysja2nuvb-bknjkrcu52tvo50dl9ermvxnr \
--advertise-addr 106.13.146.3:2377 \
188.131.210.113:2377
docker swarm join \
--token SWMTKN-1-1qvdgad5v0q5jafdw2ym20lcffi2cd0d4fsuxlr9p5dg8ogqrp-48ti1h5gccibyqx0phzboanq5 \
--advertise-addr 106.13.146.3:2377 \
188.131.210.113:2377
docker swarm join \
--token SWMTKN-1-4m6s7rs3hkhh4bhdv78gymjc7bbq663cfw6qy4yh9ysja2nuvb-bknjkrcu52tvo50dl9ermvxnr \
--advertise-addr 106.13.146.3:2377 \
188.131.210.113:2377
docker network create --driver=overlay --attachable ov-test
[root@master ~]# docker run -d -ti --network=ov-test2 --name=centos2 centos:wxy /bin/sh
[root@minion ~]# docker run -d --network=ov-test2 --name=nginx2 nginx
journalctl -u docker.service
1,第一個錯誤,node節點的ip配置錯
evel=error msg="periodic bulk sync failure for network 5s2fbdra2g5m9qt14wmiveir8: bulk sync failed on node minion.wxy-611e1a90a99f: failed to send a TCP message during bulk sync: dial tcp 106.13.146.31:7946: connect: connection refused"
2,第二個錯誤,node節點在加入集群的時候沒有指定advertise ip,這樣master在sync時訪問的是對方的小網ip
vel=error msg="Error in responding to bulk sync from node 172.16.0.4: failed to send a TCP message during bulk sync: dial tcp 172.16.0.4:7946: i/o timeout"
3,問題2,網絡不共享
定位過程:發現master有發tcp:7946且有回應,然后又發了udp:7946,node也收到了,但是發現攜帶的數據內容不大,所以猜測是沒有將網絡信息打包
Server Version: 1.13.1
iptables -A INPUT -p udp --dport 4789 -j ACCEPT
================
在minonin節點上使能集群
docker swarm init --advertise-addr=106.13.146.3:2377
iptables rule and container port has conflicts.
try :
sudo iptables -t nat -L -n --line-numbers | grep 7946
sudo iptables -t nat -D DOCKER 6926
===解析下命名空間=======================
[root@minion ~]# ip addr
3: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:93:81:c9:8d brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker_gwbridge
valid_lft forever preferred_lft forever
inet6 fe80::42:93ff:fe81:c98d/64 scope link
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:d5:cc:f2:db brd ff:ff:ff:ff:ff:ff
inet 10.0.78.1/24 scope global docker0
valid_lft forever preferred_lft forever
10: vethd59e050@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default
link/ether 22:8d:34:2b:9a:c6 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::208d:34ff:fe2b:9ac6/64 scope link
valid_lft forever preferred_lft forever
[root@minion ~]# ip netns exec ingress_sbox ip addr
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
inet 10.255.0.4/16 scope global eth0 ---沙箱容器(netwok namespace)
9: eth1@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
inet 172.17.0.2/16 scope global eth1
[root@minion ~]# ip netns exec 1-ystd5xxiui ip addr
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether aa:cb:0d:a4:a6:b9 brd ff:ff:ff:ff:ff:ff
inet 10.255.0.1/16 scope global br0
6: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN group default
link/ether aa:cb:0d:a4:a6:b9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
8: veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP group default
link/ether d2:04:e3:eb:fc:52 brd ff:ff:ff:ff:ff:ff link-netnsid 1
[root@minion ~]# ip netns
1-ystd5xxiui (id: 0)
ingress_sbox (id: 1)
其中master同
[root@minion ~]# docker network inspect ingress
[
{
"Name": "ingress",
"Id": "ystd5xxiuiuqjbpn76bbzcaws",
"Created": "2019-05-20T18:35:25.014017659+08:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.255.0.0/16",
"Gateway": "10.255.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Containers": {
"ingress-sbox": {
"Name": "ingress-endpoint",
"EndpointID": "081aaac8d188d9cb4c190bbb5863be933dcc2b98cde071dce9b035c4ea6df957",
"MacAddress": "02:42:0a:ff:00:04",
"IPv4Address": "10.255.0.4/16", -----注意這個ip
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4096"
},
"Labels": {},
"Peers": [
{
"Name": "master.wxy-b33a341ba33b",
"IP": "188.131.210.113"
},
{
"Name": "minion.wxy-47481bf33513",
"IP": "106.13.146.3"
}
]
}
]
[root@master ~]# docker network inspect ingress
[
{
"Name": "ingress",
"Id": "ystd5xxiuiuqjbpn76bbzcaws",
"Created": "2019-05-20T18:34:27.512773865+08:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.255.0.0/16",
"Gateway": "10.255.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Containers": {
"ingress-sbox": {
"Name": "ingress-endpoint",
"EndpointID": "77cc852efe15111e141ae78ee9c29ccf88637599068044677d64f0107f5db78a",
"MacAddress": "02:42:0a:ff:00:03",
"IPv4Address": "10.255.0.3/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4096"
},
"Labels": {},
"Peers": [
{
"Name": "master.wxy-b33a341ba33b",
"IP": "188.131.210.113"
},
{
"Name": "minion.wxy-47481bf33513",
"IP": "106.13.146.3"
}
]
}
]
說明:ingress-sbox其實是一個容器,就叫沙箱容器把,深層次來說就是一個namespace,這個ns的名字就叫ingress_sbox,這樣之后加入該network的容器實際上就是共享了這個namespace(network類型),當然每一個沙箱容器都還會再配備一個namespace,用來承載vxlan相關的內容,即overlay網絡通信功能的namespace。
驗證下這個network下是否可以互通呢?
master的沙箱:10.255.0.3
minomon的沙箱:10.255.0.4
[root@minion ~]# ip netns exec ingress_sbox ping 10.255.0.3
PING 10.255.0.3 (10.255.0.3) 56(84) bytes of data.
64 bytes from 10.255.0.3: icmp_seq=1 ttl=64 time=9.64 ms
64 bytes from 10.255.0.3: icmp_seq=2 ttl=64 time=9.55 ms
64 bytes from 10.255.0.3: icmp_seq=3 ttl=64 time=9.75 ms
[root@minion ~]# ip netns exec 1-ystd5xxiui bridge fdb show dev vxlan1
02:42:0a:ff:00:03 master br0
aa:cb:0d:a4:a6:b9 vlan 1 master br0 permanent
aa:cb:0d:a4:a6:b9 master br0 permanent
02:42:0a:ff:00:03 dst 188.131.210.113 link-netnsid 0 self permanent --這個正是master沙箱的
第二步:新起容器加入到ingress網絡中
/usr/bin/docker-current: Error response from daemon: Could not attach to network ingress: rpc error: code = 7 desc = network ingress not manually attachable.
第三步: