现有集群是docker默认的bridge网络模型,不支持跨节点通信。因此部署网络插件calico. 另外需要把kubelet的网络模型改成cni(--network-plugin=cni).calico官网(https://docs.projectcalico.org/getting-started/kubernetes/self-managed-onprem/onpremises)给出的安装步骤如下:
- Download the Calico networking manifest for the Kubernetes API datastore.
curl https://docs.projectcalico.org/manifests/calico.yaml -O
- 修改CALICO_IPV4POOL_CIDR 字段为你所要使用的网段
- 按需客制化manifest
- CALICO_DISABLE_FILE_LOGGING 默认为true,表示除了cni的log都通过kubectl logs打印;如果想在/var/log/calico/目录下的文件查看log,需要把该值设为false.并且需要共享主机目录/var/log/calico
- BGP_LOGSEVERITYSCREEN 设置log level,默认为info. 还可以设为debug,error等。
- FELIX_LOGSEVERITYSCREEN 设置felix的log level
- Apply the manifest using the following command.
kubectl apply -f calico.yaml
但是在最后一步时,calico-kube-controllers容器起不来,同时calico-node容器也一直在重启。查看calico-kube-controllers的logs,如下所示:
2020-09-29 09:39:55.356 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0929 09:39:55.359900 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-09-29 09:39:55.362 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-09-29 09:39:55.372 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.0.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate is valid for 127.0.0.1, 172.171.19.210, not 10.0.0.1
2020-09-29 09:39:55.373 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get "https://10.0.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate is valid for 127.0.0.1, 172.171.19.210, not 10.0.0.1
判断是kubeconfig未配置好,但是我不清楚它默认的kubeconfig是从哪里读的,因此直接修改yaml文件中关于calico-kube-controllers容器的配置,用挂载卷的方式从主机的 /root/.kube/目录下读取配置文件,从主机的/opt/kubernetes/ssl目录下读取Etcd认证文件(注意该目录下要有文件),显示配置KUBECONFIG如下所示:
containers:
- name: calico-kube-controllers
image: calico/kube-controllers:v3.16.1
volumeMounts:
- mountPath: /test-pd
name: test-volume
- mountPath: /opt/kubernetes/ssl
name: test-etcd
env:
# Choose which controllers to run.
- name: ENABLED_CONTROLLERS
value: node
- name: DATASTORE_TYPE
value: kubernetes
- name: KUBECONFIG
value: /test-pd/config
readinessProbe:
exec:
command:
- /usr/bin/check-status
- -r
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /root/.kube/
- name: test-etcd
hostPath:
path: /opt/kubernetes/ssl/
重新创建后calico-kube-controller可正确启动,但这是看calico-node仍然不停重启,查看log如下所示:
2020-09-30 01:43:32.539 [INFO][8] startup/startup.go 361: Early log level set to info
2020-09-30 01:43:32.539 [INFO][8] startup/startup.go 377: Using NODENAME environment for node name
2020-09-30 01:43:32.540 [INFO][8] startup/startup.go 389: Determined node name: k8s-node1
2020-09-30 01:43:32.543 [INFO][8] startup/startup.go 421: Checking datastore connection
2020-09-30 01:43:32.552 [INFO][8] startup/startup.go 436: Hit error connecting to datastore - retry error=Get "https://10.0.0.1:443/api/v1/nodes/foo": x509: certificate is valid for 127.0.0.1, 172.171.19.210, not 10.0.0.1
calico node工作节点启动找不到apiserver的地址,检查一下calico的配置文件,要把apiserver的IP和端口配置上,如果不配置的话,calico默认将设置默认的calico网段和443端口。字段名:KUBERNETES_SERVICE_HOST、KUBERNETES_SERVICE_PORT、KUBERNETES_SERVICE_PORT_HTTPS如下:
再重新创建,查看log,运行正常。
修改kubernetes数据存储类型为etcdv3
从官网上下载的Calico.yaml对calico-node和calico-kube-controller的数据存储类型定义如下,如果屏蔽改值那么使用默认值etcdv3.但是数据存储在kubernetes上不方便查看,因此改为etcdv3. (官方推荐使用k8s数据存储)
env:
# Use Kubernetes API as the backing datastore.
- name: DATASTORE_TYPE
value: "kubernetes"
但是使用该方法需要配置证书等,我一直没有配置成功。其实calico已经提供了一种便捷的使用etcd数据库的方法,并且官网也有使用etcd数据库的模板YAML文件。步骤如下:
下载etcd数据存储类型的calico yaml文件
curl https://docs.projectcalico.org/v3.16/manifests/calico-etcd.yaml -o calico-etcd.yaml
生成密钥
- mkdir /opt/calico
- cp -fr /opt/etcd/ssl /opt/calico/
- cd /opt/calico/ssl
- cat server.pem | base64 -w 0 > etcd-cert
- cat server-key.pem | base64 -w 0 > etcd-key
- cat ca.pem | base64 -w 0 > etcd-ca
把密钥填写到calico-etcd.yaml文件中
将上述base64加密的字符串修改至文件中声明:ca.pem对应etcd-ca、server-key.pem对应etcd-key、server.pem对应etcd-cert;修改etcd证书的位置(我没修改,就用的默认值,不知道为啥也可以);修改etcd的连接地址(与api-server中配置/opt/kubernetes/cfg/kube-apiserver.conf中相同)
# vim calico-etcd.yaml
...
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: calico-etcd-secrets
namespace: kube-system
data:
# Populate the following with etcd TLS configuration if desired, but leave blank if
# not using TLS for etcd.
# The keys below should be uncommented and the values populated with the base64
# encoded contents of each file that would be associated with the TLS data.
# Example command for encoding a file contents: cat <file> | base64 -w 0
etcd-key: 填写上面的加密字符串
etcd-cert: 填写上面的加密字符串
etcd-ca: 填写上面的加密字符串
...
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# Configure this with the location of your etcd cluster.
etcd_endpoints: "https://192.168.1.2:2379"
# If you're using TLS enabled etcd uncomment the following.
# You must also populate the Secret below with these files.
etcd_ca: "/calico-secrets/etcd-ca" #这三个值不需要修改
etcd_cert: "/calico-secrets/etcd-cert" #这三个值不需要修改
etcd_key: "/calico-secrets/etcd-key" #这三个值不需要修改
重新创建Calico相关资源
kubectl delete -f calico.yaml
kubectl create -f calico-etcd.yaml
修改/root/.bashrc,添加如下一行
alias etcdctl='ETCDCTL_API=3 etcdctl --endpoints https://192.168.1.2:2379 --cacert /opt/etcd/ssl/ca.pem --key /opt/etcd/ssl/server-key.pem --cert /opt/etcd/ssl/server.pem'
并执行命令source ~/.bashrc
验证calico数据是否在Etcd上
创建一个pod,并在etcd上查找给pod,可以看到的以/calico/resources/v3/打头的pod信息。
安装calicoctl工具
(参考https://docs.projectcalico.org/getting-started/clis/calicoctl/install)
(1) curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.16.1/calicoctl 放到/usr/local/bin/目录下,chmod +x
(2) 配置Etcd: cat << EOF > /etc/calico/calicoctl.cfg
apiVersion: projectcalico.org/v3
kind: CalicoAPIConfig
metadata:
spec:
datastoreType: "kubernetes"
kubeconfig: "/root/.kube/config"
EOF
在非集群节点以容器的形式安装calico-node
- 创建/etc/calico/calico.env 配置文件
# cat /etc/calico/calico.env
CALICO_NODENAME=""
CALICO_K8S_NODE_REF="192-168-1-210"
CALICO_IPV4POOL_IPIP="Always"
CALICO_IP=""
CALICO_IP6=""
CALICO_NETWORKING_BACKEND="bird"
DATASTORE_TYPE="etcdv3"
ETCD_ENDPOINTS="https://xxx1:2379,https://xxx2:2379,https:/xxx3:2379"
ETCD_CA_CERT_FILE="/etc/calico/pki/etcd-ca"
ETCD_CERT_FILE="/etc/calico/pki/etcd-cert"
ETCD_KEY_FILE="/etc/calico/pki/etcd-key"
KUBERNETES_SERVICE_HOST="192.168.1.130"
KUBERNETES_SERVICE_PORT="6443"
KUBECONFIG="/etc/calico/config"
WAIT_FOR_DATASTORE="true"
BGP_LOGSEVERITYSCREEN="info"
- 创建calico-node守护进程配置文件
# cat lib/systemd/system/calico-node.service
[Unit]
Description=calico-node
After=docker.service
Requires=docker.service
[Service]
EnvironmentFile=/etc/calico/calico.env
ExecStartPre=-/usr/bin/docker rm -f calico-node
ExecStart=/usr/bin/docker run --net=host --privileged \
--name=calico-node \
-e NODENAME=${CALICO_NODENAME} \
-e IP=${CALICO_IP} \
-e IP6=${CALICO_IP6} \
-e CALICO_NETWORKING_BACKEND=${CALICO_NETWORKING_BACKEND} \
-e AS=${CALICO_AS} \
-e CALICO_IPV4POOL_IPIP=${CALICO_IPV4POOL_IPIP} \
-e DATASTORE_TYPE=${DATASTORE_TYPE} \
-e ETCD_ENDPOINTS=${ETCD_ENDPOINTS} \
-e ETCD_CA_CERT_FILE=${ETCD_CA_CERT_FILE} \
-e ETCD_CERT_FILE=${ETCD_CERT_FILE} \
-e ETCD_KEY_FILE=${ETCD_KEY_FILE} \
-e KUBERNETES_SERVICE_HOST=${KUBERNETES_SERVICE_HOST} \
-e KUBERNETES_SERVICE_PORT=${KUBERNETES_SERVICE_PORT} \
-e KUBECONFIG=${KUBECONFIG} \
-e WAIT_FOR_DATASTORE=${WAIT_FOR_DATASTORE} \
-e BGP_LOGSEVERITYSCREEN=${BGP_LOGSEVERITYSCREEN} \
-v /var/log/calico:/var/log/calico \
-v /run/docker/plugins:/run/docker/plugins \
-v /lib/modules:/lib/modules \
-v /var/run/calico:/var/run/calico \
-v /etc/calico:/etc/calico \
-v /var/lib/calico:/var/lib/calico \
calico/node:v3.16.5
ExecStop=-/usr/bin/docker stop calico-node
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
然后systemctl daemon-reload, systemctl enable calico-node, systemctl start calico-node
其他常见问题:
- 新加节点后需不需要额外部署calico?
答:不需要,因为这里calico的部署方式是daemonset,当有新节点加进来时,会启动一个calico-node pod. - calico-node一直处于Init状态,是怎么回事?
答:应该是连接不了外网,导致镜像下载不了。 - 如果是多节点集群,多数节点的calico-node都是ready的,只有个别calico-node Ready的个数为“0/1”,再用命令calicoctl node status查看节点间连接建立的情况,如果连接没有建立,查看各个需要建立连接网卡的名称是否一致,如果不一致需要改成一致。当然这些网卡的IP地址需要在同一个网段。如果是多个网卡,可以指定网卡,如下所示:
# IP automatic detection
- name: IP_AUTODETECTION_METHOD
value: "interface=eth2"
另外,还有一种指定IP地址段的方法更加方便,而且一般集群中的业务网卡都在同一个网段。如下设置:
IP_AUTODETECTION_METHOD=cidr=10.0.1.0/24,10.0.2.0/24
IP6_AUTODETECTION_METHOD=cidr=2001:4860::0/64
- 如果calico-node容器显示Init:CrashLoopBackOff,说明初始化失败,那么使用命令kubectl describe 该容器,查看是initContainer中哪个步骤出错了。如下图所示是kubectl describe 初始化失败的容器的InitContainers的部分显示
Init Containers:
upgrade-ipam:
Container ID: docker://caaa485d0880c1cb022873c5017ec60ba1970ed8dc897a0b458fa6bb4b6b4179
Image: 192.168.3.224:5000/library/calico/cni:v3.18.1
Image ID: docker-pullable://192.168.3.224:5000/library/calico/cni@sha256:bc6507d4c122b69609fed5839d899569a80b709358836dd9cd1052470dfdd47a
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 27 Oct 2021 16:16:15 +0800
Finished: Wed, 27 Oct 2021 16:16:15 +0800
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-vz89c (ro)
install-cni:
Container ID: docker://87bfc3cea39247d10796148bf88f94a552d327bf3038f87f4e981feb02393cb8
Image: 192.168.3.224:5000/library/calico/cni:v3.18.1
Image ID: docker-pullable://192.168.3.224:5000/library/calico/cni@sha256:bc6507d4c122b69609fed5839d899569a80b709358836dd9cd1052470dfdd47a
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/install
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 27 Oct 2021 16:16:16 +0800
Finished: Wed, 27 Oct 2021 16:16:17 +0800
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-vz89c (ro)
flexvol-driver:
Container ID: docker://d095ec7e2ca15c90e234b207890caec380e0ae1491556e4b61f58e0db0e0df00
Image: 192.168.3.224:5000/library/calico/pod2daemon-flexvol:v3.18.1
Image ID: docker-pullable://192.168.3.224:5000/library/calico/pod2daemon-flexvol@sha256:4ac1844531e0592b2c609a0b0d2e8f740f4c66c7e27c7e5dda994dec98d7fb28
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 27 Oct 2021 16:16:18 +0800
Finished: Wed, 27 Oct 2021 16:16:18 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-vz89c (ro)
Containers:
可以看出容器初始化时用了三个容器,分别是upgrade-ipam、install-cni和flexvol-driver.其中upgrade-ipam容器的作用是查看是否有/var/lib/cni/networks/k8s-pod-network数据,如果有就把本地的ipam数据迁移到Calico-ipam;install-cni是cni-plugin项目里编译出来的一个二进制文件,用来拷贝二进制文件到各个主机的/opt/cni/bin下面的,并生成calico配置文件拷贝到/etc/cni/net.d下面;flexvol-driver使用的镜像是pod2daemon-flexvol,它的作用是 Adds a Flex Volume Driver that creates a per-pod Unix Domain Socket to allow Dikastes to communicate with Felix over the Policy Sync API.如果容器初试化错误,查看calico-node看不出问题,但可以通过查看这三个容器的log来分析,例如查看第二容器初始化install-cni的log:
kubectl logs -n kube-system calico-node-123xx -c install-cni
有时这个三个步骤有个别步骤出现系统级别无法排除的错误,在不影响功能的前提下可以删除这个容器初始化的部分yaml代码,或者手动完成初始化的功能,那么容器就能起来了。