在現有集群中部署calico踩坑記錄


現有集群是docker默認的bridge網絡模型,不支持跨節點通信。因此部署網絡插件calico. 另外需要把kubelet的網絡模型改成cni(--network-plugin=cni).calico官網(https://docs.projectcalico.org/getting-started/kubernetes/self-managed-onprem/onpremises)給出的安裝步驟如下:

  1. Download the Calico networking manifest for the Kubernetes API datastore.
curl https://docs.projectcalico.org/manifests/calico.yaml -O
  1. 修改CALICO_IPV4POOL_CIDR 字段為你所要使用的網段
  2. 按需客制化manifest
    • CALICO_DISABLE_FILE_LOGGING 默認為true,表示除了cni的log都通過kubectl logs打印;如果想在/var/log/calico/目錄下的文件查看log,需要把該值設為false.並且需要共享主機目錄/var/log/calico
    • BGP_LOGSEVERITYSCREEN 設置log level,默認為info. 還可以設為debug,error等。
    • FELIX_LOGSEVERITYSCREEN 設置felix的log level
  3. Apply the manifest using the following command.
kubectl apply -f calico.yaml

但是在最后一步時,calico-kube-controllers容器起不來,同時calico-node容器也一直在重啟。查看calico-kube-controllers的logs,如下所示:

2020-09-29 09:39:55.356 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0929 09:39:55.359900       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-09-29 09:39:55.362 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-09-29 09:39:55.372 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.0.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate is valid for 127.0.0.1, 172.171.19.210, not 10.0.0.1
2020-09-29 09:39:55.373 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get "https://10.0.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate is valid for 127.0.0.1, 172.171.19.210, not 10.0.0.1

判斷是kubeconfig未配置好,但是我不清楚它默認的kubeconfig是從哪里讀的,因此直接修改yaml文件中關於calico-kube-controllers容器的配置,用掛載卷的方式從主機的 /root/.kube/目錄下讀取配置文件,從主機的/opt/kubernetes/ssl目錄下讀取Etcd認證文件(注意該目錄下要有文件),顯示配置KUBECONFIG如下所示:

      containers:
       - name: calico-kube-controllers
         image: calico/kube-controllers:v3.16.1
         volumeMounts:
         - mountPath: /test-pd
           name: test-volume
         - mountPath: /opt/kubernetes/ssl
           name: test-etcd
         env:
           # Choose which controllers to run.
           - name: ENABLED_CONTROLLERS
             value: node
           - name: DATASTORE_TYPE
             value: kubernetes
           - name: KUBECONFIG
             value: /test-pd/config
         readinessProbe:
           exec:
             command:
             - /usr/bin/check-status
             - -r
     volumes:
       - name: test-volume
         hostPath:
         # directory location on host
           path: /root/.kube/
       - name: test-etcd
         hostPath:
           path: /opt/kubernetes/ssl/

重新創建后calico-kube-controller可正確啟動,但這是看calico-node仍然不停重啟,查看log如下所示:

2020-09-30 01:43:32.539 [INFO][8] startup/startup.go 361: Early log level set to info
2020-09-30 01:43:32.539 [INFO][8] startup/startup.go 377: Using NODENAME environment for node name
2020-09-30 01:43:32.540 [INFO][8] startup/startup.go 389: Determined node name: k8s-node1
2020-09-30 01:43:32.543 [INFO][8] startup/startup.go 421: Checking datastore connection
2020-09-30 01:43:32.552 [INFO][8] startup/startup.go 436: Hit error connecting to datastore - retry error=Get "https://10.0.0.1:443/api/v1/nodes/foo": x509: certificate is valid for 127.0.0.1, 172.171.19.210, not 10.0.0.1

calico node工作節點啟動找不到apiserver的地址,檢查一下calico的配置文件,要把apiserver的IP和端口配置上,如果不配置的話,calico默認將設置默認的calico網段和443端口。字段名:KUBERNETES_SERVICE_HOST、KUBERNETES_SERVICE_PORT、KUBERNETES_SERVICE_PORT_HTTPS如下:

再重新創建,查看log,運行正常。

修改kubernetes數據存儲類型為etcdv3

從官網上下載的Calico.yaml對calico-node和calico-kube-controller的數據存儲類型定義如下,如果屏蔽改值那么使用默認值etcdv3.但是數據存儲在kubernetes上不方便查看,因此改為etcdv3. (官方推薦使用k8s數據存儲)

          env:
            # Use Kubernetes API as the backing datastore.
            - name: DATASTORE_TYPE
              value: "kubernetes"

但是使用該方法需要配置證書等,我一直沒有配置成功。其實calico已經提供了一種便捷的使用etcd數據庫的方法,並且官網也有使用etcd數據庫的模板YAML文件。步驟如下:

下載etcd數據存儲類型的calico yaml文件

curl https://docs.projectcalico.org/v3.16/manifests/calico-etcd.yaml -o calico-etcd.yaml

生成密鑰

  • mkdir /opt/calico
  • cp -fr /opt/etcd/ssl /opt/calico/
  • cd /opt/calico/ssl
  • cat server.pem | base64 -w 0 > etcd-cert
  • cat server-key.pem | base64 -w 0 > etcd-key
  • cat ca.pem | base64 -w 0 > etcd-ca

把密鑰填寫到calico-etcd.yaml文件中

將上述base64加密的字符串修改至文件中聲明:ca.pem對應etcd-ca、server-key.pem對應etcd-key、server.pem對應etcd-cert;修改etcd證書的位置(我沒修改,就用的默認值,不知道為啥也可以);修改etcd的連接地址(與api-server中配置/opt/kubernetes/cfg/kube-apiserver.conf中相同)

# vim calico-etcd.yaml
...
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: calico-etcd-secrets
  namespace: kube-system
data:
  # Populate the following with etcd TLS configuration if desired, but leave blank if
  # not using TLS for etcd.
  # The keys below should be uncommented and the values populated with the base64
  # encoded contents of each file that would be associated with the TLS data.
  # Example command for encoding a file contents: cat <file> | base64 -w 0
  etcd-key: 填寫上面的加密字符串
  etcd-cert: 填寫上面的加密字符串
  etcd-ca: 填寫上面的加密字符串
...
kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  # Configure this with the location of your etcd cluster.
  etcd_endpoints: "https://192.168.1.2:2379"
  # If you're using TLS enabled etcd uncomment the following.
  # You must also populate the Secret below with these files.
  etcd_ca: "/calico-secrets/etcd-ca"       #這三個值不需要修改
  etcd_cert: "/calico-secrets/etcd-cert"   #這三個值不需要修改
  etcd_key: "/calico-secrets/etcd-key"     #這三個值不需要修改

重新創建Calico相關資源

kubectl delete -f calico.yaml
kubectl create -f calico-etcd.yaml

修改/root/.bashrc,添加如下一行

alias etcdctl='ETCDCTL_API=3 etcdctl --endpoints https://192.168.1.2:2379 --cacert /opt/etcd/ssl/ca.pem --key /opt/etcd/ssl/server-key.pem --cert /opt/etcd/ssl/server.pem'
並執行命令source ~/.bashrc

驗證calico數據是否在Etcd上

創建一個pod,並在etcd上查找給pod,可以看到的以/calico/resources/v3/打頭的pod信息。

安裝calicoctl工具

(參考https://docs.projectcalico.org/getting-started/clis/calicoctl/install)
(1) curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.16.1/calicoctl 放到/usr/local/bin/目錄下,chmod +x
(2) 配置Etcd: cat << EOF > /etc/calico/calicoctl.cfg
apiVersion: projectcalico.org/v3
kind: CalicoAPIConfig
metadata:
spec:
datastoreType: "kubernetes"
kubeconfig: "/root/.kube/config"
EOF

在非集群節點以容器的形式安裝calico-node

  • 創建/etc/calico/calico.env 配置文件
# cat /etc/calico/calico.env 
CALICO_NODENAME=""
CALICO_K8S_NODE_REF="192-168-1-210"
CALICO_IPV4POOL_IPIP="Always" 
CALICO_IP="" 
CALICO_IP6=""
CALICO_NETWORKING_BACKEND="bird"
DATASTORE_TYPE="etcdv3"
ETCD_ENDPOINTS="https://xxx1:2379,https://xxx2:2379,https:/xxx3:2379"
ETCD_CA_CERT_FILE="/etc/calico/pki/etcd-ca"
ETCD_CERT_FILE="/etc/calico/pki/etcd-cert"
ETCD_KEY_FILE="/etc/calico/pki/etcd-key"
KUBERNETES_SERVICE_HOST="192.168.1.130"
KUBERNETES_SERVICE_PORT="6443"
KUBECONFIG="/etc/calico/config"
WAIT_FOR_DATASTORE="true"
BGP_LOGSEVERITYSCREEN="info"

  • 創建calico-node守護進程配置文件
# cat lib/systemd/system/calico-node.service 
[Unit]
Description=calico-node
After=docker.service
Requires=docker.service

[Service]
EnvironmentFile=/etc/calico/calico.env
ExecStartPre=-/usr/bin/docker rm -f calico-node
ExecStart=/usr/bin/docker run --net=host --privileged \
 --name=calico-node \
 -e NODENAME=${CALICO_NODENAME} \
 -e IP=${CALICO_IP} \
 -e IP6=${CALICO_IP6} \
 -e CALICO_NETWORKING_BACKEND=${CALICO_NETWORKING_BACKEND} \
 -e AS=${CALICO_AS} \
 -e CALICO_IPV4POOL_IPIP=${CALICO_IPV4POOL_IPIP} \
 -e DATASTORE_TYPE=${DATASTORE_TYPE} \
 -e ETCD_ENDPOINTS=${ETCD_ENDPOINTS} \
 -e ETCD_CA_CERT_FILE=${ETCD_CA_CERT_FILE} \
 -e ETCD_CERT_FILE=${ETCD_CERT_FILE} \
 -e ETCD_KEY_FILE=${ETCD_KEY_FILE} \
 -e KUBERNETES_SERVICE_HOST=${KUBERNETES_SERVICE_HOST} \
 -e KUBERNETES_SERVICE_PORT=${KUBERNETES_SERVICE_PORT} \
 -e KUBECONFIG=${KUBECONFIG} \
 -e WAIT_FOR_DATASTORE=${WAIT_FOR_DATASTORE} \
 -e BGP_LOGSEVERITYSCREEN=${BGP_LOGSEVERITYSCREEN} \
 -v /var/log/calico:/var/log/calico \
 -v /run/docker/plugins:/run/docker/plugins \
 -v /lib/modules:/lib/modules \
 -v /var/run/calico:/var/run/calico \
 -v /etc/calico:/etc/calico \
 -v /var/lib/calico:/var/lib/calico \
 calico/node:v3.16.5

ExecStop=-/usr/bin/docker stop calico-node

Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s

[Install]
WantedBy=multi-user.target

然后systemctl daemon-reload, systemctl enable calico-node, systemctl start calico-node

其他常見問題:

  1. 新加節點后需不需要額外部署calico?
    答:不需要,因為這里calico的部署方式是daemonset,當有新節點加進來時,會啟動一個calico-node pod.
  2. calico-node一直處於Init狀態,是怎么回事?
    答:應該是連接不了外網,導致鏡像下載不了。
  3. 如果是多節點集群,多數節點的calico-node都是ready的,只有個別calico-node Ready的個數為“0/1”,再用命令calicoctl node status查看節點間連接建立的情況,如果連接沒有建立,查看各個需要建立連接網卡的名稱是否一致,如果不一致需要改成一致。當然這些網卡的IP地址需要在同一個網段。如果是多個網卡,可以指定網卡,如下所示:
            # IP automatic detection
            - name: IP_AUTODETECTION_METHOD
              value: "interface=eth2"

另外,還有一種指定IP地址段的方法更加方便,而且一般集群中的業務網卡都在同一個網段。如下設置:

IP_AUTODETECTION_METHOD=cidr=10.0.1.0/24,10.0.2.0/24
IP6_AUTODETECTION_METHOD=cidr=2001:4860::0/64
  1. 如果calico-node容器顯示Init:CrashLoopBackOff,說明初始化失敗,那么使用命令kubectl describe 該容器,查看是initContainer中哪個步驟出錯了。如下圖所示是kubectl describe 初始化失敗的容器的InitContainers的部分顯示
Init Containers:
  upgrade-ipam:
    Container ID:  docker://caaa485d0880c1cb022873c5017ec60ba1970ed8dc897a0b458fa6bb4b6b4179
    Image:         192.168.3.224:5000/library/calico/cni:v3.18.1
    Image ID:      docker-pullable://192.168.3.224:5000/library/calico/cni@sha256:bc6507d4c122b69609fed5839d899569a80b709358836dd9cd1052470dfdd47a
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 27 Oct 2021 16:16:15 +0800
      Finished:     Wed, 27 Oct 2021 16:16:15 +0800
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-vz89c (ro)
  install-cni:
    Container ID:  docker://87bfc3cea39247d10796148bf88f94a552d327bf3038f87f4e981feb02393cb8
    Image:         192.168.3.224:5000/library/calico/cni:v3.18.1
    Image ID:      docker-pullable://192.168.3.224:5000/library/calico/cni@sha256:bc6507d4c122b69609fed5839d899569a80b709358836dd9cd1052470dfdd47a
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 27 Oct 2021 16:16:16 +0800
      Finished:     Wed, 27 Oct 2021 16:16:17 +0800
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      CNI_MTU:               <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-vz89c (ro)
  flexvol-driver:
    Container ID:   docker://d095ec7e2ca15c90e234b207890caec380e0ae1491556e4b61f58e0db0e0df00
    Image:          192.168.3.224:5000/library/calico/pod2daemon-flexvol:v3.18.1
    Image ID:       docker-pullable://192.168.3.224:5000/library/calico/pod2daemon-flexvol@sha256:4ac1844531e0592b2c609a0b0d2e8f740f4c66c7e27c7e5dda994dec98d7fb28
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 27 Oct 2021 16:16:18 +0800
      Finished:     Wed, 27 Oct 2021 16:16:18 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /host/driver from flexvol-driver-host (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-vz89c (ro)
Containers:

可以看出容器初始化時用了三個容器,分別是upgrade-ipam、install-cni和flexvol-driver.其中upgrade-ipam容器的作用是查看是否有/var/lib/cni/networks/k8s-pod-network數據,如果有就把本地的ipam數據遷移到Calico-ipam;install-cni是cni-plugin項目里編譯出來的一個二進制文件,用來拷貝二進制文件到各個主機的/opt/cni/bin下面的,並生成calico配置文件拷貝到/etc/cni/net.d下面;flexvol-driver使用的鏡像是pod2daemon-flexvol,它的作用是 Adds a Flex Volume Driver that creates a per-pod Unix Domain Socket to allow Dikastes to communicate with Felix over the Policy Sync API.如果容器初試化錯誤,查看calico-node看不出問題,但可以通過查看這三個容器的log來分析,例如查看第二容器初始化install-cni的log:

kubectl logs -n kube-system   calico-node-123xx -c install-cni

有時這個三個步驟有個別步驟出現系統級別無法排除的錯誤,在不影響功能的前提下可以刪除這個容器初始化的部分yaml代碼,或者手動完成初始化的功能,那么容器就能起來了。

參考文檔


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM