(kubernetes部署成功,但Dashboard(界面管理)的時候總不成功,建議安裝行, 的1.16.3)
一. 為什么是k8s v1.16.0?
最新版的v1.16.2試過了,一直無法安裝完成,安裝到kubeadm init那一步執行后,報了很多錯,如:node xxx not found等。centos7都重裝了幾次,還是無法解決。用了一天都沒安裝完,差點放棄。后來在網上搜到的安裝教程基本都是v1.16.0的,我不太相信是v1.16.2的坑所以先前沒打算降級到v1.16.0。沒辦法了就試着安裝v1.16.0版本,竟然成功了。記錄在此,避免后來者踩坑。
本篇文章,安裝大步驟如下:
- 安裝docker-ce 18.09.9(所有機器)
- 設置k8s環境前置條件(所有機器)
- 安裝k8s v1.16.0 master管理節點
- 安裝k8s v1.16.0 node工作節點
- 安裝flannel(master)
這里有重要的一步,請記住自己master和node之間通信的ip,如我的master的ip為192.168.237.143,node的ip為:192.168.237.144 請確保使用這兩個ip在master和node上能互相ping通,這個master的ip 192.168.237.143接下來配置k8s的時候需要用到。
我的環境:
- 操作系統:win10
- 虛擬機:virtual box
- linux發行版:CentOS7
- linux內核(使用uname -r查看):3.10.0-957.el7.x86_64
- master和node節點通信的ip(master):192.168.99.104
- 修改host 主 hostnamectl --static set-hostname k8s-master
- 修改host 節點 hostnamectl --static set-hostname k8s-node1
- vi /etc/hosts
- 添加
- 192.168.237.143 k8s-master
- 192.168.237.144 k8s-node1
- 127.0.0.1 k8s-master
二. 安裝docker-ce 18.09.9(所有機器)
所有安裝k8s的機器都需要安裝docker,命令如下:
# 安裝docker所需的工具
yum install -y yum-utils device-mapper-persistent-data lvm2
# 配置阿里雲的docker源
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 指定安裝這個版本的docker-ce
yum install -y docker-ce-18.09.9-3.el7
# 啟動docker
systemctl enable docker && systemctl start docker
三. 設置k8s環境准備條件(所有機器)
安裝k8s的機器需要2個CPU和2g內存以上,這個簡單,在虛擬機里面配置一下就可以了。然后執行以下腳本做一些准備操作。所有安裝k8s的機器都需要這一步操作。
# 關閉防火牆
systemctl disable firewalld
systemctl stop firewalld
# 關閉selinux
# 臨時禁用selinux
setenforce 0
# 永久關閉 修改/etc/sysconfig/selinux文件設置
sed -i 's/SELINUX=permissive/SELINUX=disabled/' /etc/sysconfig/selinux
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
# 禁用交換分區
swapoff -a
# 永久禁用,打開/etc/fstab注釋掉swap那一行。
sed -i 's/.*swap.*/#&/' /etc/fstab
# 修改內核參數
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
四. 安裝k8s v1.16.0 master管理節點
如果還沒安裝docker,請參照本文步驟二安裝docker-ce 18.09.9(所有機器)安裝。
如果沒設置k8s環境准備條件,請參照本文步驟三設置k8s環境准備條件(所有機器)執行。
以上兩個步驟檢查完畢之后,繼續以下步驟。
1. 安裝kubeadm、kubelet、kubectl
kubeadm —— 啟動 k8s 集群的命令工具
kubelet —— 集群容器內的命令工具
kubectl —— 操作集群的命令工具
由於官方k8s源在google,國內無法訪問,這里使用阿里雲yum源
# 執行配置k8s阿里雲源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
# 安裝kubeadm、kubectl、kubelet
yum install -y kubectl-1.16.0-0 kubeadm-1.16.0-0 kubelet-1.16.0-0
# 啟動kubelet服務
systemctl enable kubelet && systemctl start kubelet
2. 初始化k8s
以下這個命令開始安裝k8s需要用到的docker鏡像,因為無法訪問到國外網站,所以這條命令使用的是國內的阿里雲的源(registry.aliyuncs.com/google_containers)。另一個非常重要的是:這里的--apiserver-advertise-address使用的是master和node間能互相ping通的ip,我這里是192.168.99.104,剛開始在這里被坑了一個晚上,你請自己修改下ip執行。 這條命令執行時會卡在[preflight] You can also perform this action in beforehand using ''kubeadm config images pull,大概需要2分鍾,請耐心等待。
# 下載管理節點中用到的6個docker鏡像,你可以使用docker images查看到
# 這里需要大概兩分鍾等待,會卡在[preflight] You can also perform this action in beforehand using ''kubeadm config images pull
kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.16.0 --apiserver-advertise-address 192.168.237.143 --pod-network-cidr=10.244.0.0/16 --token-ttl 0
上面安裝完后,會提示你輸入如下命令,復制粘貼過來,執行即可。
# 上面安裝完成后,k8s會提示你輸入如下命令,執行
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
3. 記住node加入集群的命令
上面kubeadm init執行成功后會返回給你node節點加入集群的命令,等會要在node節點上執行,需要保存下來,如果忘記了,可以使用如下命令獲取。
kubeadm token create --print-join-command
以上,安裝master節點完畢。可以使用kubectl get nodes查看一下,此時master處於NotReady狀態,暫時不用管。
五. 安裝k8s v1.16.0 node工作節點
如果還沒安裝docker,請參照本文步驟二安裝docker-ce 18.09.9(所有機器)安裝。
如果沒設置k8s環境准備條件,請參照本文步驟三設置k8s環境准備條件(所有機器)執行。
以上兩個步驟檢查完畢之后,繼續以下步驟。
1. 安裝kubeadm、kubelet
# 執行配置k8s阿里雲源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
# 安裝kubeadm、kubectl、kubelet
yum install -y kubeadm-1.16.0-0 kubelet-1.16.0-0
# 啟動kubelet服務
systemctl enable kubelet && systemctl start kubelet
2. 加入集群
這里加入集群的命令每個人都不一樣,可以登錄master節點,使用kubeadm token create --print-join-command 來獲取。獲取后執行如下。
# 加入集群,如果這里不知道加入集群的命令,可以登錄master節點,使用kubeadm token create --print-join-command 來獲取
kubeadm join 192.168.99.104:6443 --token ncfrid.7ap0xiseuf97gikl \
--discovery-token-ca-cert-hash sha256:47783e9851a1a517647f1986225f104e81dbfd8fb256ae55ef6d68ce9334c6a2
加入成功后,可以在master節點上使用kubectl get nodes命令查看到加入的節點。
- 刪除節點
首先釋放 bogon 節點資源
kubectl drain bogon --delete-local-data --force --ignore-daemonsets
刪除 bogon 節點
kubectl delete node bogon
查看節點
kubectl get nodes
No resources found.
六. 安裝flannel(master機器)
以上步驟安裝完后,機器搭建起來了,但狀態還是NotReady狀態,如下圖,master機器需要安裝flanneld。
1. 下載官方fannel配置文件
使用wget命令,地址為:(https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml),這個地址國內訪問不了,所以我把內容復制下來,為了避免前面文章過長,我把它粘貼到文章末尾第八個步驟附錄了。這個yml配置文件中配置了一個國內無法訪問的地址(quay.io),我已經將其改為國內可以訪問的地址(quay-mirror.qiniu.com)。我們新建一個kube-flannel.yml文件,復制粘貼該內容即可。
2. 安裝fannel
kubectl apply -f kube-flannel.yml
七. 大功告成
至此,k8s集群搭建完成,如下圖節點已為Ready狀態,大功告成,完結撒花。
八. 附錄
這是kube-flannel.yml文件的內容,已經將無法訪問的地址(quay.io)全部改為國內可以訪問的地址(quay-mirror.qiniu.com)。我們新建一個kube-flannel.yml文件,復制粘貼該內容即可。
---apiVersion: policy/v1beta1kind: PodSecurityPolicymetadata:
name: psp.flannel.unprivileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/defaultspec:
privileged: false
volumes:
- configMap
- secret
- emptyDir
- hostPath
allowedHostPaths:
- pathPrefix: "/etc/cni/net.d"
- pathPrefix: "/etc/kube-flannel"
- pathPrefix: "/run/flannel"
readOnlyRootFilesystem: false
# Users and groups
runAsUser:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
fsGroup:
rule: RunAsAny
# Privilege Escalation
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
# Capabilities
allowedCapabilities: ['NET_ADMIN']
defaultAddCapabilities: []
requiredDropCapabilities: []
# Host namespaces
hostPID: false
hostIPC: false
hostNetwork: true
hostPorts:
- min: 0
max: 65535
# SELinux
seLinux:
# SELinux is unused in CaaSP
rule: 'RunAsAny'---kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1beta1metadata:
name: flannelrules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch---kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1beta1metadata:
name: flannelroleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannelsubjects:- kind: ServiceAccount
name: flannel
namespace: kube-system---apiVersion: v1kind: ServiceAccountmetadata:
name: flannel
namespace: kube-system---kind: ConfigMapapiVersion: v1metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flanneldata:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}---apiVersion: apps/v1kind: DaemonSetmetadata:
name: kube-flannel-ds-amd64
namespace: kube-system
labels:
tier: node
app: flannelspec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- amd64
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg---apiVersion: apps/v1kind: DaemonSetmetadata:
name: kube-flannel-ds-arm64
namespace: kube-system
labels:
tier: node
app: flannelspec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- arm64
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-arm64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-arm64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg---apiVersion: apps/v1kind: DaemonSetmetadata:
name: kube-flannel-ds-arm
namespace: kube-system
labels:
tier: node
app: flannelspec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- arm
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-arm
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-arm
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg---apiVersion: apps/v1kind: DaemonSetmetadata:
name: kube-flannel-ds-ppc64le
namespace: kube-system
labels:
tier: node
app: flannelspec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- ppc64le
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-ppc64le
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-ppc64le
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg---apiVersion: apps/v1kind: DaemonSetmetadata:
name: kube-flannel-ds-s390x
namespace: kube-system
labels:
tier: node
app: flannelspec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- s390x
hostNetwork: true
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-s390x
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay-mirror.qiniu.com/coreos/flannel:v0.11.0-s390x
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
九. k8s 集群部署問題整理
對kubernetes感興趣的可以加群885763297,一起玩轉kubernetes
1、hostname “master” could not be reached
在host中沒有加解析
2、curl -sSL http://localhost:10248/healthz
curl: (7) Failed connect to localhost:10248; 拒絕連接 在host中沒有localhost的解析
3、Error starting daemon: SELinux is not supported with the overlay2 graph driver on this kernel. Either boot into a newer kernel or…abled=false)
vim /etc/ssconfig/docker --selinux-enabled=False
4、bridge-nf-call-iptables 固化的問題:
#下面的是關於bridge的配置: net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 1 #意味着二層的網絡在轉發包的時候會被iptables的forward規則過濾 net.bridge.bridge-nf-call-arptables = 0
5、The connection to the server localhost:8080 was refused - did you specify the right host or port?
unable to recognize "kube-flannel.yml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused 下面如果在root用戶下執行的,就不會報錯 mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
###6、error: unable to recognize “mycronjob.yml”: no matches for kind “CronJob” in version “batch/v2alpha1”
去kube-apiserver.yaml文件中添加: - --runtime-config=batch/v2alpha1=true,然后重啟kubelet服務,就可以了
7、Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Unable to update cni config: No networks found in /etc/cni/net.d Failed to get system container stats for “/system.slice/kubelet.service”: failed to get cgroup stats for “/system.slice/kubelet.service”: failed to get container info for “/system.slice/kubelet.service”: unknown container “/system.slice/kubelet.service”
docker pull quay.io/coreos/flannel:v0.10.0-amd64
mkdir -p /etc/cni/net.d/
cat <<EOF> /etc/cni/net.d/10-flannel.conf
{"name":"cbr0","type":"flannel","delegate": {"isDefaultGateway": true}}
EOF
mkdir /usr/share/oci-umount/oci-umount.d -p
mkdir /run/flannel/
cat <<EOF> /run/flannel/subnet.env
FLANNEL_NETWORK=172.100.0.0/16
FLANNEL_SUBNET=172.100.1.0/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
EOF
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
8、Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “kubernetes”)
export KUBECONFIG=/etc/kubernetes/kubelet.conf
9、Failed to get system container stats for “/system.slice/docker.service”: failed to get cgroup stats for “/system.slice/docker.service”: failed to get container info for “/system.slice/docker.service”: unknown container “/system.slice/docker.service”
vim /etc/sysconfig/kubelet --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice systemctl restart kubelet
大概意思是Flag --cgroup-driver --kubelet-cgroups 驅動已經被禁用,這個參數應該通過kubelet 的配置指定配置文件來配置
10、The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz’ failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false"
11、failed to run Kubelet: failed to create kubelet: miscon figuration: kubelet cgroup driver: “systemd” is different from docker cgroup driver: “cgroupfs”
kubelet: Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd" docker: vi /lib/systemd/system/docker.service -exec-opt native.cgroupdriver=systemd
12、[ERROR CRI]: unable to check if the container runtime at “/var/run/dockershim.sock” is running: exit status 1
rm -f /usr/bin/crictl
13、 Warning FailedScheduling 2s (x7 over 33s) default-scheduler 0/4 nodes are available: 4 node(s) didn’t match node selector.
如果指定的label在所有node上都無法匹配,則創建Pod失敗,會提示無法調度:
14、kubeadm 生成的token過期后,集群增加節點
kubeadm token create
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null |
openssl dgst -sha256 -hex | sed 's/^.* //'
kubeadm join --token aa78f6.8b4cafc8ed26c34f --discovery-token-ca-cert-hash sha256:0fd95a9bc67a7bf0ef42da968a0d55d92e52898ec37c971bd77ee501d845b538 172.16.6.79:6443 --skip-preflight-checks
15、### systemctl status kubelet告警
cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
May 29 06:30:28 fnode kubelet[4136]: E0529 06:30:28.935309 4136 kubelet.go:2130] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
刪除 /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 的 KUBELET_NETWORK_ARGS,然后重啟kubelet服務 臨時解決。沒啥用
根本原因是缺少: k8s.gcr.io/pause-amd64:3.1
16 刪除flannel網絡:
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig del flannel.1
ifconfig del cni0
ip link del flannel.1
ip link del cni0
yum install bridge-utils
brctl delbr flannel.1
brctl delbr cni0
rm -rf /var/lib/cni/flannel/* && rm -rf /var/lib/cni/networks/cbr0/* && ip link delete cni0 && rm -rf /var/lib/cni/network/cni0/*
17、E0906 15:10:55.415662 1 leaderelection.go:234] error retrieving resource lock default/ceph.com-rbd: endpoints “ceph.com-rbd” is forbidden: User “system:serviceaccount:default:rbd-provisioner” cannot get endpoints in the namespace “default”
`在 添加下面的這一段 (會重新申請資源) kubectl apply -f ceph/rbd/deploy/rbac/clusterrole.yaml
apiGroups: [""]
resources: [“endpoints”]
verbs: [“get”, “list”, “watch”, “create”, “update”, “patch”]`
18、flannel指定網卡設備:
- --iface=eth0
21、 Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container “957541888b8a0e5b9ad65da932f688eb02cc182808e10d1a89a6e8db2132c253” network for pod “coredns-7655b945bc-6hgj9”: NetworkPlugin cni failed to set up pod “coredns-7655b945bc-6hgj9_kube-system” network: failed to find plugin “loopback” in path [/opt/cni/bin], failed to clean up sandbox container “957541888b8a0e5b9ad65da932f688eb02cc182808e10d1a89a6e8db2132c253” network for pod “coredns-7655b945bc-6hgj9”: NetworkPlugin cni failed to teardown pod “coredns-7655b945bc-6hgj9_kube-system” network: failed to find plugin “portmap” in path [/opt/cni/bin]]
https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#coredns-pods-have-crashloopbackoff-or-error-state
如果您的網絡提供商不支持portmap CNI插件,您可能需要使用服務的NodePort功能或使用HostNetwork=true。
22、問題:kubelet設置了system-reserved(800m)、kube-reserved(500m)、eviction-hard(800),其實集群實際可用的內存是總內存-800m-800m-500m ,但是發現還 是會觸發系統級別kill進程,
排查:使用top查看前幾名的內存使用情況,發現etcd服務使用了內存達到500M以上,kubelet使用內存200m,ceph使用內存總和是200多m,加起來就已經900m了,這些都是k8s之外的系統開銷,已經完全超出了系統預留內存,因此可能會觸發系統級別的kill,
23、如何訪問api-server?
使用kubectl proxy功能
24、使用svc的endpoint代理集群外部服務,經常出現endpoint丟失的問題
解決:去掉service.spec.selecter 標簽就好了。
25、集群雪崩的一次問題處理,node節點偶爾出現noreading狀態,
排查:此node節點上cpu使用率過高。
1、沒有觸發node節點上的cpuPressure的狀態,判斷出來不是k8s所管理的cpu占用過高的問題,應該是system、kube組件預留的cpu高導致的。
2、查看cpu和mem的cgroup分組,發現kubelet,都在system.sliec下面,因此判斷kube預留資源沒有生效導致的。
3、
--enforce-node-allocatable=pods,kube-reserved,system-reserved #采用硬限制,超出限制就oom
--system-reserved-cgroup=/system.slice #指定系統reserved-cgroup對那些cgroup限制。
--kube-reserved-cgroup=/system.slice/kubelet.service #指定kube-reserved-cgroup對那些服務的cgroup進行限制
--system-reserved=memory=1Gi,cpu=500m
--kube-reserved=memory=500Mi,cpu=500m,ephemeral-storage=10Gi
26、[etcd] Checking Etcd cluster health
etcd cluster is not healthy: context deadline exceeded
————————————————
十. k8s 部署問題解決
從snap安裝導致的初始化問題
由於一開始我安裝的時候沒有配置好鏡像源,所以導致了apt下載 k8s 三件套時出現了找不到對應包的問題,再加上 ubuntu 又提示了一下 try sudo snap isntall kubelet ... 所以我就用snap安裝了三件套,使用的安裝命令如下:
snap install kubelet --classic
snap install kubeadm --classic
snap install kubectl --classic
雖然我在網上也找到了不少用snap成功部署的例子,但是迫於技術不精,最終實在是無法解決出現的問題,換用了apt安裝之后就一帆風順的安裝完成了。下面記錄一下用snap安裝時出現的問題:
kubelet isn't running or healthy
使用kubeadm init初始化時出現了下述錯誤,重復四次之后就超時退出了:
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.[kubelet-check] It seems like the kubelet isn't running or healthy.
官方給出的解決方案是使用systemctl status kubelet查看一下kubelet的狀態。但是我運行之后顯示未找到kubelet.service,然后用如下命令查看啟動失敗的服務:
systemctl list-units --failed
結果發現一個名為snap.kubelet.daemon.service的服務無法啟動了,嘗試了各種方法都沒有讓它復活,無奈只好放棄用snap安裝了。如果有大佬知道該怎么解決請告訴我,不勝感激。下面就說一下遇到的其他問題。
初始化時的警告
在使用kubeadm init命令初始化節點剛開始時,會有如下的perflight階段,該階段會進行檢查,如果其中出現了如下WARNING並且初始化失敗了。就要回來具體查看一下問題了。下面會對下述兩個警告進行解決:
# kubeadm init ...[init] Using Kubernetes version: v1.15.0[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING FileExisting-socat]: socat not found in system path
WARNING IsDockerSystemdCheck
修改或創建/etc/docker/daemon.json,加入下述內容:
{
"exec-opts": ["native.cgroupdriver=systemd"]}
重啟docker:
systemctl restart docker
查看修改后的狀態:
docker info | grep Cgroup
WARNING FileExisting-socat
socat是一個網絡工具, k8s 使用它來進行 pod 的數據交互,出現這個問題直接安裝socat即可:
apt-get install socat
節點狀態為 NotReady
使用kubectl get nodes查看已加入的節點時,出現了Status為NotReady的情況。
root@master1:~# kubectl get nodesNAME STATUS ROLES AGE VERSION
master1 NotReady master 152m v1.15.0
worker1 NotReady <none> 94m v1.15.0
這種情況是因為有某些關鍵的 pod 沒有運行起來,首先使用如下命令來看一下kube-system的 pod 狀態:
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bccdc95cf-792px 1/1 Pending 0 3h11m
coredns-bccdc95cf-bc76j 1/1 Pending 0 3h11m
etcd-master1 1/1 Running 2 3h10m
kube-apiserver-master1 1/1 Running 2 3h11m
kube-controller-manager-master1 1/1 Running 2 3h10m
kube-flannel-ds-amd64-9trbq 0/1 ImagePullBackoff 0 133m
kube-flannel-ds-amd64-btt74 0/1 ImagePullBackoff 0 174m
kube-proxy-27zfk 1/1 Pending 2 3h11m
kube-proxy-lx4gk 1/1 Pending 0 133m
kube-scheduler-master1 1/1 Running 2 3h11m
如下,可以看到 pod kube-flannel 的狀態是ImagePullBackoff,意思是鏡像拉取失敗了,所以我們需要手動去拉取這個鏡像。這里可以看到某些 pod 運行了兩個副本是因為我有兩個節點存在了。
你也可以通過kubectl describe pod -n kube-system <服務名>來查看某個服務的詳細情況,如果 pod 存在問題的話,你在使用該命令后在輸出內容的最下面看到一個[Event]條目,如下:
root@master1:~# kubectl describe pod kube-flannel-ds-amd64-9trbq -n kube-system
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 29m kubelet, worker1 Stopping container kube-flannel
Warning FailedCreatePodSandBox 27m (x12 over 29m) kubelet, worker1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "kube-flannel-ds-amd64-9trbq": Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice"
Normal SandboxChanged 19m (x48 over 29m) kubelet, worker1 Pod sandbox changed, it will be killed and re-created.
Normal Pulling 42s kubelet, worker1 Pulling image "quay.io/coreos/flannel:v0.11.0-amd64"
手動拉取鏡像
flannel的鏡像可以使用如下命令拉到,如果你是其他鏡像沒拉到的話,百度一下就可以找到國內的鏡像源地址了,這里記得把最后面的版本號修改成你自己的版本,具體的版本號可以用上面說的kubectl describe命令看到:
docker pull quay-mirror.qiniu.com/coreos/flannel:v0.11.0-amd64
等鏡像拉取完了之后需要把鏡像名改一下,改成 k8s 沒有拉到的那個鏡像名稱,我這里貼的鏡像名和版本和你的不一定一樣,注意修改:
docker tag quay-mirror.qiniu.com/coreos/flannel:v0.11.0-amd64 quay.io/coreos/flannel:v0.11.0-amd64
修改完了之后過幾分鍾 k8s 會自動重試,等一下就可以發現不僅flannel正常了,其他的 pod 狀態也都變成了Running,這時再看 node 狀態就可以發現問題解決了:
root@master1:~# kubectl get nodesNAME STATUS ROLES AGE VERSION
master1 Ready master 3h27m v1.15.0
worker1 Ready <none> 149m v1.15.0
工作節點加入失敗
在子節點執行kubeadm join命令后返回超時錯誤,如下:
root@worker2:~# kubeadm join 192.168.56.11:6443 --token wbryr0.am1n476fgjsno6wa --discovery-token-ca-cert-hash sha256:7640582747efefe7c2d537655e428faa6275dbaff631de37822eb8fd4c054807[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: abort connecting to API servers after timeout of 5m0s
在master節點上執行kubeadm token create --print-join-command重新生成加入命令,並使用輸出的新命令在工作節點上重新執行即可。