折騰fluend-elasticsearch日志,折騰出一大堆問題,解決這些問題過程中,感覺又了解了不少.
1.如何刪除不一致狀態下的rc,deployment,service.
在某些情況下,經常發現kubectl進程掛起現象,然后在get時候發現刪了一半,而另外的刪除不了
[root@k8s-master ~]# kubectl get -f fluentd-elasticsearch/ NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE |
刪除這些deployment,service或者rc命令如下:
kubectl delete deployment kibana-logging -n kube-system --cascade=false kubectl delete deployment kibana-logging -n kube-system --ignore-not-found delete rc elasticsearch-logging-v1 -n kube-system --force now --grace-period=0 |
2.刪除不了后如何重置etcd
rm -rf /var/lib/etcd/* |
刪除后重新reboot master結點.
reset etcd后需要重新設置網絡
etcdctl mk /atomic.io/network/config '{ "Network": "192.168.0.0/16" }'
3.啟動apiserver失敗
每次啟動都是報
start request repeated too quickly for kube-apiserver.service
但其實不是啟動頻率問題,需要查看,/var/log/messages,在我的情況中是因為開啟ServiceAccount后找不到ca.crt等文件,導致啟動出錯
May 21 07:56:41 k8s-master kube-apiserver: Flag --port has been deprecated, see --insecure-port instead.
May 21 07:56:41 k8s-master kube-apiserver: F0521 07:56:41.692480 4299 universal_validation.go:104] Validate server run options failed: unable to load client CA file: open /var/run/kubernetes/ca.crt: no such file or directory
May 21 07:56:41 k8s-master systemd: kube-apiserver.service: main process exited, code=exited, status=255/n/a
May 21 07:56:41 k8s-master systemd: Failed to start Kubernetes API Server.
May 21 07:56:41 k8s-master systemd: Unit kube-apiserver.service entered failed state.
May 21 07:56:41 k8s-master systemd: kube-apiserver.service failed.
May 21 07:56:41 k8s-master systemd: kube-apiserver.service holdoff time over, scheduling restart.
May 21 07:56:41 k8s-master systemd: start request repeated too quickly for kube-apiserver.service
May 21 07:56:41 k8s-master systemd: Failed to start Kubernetes API Server.
在部署fluentd等日志組件的時候,很多問題都是因為需要開啟ServiceAccount選項需要配置安全導致,所以說到底還是需要配置好ServiceAccount.
4.出現Permission denied情況
在配置fluentd時候出現cannot create /var/log/fluentd.log: Permission denied錯誤,這是因為沒有關掉SElinux安全導致.
可以在/etc/selinux/config中將SELINUX=enforcing設置成disabled,然后reboot
5.基於ServiceAccount的配置
首先生成各種需要的keys,k8s-master需替換成master的主機名.
openssl genrsa -out ca.key 2048 echo subjectAltName=IP:10.254.0.1 > extfile.cnf #ip由下述命令決定 #kubectl get services --all-namespaces |grep 'default'|grep 'kubernetes'|grep '443'|awk '{print $3}'
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -extfile extfile.cnf -out server.crt -days 10000 |
如果修改/etc/kubernetes/apiserver的配置文件參數的話,通過systemctl start kube-apiserver啟動失敗,出錯信息為:
Validate server run options failed: unable to load client CA file: open /root/keys/ca.crt: permission denied
但可以通過命令行啟動API Server
/usr/bin/kube-apiserver --logtostderr=true --v=0 --etcd-servers=http://k8s-master:2379 --address=0.0.0.0 --port=8080 --kubelet-port=10250 --allow-privileged=true --service-cluster-ip-range=10.254.0.0/16 --admission-control=ServiceAccount --insecure-bind-address=0.0.0.0 --client-ca-file=/root/keys/ca.crt --tls-cert-file=/root/keys/server.crt --tls-private-key-file=/root/keys/server.key --basic-auth-file=/root/keys/basic_auth.csv --secure-port=443
&>> /var/log/kubernetes/kube-apiserver.log & |
命令行啟動Controller-manager
/usr/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://k8s-master:8080 --root-ca-file=/root/keys/ca.crt --service-account-private-key-file=/root/keys/server.key
& >>/var/log/kubernetes/kube-controller-manage.log |
6.ETCD啟動不起來
etcd是kubernetes集群的zookeeper進程,幾乎所有的service都依賴於etcd的啟動,比如flanneld,apiserver,docker.....
在啟動etcd是報錯日志如下
May 24 13:39:09 k8s-master systemd: Stopped Flanneld overlay address etcd agent. May 24 13:39:28 k8s-master systemd: Starting Etcd Server... May 24 13:39:28 k8s-master etcd: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://etcd:2379,http://etcd:4001 May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_NAME, but unused: shadowed by corresponding flag May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corresponding flag May 24 13:39:28 k8s-master etcd: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadowed by corresponding flag May 24 13:39:28 k8s-master etcd: etcd Version: 3.1.3 May 24 13:39:28 k8s-master etcd: Git SHA: 21fdcc6 May 24 13:39:28 k8s-master etcd: Go Version: go1.7.4 May 24 13:39:28 k8s-master etcd: Go OS/Arch: linux/amd64 May 24 13:39:28 k8s-master etcd: setting maximum number of CPUs to 1, total number of available CPUs is 1 May 24 13:39:28 k8s-master etcd: the server is already initialized as member before, starting as etcd member... May 24 13:39:28 k8s-master etcd: listening for peers on http://localhost:2380 May 24 13:39:28 k8s-master etcd: listening for client requests on 0.0.0.0:2379 May 24 13:39:28 k8s-master etcd: listening for client requests on 0.0.0.0:4001 May 24 13:39:28 k8s-master etcd: recovered store from snapshot at index 140014 May 24 13:39:28 k8s-master etcd: name = master May 24 13:39:28 k8s-master etcd: data dir = /var/lib/etcd/default.etcd May 24 13:39:28 k8s-master etcd: member dir = /var/lib/etcd/default.etcd/member May 24 13:39:28 k8s-master etcd: heartbeat = 100ms May 24 13:39:28 k8s-master etcd: election = 1000ms May 24 13:39:28 k8s-master etcd: snapshot count = 10000 May 24 13:39:28 k8s-master etcd: advertise client URLs = http://etcd:2379,http://etcd:4001 May 24 13:39:28 k8s-master etcd: ignored file 0000000000000001-0000000000012700.wal.broken in wal May 24 13:39:29 k8s-master etcd: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 148905 May 24 13:39:29 k8s-master etcd: 8e9e05c52164694d became follower at term 12 May 24 13:39:29 k8s-master etcd: newRaft 8e9e05c52164694d [peers: [8e9e05c52164694d], term: 12, commit: 148905, applied: 140014, lastindex: 148905, lastterm: 12] May 24 13:39:29 k8s-master etcd: enabled capabilities for version 3.1 May 24 13:39:29 k8s-master etcd: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32 from store May 24 13:39:29 k8s-master etcd: set the cluster version to 3.1 from store May 24 13:39:29 k8s-master etcd: starting server... [version: 3.1.3, cluster version: 3.1] May 24 13:39:29 k8s-master etcd: raft save state and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory May 24 13:39:29 k8s-master systemd: etcd.service: main process exited, code=exited, status=1/FAILURE May 24 13:39:29 k8s-master systemd: Failed to start Etcd Server. May 24 13:39:29 k8s-master systemd: Unit etcd.service entered failed state. May 24 13:39:29 k8s-master systemd: etcd.service failed. May 24 13:39:29 k8s-master systemd: etcd.service holdoff time over, scheduling restart.
核心語句
raft save state and entries error: open /var/lib/etcd/default.etcd/member/wal/0.tmp: is a directory
進入相關目錄,刪除0.tmp,然后就可以啟動啦!
7.CentOS下配置主機互信
- 在每台服務器需要建立主機互信的用戶名執行以下命令生成公鑰/密鑰,默認回車即可
ssh-keygen -t rsa
可以看到生成個公鑰的文件
- 互傳公鑰,第一次需要輸入密碼,之后就OK了
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.199.132 (-p 2222)
-p 端口 默認端口不加-p,如果更改過端口,就得加上-p
可以看到是在.ssh/下生成了個authorized_keys的文件,記錄了能登陸這台服務器的其他服務器的公鑰
- 測試看是否能登陸
ssh 192.168.199.132 (-p 2222)
8.CentOS主機名的修改
hostnamectl set-hostname k8s-master1
9.Virtualbox實現CentOS復制和粘貼功能
如果不安裝或者不輸出,可以將update修改成install再運行
yum install update yum update kernel yum update kernel-devel yum install kernel-headers yum install gcc yum install gcc make
運行完后sh VBoxLinuxAdditions.run
10. 刪除Pod一直處於Terminating狀態
可以通過下面命令強制刪除
kubectl delete pod NAME --grace-period=0 --force