kubectl get componentstatus ERROR：HTTP probe failed with statuscode: 503

本文轉載自查看原文 2018-07-03 10:17 6945 Kubernetes

通過kubectl命令可以查看k8s各組件的狀態：

[root@wecloud-test-k8s-1 ~]# kubectl get cs NAME STATUS MESSAGE ERROR controller-manager Healthy ok scheduler Healthy ok etcd-2 Healthy {"health": "true"} etcd-1 Healthy {"health": "true"} etcd-0 Healthy {"health": "true"}

這里分享一個問題的解決方法，我再多次執行查看狀態的時候發現etcd的狀態總是會有部分節點出現Unhealthy的狀態。

[root@wecloud-test-k8s-1 ~]# kubectl get componentstatuses NAME STATUS MESSAGE ERROR controller-manager Healthy ok scheduler Healthy ok etcd-0 Healthy {"health": "true"} etcd-2 Healthy {"health": "true"} etcd-1 Unhealthy HTTP probe failed with statuscode: 503 [root@wecloud-test-k8s-1 ~]# kubectl get componentstatuses NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health": "true"} etcd-2 Unhealthy HTTP probe failed with statuscode: 503 etcd-1 Unhealthy HTTP probe failed with statuscode: 503

現象是etcd的監控狀態非常不穩定，查看日志發現etcd服務的各節點之間的心跳檢測出現了問題：

root@zhangchi-ThinkPad-T450s:~# ssh 192.168.99.189 [root@wecloud-test-k8s-2 ~]# systemctl status etcd ● etcd.service - Etcd Server Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled) Active: active (running) since 一 2018-04-09 22:56:31 CST; 1 day 10h ago Docs: https://github.com/coreos Main PID: 17478 (etcd) CGroup: /system.slice/etcd.service └─17478 /usr/local/bin/etcd --name infra1 --cert-file=/etc/kubernetes/ssl/kubernetes.pem --key-file=/etc/kubernetes/ssl/kubern... 4月 11 09:33:35 wecloud-test-k8s-2.novalocal etcd[17478]: e23bf6fd185b2dc5 [quorum:2] has received 1 MsgVoteResp votes and 1 vote ...ctions 4月 11 09:33:36 wecloud-test-k8s-2.novalocal etcd[17478]: e23bf6fd185b2dc5 received MsgVoteResp from c9b9711086e865e3 at term 337 4月 11 09:33:36 wecloud-test-k8s-2.novalocal etcd[17478]: e23bf6fd185b2dc5 [quorum:2] has received 2 MsgVoteResp votes and 1 vote ...ctions 4月 11 09:33:36 wecloud-test-k8s-2.novalocal etcd[17478]: e23bf6fd185b2dc5 became leader at term 337 4月 11 09:33:36 wecloud-test-k8s-2.novalocal etcd[17478]: raft.node: e23bf6fd185b2dc5 elected leader e23bf6fd185b2dc5 at term 337 4月 11 09:33:41 wecloud-test-k8s-2.novalocal etcd[17478]: timed out waiting for read index response 4月 11 09:33:46 wecloud-test-k8s-2.novalocal etcd[17478]: failed to send out heartbeat on time (exceeded the 100ms timeout for 401...516ms) 4月 11 09:33:46 wecloud-test-k8s-2.novalocal etcd[17478]: server is likely overloaded 4月 11 09:33:46 wecloud-test-k8s-2.novalocal etcd[17478]: failed to send out heartbeat on time (exceeded the 100ms timeout for 401.80886ms) 4月 11 09:33:46 wecloud-test-k8s-2.novalocal etcd[17478]: server is likely overloaded Hint: Some lines were ellipsized, use -l to show in full.

報錯信息主要為：failed to send out heartbeat on time (exceeded the 100ms timeout for 401.80886ms)

心跳檢測報錯主要與以下因素有關（磁盤速度、cpu性能和網絡不穩定問題）：

etcd使用了raft算法，leader會定時地給每個follower發送心跳，如果leader連續兩個心跳時間沒有給follower發送心跳，etcd會打印這個log以給出告警。通常情況下這個issue是disk運行過慢導致的，leader一般會在心跳包里附帶一些metadata，leader需要先把這些數據固化到磁盤上，然后才能發送。寫磁盤過程可能要與其他應用競爭，或者因為磁盤是一個虛擬的或者是SATA類型的導致運行過慢，此時只有更好更快磁盤硬件才能解決問題。etcd暴露給Prometheus的metrics指標walfsyncduration_seconds就顯示了wal日志的平均花費時間，通常這個指標應低於10ms。

第二種原因就是CPU計算能力不足。如果是通過監控系統發現CPU利用率確實很高，就應該把etcd移到更好的機器上，然后通過cgroups保證etcd進程獨享某些核的計算能力，或者提高etcd的priority。

第三種原因就可能是網速過慢。如果Prometheus顯示是網絡服務質量不行，譬如延遲太高或者丟包率過高，那就把etcd移到網絡不擁堵的情況下就能解決問題。但是如果etcd是跨機房部署的，長延遲就不可避免了，那就需要根據機房間的RTT調整heartbeat-interval，而參數election-timeout則至少是heartbeat-interval的5倍。

本次實驗是在openstack雲主機上進行的，所以磁盤io不足是已知的問題，所以需要修改hearheat-interval的值（調大一些）。

在etcd服務節點上修改/etc/etcd/etcd.conf文件，添加如下內容：

6秒檢測頻率

ETCD_HEARTBEAT_INTERVAL=6000 ETCD_ELECTION_TIMEOUT=30000

然后重啟etcd服務

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 IIS 503錯誤解決辦法 HTTP Error 503 HTTP Error 503. The service is unavailable IIS error: Service Unavailable : HTTP Error 503. The service is unavailable http statusCode(狀態碼) 【Python】解決urllib返回http error 503問題 IIS Service Unavailable HTTP Error 503. The service is unavailable. http statusCode(狀態碼)含義 Readiness probe failed:connection refused Windows7 IIS7.5 HTTP Error 503 The service is unavailable 另類解決方案在Windows8.1中通過IIS發布網站產生HTTP Error 503錯誤的解決方案