資源預留必要性
以常見的kubeadm安裝的k8s集群來說,默認情況下kubelet沒有配置kube-reserverd和system-reserverd資源預留。worker node上的pod負載,理論上可以使用該節點服務器上的所有cpu和內存資源。比如某個deployment controller管理的pod存在bug,運行時無法正常釋放內存,那么該worker node上的kubelet進程最終會搶占不到足夠的內存,無法向kube-apiserver同步心跳狀態,該worker node節點的狀態進而被標記為NotReady。隨后deployment controller會在另外一個worker節點上創建一個pod副本,又重復前述過程,壓垮第二個worker node,最終整個k8s集群將面臨“雪崩”危險。
資源分類
Node capacity:節點總的資源
kube-reserved:預留給k8s進程的資源(如kubelet, container runtime, node problem detector等)
system-reserved:預留給操作系統的資源(如sshd、udev等)
eviction-threshold:kubelet eviction的閥值
allocatable:留給pod的可用資源=Node capacity - kube-reserved - system-reserved - eviction-threshold
配置方法
以ubuntu 16.04+k8s v1.14的環境舉例,配置步驟如下。
1.修改/var/lib/kubelet/config.yaml
enforceNodeAllocatable: - pods - kube-reserved - system-reserved systemReserved: cpu: "1" memory: "2Gi" kubeReserved: cpu: "1" memory: "2Gi" systemReservedCgroup: /system.slice kubeReservedCgroup: /system.slice/kubelet.service
參數解釋:
enforce-node-allocatable=pods,kube-reserved,system-reserved #默認為pod設置,這里要給kube進程和system預留所以要加上。
kube-reserved-cgroup=/system.slice/kubelet.service #k8s組件對應的cgroup目錄
system-reserved-cgroup=/system.slice #系統組件對應的cgroup目錄
kube-reserved=cpu=1,memory=2Gi #k8s組件資源預留大小
system-reserved=cpu=2,memory=4Gi #系統組件資源預留大小。結合主機配置和系統空載占用資源量的監控,實際測試確定。
注:根據實際需求,也可以對ephemeral-storage做預留,例如“kube-reserved=cpu=1,memory=2Gi, ephemeral-storage=10Gi"
2.修改/lib/systemd/system/kubelet.service
由於cpuset和hugetlb這兩個cgroup subsystem默認沒有初始化system.slice,需要在啟動進程前指定創建。
[Unit] Description=kubelet: The Kubernetes Node Agent Documentation=https://kubernetes.io/docs/home/ [Service] ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuset/system.slice/kubelet.service ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/hugetlb/system.slice/kubelet.service ExecStart=/usr/bin/kubelet Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target
3.重啟kubelet進程
systemctl restart kubelet
systemctl status kubelet
4.查看worker node的可用資源
kubectl describe node [Your-NodeName]
Capacity: cpu: 40 ephemeral-storage: 197608716Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 131595532Ki pods: 110 Allocatable: cpu: 37 ephemeral-storage: 182116192365 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 125201676Ki pods: 110
參考文檔:
https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/
https://my.oschina.net/jxcdwangtao/blog/1629059