在k8s中解決pod資源的正確識別


1、容器資源限制概述

在使用docker作為容器引擎的時候,可以通過添加--memory--cpus及更多參數來限制容器可用的cpu和內存,具體參數可以參考docker資源限制docker對容器進行限制的原理實際上是利用Linux內核的cgroups實現的,cgroups可以限制、記錄、隔離進程組所使用的物理資源(包括:CPUmemoryIO 等),為容器實現虛擬化提供了基本保證,是構建Docker等一系列虛擬化管理工具的基石

關於cgroups資源限制實現可以參考Docker背后的內核知識-cgroups資源限制

2、問題背景

對於某些容器中運行的服務,通常會自動對當前環境的可用資源數量進行檢測,接着根據這些數據來合理分配相應資源

例如nginx容器,nginx通過在配置文件中指定nginx worker_processes選項,默認這個選項參數的值為1,表示nginx僅啟動1個worker進程

如果需要在大並發環境下優化nginx性能,可以將這個值手動設置成對應環境的cpu核數,或者直接配置成auto讓其自動設置,兩種設置方法中前者需要將配置文件進行掛載並手動變更配置,后者更為靈活但在容器環境下會有一定問題,因為不管是通過docker直接運行的容器還是通過k8s運行的最小化單元Pod中的容器,識別到的cpu和內存都是所在node節點機器的資源信息,因此對nginx來說並不能直接通過auto參數對cpu進行正確的自動識別,例如我這里的一台node節點及節點上的pod資源信息

# kubectl describe nodes k8s-node-07|grep -A 5 "Capacity"
Capacity:
  cpu:                16
  ephemeral-storage:  74408452Ki
  hugepages-2Mi:      0
  memory:             16430184Ki
  pods:               110
# docker info|grep -A 6 "Kernel"
Kernel Version: 4.4.247-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 15.67GiB
Name: k8s-node-07
# kubectl exec -it test-pod-5dff4b89fd-bsh6b -- bash
root@test-pod-5dff4b89fd-bsh6b:/# free -m
              total        used        free      shared  buff/cache   available
Mem:          16045        7915        2354        1002        5775        6222
Swap:             0           0           0
root@test-pod-5dff4b89fd-bsh6b:/# head -2 /proc/meminfo
MemTotal:       16430184 kB
MemFree:         2374064 kB

如果在k8s中通過resources限制了Podcpu和內存,例如

        resources:
          limits:
            cpu: "1"
            memory: 2Gi
          requests:
            cpu: 200m
            memory: 512Mi

可以在創建出來的pod所在節點機器上通過docker命令查看具體的資源信息

# docker inspect b1f4bfb53a2c|grep -i cgroup
            "Cgroup": "",
            "CgroupParent": "/kubepods/burstable/podc4a25564-225b-4562-afee-fab8cc5d694f",
            "DeviceCgroupRules": null,
# cat /sys/fs/cgroup/cpu/kubepods/burstable/podc4a25564-225b-4562-afee-fab8cc5d694f/cpu.cfs_quota_us
100000
# cat /sys/fs/cgroup/cpu/kubepods/burstable/podc4a25564-225b-4562-afee-fab8cc5d694f/cpu.cfs_period_us
100000

通過查找相關資料得知,對nginx來說,獲取CPU核心數是通過系統調用sysconf(_SC_NPROCESSORS_ONLN)來獲取的,實際上是通過讀取文件/sys/devices/system/cpu/online來獲取的,而默認情況下pod中的這個文件信息和宿主機是一樣的,因此nginxworker_processes參數如果設置成auto,那么最終啟動的worker進程數將會是16個,而nginx所在的Pod本身的cpu限制配置較小時,導致每個worker分配的時間片比較少,這會帶來明顯的響應慢的問題

# kubectl exec -it test-pod-5dff4b89fd-bsh6b -- cat /sys/devices/system/cpu/online
0-15

3、引入lxcfs

lxcfs是一個的小型FUSE文件系統,旨在使Linux容器更像一個虛擬機,能夠幫助容器正確的識別自身資源,處理對以下文件的信息

/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime
/sys/devices/system/cpu/online

當容器啟動時,容器中的/proc/xxx會被掛載成hostlxcfs的目錄。例如當容器內的應用如果需要讀取/proc/meminfo的信息時,請求就會被導向lxcfs,而lxcfs又會通過cgroup的信息來返回正確的值最終使得容器內的應用正確識別

3.1 在k8s中部署lxcfs

基於k8s部署的lxcfs文件系統的項目地址:https://github.com/denverdino/lxcfs-admission-webhook

其最終利用的原理是基於k8s動態准入控制AdmissionWebhook

我這里的k8s集群版本如下

# kubectl version -o yaml
clientVersion:
  buildDate: "2020-12-08T17:59:43Z"
  compiler: gc
  gitCommit: af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38
  gitTreeState: clean
  gitVersion: v1.20.0
  goVersion: go1.15.5
  major: "1"
  minor: "20"
  platform: darwin/amd64
serverVersion:
  buildDate: "2019-06-19T16:32:14Z"
  compiler: gc
  gitCommit: e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529
  gitTreeState: clean
  gitVersion: v1.15.0
  goVersion: go1.12.5
  major: "1"
  minor: "15"
  platform: linux/amd64

首先獲取資源清單並通過腳本一鍵部署

# git clone https://github.com/denverdino/lxcfs-admission-webhook.git
# cd lxcfs-admission-webhook
# ls deployment 
deployment.yaml                lxcfs-daemonset.yaml           mutatingwebhook.yaml           uninstall.sh                   web.yaml                       webhook-patch-ca-bundle.sh
install.sh                     mutatingwebhook-ca-bundle.yaml service.yaml                   validatingwebhook.yaml         webhook-create-signed-cert.sh
# kubectl apply -f deployment/lxcfs-daemonset.yaml                    
daemonset.apps/lxcfs created
# ./deployment/install.sh  
creating certs in tmpdir /var/folders/8n/11ndbfq95jv79gds8wqj2scc0000gn/T/tmp.c6OKXi4L 
Generating RSA private key, 2048 bit long modulus
.......................................+++
...............+++
e is 65537 (0x10001)
certificatesigningrequest.certificates.k8s.io/lxcfs-admission-webhook-svc.default created
NAME                                  AGE   REQUESTOR   CONDITION
lxcfs-admission-webhook-svc.default   0s    admin       Pending
certificatesigningrequest.certificates.k8s.io/lxcfs-admission-webhook-svc.default approved
W0327 16:35:14.764281    8953 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
secret/lxcfs-admission-webhook-certs created
NAME                            TYPE     DATA   AGE
lxcfs-admission-webhook-certs   Opaque   2      0s
deployment.apps/lxcfs-admission-webhook-deployment created
service/lxcfs-admission-webhook-svc created
mutatingwebhookconfiguration.admissionregistration.k8s.io/mutating-lxcfs-admission-webhook-cfg created

查看部署結果,會運行一個名為lxcfs-admission-webhook-deploymentpod,以及在所有節點上以ds的方式運行一個lxcfspod

kubectl get pods -o wide|grep lxcfs
lxcfs-admission-webhook-deployment-6896958c4c-56k54   1/1     Running   0          80s     172.20.7.51    172.16.1.111   <none>           <none>
lxcfs-67cgk                                           1/1     Running   0          94s     172.20.0.25    172.16.1.100   <none>           <none>
lxcfs-c4lkx                                           1/1     Running   0          93s     172.20.1.25    172.16.1.101   <none>           <none>
...

3.2 開啟命名空間注入

# kubectl label namespace default lxcfs-admission-webhook=enabled

為指定的命名空間開啟lxcfs注入,開啟后該命名空間下所有新創建的Pod都將被注入lxcfs

3.3 還原

如果是要還原安裝的環境,執行目錄中的卸載腳本即可

# ./deployment/uninstall.sh 
mutatingwebhookconfiguration.admissionregistration.k8s.io "mutating-lxcfs-admission-webhook-cfg" deleted
service "lxcfs-admission-webhook-svc" deleted
deployment.apps "lxcfs-admission-webhook-deployment" deleted
secret "lxcfs-admission-webhook-certs" deleted
# kubectl delete -f deployment/lxcfs-daemonset.yaml        
daemonset.apps "lxcfs" deleted

4、測試

克隆下來的代碼中提供了一個用於測試的httpd podyaml,可以直接部署

# kubectl apply -f deployment/web.yaml 
deployment.apps/web created
# kubectl get pods -l app=web
NAME                   READY   STATUS    RESTARTS   AGE
web-5ff5cd75f8-74pr6   1/1     Running   0          27s
web-5ff5cd75f8-bcm2x   1/1     Running   0          27s

進入容器查看資源

kubectl exec -it web-5ff5cd75f8-74pr6 -- bash
root@web-5ff5cd75f8-74pr6:/usr/local/apache2# free -m
             total       used       free     shared    buffers     cached
Mem:           256         15        240          0          0          0
-/+ buffers/cache:         14        241
Swap:            0          0          0
root@web-5ff5cd75f8-74pr6:/usr/local/apache2# cat /proc/cpuinfo| grep "processor"| wc -l
1

實際上通過lxcfs+動態准入控制,在創建新的pod時自動掛載了主機的相關文件,可以通過下面的方式查看

# kubectl describe pods web-5ff5cd75f8-74pr6
...
    Mounts:
      /proc/cpuinfo from lxcfs-proc-cpuinfo (rw)
      /proc/diskstats from lxcfs-proc-diskstats (rw)
      /proc/loadavg from lxcfs-proc-loadavg (rw)
      /proc/meminfo from lxcfs-proc-meminfo (rw)
      /proc/stat from lxcfs-proc-stat (rw)
      /proc/swaps from lxcfs-proc-swaps (rw)
      /proc/uptime from lxcfs-proc-uptime (rw)
      /sys/devices/system/cpu/online from lxcfs-sys-devices-system-cpu-online (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jtj98 (ro)
...

5、小結

容器中的pod已經能正確的讀取到cpu及內存的限制值了,如果是自身應用要讀取所在環境的資源配置,如果出現問題,一定要從底層弄清楚是如何獲取到的環境資源

通過上面的測試可以看到lxcfs也自動掛載了nginx需要的/sys/devices/system/cpu/online文件到pod中了,因此nginx容器中worker process自動設置的問題經過測試驗證也已得到了解決

參考:

https://github.com/denverdino/lxcfs-admission-webhook


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM