本篇博客主要介紹kubernetes集群如何與ceph集群進行對接,將ceph作為kubernetes的后端存儲實現pvc的動態供應。本文中的ceph和kubernetes為一套集群。
主機列表
K8s集群角色 | ceph集群角色 | IP | 內核 |
---|---|---|---|
master-1 | mon、osd節點 | 172.16.200.101 | 4.4.247-1.el7.elrepo.x86_64 |
master-2 | mon、osd節點 | 172.16.200.102 | 4.4.247-1.el7.elrepo.x86_64 |
master-3 | mon、osd節點 | 172.16.200.103 | 4.4.247-1.el7.elrepo.x86_64 |
node-1 | osd節點 | 172.16.200.104 | 4.4.247-1.el7.elrepo.x86_64 |
升級內核的原因:ceph官方推薦4.1.4版本以上的版本
對接前需要保證ceph集群已經處於健康狀態
root ~ >>> ceph health
HEALTH_OK
對接過程
創建存儲池mypool
root ~ >>> ceph osd pool create mypool 128 128 #設置PG和PGP為128
pool 'mypool' created
root ~ >>> ceph osd pool ls
mypool
創建secret對象
在ceph中創建kube用戶並授予相應的權限
root ~ >>> ceph auth get-or-create client.kube mon 'allow r' osd 'allow class-read object_prefix rbd_children,allow rwx pool=mypool'
root ~/k8s/ceph >>> cat ceph-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: ceph-kube-secret
namespace: default
data:
key: "ceph auth get-key client.kube | base64"
type: kubernetes.io/rbd
---
apiVersion: v1
kind: Secret
metadata:
name: ceph-admin-secret
namespace: default
data:
key: "ceph auth get-key client.admin | base64"
type: kubernetes.io/rbd
創建上述資源
root ~/k8s/ceph >>> kubectl apply -f ceph-secret.yaml
創建StorageClass
root ~/k8s/ceph >>> cat ceph-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-storageclass
provisioner: kubernetes.io/rbd
parameters:
monitors: 172.16.200.101:6789,172.16.200.102:6789,172.16.200.103:6789 #定義mon節點,ceph-mon默認監聽6789端口
adminId: admin
adminSecretName: ceph-admin-secret
adminSecretNamespace: default
pool: mypool #定義ceph存儲池
userId: kube
userSecretName: ceph-kube-secret
userSecretNamespace: default
fsType: ext4
imageFormat: "2"
imageFeatures: "layering"
root ~/k8s/ceph >>> kubectl apply -f ceph-storageclass.yaml
root ~/k8s/ceph >>> kubectl get sc
NAME PROVISIONER AGE
ceph-storageclass kubernetes.io/rb 87s
創建pvc
root ~/k8s/ceph >>> cat ceph-storageclass-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ceph-test-claim
spec:
storageClassName: ceph-storageclass #對應storageclass的名稱
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi #申請5G空間的存儲空間
- ReadWriteOnce: 讀寫權限,並且只能被單個Node掛載
- ReadOnlyMany:只讀權限,允許被多個Node掛載
- ReadWriteMany:讀寫權限,允許被多個Node掛載
創建測試pod
root ~/k8s/ceph >>> cat ceph-busybox-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: ceph-pod
spec:
containers:
- name: ceph-busybox
image: busybox:1.32.0
command: ["/bin/sh","-c","tail -f /etc/resolv.conf"]
volumeMounts:
- name: ceph-volume
mountPath: /usr/share/busybox
readOnly: false
volumes:
- name: ceph-volume
persistentVolumeClaim:
claimName: ceph-test-claim
查看pod的狀態,如果發現pod並未處於running
狀態並且事件中有如下的報錯。並且k8s集群的各個節點確認已安裝與ceph版本一致的ceph-common
軟件包,這時需要更新storageclass
的Provisioner
,用外部的提供者代替k8s自帶的kubernetes.io/rbd
以實現pvc的動態供應。
rbd: create volume failed, err: failed to create rbd image: executable file not found in $PATH:
排障步驟
出現這個報錯問題的原因其實很簡單:gcr.io
中自帶的kube-controller-manager鏡像沒有自帶rbd
子命令。所以如果是鏡像方式啟動的kube-controller-manager
就會遇到這個問題。
Github相關issue:Error creating rbd image: executable file not found in $PATH · Issue #38923 · kubernetes/kubernetes · GitHub
部署外部provisioner
定義外部的provisioner,由這個provisioner拿之前定義的秘鑰創建ceph鏡像。同時對外提供ceph.com/rbd
的供應者。相關的定義如下:
root ~/k8s/ceph >>> cat storageclass-fix-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rbd-provisioner
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: rbd-provisioner
strategy:
type: Recreate
template:
metadata:
labels:
app: rbd-provisioner
spec:
containers:
- name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:latest"
env:
- name: PROVISIONER_NAME
value: ceph.com/rbd
serviceAccountName: persistent-volume-binder
這里必須明確指明
ServiceAccountName
為persistent-volume-binder
,默認的default
賬戶沒有權限列出相應資源
創建該deployment資源
root ~/k8s/ceph >>> kubectl apply -f storageclass-fix-deployment.yaml
更新原有storageclass
...
metadata:
name: ceph-storageclass
#provisioner: kubernetes.io/rbd
provisioner: ceph.com/rbd
...
root ~/k8s/ceph >>> kubectl get sc
NAME PROVISIONER AGE
ceph-storageclass ceph.com/rbd 64m
重新創建pvc及pod
root ~/k8s/ceph >>> kubectl delete -f ceph-storageclass-pvc.yaml && kubectl apply -f ceph-storageclass-pvc.yaml
root ~/k8s/ceph >>> kubectl delete -f ceph-busybox-pod.yaml && kubectl apply -f ceph-busybox-pod.yaml
pod成功進入running狀態
root ~/k8s/ceph >>> kubectl get pods
NAME READY STATUS RESTARTS AGE
ceph-pod 1/1 Running 0 47s
驗證PV
root ~/k8s/ceph >>> kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-9579436e-5122-48c1-871d-2e18b8e42863 5Gi RWO Delete Bound default/ceph-test-claim ceph-storageclass 112s
相應的image已映射為塊設備掛載到/usr/share/busybox/
目錄下
root ~/k8s/ceph >>> kubectl exec -it ceph-pod -- /bin/sh
/ # mount | grep share
/dev/rbd0 on /usr/share/busybox type ext4 (rw,seclabel,relatime,stripe=1024,data=ordered)
mypool池中順利生成相應的image
root ~/k8s/ceph >>> rbd ls -p mypool
kubernetes-dynamic-pvc-54b5fec8-3e1d-11eb-ab76-925ce2bb963b
常見錯誤
1、rbd image無法映射為塊設備
Warning FailedMount 72s (x2 over 2m13s) kubelet, node-1 MountVolume.WaitForAttach failed for volume "pvc-c0683658-308a-4fc0-b330-c813c1cad850" : rbd: map failed exit status 110, rbd output: rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (110) Connection timed out
#對應節點的dmesg日志:
[843372.414046] libceph: mon1 172.16.200.102:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
[843372.415663] libceph: mon1 172.16.200.102:6789 missing required protocol features
[843382.430314] libceph: mon1 172.16.200.102:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
[843382.432031] libceph: mon1 172.16.200.102:6789 missing required protocol features
[843393.566967] libceph: mon1 172.16.200.102:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
[843393.569506] libceph: mon1 172.16.200.102:6789 missing required protocol features
[843403.421791] libceph: mon2 172.16.200.103:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
[843403.424892] libceph: mon2 172.16.200.103:6789 missing required protocol features
[843413.405831] libceph: mon0 172.16.200.101:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
原因:linux內核版本<4.5時不支持feature flag 400000000000000,手動關閉即可。ceph osd crush tunables hammer