ceph配置
rbd create --size 100 rbd/nginx-image
[root@localhost my-cluster]# rbd list
nginx-image
[root@localhost my-cluster]# rbd info nginx-image
rbd image 'nginx-image':
size 100MiB in 25 objects
order 22 (4MiB objects)
block_name_prefix: rbd_data.5e4d6b8b4567
format: 2
features: layering
flags:
create_timestamp: Tue Apr 30 18:10:05 2019
[root@localhost my-cluster]#
# 獲取密碼
[root@localhost my-cluster]# ceph auth get-key client.admin | base64
QVFDRTQ4ZGNLRFVIRFJBQTVGd2J5QzU0d3B0cGJuOTREcjM1VHc9PQ==
k8s 配置
通過靜態pv,pvc使用ceph
每次重建需要先執行 rbd map 將鏡像掛載到node節點,rbd卷允許將Rados塊設備卷安裝到Pod中。與刪除Pod時擦除emptyDir不同,rbd卷的內容被保留,而卷僅被卸載。這意味着RBD卷可以預先填充數據,並且可以在pod之間“傳遞”數據。ceph RBD只能進行單節點讀寫或多節點讀,不能進行多節點讀寫.但是有的業務可能需要多節點讀寫的功能,可用cephfs解決了這個問題。
-
安裝ceph-common
yum -y install ceph-common
-
ceph配置拷至k8s node 節點
ceph.conf ,ceph.client.admin.keyring 至 /etc/ceph/ 目錄
-
格式化為 xfs
[root@localhost my-cluster]# rbd map nginx-image
/dev/rbd0
[root@localhost my-cluster]# mkfs.xfs /dev/rbd0
[root@localhost my-cluster]# rbd unmap nginx-image
- 創建ceph的secret
cat ceph-secret.yaml
**********************
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
type: "kubernetes.io/rbd"
data:
key: QVFDTTlXOWFOMk9IR3hBQXZyUjFjdGJDSFpoZUtmckY0N2tZOUE9PQ==
kubectl create -f ceph-secret.yaml
[root@node1 work]# kubectl get secret
NAME TYPE DATA AGE
ceph-secret Opaque 1 8d
default-token-7s88r kubernetes.io/service-account-token 3 11d
- 創建PV
[root@node1 work]# cat nginx-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: nginx-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
rbd:
monitors:
- 192.168.6.156:6789,192.168.6.157:6789,192.168.6.158:6789
pool: rbd
image: nginx-image
user: admin
secretRef:
name: ceph-secret
fsType: xfs
readOnly: false
persistentVolumeReclaimPolicy: Recycle
- 創建PVC
[root@node1 work]# cat nginx-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nginx-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
kubectl create -f nginx-pvc.yml
- 創建Deployment並掛載
[root@node1 work]# cat nginx-deploy.yml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: demo
spec:
replicas: 3
template:
metadata:
labels:
app: demo
spec:
containers:
- name: demo
image: mritd/demo
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/data"
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: nginx-pvc
[root@node1 work]# rbd map nginx-image
/dev/rbd0
[root@node1 work]# kubectl create -f nginx-deploy.yml
deployment "demo" created
- 創建service
[root@node1 work]# kubectl expose deployment/demo
service "demo" exposed
[root@node1 work]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo ClusterIP 10.254.170.53 <none> 80/TCP 6m
kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 11d
當前k8s1.8版本對於pv 的 RWO 似乎是有bug的,三個pod都掛載了pv,都可寫數據,但相互看不到,正常應當是只有一個pod 創建成功。rbd使用前,需要在節點上執行 rbd map 命令,也不太方便。
https://github.com/kubernetes/kubernetes/issues/60903
測試多pod掛載靜態pv數據不一致問題
[root@node1 work]# ansible k8s -a 'rbd map nginx-image'
192.168.6.161 | SUCCESS | rc=0 >>
/dev/rbd0
192.168.6.162 | SUCCESS | rc=0 >>
/dev/rbd0
192.168.6.163 | SUCCESS | rc=0 >>
/dev/rbd1
[root@node1 work]# kubectl create -f nginx-deploy.yml
deployment "demo" created
[root@node1 work]# kubectl get pods
NAME READY STATUS RESTARTS AGE
ceph-mysql-pod 1/1 Running 0 8d
demo-579d6c87d6-5c59l 1/1 Running 0 10s
demo-579d6c87d6-dck74 1/1 Running 0 10s
demo-579d6c87d6-gg9jf 1/1 Running 0 10s
[root@node1 work]# for i in `kubectl get pods -l app=demo|grep demo|awk '{print $1}'`; do kubectl exec -it $i touch /data/$i.txt ;done;
[root@node1 work]# for i in `kubectl get pods -l app=demo|grep demo|awk '{print $1}'`; do kubectl exec -it $i ls /data ;done;
3.txt demo-579d6c87d6-5c59l.txt
3.txt demo-579d6c87d6-dck74.txt
3.txt demo-579d6c87d6-gg9jf.txt
每個 pod 文件各不相同。
刪除deploy后重建pod
[root@node1 work]# kubectl delete -f nginx-deploy.yml
deployment "demo" deleted
[root@node1 work]# ansible k8s -a 'rbd map nginx-image'
192.168.6.161 | SUCCESS | rc=0 >>
/dev/rbd0
192.168.6.162 | SUCCESS | rc=0 >>
/dev/rbd0
192.168.6.163 | SUCCESS | rc=0 >>
/dev/rbd1
[root@node1 work]# kubectl create -f nginx-deploy.yml
deployment "demo" created
[root@node1 work]# kubectl get pods
NAME READY STATUS RESTARTS AGE
ceph-mysql-pod 1/1 Running 0 8d
demo-579d6c87d6-fbdc2 1/1 Running 0 4s
demo-579d6c87d6-hslhw 1/1 Running 0 4s
demo-579d6c87d6-p5dc5 1/1 Running 0 4s
[root@node1 work]# for i in `kubectl get pods -l app=demo|grep demo|awk '{print $1}'`; do kubectl exec -it $i ls /data ;done;
3.txt demo-579d6c87d6-gg9jf.txt
3.txt demo-579d6c87d6-gg9jf.txt
3.txt demo-579d6c87d6-gg9jf.txt
rbd中只保留了最后一份。
StoragaClass 方式
在 1.4 以后,kubernetes 提供了一種更加方便的動態創建 PV 的方式;也就是說使用 StoragaClass 時無需預先創建固定大小的 PV,等待使用者創建 PVC 來使用;而是直接創建 PVC 即可分配使用。也無需到各Node節點上執行rbd map 鏡像。
- 創建系統級 Secret
注意: 由於 StorageClass 要求 Ceph 的 Secret type必須為 kubernetes.io/rbd ,所以上一步創建的 ceph-secret 需要先被刪除,然后使用如下命令重新創建;此時的 key 並沒有經過 base64
[root@node1 work]# kubectl delete secret ceph-secret
secret "ceph-secret" deleted
[root@node1 work]# ceph auth get-key client.admin
AQCE48dcKDUHDRAA5FwbyC54wptpbn94Dr35Tw==
# 這個 secret type 必須為 kubernetes.io/rbd,否則會造成 PVC 無法使用
kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" --from-literal=key='AQCE48dcKDUHDRAA5FwbyC54wptpbn94Dr35Tw=='
- 創建 StorageClass
cat << EOF >> ceph.storageclass.yml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: test-storageclass
provisioner: kubernetes.io/rbd
parameters:
monitors: 192.168.6.156:6789,192.168.6.157:6789,192.168.6.158:6789
# Ceph 客戶端用戶 ID(非 k8s 的)
adminId: admin
adminSecretName: ceph-secret
pool: rbd
userId: admin
userSecretName: ceph-secret
EOF
[root@node1 work]# kubectl create -f ceph.storageclass.yml
storageclass "ceph-storageclass" created
[root@node1 work]# kubectl get storageclass
NAME PROVISIONER
ceph-storageclass kubernetes.io/rbd
- 創建PVC
[root@node1 work]# vim nginx-pvc2.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nginx-pvc2
annotations:
volume.beta.kubernetes.io/storage-class: ceph-storageclass
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
[root@node1 work]# kubectl create -f nginx-pvc2.yaml
persistentvolumeclaim "nginx-pvc2" create
- 創建Deployment
[root@node1 work]# vim nginx-deploy2.yml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: demo2
spec:
replicas: 3
template:
metadata:
labels:
app: demo2
spec:
containers:
- name: nginx-demo2
image: mritd/demo
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/data"
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: nginx-pvc2
[root@node1 work]# kubectl create -f nginx-deploy2.yml
deployment "demo2" created
[root@node1 work]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
ceph-mysql-pod 1/1 Running 0 8d 172.30.128.6 node3
demo2-66fd75bb8d-nwxmr 1/1 Running 0 15s 172.30.128.3 node3
demo2-66fd75bb8d-wmt7g 0/1 ContainerCreating 0 15s <none> node1
demo2-66fd75bb8d-xh47j 0/1 ContainerCreating 0 15s <none> node2
可以看出只創建成功一個pod,符合預期,只有一個節點進行讀寫,其余節點無法創建成功。
https://github.com/kubernetes/kubernetes/issues/67474
將配置修改為 ReadOnlyMany 后仍不能多個節點掛載,可能也是bug.
ceph 常用命令
- rbd 查看鎖
rbd lock list nginx-image
- rbd 查看 map情況
rbd showmapped
k8s 常用命令
- 獲取pod IP
[root@node1 work]# kubectl get pods -l app=demo -o yaml|grep podIP
podIP: 172.30.128.3
podIP: 172.30.96.3
podIP: 172.30.184.2
-
生成service創建配置文件
kubectl get svc -l app=demo -o yaml
-
查看 service 描述
kubectl describe svc demo
k8s各類端口及IP說明
端口說明
targetPort:容器接收流量的端口;port:抽象的 Service 端口,可以使任何其它 Pod 訪問該 Service 的端口
問題
- rbd: failed to lock image nginx-image (maybe locked by other nodes), error exit status 1
Error syncing pod
May 7 10:50:19 node2 kubelet: E0507 10:50:19.891141 27177 kubelet.go:1633] Unable to mount volumes for pod "demo-67bf76f84c-z8kmx_defaul t(3eaf6dab-7072-11e9-b0eb-000c29bda28d)": timeout expired waiting for volumes to attach/mount for pod "default"/"demo-67bf76f84c-z8kmx". li st of unattached/unmounted volumes=[data]; skipping pod May 7 10:50:19 node2 kubelet: E0507 10:50:19.891219 27177 pod_workers.go:182] Error syncing pod 3eaf6dab-7072-11e9-b0eb-000c29bda28d ("d emo-67bf76f84c-z8kmx_default(3eaf6dab-7072-11e9-b0eb-000c29bda28d)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"demo-67bf76f84c-z8kmx". list of unattached/unmounted volumes=[data]
解決
每節點執行:rbd map nginx-image [root@node3 ~]# rbd showmapped id pool image snap device 0 rbd db-image - /dev/rbd0 1 rbd nginx-image - /dev/rbd1
PV訪問模式
- 訪問模式包括:
- ReadWriteOnce —— 該volume只能被單個節點以讀寫的方式映射
- ReadOnlyMany —— 該volume可以被多個節點以只讀方式映射
- ReadWriteMany —— 該volume只能被多個節點以讀寫的方式映射
- 狀態
- Available:空閑的資源,未綁定給PVC
- Bound:綁定給了某個PVC
- Released:PVC已經刪除了,但是PV還沒有被集群回收
- Failed:PV在自動回收中失敗了
- 當前的回收策略有:
- Retain:手動回收
- Recycle:需要擦出后才能再使用
- Delete:相關聯的存儲資產,如AWS EBS,GCE PD,Azure Disk,or OpenStack Cinder卷都會被刪除
當前,只有NFS和HostPath支持回收利用,AWS EBS,GCE PD,Azure Disk,or OpenStack Cinder卷支持刪除操作。
測試結果
k8s RBD只能進行單節點讀寫或多節點讀,不能進行多節點讀寫。測試ceph的rbd發現,多節點讀好像有bug,生產使用還是最好能具備一定開發能力。