一、rook簡介
Rook是一個自管理的分布式存儲編排系統,可以為Kubernetes提供便利的存儲解決方案。Rook本身並不提供存儲,而是在kubernetes和存儲系統之間提供適配層,簡化存儲系統的部署與維護工作。目前,rook支持的存儲系統包括:Ceph、CockroachDB、Cassandra、EdgeFS、Minio、NFS,其中Ceph為Stable狀態,其余均為Alpha。本文僅介紹Ceph相關內容。
Rook由Operator和Cluster兩部分組成:
- Operator:由一些CRD和一個All in one鏡像構成,包含包含啟動和監控存儲系統的所有功能。
- Cluster:負責創建CRD對象,指定相關參數,包括ceph鏡像、元數據持久化位置、磁盤位置、dashboard等等…
下圖是Rook的體系結構圖,Operator啟動之后,首先創建Agent和Discover容器,負責監視和管理各個節點上存儲資源。然后創建Cluster,Cluster是創建Operator時定義的CRD。Operator根據Cluster的配置信息啟動Ceph的相關容器。存儲集群啟動之后,使用kubernetes元語創建PVC為應用容器所用。
1、系統要求
本次安裝環境
- kubernetes 1.16
- centos7.7
- kernel 5.4.65-200.el7.x86_64
- flannel v0.12.0
須要安裝lvm包
yum install -y lvm2
2、內核要求
RBD
通常發行版的內核都編譯有,但你最好肯定下:
foxchan@~$ lsmod|grep rbd rbd 114688 0 libceph 368640 1 rbd
能夠用如下命令放到開機啟動項里
cat > /etc/sysconfig/modules/rbd.modules << EOF
modprobe rbd
EOF
CephFS
若是你想使用cephfs,內核最低要求是 4.17。
二、rook部署
2.1、環境說明
[root@k8s-master ceph]# kubectl get node NAME STATUS ROLES AGE VERSION k8s-master NotReady master 92m v1.16.2 k8s-node1 Ready <none> 92m v1.16.2 k8s-node2 Ready <none> 90m v1.16.2 [root@k8s-node1 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 20G 0 disk ├─sda1 8:1 0 200M 0 part /boot └─sda2 8:2 0 19.8G 0 part ├─centos-root 253:0 0 15.8G 0 lvm / └─centos-swap 253:1 0 4G 0 lvm sdb 8:16 0 20G 0 disk sr0 11:0 1 10.3G 0 rom
每個節點 sdb用來做ceph的數據盤
2.2、拉取項目
git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.git
2.3、部署Rook Operator
獲取鏡像
可能由於國內環境無法Pull鏡像,建議提前pull如下鏡像
docker pull ceph/ceph:v15.2.5 docker pull rook/ceph:v1.5.1 docker pull registry.aliyuncs.com/it00021hot/cephcsi:v3.1.2 docker pull registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.0.1 docker pull registry.aliyuncs.com/it00021hot/csi-attacher:v3.0.0 docker pull registry.aliyuncs.com/it00021hot/csi-provisioner:v2.0.0 docker pull registry.aliyuncs.com/it00021hot/csi-snapshotter:v3.0.0 docker pull registry.aliyuncs.com/it00021hot/csi-resizer:v1.0.0 docker tag registry.aliyuncs.com/it00021hot/csi-snapshotter:v3.0.0 k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.0 docker tag registry.aliyuncs.com/it00021hot/csi-resizer:v1.0.0 k8s.gcr.io/sig-storage/csi-resizer:v1.0.0 docker tag registry.aliyuncs.com/it00021hot/cephcsi:v3.1.2 quay.io/cephcsi/cephcsi:v3.1.2 docker tag registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.0.1 k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.1 docker tag registry.aliyuncs.com/it00021hot/csi-attacher:v3.0.0 k8s.gcr.io/sig-storage/csi-attacher:v3.0.0 docker tag registry.aliyuncs.com/it00021hot/csi-provisioner:v2.0.0 k8s.gcr.io/sig-storage/csi-provisioner:v2.0.0 ####可以將其更改tag並推送到私有倉庫; 另外或者tag為yaml文件中的名字, 建議修改為本地私有倉庫
修改 operator.yaml 的鏡像名,更改為私有倉庫
ROOK_CSI_CEPH_IMAGE: "10.2.55.8:5000/kubernetes/cephcsi:v3.1.2" ROOK_CSI_REGISTRAR_IMAGE: "10.2.55.8:5000/kubernetes/csi-node-driver-registrar:v2.0.1" ROOK_CSI_RESIZER_IMAGE: "10.2.55.8:5000/kubernetes/csi-resizer:v1.0.0" ROOK_CSI_PROVISIONER_IMAGE: "10.2.55.8:5000/kubernetes/csi-provisioner:v2.0.0" ROOK_CSI_SNAPSHOTTER_IMAGE: "10.2.55.8:5000/kubernetes/csi-snapshotter:v3.0.0" ROOK_CSI_ATTACHER_IMAGE: "10.2.55.8:5000/kubernetes/csi-attacher:v3.0.0"
ROOK_CSI_KUBELET_DIR_PATH: "/data/k8s/kubelet" ###如果之前有修改過kubelet 數據目錄,這里需要修改
執行 operator.yaml
cd rook/cluster/examples/kubernetes/ceph kubectl create -f crds.yaml -f common.yaml kubectl create -f operator.yaml
2.4、配置cluster
cluster.yaml文件里的內容需要修改,一定要適配自己的硬件情況,請詳細閱讀配置文件里的注釋,避免我踩過的坑。
此文件的配置,除了增刪osd設備外,其他的修改都要重裝ceph集群才能生效,所以請提前規划好集群。如果修改后不卸載ceph直接apply,會觸發ceph集群重裝,導致集群異常掛掉
修改內容如下:
vi cluster.yaml
- 修改內容如下,更多配置參考官網
apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: # 命名空間的名字,同一個命名空間只支持一個集群 name: rook-ceph namespace: rook-ceph spec: # ceph版本說明 # v13 is mimic, v14 is nautilus, and v15 is octopus. cephVersion: #修改ceph鏡像,加速部署時間 image: image: 10.2.55.8:5000/kubernetes/ceph:v15.2.5 # 是否允許不支持的ceph版本 allowUnsupported: false #指定rook數據在節點的保存路徑 dataDirHostPath: /data/k8s/rook # 升級時如果檢查失敗是否繼續 skipUpgradeChecks: false # 從1.5開始,mon的數量必須是奇數 mon: count: 3 # 是否允許在單個節點上部署多個mon pod allowMultiplePerNode: false mgr: modules: - name: pg_autoscaler enabled: true # 開啟dashboard,禁用ssl,指定端口是7000,你可以默認https配置。我是為了ingress配置省事。 dashboard: enabled: true port: 7000 ssl: false # 開啟prometheusRule monitoring: enabled: true # 部署PrometheusRule的命名空間,默認此CR所在命名空間 rulesNamespace: rook-ceph # 開啟網絡為host模式,解決無法使用cephfs pvc的bug network: provider: host # 開啟crash collector,每個運行了Ceph守護進程的節點上創建crash collector pod crashCollector: disable: false placement: osd: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ceph-osd operator: In values: - enabled # 存儲的設置,默認都是true,意思是會把集群所有node的設備清空初始化。 storage: # cluster level storage configuration and selection useAllNodes: false #關閉使用所有Node useAllDevices: false #關閉使用所有設備 nodes: - name: "k8s-node1" #指定存儲節點主機 devices: - name: "sdb" #指定磁盤為/dev/sdb - name: "k8s-node2" devices: - name: "sdb"
更多 cluster 的 CRD 配置參考:
為osd節點增加label
[root@k8s-master ceph]# kubectl label nodes k8s-node1 ceph-osd=enabled node/k8s-node1 labeled [root@k8s-master ceph]# kubectl label nodes k8s-node2 ceph-osd=enabled node/k8s-node2 labeled
執行安裝
kubectl apply -f cluster.yaml
2.5、 增刪osd
2.5.1 添加相關label
kubectl label nodes k8s-master ceph-osd=enabled
2.5.2 修改cluster.yaml
storage: # cluster level storage configuration and selection useAllNodes: false #關閉使用所有Node useAllDevices: false #關閉使用所有設備 nodes: - name: "k8s-node1" #指定存儲節點主機 devices: - name: "sdb" #指定磁盤為/dev/sdb - name: "k8s-node2" devices: - name: "sdb" - name: "k8s-master" devices: - name: "sdb"
2.5.3 apply cluster.yaml
kubectl apply -f cluster.yaml
刪除重新安裝rook-ceph會報錯,可以執行如下命令生成secret。
kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyring
2.6、 安裝toolbox
Rook工具箱是一個容器,其中包含用於rook調試和測試的常用工具。 該工具箱基於CentOS,因此yum可以輕松安裝您選擇的更多工具。
kubectl apply -f toolbox.yaml
測試Rook
一旦 toolbox 的 Pod 運行成功后,我們就可以使用下面的命令進入到工具箱內部進行操作:
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
比如現在我們要查看集群的狀態,需要滿足下面的條件才認為是健康的:
查看rook狀態:ceph status
- 所有 mons 應該達到法定數量
- mgr 應該是激活狀態
- 至少有一個 OSD 處於激活狀態
- 如果不是 HEALTH_OK 狀態,則應該查看告警或者錯誤信息
2.7、訪問dashboard
[root@k8s-master ceph]# kubectl get svc -n rook-ceph NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE csi-cephfsplugin-metrics ClusterIP 10.103.103.152 <none> 8080/TCP,8081/TCP 9m39s csi-rbdplugin-metrics ClusterIP 10.109.21.95 <none> 8080/TCP,8081/TCP 9m41s rook-ceph-mgr ClusterIP 10.103.36.44 <none> 9283/TCP 8m50s rook-ceph-mgr-dashboard NodePort 10.104.55.171 <none> 7000:30112/TCP 8m50s rook-ceph-mon-a ClusterIP 10.103.40.41 <none> 6789/TCP,3300/TCP 9m36s rook-ceph-mon-b ClusterIP 10.96.138.43 <none> 6789/TCP,3300/TCP 9m14s rook-ceph-mon-c ClusterIP 10.108.169.68 <none> 6789/TCP,3300/TCP 9m3s
### 獲得dashboard的登錄密碼,用戶為admin ###。密碼通過如下方式獲得:
[root@k8s-master ceph]# kubectl get secrets -n rook-ceph rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d bagfSJpb/3Nj0DN5I#7Z
登錄后界面如下:
三、部署塊存儲
3.1 創建pool和StorageClass
ceph-storageclass.yaml
kubectl apply -f csi/rbd/storageclass.yaml
# 定義一個塊存儲池 apiVersion: ceph.rook.io/v1 kind: CephBlockPool metadata: name: replicapool namespace: rook-ceph spec: # 每個數據副本必須跨越不同的故障域分布,如果設置為host,則保證每個副本在不同機器上 failureDomain: host # 副本數量 replicated: size: 3 # Disallow setting pool with replica 1, this could lead to data loss without recovery. # Make sure you're *ABSOLUTELY CERTAIN* that is what you want requireSafeReplicaSize: true # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size #targetSizeRatio: .5 --- # 定義一個StorageClass apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rook-ceph-block # 該SC的Provisioner標識,rook-ceph前綴即當前命名空間 provisioner: rook-ceph.rbd.csi.ceph.com parameters: # clusterID 就是集群所在的命名空間名 # If you change this namespace, also change the namespace below where the secret namespaces are defined clusterID: rook-ceph # If you want to use erasure coded pool with RBD, you need to create # two pools. one erasure coded and one replicated. # You need to specify the replicated pool here in the `pool` parameter, it is # used for the metadata of the images. # The erasure coded pool must be set as the `dataPool` parameter below. #dataPool: ec-data-pool # RBD鏡像在哪個池中創建 pool: replicapool # RBD image format. Defaults to "2". imageFormat: "2" # 指定image特性,CSI RBD目前僅僅支持layering imageFeatures: layering # Ceph admin 管理憑證配置,由operator 自動生成 # in the same namespace as the cluster. csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # 卷的文件系統類型,默認ext4,不建議xfs,因為存在潛在的死鎖問題(超融合設置下卷被掛載到相同節點作為OSD時) csi.storage.k8s.io/fstype: ext4 # uncomment the following to use rbd-nbd as mounter on supported nodes # **IMPORTANT**: If you are using rbd-nbd as the mounter, during upgrade you will be hit a ceph-csi # issue that causes the mount to be disconnected. You will need to follow special upgrade steps # to restart your application pods. Therefore, this option is not recommended. #mounter: rbd-nbd allowVolumeExpansion: true reclaimPolicy: Delete
3.2 demo示例
推薦pvc 和應用寫到一個yaml里面
vim ceph-demo.yaml
#創建pvc apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rbd-demo-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: rook-ceph-block --- apiVersion: apps/v1 kind: Deployment metadata: name: csirbd-demo-pod labels: test-cephrbd: "true" spec: replicas: 1 selector: matchLabels: test-cephrbd: "true" template: metadata: labels: test-cephrbd: "true" spec: containers: - name: web-server-rbd image: 10.2.55.8:5000/library/nginx:1.18.0 volumeMounts: - name: mypvc mountPath: /usr/share/nginx/html volumes: - name: mypvc persistentVolumeClaim: claimName: rbd-demo-pvc readOnly: false
四、部署文件系統
4.1 創建CephFS
CephFS的CSI驅動使用Quotas來強制應用PVC聲明的大小,僅僅4.17+內核才能支持CephFS quotas。
如果內核不支持,而且你需要配額管理,配置Operator環境變量 CSI_FORCE_CEPHFS_KERNEL_CLIENT: false來啟用FUSE客戶端。
使用FUSE客戶端時,升級Ceph集群時應用Pod會斷開mount,需要重啟才能再次使用PV。
cd rook/cluster/examples/kubernetes/ceph
kubectl apply -f filesystem.yaml
五、刪除ceph集群
刪除ceph集群前,請先清理相關pod
刪除塊存儲和文件存儲
kubectl delete -n rook-ceph cephblockpool replicapool kubectl delete storageclass rook-ceph-block kubectl delete -f csi/cephfs/filesystem.yaml kubectl delete storageclass csi-cephfs rook-ceph-block
kubectl -n rook-ceph delete cephcluster rook-ceph
刪除operator和相關crd
kubectl delete -f cluster.yaml
kubectl delete -f operator.yaml kubectl delete -f common.yaml kubectl delete -f crds.yaml
清除主機上的數據
刪除Ceph集群后,在之前部署Ceph組件節點的/data/rook/目錄,會遺留下Ceph集群的配置信息。
rm -rf /data/k8s/rook/*
若之后再部署新的Ceph集群,先把之前Ceph集群的這些信息刪除,不然啟動monitor會失敗;
# cat clean-rook-dir.sh hosts=( 192.168.130.130 192.168.130.131 192.168.130.132 ) for host in ${hosts[@]} ; do ssh $host "rm -rf /data/k8s/rook/*" done
清除device
yum install gdisk -y export DISK="/dev/sdb" sgdisk --zap-all $DISK dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync blkdiscard $DISK ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove % rm -rf /dev/ceph-*
如果因為某些原因導致刪除ceph集群卡主,可以先執行以下命令, 再刪除ceph集群就不會卡主了
kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge
轉載於: https://my.oschina.net/u/4346166/blog/4752651
https://blog.csdn.net/qq_40592377/article/details/110292089