ceph官網地址:https://ceph.com/en/
rook官網地址:https://rook.io/
Rook是什么
- Rook本身並不是一個分布式存儲系統,而是利用 Kubernetes 平台的強大功能,通過 Kubernetes Operator 為每個存儲提供商提供服務。它是一個存儲“編排器”,可以使用不同的后端(例如 Ceph、EdgeFS 等)執行繁重的管理存儲工作,從而抽象出很多復雜性。
- Rook 將分布式存儲系統轉變為自我管理、自我擴展、自我修復的存儲服務。它自動執行存儲管理員的任務:部署、引導、配置、供應、擴展、升級、遷移、災難恢復、監控和資源管理
- Rook 編排了多個存儲解決方案,每個解決方案都有一個專門的 Kubernetes Operator 來實現自動化管理。目前支持Ceph、Cassandra、NFS。
- 目前主流使用的后端是Ceph,Ceph 提供的不僅僅是塊存儲;它還提供與 S3/Swift 兼容的對象存儲和分布式文件系統。Ceph 可以將一個卷的數據分布在多個磁盤上,因此可以讓一個卷實際使用比單個磁盤更多的磁盤空間,這很方便。當向集群添加更多磁盤時,它會自動在磁盤之間重新平衡/重新分配數據。
ceph-rook 與k8s集成方式
- Rook 是一個開源的cloud-native storage編排, 提供平台和框架;為各種存儲解決方案提供平台、框架和支持,以便與雲原生環境本地集成。
- Rook 將存儲軟件轉變為自我管理、自我擴展和自我修復的存儲服務,它通過自動化部署、引導、配置、置備、擴展、升級、遷移、災難恢復、監控和資源管理來實現此目的。
- Rook 使用底層雲本機容器管理、調度和編排平台提供的工具來實現它自身的功能。
- Rook 目前支持Ceph、NFS、Minio Object Store和CockroachDB。
- Rook使用Kubernetes原語使Ceph存儲系統能夠在Kubernetes上運行
rook + ceph 提供的能力
- Rook 幫我們創建好StorageClass
- pvc 只需要指定存儲類,Rook 自動調用 StorageClass 里面的 Provisioner 供應商,接下來對 ceph 集群操作
- Ceph
- Block:塊存儲。RWO(ReadWriteOnce)單節點讀寫【一個Pod操作一個自己專屬的讀寫區】,適用於(有狀態副本及)
- Share FS(Ceph File System):共享存儲。RWX(ReadWriteMany)多節點讀寫【多個Pod操作同一個存儲區,可讀可寫】,適用於無狀態應用。(文件系統)
- 總結:無狀態應用隨意復制多少份,一定用到RWX能力。有狀態應用復制任意份,每份都只是讀寫自己的存儲,用到RWO(優先)或者RWX。
- 直接通過Rook可以使用到任意能力的存儲。
前置環境:
實驗環境:
- 1、vmware虛擬機部署了2台centos7機器,一個master節點,一個woker節點
- 2、k8s版本:1.17.4
- 3、網絡插件:flannel
- 4、將master節點不能調度的污點去除(非必須,電腦配置低,那master當woker用)
- kubectl taint nodes k8s-master node-role.kubernetes.io/master:NoSchedule-
sudo yum install -y lvm2
一、需要一個未分區即無文件系統的磁盤
- Raw devices (no partitions or formatted filesystems): 原始磁盤,無分區或者格式化
- Raw partitions (no formatted filesystem): 原始分區,無格式化文件系統
虛擬機中如何添加無分區或格式化的硬盤:
- 虛擬機設置,點擊添加
- 選擇硬盤,下一步
- SCSI,下一步
- 創建新虛擬磁盤,下一步
- 選擇容量大小,下一步
- 完成,啟動虛擬機
lsblk -f ## 磁盤清零(如果是雲廠商的掛阿紫的硬盤,必須要低格,不然會遇到問題) dd if=/dev/zero of=/dev/sdb bs=1M status=progress
二、Deploy the Rook Operator(發布Rook控制器)
下載源碼包 git clone --single-branch --branch v1.7.11 https://github.com/rook/rook.git cd rook/cluster/examples/kubernetes/ceph # 1、修改operator.yaml中的鏡像,因為鏡像是在谷歌服務器,國內訪問不到 # ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.4.0" # ROOK_CSI_REGISTRAR_IMAGE: "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0" # ROOK_CSI_RESIZER_IMAGE: "k8s.gcr.io/sig-storage/csi-resizer:v1.3.0" # ROOK_CSI_PROVISIONER_IMAGE: "k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0" # ROOK_CSI_SNAPSHOTTER_IMAGE: "k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0" # ROOK_CSI_ATTACHER_IMAGE: "k8s.gcr.io/sig-storage/csi-attacher:v3.3.0" ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.4.0" ROOK_CSI_REGISTRAR_IMAGE: "registry.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.3.0" ROOK_CSI_RESIZER_IMAGE: "registry.aliyuncs.com/google_containers/csi-resizer:v1.3.0" ROOK_CSI_PROVISIONER_IMAGE: "registry.aliyuncs.com/google_containers/csi-provisioner:v3.0.0" ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.aliyuncs.com/google_containers/csi-snapshotter:v4.2.0" ROOK_CSI_ATTACHER_IMAGE: "registry.aliyuncs.com/google_containers/csi-attacher:v3.3.0" kubectl create -f crds.yaml -f common.yaml -f operator.yaml # verify the rook-ceph-operator is in the `Running` state before proceeding kubectl -n rook-ceph get pod
三、Create a Ceph Cluster(創建ceph集群)
# 修改 cluster.yaml 配置 #注意 mon.count :這個是奇數個,且不能超過k8s工作節點的個數 mon: # Set the number of mons to be started. Generally recommended to be 3. # For highest availability, an odd number of mons should be specified. count: 1 # The mons should be on unique nodes. For production, at least 3 nodes are recommended for this reason. # Mons should only be allowed on the same node for test environments where data loss is acceptable. allowMultiplePerNode: false mgr: # When higher availability of the mgr is needed, increase the count to 2. # In that case, one mgr will be active and one in standby. When Ceph updates which # mgr is active, Rook will update the mgr services to match the active mgr. count: 1 modules: # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules # are already enabled by other settings in the cluster CR. - name: pg_autoscaler enabled: true storage: # cluster level storage configuration and selection useAllNodes: false useAllDevices: false #deviceFilter: config: # crushRoot: "custom-root" # specify a non-default root label for the CRUSH map # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore. # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB # journalSizeMB: "1024" # uncomment if the disks are 20 GB or smaller osdsPerDevice: "2" # this value can be overridden at the node or device level # encryptedDevice: "true" # the default value for this option is "false" # Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named # nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label. nodes: - name: "k8s-master" devices: - name: "sdb" - name: "k8s-node1" devices: - name: "sdb" - name: "k8s-node2" devices: - name: "sdb" # nodes: # - name: "172.17.4.201" # devices: # specific devices to use for storage can be specified for each node # - name: "sdb" # - name: "nvme01" # multiple osds can be created on high performance devices # config: # osdsPerDevice: "5" # - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths # config: # configuration can be specified at the node level which overrides the cluster level config # - name: "172.17.4.301" # deviceFilter: "^sd." # when onlyApplyOSDPlacement is false, will merge both placement.All() and placement.osd # 發布集群 kubectl create -f cluster.yaml # 檢查各組件是否創建成功 kubectl -n rook-ceph get pod NAME READY STATUS RESTARTS AGE csi-cephfsplugin-provisioner-d77bb49c6-n5tgs 5/5 Running 0 140s csi-cephfsplugin-provisioner-d77bb49c6-v9rvn 5/5 Running 0 140s csi-cephfsplugin-rthrp 3/3 Running 0 140s csi-rbdplugin-hbsm7 3/3 Running 0 140s csi-rbdplugin-provisioner-5b5cd64fd-nvk6c 6/6 Running 0 140s csi-rbdplugin-provisioner-5b5cd64fd-q7bxl 6/6 Running 0 140s rook-ceph-crashcollector-minikube-5b57b7c5d4-hfldl 1/1 Running 0 105s rook-ceph-mgr-a-64cd7cdf54-j8b5p 1/1 Running 0 77s rook-ceph-mon-a-694bb7987d-fp9w7 1/1 Running 0 105s rook-ceph-mon-b-856fdd5cb9-5h2qk 1/1 Running 0 94s rook-ceph-mon-c-57545897fc-j576h 1/1 Running 0 85s rook-ceph-operator-85f5b946bd-s8grz 1/1 Running 0 92m rook-ceph-osd-0-6bb747b6c5-lnvb6 1/1 Running 0 23s rook-ceph-osd-1-7f67f9646d-44p7v 1/1 Running 0 24s rook-ceph-osd-2-6cd4b776ff-v4d68 1/1 Running 0 25s rook-ceph-osd-prepare-node1-vx2rz 0/2 Completed 0 60s rook-ceph-osd-prepare-node2-ab3fd 0/2 Completed 0 60s rook-ceph-osd-prepare-node3-w4xyz 0/2 Completed 0 60s
集群創建完后,再次查看掛載的 sdb 磁盤的信息
lsblk -f
可以看到新掛載的無文件系統的磁盤已經被ceph管理,變為ceph的文件系統
四、配置ceph dashboard
默認的ceph 已經安裝的ceph-dashboard,但是其svc地址為service clusterIP,並不能被外部訪問
kubectl apply -f dashboard-external-https.yaml # kubectl get svc -n rook-ceph|grep dashboard rook-ceph-mgr-dashboard ClusterIP 10.107.32.153 <none> 8443/TCP 23m rook-ceph-mgr-dashboard-external-https NodePort 10.106.136.221 <none> 8443:31840/TCP 7m58s
瀏覽器訪問:https://192.168.30.130:31840/
用戶名默認是admin,至於密碼可以通過以下代碼獲取:
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}"|base64 --decode && echo
密碼太復雜了,可以進入系統后再改密碼: admin/admin123
五、塊存儲(RDB)
RDB:RADOS Block Devices
RADOS:Reliable,Autonomic Distributed Object Store
不能是RWX模式
1、部署 CephBlockPool 和StorageClass
# 創建 CephBlockPool cd rook/cluster/examples/kubernetes/ceph kubectl apply -f pool.yaml # 創建 StorageClass 供應商 cd csi/rbd kubectl apply -f storageclass.yaml [root@k8s-master rbd]# kubectl apply -f storageclass.yaml [root@k8s-master rbd]# kubectl get sc -n rook-ceph NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 12s
以后有狀態應用 申明 pvc 只要寫上 storageClassName 為 rook-ceph-block 就可以了,供應商會自動給你分配 pv 存儲
2、案例
# pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rbd-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: rook-ceph-block # pod.yaml apiVersion: v1 kind: Pod metadata: name: csirbd-demo-pod spec: containers: - name: web-server image: nginx volumeMounts: - name: mypvc mountPath: /var/lib/www/html volumes: - name: mypvc persistentVolumeClaim: claimName: rbd-pvc readOnly: false
kubectl apply -f pvc.yaml kubectl apply -f pod.yaml [root@k8s-master rbd]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE rbd-pvc Bound pvc-af052c81-9dad-4597-943f-7b62002e551c 1Gi RWO rook-ceph-block 32m
六、文件存儲(CephFS)
1、部署 CephFilesystem 和 StorageClass
# 部署 CephFilesystem cd rook/cluster/examples/kubernetes/ceph kubectl apply -f filesystem.yaml # 創建 StorageClass 供應商 cd csi/cephfs/ kubectl apply -f storageclass.yaml [root@k8s-master cephfs]# kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 48m rook-cephfs rook-ceph.cephfs.csi.ceph.com Delete Immediate true 17s #storageclass為rook-cephfs的就是共享文件(RWX)的供應商
無狀態應用聲明 storageClassName: rook-cephfs ,供應商直接給你動態分配pv
2、案例:
# nginx.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deploy namespace: default labels: app: nginx spec: selector: matchLabels: app: nginx replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80 name: nginx volumeMounts: - name: localtime mountPath: /etc/localtime - name: nginx-html mountPath: /usr/share/nginx/html volumes: - name: localtime hostPath: path: /usr/share/zoneinfo/Asia/Shanghai - name: nginx-html persistentVolumeClaim: claimName: nginx-html --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nginx-html namespace: default spec: storageClassName: rook-cephfs accessModes: - ReadWriteMany resources: requests: storage: 1Gi --- apiVersion: v1 kind: Service metadata: name: nginx-service labels: app: nginx spec: type: NodePort selector: app: nginx ports: - port: 80 targetPort: 80 nodePort: 32500
kubectl apply -f nginx.yaml [root@k8s-master ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE nginx-html Bound pvc-1beafc19-2687-4394-ae19-a42768f64cd5 1Gi RWX rook-cephfs 24h
然后進入容器,在/usr/share/nginx/html中創建 index.html,並在里面寫入123456,訪問nginx主頁面如下:
/usr/share/nginx/html 是掛載在 ceph 集群的文件系統下面的。
七、ceph客戶端工具 toolbox
rook toolbox可以在Kubernetes集群中作為部署運行,您可以在其中連接並運行任意Ceph命令
cd rook/cluster/examples/kubernetes/ceph # 部署工具 kubectl create -f toolbox.yaml # 查看pod運行狀態,知道為 running 狀態: kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" # 一旦運行成功,你可以使用下面的命令進入容器中使用ceph客戶端命令 kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
Example:
- ceph status
- ceph osd status
- ceph df
- rados df
每次都需要進入pod才能使用客戶端,可以在集群任意節點安裝ceph-common客戶端工具:
1、安裝ceph-common
# 安裝 yum install ceph-common -y
2、進入 toolbox 容器,將 ceph.conf 和 keyring 文件內容拷到 k8s master節點中
# 進入容器 kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash # 查看配置文件 ls /etc/ceph [root@rook-ceph-tools-64fc489556-xvm7s ceph]# ls /etc/ceph ceph.conf keyring
3、在master主機的 /etc/ceph 目錄中新建 ceph.conf 、keyring兩個文件,並且將第二步驟中的文件中的內容拷貝到 對應 文件中;
驗證是否成功:
[root@k8s-master ~]# ceph version ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable)
4、查看系統中的文件系統
[root@k8s-master ~]# ceph fs ls name: myfs, metadata pool: myfs-metadata, data pools: [myfs-data0 ]
八、將ceph集群中的文件系統掛載到master物理節點目錄中
1、master節點新建一個目錄用來掛載cephfs文件系統
mkdir -p /k8s/data
2、掛載文件系統
# Detect the mon endpoints and the user secret for the connection mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}') my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}') # Mount the filesystem mount -t ceph -o mds_namespace=myfs,name=admin,secret=$my_secret $mon_endpoints:/ /k8s/data # See your mounted filesystem df -h 10.96.224.164:6789:/ 11G 0 11G 0% /k8s/data # 查看文件夾中的數據 ls /k8s/data [root@k8s-master 9d4e99e0-0a18-4c21-81e3-4244e03cf274]# ls /k8s/data volumes # 在下面的目錄中可以看到容器nginx中掛載的index.html文件 [root@k8s-master ~]# ls /k8s/data/volumes/csi/csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01/9d4e99e0-0a18-4c21-81e3-4244e03cf274 index.html
進入nginx容器,在 /usr/share/nginx/html 中新建 login.html
[root@k8s-master ~]# kubectl exec -it nginx-deploy-7d7c77bdcf-7bs6k -- /bin/bash root@nginx-deploy-7d7c77bdcf-7bs6k:/# cd /usr/share/nginx/html/ root@nginx-deploy-7d7c77bdcf-7bs6k:/usr/share/nginx/html# ls index.html root@nginx-deploy-7d7c77bdcf-7bs6k:/usr/share/nginx/html# echo 222 >> login.html
退出容器,在主機掛載的目錄上查看是否有login.html
[root@k8s-master ~]# ls /k8s/data/volumes/csi/csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01/9d4e99e0-0a18-4c21-81e3-4244e03cf274 index.html login.html
這樣就可以在本機上查看容器中掛載的目錄了
3、如果想取消掛載,則
umount /k8s/data
4、/k8s/data/volumes/csi/這個目錄下如何區分哪個目錄是哪個pod掛載的?
[root@k8s-master ~]# ls /k8s/data/volumes/csi csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01
首先,查看所有的 pv
kubectl get pv [root@k8s-master ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-1beafc19-2687-4394-ae19-a42768f64cd5 1Gi RWX Delete Bound default/nginx-html rook-cephfs 27h
查看 pvc-1beafc19-2687-4394-ae19-a42768f64cd5 詳情:
kubectl describe pv pvc-1beafc19-2687-4394-ae19-a42768f64cd5 [root@k8s-master ~]# kubectl describe pv pvc-1beafc19-2687-4394-ae19-a42768f64cd5 Name: pvc-1beafc19-2687-4394-ae19-a42768f64cd5 Labels: <none> Annotations: pv.kubernetes.io/provisioned-by: rook-ceph.cephfs.csi.ceph.com Finalizers: [kubernetes.io/pv-protection] StorageClass: rook-cephfs Status: Bound Claim: default/nginx-html Reclaim Policy: Delete Access Modes: RWX VolumeMode: Filesystem Capacity: 1Gi Node Affinity: <none> Message: Source: Type: CSI (a Container Storage Interface (CSI) volume source) Driver: rook-ceph.cephfs.csi.ceph.com VolumeHandle: 0001-0009-rook-ceph-0000000000000001-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01 ReadOnly: false VolumeAttributes: clusterID=rook-ceph fsName=myfs pool=myfs-data0 storage.kubernetes.io/csiProvisionerIdentity=1646222524067-8081-rook-ceph.cephfs.csi.ceph.com subvolumeName=csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01 Events: <none>
如上所示,卷目錄名稱是:subvolumeName=csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01
所以:目錄 /k8s/data/volumes/csi/csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01 中的內容就是 pod 掛載目錄所放的內容。
目前自己部署遇到的坑:
1、我搭建的k8s集群版本:1.21.0,網絡插件為:calico時,部署ceph的operator.yaml,一直不成功,換成flannel插件就成功了。