k8s的ceph分布式存儲方案


ceph官網地址:https://ceph.com/en/

rook官網地址:https://rook.io/

Rook是什么

  • Rook本身並不是一個分布式存儲系統,而是利用 Kubernetes 平台的強大功能,通過 Kubernetes Operator 為每個存儲提供商提供服務。它是一個存儲“編排器”,可以使用不同的后端(例如 Ceph、EdgeFS 等)執行繁重的管理存儲工作,從而抽象出很多復雜性。
  • Rook 將分布式存儲系統轉變為自我管理、自我擴展、自我修復的存儲服務。它自動執行存儲管理員的任務:部署、引導、配置、供應、擴展、升級、遷移、災難恢復、監控和資源管理
  • Rook 編排了多個存儲解決方案,每個解決方案都有一個專門的 Kubernetes Operator 來實現自動化管理。目前支持Ceph、Cassandra、NFS。
  • 目前主流使用的后端是Ceph,Ceph 提供的不僅僅是塊存儲;它還提供與 S3/Swift 兼容的對象存儲和分布式文件系統。Ceph 可以將一個卷的數據分布在多個磁盤上,因此可以讓一個卷實際使用比單個磁盤更多的磁盤空間,這很方便。當向集群添加更多磁盤時,它會自動在磁盤之間重新平衡/重新分配數據。

 

ceph-rook 與k8s集成方式

  • Rook 是一個開源的cloud-native storage編排, 提供平台和框架;為各種存儲解決方案提供平台、框架和支持,以便與雲原生環境本地集成。
  • Rook 將存儲軟件轉變為自我管理、自我擴展和自我修復的存儲服務,它通過自動化部署、引導、配置、置備、擴展、升級、遷移、災難恢復、監控和資源管理來實現此目的。
  • Rook 使用底層雲本機容器管理、調度和編排平台提供的工具來實現它自身的功能。
  • Rook 目前支持Ceph、NFS、Minio Object Store和CockroachDB。
  • Rook使用Kubernetes原語使Ceph存儲系統能夠在Kubernetes上運行

 

rook + ceph 提供的能力

  • Rook 幫我們創建好StorageClass
    • pvc 只需要指定存儲類,Rook 自動調用 StorageClass 里面的 Provisioner 供應商,接下來對 ceph 集群操作
  • Ceph
    • Block:塊存儲。RWO(ReadWriteOnce)單節點讀寫【一個Pod操作一個自己專屬的讀寫區】,適用於(有狀態副本及)
    • Share FS(Ceph File System):共享存儲。RWX(ReadWriteMany)多節點讀寫【多個Pod操作同一個存儲區,可讀可寫】,適用於無狀態應用。(文件系統)
    • 總結:無狀態應用隨意復制多少份,一定用到RWX能力。有狀態應用復制任意份,每份都只是讀寫自己的存儲,用到RWO(優先)或者RWX。
  • 直接通過Rook可以使用到任意能力的存儲。

 

前置環境:

實驗環境:

  • 1、vmware虛擬機部署了2台centos7機器,一個master節點,一個woker節點
  • 2、k8s版本:1.17.4
  • 3、網絡插件:flannel
  • 4、將master節點不能調度的污點去除(非必須,電腦配置低,那master當woker用)
    • kubectl taint nodes k8s-master node-role.kubernetes.io/master:NoSchedule-
sudo yum install -y lvm2

一、需要一個未分區即無文件系統的磁盤

  • Raw devices (no partitions or formatted filesystems): 原始磁盤,無分區或者格式化
  • Raw partitions (no formatted filesystem): 原始分區,無格式化文件系統

 

虛擬機中如何添加無分區或格式化的硬盤:

  • 虛擬機設置,點擊添加

  • 選擇硬盤,下一步
  • SCSI,下一步
  • 創建新虛擬磁盤,下一步
  • 選擇容量大小,下一步
  • 完成,啟動虛擬機
lsblk -f

## 磁盤清零(如果是雲廠商的掛阿紫的硬盤,必須要低格,不然會遇到問題)
dd if=/dev/zero of=/dev/sdb bs=1M status=progress

二、Deploy the Rook Operator(發布Rook控制器)

下載源碼包
git clone --single-branch --branch v1.7.11 https://github.com/rook/rook.git

cd rook/cluster/examples/kubernetes/ceph

# 1、修改operator.yaml中的鏡像,因為鏡像是在谷歌服務器,國內訪問不到
  # ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.4.0"
  # ROOK_CSI_REGISTRAR_IMAGE: "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0"
  # ROOK_CSI_RESIZER_IMAGE: "k8s.gcr.io/sig-storage/csi-resizer:v1.3.0"
  # ROOK_CSI_PROVISIONER_IMAGE: "k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0"
  # ROOK_CSI_SNAPSHOTTER_IMAGE: "k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0"
  # ROOK_CSI_ATTACHER_IMAGE: "k8s.gcr.io/sig-storage/csi-attacher:v3.3.0"
  
  ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.4.0"
  ROOK_CSI_REGISTRAR_IMAGE: "registry.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.3.0"
  ROOK_CSI_RESIZER_IMAGE: "registry.aliyuncs.com/google_containers/csi-resizer:v1.3.0"
  ROOK_CSI_PROVISIONER_IMAGE: "registry.aliyuncs.com/google_containers/csi-provisioner:v3.0.0"
  ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.aliyuncs.com/google_containers/csi-snapshotter:v4.2.0"
  ROOK_CSI_ATTACHER_IMAGE: "registry.aliyuncs.com/google_containers/csi-attacher:v3.3.0"

kubectl create -f crds.yaml -f common.yaml -f operator.yaml

# verify the rook-ceph-operator is in the `Running` state before proceeding
kubectl -n rook-ceph get pod

三、Create a Ceph Cluster(創建ceph集群)

# 修改 cluster.yaml 配置
#注意 mon.count :這個是奇數個,且不能超過k8s工作節點的個數
    mon:
    # Set the number of mons to be started. Generally recommended to be 3.
    # For highest availability, an odd number of mons should be specified.
    count: 1 
    # The mons should be on unique nodes. For production, at least 3 nodes are recommended for this reason.
    # Mons should only be allowed on the same node for test environments where data loss is acceptable.
    allowMultiplePerNode: false
  mgr:
    # When higher availability of the mgr is needed, increase the count to 2.
    # In that case, one mgr will be active and one in standby. When Ceph updates which
    # mgr is active, Rook will update the mgr services to match the active mgr.
    count: 1
    modules:
      # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
      # are already enabled by other settings in the cluster CR.
      - name: pg_autoscaler
        enabled: true





  storage: # cluster level storage configuration and selection
    useAllNodes: false
    useAllDevices: false
    #deviceFilter:
    config:
      # crushRoot: "custom-root" # specify a non-default root label for the CRUSH map
      # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
      # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
      # journalSizeMB: "1024"  # uncomment if the disks are 20 GB or smaller
      osdsPerDevice: "2" # this value can be overridden at the node or device level
      # encryptedDevice: "true" # the default value for this option is "false"
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
    nodes:
      - name: "k8s-master"
        devices:
          - name: "sdb"
      - name: "k8s-node1"
        devices:
          - name: "sdb"
      - name: "k8s-node2"
        devices:
          - name: "sdb"
    # nodes:
    #   - name: "172.17.4.201"
    #     devices: # specific devices to use for storage can be specified for each node
    #       - name: "sdb"
    #       - name: "nvme01" # multiple osds can be created on high performance devices
    #         config:
    #           osdsPerDevice: "5"
    #       - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
    #     config: # configuration can be specified at the node level which overrides the cluster level config
    #   - name: "172.17.4.301"
    #     deviceFilter: "^sd."
    # when onlyApplyOSDPlacement is false, will merge both placement.All() and placement.osd
   

# 發布集群
kubectl create -f cluster.yaml

# 檢查各組件是否創建成功
kubectl -n rook-ceph get pod

NAME                                                 READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-provisioner-d77bb49c6-n5tgs         5/5     Running     0          140s
csi-cephfsplugin-provisioner-d77bb49c6-v9rvn         5/5     Running     0          140s
csi-cephfsplugin-rthrp                               3/3     Running     0          140s
csi-rbdplugin-hbsm7                                  3/3     Running     0          140s
csi-rbdplugin-provisioner-5b5cd64fd-nvk6c            6/6     Running     0          140s
csi-rbdplugin-provisioner-5b5cd64fd-q7bxl            6/6     Running     0          140s
rook-ceph-crashcollector-minikube-5b57b7c5d4-hfldl   1/1     Running     0          105s
rook-ceph-mgr-a-64cd7cdf54-j8b5p                     1/1     Running     0          77s
rook-ceph-mon-a-694bb7987d-fp9w7                     1/1     Running     0          105s
rook-ceph-mon-b-856fdd5cb9-5h2qk                     1/1     Running     0          94s
rook-ceph-mon-c-57545897fc-j576h                     1/1     Running     0          85s
rook-ceph-operator-85f5b946bd-s8grz                  1/1     Running     0          92m
rook-ceph-osd-0-6bb747b6c5-lnvb6                     1/1     Running     0          23s
rook-ceph-osd-1-7f67f9646d-44p7v                     1/1     Running     0          24s
rook-ceph-osd-2-6cd4b776ff-v4d68                     1/1     Running     0          25s
rook-ceph-osd-prepare-node1-vx2rz                    0/2     Completed   0          60s
rook-ceph-osd-prepare-node2-ab3fd                    0/2     Completed   0          60s
rook-ceph-osd-prepare-node3-w4xyz                    0/2     Completed   0          60s

集群創建完后,再次查看掛載的 sdb 磁盤的信息

lsblk -f

可以看到新掛載的無文件系統的磁盤已經被ceph管理,變為ceph的文件系統

四、配置ceph dashboard

默認的ceph 已經安裝的ceph-dashboard,但是其svc地址為service clusterIP,並不能被外部訪問

kubectl apply -f dashboard-external-https.yaml

# kubectl get svc -n rook-ceph|grep dashboard

rook-ceph-mgr-dashboard                  ClusterIP   10.107.32.153    <none>        8443/TCP            23m
rook-ceph-mgr-dashboard-external-https   NodePort    10.106.136.221   <none>        8443:31840/TCP      7m58s

瀏覽器訪問:https://192.168.30.130:31840/

用戶名默認是admin,至於密碼可以通過以下代碼獲取:

kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}"|base64 --decode && echo

密碼太復雜了,可以進入系統后再改密碼: admin/admin123

五、塊存儲(RDB)

RDB:RADOS Block Devices

RADOS:Reliable,Autonomic Distributed Object Store

不能是RWX模式

 

1、部署 CephBlockPool 和StorageClass

# 創建 CephBlockPool
cd rook/cluster/examples/kubernetes/ceph
kubectl apply -f pool.yaml

# 創建 StorageClass 供應商
cd csi/rbd
kubectl apply -f storageclass.yaml

[root@k8s-master rbd]# kubectl apply -f storageclass.yaml 
[root@k8s-master rbd]# kubectl get sc -n rook-ceph
NAME              PROVISIONER                  RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block   rook-ceph.rbd.csi.ceph.com   Delete          Immediate           true                   12s

以后有狀態應用 申明 pvc 只要寫上 storageClassName 為 rook-ceph-block 就可以了,供應商會自動給你分配 pv 存儲

 

2、案例

# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block
  
# pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: csirbd-demo-pod
spec:
  containers:
    - name: web-server
      image: nginx
      volumeMounts:
        - name: mypvc
          mountPath: /var/lib/www/html
  volumes:
    - name: mypvc
      persistentVolumeClaim:
        claimName: rbd-pvc
        readOnly: false
kubectl apply -f pvc.yaml
kubectl apply -f pod.yaml

[root@k8s-master rbd]# kubectl get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rbd-pvc   Bound    pvc-af052c81-9dad-4597-943f-7b62002e551c   1Gi        RWO            rook-ceph-block   32m

六、文件存儲(CephFS)

1、部署 CephFilesystem 和 StorageClass

# 部署 CephFilesystem
cd rook/cluster/examples/kubernetes/ceph
kubectl apply -f filesystem.yaml

# 創建 StorageClass 供應商
cd csi/cephfs/
kubectl apply -f storageclass.yaml

[root@k8s-master cephfs]# kubectl get sc
NAME              PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block   rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   48m
rook-cephfs       rook-ceph.cephfs.csi.ceph.com   Delete          Immediate           true                   17s

#storageclass為rook-cephfs的就是共享文件(RWX)的供應商

無狀態應用聲明 storageClassName: rook-cephfs ,供應商直接給你動態分配pv

2、案例:

# nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name:  nginx-deploy
  namespace: default
  labels:
    app:  nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app:  nginx
    spec:
      containers:
      - name:  nginx
        image:  nginx:latest
        ports:
        - containerPort:  80
          name:  nginx
        volumeMounts:
        - name: localtime
          mountPath: /etc/localtime
        - name: nginx-html
          mountPath: /usr/share/nginx/html
      volumes:
        - name: localtime
          hostPath:
            path: /usr/share/zoneinfo/Asia/Shanghai
        - name: nginx-html
          persistentVolumeClaim:
            claimName: nginx-html
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nginx-html
  namespace: default
spec:
  storageClassName: rook-cephfs
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  labels:
    app: nginx
spec:
  type: NodePort
  selector:
    app: nginx
  ports:
  - port: 80
    targetPort: 80
    nodePort: 32500
kubectl apply -f nginx.yaml

[root@k8s-master ~]# kubectl get pvc
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nginx-html   Bound    pvc-1beafc19-2687-4394-ae19-a42768f64cd5   1Gi        RWX            rook-cephfs    24h

然后進入容器,在/usr/share/nginx/html中創建 index.html,並在里面寫入123456,訪問nginx主頁面如下:

/usr/share/nginx/html 是掛載在 ceph 集群的文件系統下面的。

七、ceph客戶端工具 toolbox

rook toolbox可以在Kubernetes集群中作為部署運行,您可以在其中連接並運行任意Ceph命令

cd rook/cluster/examples/kubernetes/ceph

# 部署工具
kubectl create -f toolbox.yaml

# 查看pod運行狀態,知道為 running 狀態:
kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"

# 一旦運行成功,你可以使用下面的命令進入容器中使用ceph客戶端命令
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

Example:

  • ceph status
  • ceph osd status
  • ceph df
  • rados df

 

每次都需要進入pod才能使用客戶端,可以在集群任意節點安裝ceph-common客戶端工具:

1、安裝ceph-common

# 安裝
yum install ceph-common -y

2、進入 toolbox 容器,將 ceph.conf 和 keyring 文件內容拷到 k8s master節點中

# 進入容器
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
# 查看配置文件
ls /etc/ceph

[root@rook-ceph-tools-64fc489556-xvm7s ceph]# ls /etc/ceph
ceph.conf  keyring

3、在master主機的 /etc/ceph 目錄中新建 ceph.conf 、keyring兩個文件,並且將第二步驟中的文件中的內容拷貝到 對應 文件中;

驗證是否成功:

[root@k8s-master ~]# ceph version
ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable)

4、查看系統中的文件系統

[root@k8s-master ~]# ceph fs ls
name: myfs, metadata pool: myfs-metadata, data pools: [myfs-data0 ]

八、將ceph集群中的文件系統掛載到master物理節點目錄中

1、master節點新建一個目錄用來掛載cephfs文件系統

mkdir -p /k8s/data

2、掛載文件系統

# Detect the mon endpoints and the user secret for the connection
mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')

# Mount the filesystem
mount -t ceph -o mds_namespace=myfs,name=admin,secret=$my_secret $mon_endpoints:/ /k8s/data

# See your mounted filesystem
df -h

10.96.224.164:6789:/      11G     0   11G    0% /k8s/data

# 查看文件夾中的數據
ls /k8s/data

[root@k8s-master 9d4e99e0-0a18-4c21-81e3-4244e03cf274]# ls /k8s/data
volumes

# 在下面的目錄中可以看到容器nginx中掛載的index.html文件
[root@k8s-master ~]# ls /k8s/data/volumes/csi/csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01/9d4e99e0-0a18-4c21-81e3-4244e03cf274
index.html

進入nginx容器,在 /usr/share/nginx/html 中新建 login.html

[root@k8s-master ~]# kubectl exec -it nginx-deploy-7d7c77bdcf-7bs6k -- /bin/bash
root@nginx-deploy-7d7c77bdcf-7bs6k:/# cd /usr/share/nginx/html/
root@nginx-deploy-7d7c77bdcf-7bs6k:/usr/share/nginx/html# ls
index.html
root@nginx-deploy-7d7c77bdcf-7bs6k:/usr/share/nginx/html# echo 222 >> login.html

退出容器,在主機掛載的目錄上查看是否有login.html

[root@k8s-master ~]# ls /k8s/data/volumes/csi/csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01/9d4e99e0-0a18-4c21-81e3-4244e03cf274
index.html  login.html

這樣就可以在本機上查看容器中掛載的目錄了

3、如果想取消掛載,則

umount /k8s/data

4、/k8s/data/volumes/csi/這個目錄下如何區分哪個目錄是哪個pod掛載的?

[root@k8s-master ~]# ls /k8s/data/volumes/csi
csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01

首先,查看所有的 pv

kubectl get pv

[root@k8s-master ~]# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                STORAGECLASS   REASON   AGE
pvc-1beafc19-2687-4394-ae19-a42768f64cd5   1Gi        RWX            Delete           Bound    default/nginx-html   rook-cephfs             27h

查看 pvc-1beafc19-2687-4394-ae19-a42768f64cd5 詳情:

kubectl describe pv pvc-1beafc19-2687-4394-ae19-a42768f64cd5

[root@k8s-master ~]# kubectl describe pv pvc-1beafc19-2687-4394-ae19-a42768f64cd5
Name:            pvc-1beafc19-2687-4394-ae19-a42768f64cd5
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: rook-ceph.cephfs.csi.ceph.com
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    rook-cephfs
Status:          Bound
Claim:           default/nginx-html
Reclaim Policy:  Delete
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        1Gi
Node Affinity:   <none>
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            rook-ceph.cephfs.csi.ceph.com
    VolumeHandle:      0001-0009-rook-ceph-0000000000000001-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01
    ReadOnly:          false
    VolumeAttributes:      clusterID=rook-ceph
                           fsName=myfs
                           pool=myfs-data0
                           storage.kubernetes.io/csiProvisionerIdentity=1646222524067-8081-rook-ceph.cephfs.csi.ceph.com
                           subvolumeName=csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01
Events:                <none>

如上所示,卷目錄名稱是:subvolumeName=csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01

所以:目錄 /k8s/data/volumes/csi/csi-vol-3ca5cb69-9a9b-11ec-97b1-9a59f1d96a01 中的內容就是 pod 掛載目錄所放的內容。

 

目前自己部署遇到的坑:

1、我搭建的k8s集群版本:1.21.0,網絡插件為:calico時,部署ceph的operator.yaml,一直不成功,換成flannel插件就成功了。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM