在k8s集群中安裝rook-ceph 1.8版本步驟


官方文檔地址:https://rook.io/docs/rook/v1.8/quickstart.html

Kubernetes 最小版本號

Kubernetes 最小版本號:Kubernetes v1.16

前提

  • Kubernetes 集群各節點主機安裝 lvm2 軟件:yum -y install lvm2
  • Kubernetes 集群各節點主機內核版本不低於4.17
  • Kubernetes 集群有至少 3 個工作節點(master和worker),且每個工作節點都有一塊初系統盤以外的 未格式化 的裸盤(工作節點是虛擬機時,未格式化的裸盤可以是虛擬磁盤),用於創建 3 個 Ceph OSD;
  • 也可以只有 1 個工作節點,掛載了一塊 未格式化 的裸盤;
  • 在節點機器上執行 lsblk -f 指令可以查看磁盤是否需被格式化,輸出結果如下:
NAME                  FSTYPE      LABEL UUID                                   MOUNTPOINT
vda
└─vda1                LVM2_member       >eSO50t-GkUV-YKTH-WsGq-hNJY-eKNf-3i07IB
 ├─ubuntu--vg-root   ext4              c2366f76-6e21-4f10-a8f3-6776212e2fe4   /
 └─ubuntu--vg-swap_1 swap              9492a3dc-ad75-47cd-9596-678e8cf17ff9   [SWAP]
vdb

如果 FSTYPE 字段不為空,則表示該磁盤上已經被格式化。在上面的例子中,可以將磁盤 vdb 用於 Ceph 的 OSD,而磁盤 vda 及其分區則不能用做 Ceph 的 OSD。

下載yaml文件

從官方GitHub上下載過來的壓縮包:https://files.cnblogs.com/files/sanduzxcvbnm/rook-1.8.1.zip

# 有壓縮包了,這一步可以省略
# git clone --single-branch --branch v1.8.1 https://github.com/rook/rook.git

事先下載好yaml文件使用的鏡像

每個K8s集群主機都需要下載

里面使用到的yaml文件中有些鏡像會從k8s.gcr.io中下載,這里事先從GitHub下載同名鏡像,然后重新tag,使其符合要求
若不事先執行這一步,直接操作執行后面那些yaml文件后會發現有些pod啟動失敗,查看詳情得知鏡像是從k8s.gcr.io中下載導致的,鏡像下載不下來導致有關pod啟動失敗
在這里事先下載好這些鏡像了(從文件operator.yaml中可以事先獲取到,這樣就避免真正執行yaml的時候發現pod報錯再解決了)

  # The default version of CSI supported by Rook will be started. To change the version
  # of the CSI driver to something other than what is officially supported, change
  # these images to the desired release of the CSI driver.
  # ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.4.0"
  # ROOK_CSI_REGISTRAR_IMAGE: "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0"
  # ROOK_CSI_RESIZER_IMAGE: "k8s.gcr.io/sig-storage/csi-resizer:v1.3.0"
  # ROOK_CSI_PROVISIONER_IMAGE: "k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0"
  # ROOK_CSI_SNAPSHOTTER_IMAGE: "k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0"
  # ROOK_CSI_ATTACHER_IMAGE: "k8s.gcr.io/sig-storage/csi-attacher:v3.3.0"
docker pull rook/ceph:v1.8.1
docker pull quay.io/ceph/ceph:v16.2.7
docker pull quay.io/cephcsi/cephcsi:v3.4.0

docker pull liangjw/csi-node-driver-registrar:v2.3.0
docker tag liangjw/csi-node-driver-registrar:v2.3.0 k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0

docker pull liangjw/csi-provisioner:v3.0.0
docker tag liangjw/csi-provisioner:v3.0.0 k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0

docker pull liangjw/csi-resizer:v1.3.0
docker tag liangjw/csi-resizer:v1.3.0 k8s.gcr.io/sig-storage/csi-resizer:v1.3.0

docker pull liangjw/csi-attacher:v3.3.0
docker tag liangjw/csi-attacher:v3.3.0 k8s.gcr.io/sig-storage/csi-attacher:v3.3.0

docker pull liangjw/csi-snapshotter:v4.2.0
docker tag liangjw/csi-snapshotter:v4.2.0 k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0

初始化

cd rook/deploy/examples
kubectl create -f crds.yaml -f common.yaml -f operator.yaml

# verify the rook-ceph-operator is in the `Running` state before proceeding
kubectl -n rook-ceph get pod

安裝rook-ceph

kubectl create -f cluster.yaml
kubectl -n rook-ceph get pod

使用toolbox工具箱驗證rook-ceph狀態

kubectl create -f deploy/examples/toolbox.yaml (刪除:kubectl -n rook-ceph delete deploy/rook-ceph-tools)
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
# 進去到toolbox中后可以執行如下四個命令
  ceph status
  ceph osd status
  ceph df
  rados df

# 執行ceph status命令后的結果顯示
 cluster:
   id:     a0452c76-30d9-4c1a-a948-5d8405f19a7c
   health: HEALTH_OK

 services:
   mon: 3 daemons, quorum a,b,c (age 3m)
   mgr: a(active, since 2m)
   osd: 3 osds: 3 up (since 1m), 3 in (since 1m)
...

web頁面訪問

在cluster.yaml文件中默認已經開啟了dashboard,默認端口是8443,不過對應的service類型是ClusterIP,只能集群內部訪問,無法通過外網訪問

  spec:
    dashboard:
      enabled: true

# kubectl -n rook-ceph get service
NAME                                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
rook-ceph-mgr                           ClusterIP   10.108.111.192   <none>        9283/TCP         4h
rook-ceph-mgr-dashboard                 ClusterIP   10.110.113.240   <none>        8443/TCP         4h

外部用戶訪問dashboard,有Ingress Controller和其他方式等,其他方式比如有NodePort, LoadBalancer和ExternalIPs 這三種方式,這里采用Node Port方式

Node Port

https形式:dashboard-external-https.yaml
http形式:dashboard-external-http.yaml

# kubectl create -f dashboard-external-https.yaml

# kubectl -n rook-ceph get service
NAME                                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
rook-ceph-mgr                           ClusterIP   10.108.111.192   <none>        9283/TCP         4h
rook-ceph-mgr-dashboard                 ClusterIP   10.110.113.240   <none>        8443/TCP         4h
rook-ceph-mgr-dashboard-external-https  NodePort    10.101.209.6     <none>        8443:31176/TCP   4h

# 訪問地址:https://ip<Node Port>:31176

# 默認用戶:admin,密碼:使用下面的命令獲取
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

LoadBalancer

文件:dashboard-loadbalancer.yaml

# kubectl create -f dashboard-loadbalancer.yaml
# kubectl -n rook-ceph get service
NAME                                     TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)             AGE
rook-ceph-mgr                            ClusterIP      172.30.11.40     <none>                                                                    9283/TCP            4h
rook-ceph-mgr-dashboard                  ClusterIP      172.30.203.185   <none>                                                                    8443/TCP            4h
rook-ceph-mgr-dashboard-loadbalancer     LoadBalancer   172.30.27.242    a7f23e8e2839511e9b7a5122b08f2038-1251669398.us-east-1.elb.amazonaws.com   8443:32747/TCP      4h

# 訪問地址:https://a7f23e8e2839511e9b7a5122b08f2038-1251669398.us-east-1.elb.amazonaws.com:8443

Ingress Controller

nginx Ingress Controlle ,文件:dashboard-ingress-https.yaml

# 替換文件中的域名,host: rook-ceph.example.com為實際使用的域名

# kubectl create -f dashboard-ingress-https.yaml
# kubectl -n rook-ceph get ingress
NAME                      HOSTS                      ADDRESS   PORTS     AGE
rook-ceph-mgr-dashboard   rook-ceph.example.com      80, 443   5m

# kubectl -n rook-ceph get secret rook-ceph.example.com
NAME                       TYPE                DATA      AGE
rook-ceph.example.com      kubernetes.io/tls   2         4m

# 訪問地址:https://rook-ceph.example.com/

基於rook-ceph創建共享文件系統cephfs,名稱是myfs

共享文件系統可以通過多個POD的讀/寫權限裝載

# kubectl create -f filesystem.yaml
# kubectl -n rook-ceph get pod -l app=rook-ceph-mds
NAME                                      READY     STATUS    RESTARTS   AGE
rook-ceph-mds-myfs-7d59fdfcf4-h8kw9       1/1       Running   0          12s
rook-ceph-mds-myfs-7d59fdfcf4-kgkjp       1/1       Running   0          12s

# kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
ceph status

 ...
 services:
   mds: myfs-1/1/1 up {[myfs:0]=mzw58b=up:active}, 1 up:standby-replay

基於文件系統創建StorageClass

kubectl create -f deploy/examples/csi/cephfs/storageclass.yaml

基於文件系統創建StorageClass舉例使用

kubectl create -f deploy/examples/csi/cephfs/kube-registry.yaml

kubectl delete -f deploy/examples/csi/cephfs/kube-registry.yaml
# 前提:filesystem.yaml文件中preserveFilesystemOnDelete: true (默認)

刪除基於rook-ceph創建共享文件系統cephfs,名稱是myfs

kubectl -n rook-ceph delete cephfilesystem myfs

基於rook-ceph創建塊設備,名稱是rook-ceph-block

塊存儲允許單個pod安裝存儲
在Rook調配存儲之前,需要創建StorageClass和CephBlockPool。這將允許Kubernetes在配置持久卷時與Rook進行互操作。

注意:此示例要求每個節點至少有1個OSD,每個OSD位於3個不同的節點上。
每個OSD必須位於不同的節點上,因為failureDomain設置為host,並且已復制。大小設置為3。

kubectl create -f deploy/examples/csi/rbd/storageclass.yaml

舉例

這一步相當於是創建倆應用,每個應用使用一個快設備,從塊存儲中獲取一個20G的磁盤

cd deploy/examples
kubectl create -f mysql.yaml
kubectl create -f wordpress.yaml

kubectl get pvc
NAME             STATUS    VOLUME                                     CAPACITY   ACCESSMODES   AGE
mysql-pv-claim   Bound     pvc-95402dbc-efc0-11e6-bc9a-0cc47a3459ee   20Gi       RWO           1m
wp-pv-claim      Bound     pvc-39e43169-efc1-11e6-bc9a-0cc47a3459ee   20Gi       RWO           1m

kubectl get svc wordpress
NAME        CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
wordpress   10.3.0.155   <pending>     80:30841/TCP   2m

# echo http://$(minikube ip):$(kubectl get service wordpress -o jsonpath='{.spec.ports[0].nodePort}')
# 訪問地址:http://集群任意節點ip:30841

kubectl delete -f wordpress.yaml
kubectl delete -f mysql.yaml

刪除塊設備

# 等同於直接執行文件命令:kubectl delete  -f deploy/examples/csi/rbd/storageclass.yaml
kubectl delete -n rook-ceph cephblockpools.ceph.rook.io replicapool
kubectl delete storageclass rook-ceph-block

基於rook-ceph創建對象存儲,名稱是rook-ceph-rgw

官方文檔:https://rook.io/docs/rook/v1.8/ceph-object.html

注意:此示例至少需要3個bluestore OSD,每個OSD位於不同的節點上。
OSD必須位於不同的節點上,因為failureDomain設置為host,並且erasureCoded區塊設置需要至少3個不同的OSD(2個數據區塊+1個編碼區塊)。

Create the object store

cd deploy/examples
kubectl create -f object.yaml

# To confirm the object store is configured, wait for the rgw pod to start
kubectl -n rook-ceph get pod -l app=rook-ceph-rgw
NAME                                        READY   STATUS    RESTARTS   AGE
rook-ceph-rgw-my-store-a-67c588c977-h6wc6   1/1     Running   0          22s

連接到其他對象存儲的網關 (使用其他的存儲對象網關,不是這次要創建的存儲對象網關) (這一步可以不用操作)

kubectl create -f object-external.yaml
ceph-object-controller: ceph object store gateway service >running at 10.100.28.138:8080

kubectl -n rook-ceph get svc -l app=rook-ceph-rgw
NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
rook-ceph-rgw-my-store   ClusterIP   10.100.28.138   <none>        8080/TCP   6h59m

# 集群中的任何pod現在都可以訪問此端點:
$ curl 10.100.28.138:8080
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>

# 也可以使用內部注冊的DNS名稱:
curl rook-ceph-rgw-my-store.rook-ceph:8080
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>

# DNS名稱是使用以下模式:rook-ceph-rgw-$STORE_NAME.$NAMESPACE

創建一個桶

kubectl create -f storageclass-bucket-delete.yaml (set the reclaim policy to delete the bucket and all objects when its OBC is deleted.)
# kubectl create -f storageclass-bucket-retain.yaml (set the reclaim policy to retain the bucket when its OBC is deleted.)
kubectl create -f object-bucket-claim-delete.yaml

客戶端連接,這一步可以跳過

官方文檔中建立的rook-ceph是基於aws,因此這里使用的是AWS,不影響使用,只是表述顯示的問題,當然也可以換成其他名稱來使用,這里只是舉例說明

#config-map, secret, OBC will part of default if no specific name space mentioned
# ceph-delete-bucket根據上一步的結果而定,官方頁面給的是ceph-bucket找不到
export AWS_HOST=$(kubectl -n default get cm ceph-delete-bucket -o jsonpath='{.data.BUCKET_HOST}')
export AWS_ACCESS_KEY_ID=$(kubectl -n default get secret ceph-delete-bucket -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 --decode)
export AWS_SECRET_ACCESS_KEY=$(kubectl -n default get secret cceph-delete-bucket -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 --decode)

# 實際執行后的結果顯示如下:
export AWS_HOST=rook-ceph-rgw-my-store.rook-ceph.svc
export AWS_ACCESS_KEY_ID=RIEOBDSNISG4YPIJ4PWR
export AWS_SECRET_ACCESS_KEY=GEntrTD8Z6k1zM82h9Vj9VeWCZH0JKejYYVCbbsK

通過toolbox來使用對象存儲

export AWS_HOST=<host>
export AWS_ENDPOINT=<endpoint>
export AWS_ACCESS_KEY_ID=<accessKey>
export AWS_SECRET_ACCESS_KEY=<secretKey>
  • Host: The DNS host name where the rgw service is found in the cluster. Assuming you are using the default rook-ceph cluster, it will be rook-ceph-rgw-my-store.rook-ceph.
  • Endpoint: The endpoint where the rgw service is listening. Run kubectl -n rook-ceph get svc rook-ceph-rgw-my-store, then combine the clusterIP and the port.
  • Access key: The user’s access_key as printed above
  • Secret key: The user’s secret_key as printed above

Endpoint=172.16.123.52:80 (獲取命令:kubectl -n rook-ceph get svc rook-ceph-rgw-my-store)

# 實際執行后獲取的參數信息
export AWS_HOST=rook-ceph-rgw-my-store.rook-ceph.svc
export AWS_ENDPOINT=172.16.123.52:80
export AWS_ACCESS_KEY_ID=RIEOBDSNISG4YPIJ4PWR
export AWS_SECRET_ACCESS_KEY=GEntrTD8Z6k1zM82h9Vj9VeWCZH0JKejYYVCbbsK

# 官方文檔頁面給的展示的數據樣式
export AWS_HOST=rook-ceph-rgw-my-store.rook-ceph
export AWS_ENDPOINT=10.104.35.31:80
export AWS_ACCESS_KEY_ID=XEZDB3UJ6X7HVBE7X7MA
export AWS_SECRET_ACCESS_KEY=7yGIZON7EhFORz0I40BFniML36D2rl8CQQ5kXU6l

Configure s5cmd

To test the CephObjectStore, set the object store credentials in the toolbox pod for the s5cmd tool.

# 使用toolbox pod進行測試
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

export AWS_HOST=rook-ceph-rgw-my-store.rook-ceph.svc
export AWS_ENDPOINT=172.16.123.52:80
export AWS_ACCESS_KEY_ID=RIEOBDSNISG4YPIJ4PWR
export AWS_SECRET_ACCESS_KEY=GEntrTD8Z6k1zM82h9Vj9VeWCZH0JKejYYVCbbsK

mkdir ~/.aws
cat > ~/.aws/credentials << EOF
[default]
aws_access_key_id = ${AWS_ACCESS_KEY_ID}
aws_secret_access_key = ${AWS_SECRET_ACCESS_KEY}
EOF

PUT or GET an object

echo "Hello Rook" > /tmp/rookObj
s5cmd --endpoint-url http://$AWS_ENDPOINT cp /tmp/rookObj s3://rookbucket

# 報錯:ERROR "cp /tmp/rookObj s3://rookbucket/rookObj": NotFound: Not Found status code: 404, request id: tx00000697d87623a91fcc6-0061c2e235-e056-my-store, host id:

s5cmd --endpoint-url http://$AWS_ENDPOINT cp s3://rookbucket/rookObj /tmp/rookObj-download
cat /tmp/rookObj-download

外網訪問對象存儲

Rook設置對象存儲,以便POD可以訪問集群內部。如果應用程序在集群外運行,則需要通過NodePort設置外部服務。

kubectl -n rook-ceph get service rook-ceph-rgw-my-store
NAME                     CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
rook-ceph-rgw-my-store   10.3.0.177   <none>        80/TCP      2m

kubectl create -f rgw-external.yaml # 注意:文檔中使用的內部端口是8080,需要修改成80端口

kubectl -n rook-ceph get service rook-ceph-rgw-my-store rook-ceph-rgw-my-store-external
NAME                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
rook-ceph-rgw-my-store            ClusterIP   10.104.82.228    <none>        80/TCP         4m
rook-ceph-rgw-my-store-external   NodePort    10.111.113.237   <none>        80:31536/TCP   39s

在內部,rgw服務正在端口80上運行。本例中的外部端口為31536。現在,您可以從任何地方訪問CephObjectStore!您只需要集群中任何計算機的主機名、外部端口和用戶憑據。

創建用戶

kubectl create -f object-user.yaml

kubectl -n rook-ceph describe secret rook-ceph-object-user-my-store-my-user
Name:		rook-ceph-object-user-my-store-my-user
Namespace:	rook-ceph
Labels:		app=rook-ceph-rgw
	        rook_cluster=rook-ceph
		rook_object_store=my-store
Annotations:	<none>

Type:	kubernetes.io/rook

Data
====
AccessKey:	20 bytes
SecretKey:	40 bytes

獲取用戶訪問對象存儲使用的AccessKey和SecretKey

kubectl -n rook-ceph get secret rook-ceph-object-user-my-store-my-user -o jsonpath='{.data.AccessKey}' | base64 --decode
kubectl -n rook-ceph get secret rook-ceph-object-user-my-store-my-user -o jsonpath='{.data.SecretKey}' | base64 --decode

使用prometheus監控rook-ceph

Prometheus Operator

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.40.0/bundle.yaml
kubectl get pod

使用客戶端進行訪問驗證還有待進一步研究

Prometheus Instances

cd rook/deploy/examples/monitoring
kubectl create -f service-monitor.yaml
kubectl create -f prometheus.yaml
kubectl create -f prometheus-service.yaml

kubectl -n rook-ceph get pod prometheus-rook-prometheus-0

Prometheus Web Console

echo "http://$(kubectl -n rook-ceph -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"

Prometheus Alerts

kubectl create -f deploy/examples/monitoring/rbac.yaml

# 修改cluster.yaml文件,開啟監控 (默認:enabled: false)
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
[...]
spec:
[...]
  monitoring:
    enabled: true
    rulesNamespace: "rook-ceph"
[...]

kubectl apply -f cluster.yaml

Grafana Dashboards

Ceph - Cluster:https://grafana.com/grafana/dashboards/2842
Ceph - OSD (Single):https://grafana.com/dashboards/5336
Ceph - Pools:https://grafana.com/dashboards/5342

Updates and Upgrades

更新Rook時,可能會更新RBAC以進行監視。每次更新或升級都很容易應用更改。這應該在更新Rook公共資源(如common)的同時完成。

kubectl apply -f deploy/examples/monitoring/rbac.yaml

卸載

kubectl delete -f service-monitor.yaml
kubectl delete -f prometheus.yaml
kubectl delete -f prometheus-service.yaml
kubectl delete -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.40.0/bundle.yaml


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM