注:使用源碼安裝的原因主要是使用yum安裝glusterfs服務端時出現一些依賴庫問題
- 准備3台glusterfs服務器(官方也建議至少3台,防止發生腦裂),並在各個服務器的/etc/hosts下面添加如下內容(如使用DNS服務器,則在DNS中添加域名解析)
10.85.3.113 glusterfs-1.example.com 10.85.3.114 glusterfs-2.example.com 10.85.3.115 glusterfs-3.example.com
- 參照官方文檔Build and Install GlusterFS安裝依賴庫
# yum install autoconf automake bison cmockery2-devel dos2unix flex fuse-devel glib2-devel libacl-devel libaio-devel libattr-devel libcurl-devel libibverbs-devel librdmacm-devel libtirpc-devel libtool libxml2-devel lvm2-devel make openssl-devel pkgconfig pyliblzma python-devel python-eventlet python-netifaces python-paste-deploy python-simplejson python-sphinx python-webob pyxattr readline-devel rpm-build sqlite-devel systemtap-sdt-devel tar userspace-rcu-devel
- 下載userspace-rcu-master並解壓到/home/userspace-rcu-master
- 下載glusterfs-xlators/master並解壓到/home/glusterfs-xlators-master
- 編譯userspace-rcu
# cd userspace-rcu-master # ./bootstrap # ./configure # make && make install # ldconfig
- 下載glusterfs源碼並解壓到/home/glusterfs-5.7,本次版本為5.7
- 將glusterfs的lib拷貝到系統目錄(編譯gluster的時候會用到)
# cd /home/glusterfs-5.7
# cp -r libglusterfs/src /usr/local/include/glusterfs
- 安裝依賴庫uuid,yum install -y libuuid-devel
- 拷貝glupy(編譯gluster的時候會用到)
# cp -r /home/glusterfs-xlators-master/xlators/features/glupy/ /home/glusterfs-5.7/xlators/features/
- 按照官方文檔編譯安裝glusterfs
# cd /home/glusterfs-5.7
# ./autogen.sh # ./configure --without-libtirpc # ./configure --enable-gnfs # make # make install
在make的時候可能會遇到如下錯誤
../../../../contrib/userspace-rcu/rculist-extra.h:33:6: error: redefinition of 'cds_list_add_tail_rcu' void cds_list_add_tail_rcu(struct cds_list_head *newp, ^ In file included from glusterd-rcu.h:15:0, from glusterd-sm.h:26, from glusterd.h:28, from glusterd.c:19: /usr/local/include/urcu/rculist.h:44:6: note: previous definition of 'cds_list_add_tail_rcu' was here void cds_list_add_tail_rcu(struct cds_list_head *newp,
給下述文件的cds_list_add_tail_rcu函數加上條件編譯即可
/usr/local/include/urcu/rculist.h /home/glusterfs-5.7/contrib/userspace-rcu/rculist-extra.h
#ifndef CDS_LIST_ADD_TAIL_CRU #define CDS_LIST_ADD_TAIL_CRU static inline void cds_list_add_tail_rcu(struct cds_list_head *newp, struct cds_list_head *head) { newp->next = head; newp->prev = head->prev; rcu_assign_pointer(head->prev->next, newp); head->prev = newp; } #endif
- 在3台服務器執行如下命令,使用devicemapper
mkdir -p /data/brick chown -R root:65534 /data chmod -R 775 /data vgcreate vgglusterfs /dev/vdb lvcreate -L 499G -n glusterfs vgglusterfs mkfs.xfs /dev/mapper/vgglusterfs-glusterfs #/etc/fstab中添加:/dev/mapper/vgglusterfs-glusterfs /data/brick xfs defaults 0 0 mount -a
- 在3台服務器執行如下命令,設置開機啟動並啟動glusterfs
systemctl enable glusterd.service
使用如下命令查看生成的gluster.service的路徑,本系統為:/usr/local/lib/systemd/system/glusterd.service
systemctl cat glusterd.service
/usr/local/lib/systemd/system/glusterd.service的內容如下
[Unit] Description=GlusterFS, a clustered file-system server Requires=rpcbind.service After=network.target rpcbind.service Before=network-online.target [Service] Type=forking PIDFile=/var/run/glusterd.pid LimitNOFILE=1048576 Environment="LOG_LEVEL=INFO" ExecStart=/usr/local/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS KillMode=process SuccessExitStatus=15 [Install] WantedBy=multi-user.target
執行如下命令啟動
systemctl daemon-reload
systemctl start glusterd.service
如果遇到“Failed to restart glusterd.service: Unit not found”錯誤,可能是rpcbind.service沒有安裝,執行yum install -y rpcbind安裝即可
遇到“Failed to start GlusterFS, a clustered file-system server”錯誤,可以嘗試如下方式:
- 可能本機已經啟動glusterfs服務,找到進程,kill掉
- glusterd配置錯誤,找到glusterd.vol文件,並修改“working-directory” 為glusterd.vol所在的路徑。如本機的glusterd.vol路徑為/usr/local/etc/glusterfs/glusterd.vol,則glusterd.vol中的working-directory一行設置為“working-directory /usr/local/etc/glusterfs"
- 在master1上執行如下命令,添加對端
# gluster peer probe glusterfs-1.example.com
# gluster peer probe glusterfs-2.example.com
# gluster peer probe glusterfs-3.example.com
# gluster peer status
- 在master1上創建並啟動volume,大小為5G
gluster volume create volume-5G replica 3 glusterfs-1.example.com:/data/brick/glusterfs1 glusterfs-2.example.com:/data/brick/glusterfs2 glusterfs-3.example.com:/data/brick/glusterfs3 gluster volume start volume-5G
使用如下命令查看volume
gluster volume status
gluster volume info
容器環境下推薦Replicated 模式,更多信息參見官方文檔,注意部分模式已經廢棄
- 在客戶端機器上安裝glusterfs客戶端
# yum install glusterfs-client -y
- 在客戶端掛載服務端volume,第一種方式為采用nfs掛載,第二種采用glusterfs命令行掛載,效果一樣
mount -t glusterfs glusterfs-1.example.com:/volume-5G /data/mounttest/ glusterfs --volfile-id=volume-5G --volfile-server=glusterfs-1.example.com /data/glusterfsclient
- 需要注意的是存在多個glusterfs服務器時,client掛載glusterfs服務器的目錄時並不需要對其進行負載均衡,只要掛載其中一台glusterfs的服務器的目錄即可。本例中的client為:10.85.3.111,服務端為:10.85.3.113~10.85.3.115,client僅掛載了10.85.3.113的目錄,但在client的tcp鏈接上可以看到如下鏈接狀態(已刪除無關信息),即client其實與3個服務端都建立了鏈接,即使一個服務器宕機或刪除任何一個服務器的brick都不會影響其他兩台服務器的數據存儲。客戶端掛載僅用於初始化拓撲。更多參見StackOverflow
# netstat -ntp Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 10.85.3.111:1015 10.85.3.115:49152 ESTABLISHED 10679/glusterfs tcp 0 0 10.85.3.111:1017 10.85.3.113:49152 ESTABLISHED 10679/glusterfs tcp 0 0 10.85.3.111:1023 10.85.3.113:24007 ESTABLISHED 10679/glusterfs tcp 0 0 10.85.3.111:1016 10.85.3.114:49152 ESTABLISHED 10679/glusterfs
TIPS:
- gluster命令行可以參見官方文檔和man命令
- 在執行”gluster delete volume $volume_name“命令后再添加時可能會出現如"Error: /data/brick/glusterfs1 is already part of a volume"的錯誤,使用如下方式清理環境即可
# rm -rf /data/brick/glusterfs1/* # setfattr -x trusted.glusterfs.volume-id /data/brick/glusterfs1/
# 開啟 指定 volume 的配額 $ gluster volume quota k8s-volume enable # 限制 指定 volume 的配額 $ gluster volume quota k8s-volume limit-usage / 1TB # 設置 cache 大小, 默認32MB $ gluster volume set k8s-volume performance.cache-size 4GB # 設置 io 線程, 太大會導致進程崩潰 $ gluster volume set k8s-volume performance.io-thread-count 16 # 設置 網絡檢測時間, 默認42s $ gluster volume set k8s-volume network.ping-timeout 10 # 設置 寫緩沖區的大小, 默認1M $ gluster volume set k8s-volume performance.write-behind-window-size 1024MB
- gluster brick的刪除和添加,使用remove-brick時需要指定副本數,remove-brick無法刪除所有的副本。如需要可以使用gluster volume delete命令
volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> ... <start|stop|status|commit|force>
volume add-brick <VOLNAME> [<stripe|replica> <COUNT> [arbiter <COUNT>]] <NEW-BRICK> ... [force]
- 磁盤測試
下載fio,解壓后執行“make & make install”即可。可能需要執行"yum install libaio-devel"
- glusterfs的監控
在一台服務器上下載並執行“make build”編譯gluster_exporter,直接使用./gluster_exporter啟動即可。(為保證node節點有效性,最好在每台服務器上安裝node_exporter)。使用如下方式可以簡單啟動一個exporter,監聽端口默認是9189
nohup gluster_exporter --web.telemetry-path="/usr/local/sbin/gluster" 2>&1 &
在Prometheus中添加新job,即可使用Prometheus監控glusterfs
- job_name: glusterfs1 static_configs: - targets: ['10.85.3.113:9189'] labels: alias: glusterfs1
- glusterfs默認日志路徑為:/var/log/glusterfs/glusterfs.log
- glusterfs client掛載磁盤時必須指定域名
- client去掛載目錄刪除對應掛載的進程即可,使用ps -ef|grep glusterfs查看
- 使用glusterfs時建議設置目錄權組為65534,權限設置為775,並在掛載的進程上加上Supplement group為65534
- 使用gluster volume status [all | <VOLNAME> [nfs|shd|<BRICK>|quotad|tierd]] [detail|clients|mem|inode|fd|callpool|tasks|client-list]可以查看相關的volume信息,如gluster volume status
VOLNAME
clients可以查看volume的client信息
目前kubernetes使用glusterfs storageclass時需要用到heketi,下面講解heketi的部署
在官網下載heketi壓縮包heketi-vx.x.x.linux.amd64.tar.gz,可以按照官網文檔進行安裝部署,虛機部署方式如下:
- 創建目錄:
mkdir /etc/heketi/
mkdir /var/lib/heketi/
- 將壓縮包中的heketi.json拷貝到/etc/heketi/路徑下,修改heketi.json文件,使用ssh作為與glusterfs通信方式,主要修改內容如下(如有需要,請修改port)
"executor": "ssh", "_sshexec_comment": "SSH username and private key file information", "sshexec": { "keyfile": "/etc/heketi/heketi_key", "user": "root", "port": "22", "fstab": "Optional: Specify fstab file on node. Default is /etc/fstab", "pv_data_alignment": "Optional: Specify PV data alignment size. Default is 256K", "vg_physicalextentsize": "Optional: Specify VG physical extent size. Default is 4MB", "lv_chunksize": "Optional: Specify LV chunksize. Default is 256K", "backup_lvm_metadata": false, "_debug_umount_failures": "Optional: boolean to capture more details in case brick unmounting fails", "debug_umount_failures": true },
- 添加授權,使用如下方式生成密鑰,
ssh-keygen -t rsa -q -f /etc/heketi/heketi_key -N ''
chmod 775 /etc/heketi/heketi_key
在所有gluasterfs服務器上為heketi添加ssh互信
cat heketi_key.pub >> /root/.ssh/authorized_keys
- 創建systemd啟動文件:/usr/lib/systemd/system/heketi.service,內容如下
[Unit] Description=Heketi Server [Service] Type=simple WorkingDirectory=/var/lib/heketi EnvironmentFile=-/etc/heketi/heketi.json User=heketi ExecStart=/usr/bin/heketi --config=/etc/heketi/heketi.json Restart=on-failure StandardOutput=syslog StandardError=syslog [Install] WantedBy=multi-user.target
並在/etc/systemd/system/multi-user.target.wants中創建軟連接
ln -s /usr/lib/systemd/system/heketi.service heketi.service
加載並啟動heketi
systemctl daemon-reload
systemctl enable heketi
systemctl start heketi
如果啟動時出現如下錯誤,可在/etc/heketi/heketi.json中刪除對應的行即可,如出現如下錯誤,刪除"xfs_sw"對應的行,使用默認值即可。(建議配置或刪除所有“Optional”的行)
ERROR: Unable to parse configuration: json: cannot unmarshal string into Go struct field KubeConfig.xfs_sw of type int
- 執行如下命令,加載拓撲文件
$ export HEKETI_CLI_SERVER=http://<heketi server and port> $ heketi-cli topology load --json=<topology>
拓撲文件內容如下,需要注意的是,heketi只能使用裸磁盤,不能使用文件系統以及lvm
{ "clusters": [ { "nodes": [ { "node": { "hostnames": { "manage": [ "10.85.3.113" ], "storage": [ "10.85.3.113" ] }, "zone": 1 }, "devices": [ { "name": "/dev/vdb", "destroydata": false } ] }, { "node": { "hostnames": { "manage": [ "10.85.3.114" ], "storage": [ "10.85.3.114" ] }, "zone": 1 }, "devices": [ { "name": "/dev/vdb", "destroydata": false } ] }, { "node": { "hostnames": { "manage": [ "10.85.3.115" ], "storage": [ "10.85.3.115" ] }, "zone": 2 }, "devices": [ { "name": "/dev/vdb", "destroydata": false } ] } ] } ] }
- 使用如下命令創建一個10Gb的volume
heketi-cli volume create --size=10 --name="test"
可以看到新創建的volume
# heketi-cli volume list
Id:0a0e0fd1f81973044bb0365e44c08648 Cluster:b988c638edcef9b5b44e0e42ccef30b7 Name:tes
在glusterfs上查看可以看到heketi其實也是使用了devicemapper
# df -h Filesystem Size Used Avail Use% Mounted on ... /dev/mapper/vg_c17870f6f2870ff1c8001ad19b3b9d5b-brick_e3e02d5357b540ce340ec5328fe8b0ec 10G 33M 10G 1% /var/lib/heketi/mounts/vg_c17870f6f2870ff1c8001ad19b3b9d5b/brick_e3e02d5357b540ce340ec5328fe8b0ec
當然此時也可以使用gluster命令查看volume
# gluster volume list
test
heketi命令行一般如下
export HEKETI_CLI_SERVER=http://<heketi server and port> heketi-cli cluster list heketi-cli cluster info $ID heketi-cli node list heketi-cli node info $ID heketi-cli topology info heketi-cli volume create --size=10 //10G heketi-cli volume delete $ID
heketi-cli device disable $ID
keheti-cli device remove $ID
heketi-cli device delete $ID
注意:使用heketi管理volume后,僅使用heketi,不能glusterfs和heketi混用
- openshift使用heketi做動態pvc的方式如下
首先創建storageclass
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: glusterfs namespace: demo provisioner: kubernetes.io/glusterfs parameters: resturl: "http://10.86.3.113:18080" restuser: "root" restauthenabled: "false"
然后創建pvc
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: glusterfs-test namespace: demo spec: accessModes: - ReadWriteMany resources: requests: storage: 30Gi storageClassName: glusterfs
需要注意的是如果heketi topology文件中使用了域名,則該域名必須能夠被kubernetes解析,否則會失敗
TIPS:
- heketi刪除node之前需要移除該node的device和volume
- openshift 3.6中在配置pvc的時候可能會出現如下問題,在出現錯誤時沒有給出錯誤信息,內容為nil。該問題為3.6版本的bug,建議升級
Failed to provision volume with StorageClass "glusterfs": glusterfs: create volume err: failed to create endpoint/service <nil>
- 如果刪除heketi出現問題,可以手動清理環境,執行如下操作
systemctl stop heketi //在所有glusterfs服務器上執行如下操作 umount /var/lib/heketi/mounts/vg_xxxxxxx/brick_xxxx rm -rf /var/lib/heketi/* 刪除heketi創建的lv和vg
參考:
https://www.cnblogs.com/jicki/p/5801712.html
https://www.ibm.com/developerworks/cn/opensource/os-cn-glusterfs-docker-volume/index.html
https://jimmysong.io/kubernetes-handbook/practice/using-glusterfs-for-persistent-storage.html
https://pdf.us/2019/03/15/3020.html
https://github.com/psyhomb/heketi
https://github.com/heketi/heketi/blob/master/docs/admin/volume.md