學習GlusterFS(七)


初始環境:

 

系統環境:centos73.10.0-514.26.2.el7.x86_64

機器數量:兩台

硬盤:至少兩塊,一塊為系統盤,另一塊留作他用

命名規則:node1 node2

IP規划:192.168.238.129 node1

192.168.238.130 node2

 

 

1.磁盤設置(裝完系統后,格式化另外的硬盤,兩節點都需要)

 

[root@node1 ~]#fdisk /dev/sdb#格式化磁盤

 

2.設置磁盤格式掛載硬盤

 

(on both nodes):Note:

These examples are going to assume the brick is going to reside on /dev/sdb1.(兩節點都需要做)

[root@node1 ~]#mkfs.xfs -i size=512/dev/sdb1

[root@node1 ~]#mkdir -p /export/sdb1&& mount /dev/sdb1 /export/sdb1#建立掛載目錄和掛載

[root@node1 ~]#echo"/dev/sdb1 /export/sdb1 xfs defaults 0 0">> /etc/fstab#加入開機啟動

[root@node1

~]#mount -a && mount#掛載測試和掛載及結果查看

3.安裝glusterfs

[root@node1 ~]#yum installcentos-release-gluster#安裝源

[root@node1 ~]#yuminstall glusterfs-server#安裝glusterfs

4.注意關掉防火牆

[root@node1 ~]#systemctlstop firewalld#臨時關閉,重啟還會啟動

[root@node1 ~]#systemctldisable firewalld#永久關閉

 

5.啟動glusterfs

[root@node1 ~]#systemctl start glusterd#啟動gluster

[root@node1 ~]#systemctl enable glusterd#加入開機啟動

 

6.組建集群

 

Replace nodename with hostname of theother server in the cluster, or IP address if you don’t have DNS or /etc/hostsentries

在/etc/hosts里邊做設置或者在DNS,建議兩者都做,這里沒有DNS解析服務器,只在/etc/hosts里邊坐綁定,注意不要忘記該機器的名字。

 

For Example:

 

[root@node1 gluster]#cat /etc/hosts

127.0.0.1localhostlocalhost.localdomain localhost4 localhost4.localdomain4

::1localhostlocalhost.localdomain localhost6 localhost6.localdomain6

192.168.238.129 node1

192.168.238.130 node2

192.168.238.133 node3

192.168.238.132 node4

192.168.238.134 node5

#在node1節點運行peer probe命令

[root@node1 ~]#glusterpeer probe node2#注意,如果探測不到請確認host解析做了沒有,防火牆是否關閉

#查看剛剛加入的節點

[root@node1 ~]#glusterpeer status

[root@node1 ~]#gluster peer probe node2

peer probe: success.

[root@node1 ~]#gluster peer status

Number of Peers: 1

Configure your Gluster volume

刪除加入的節點(如果想刪除節點,執行該命令)

 

[root@node1 ~]#glusterpeer detach node2

在倆個主機執行

 

[root@node1 ~]#mkdir -p/export/sdb1/brick

[root@node1~]#gluster volume create testvol replica 2 transport tcpnode1:/export/sdb1/brick node2:/export/sdb1/brick

volume create: testvol: success: please start the volume to accessdata

[root@node1 log]#gluster peer status

Number of Peers: 1

Hostname: node2

Uuid: 61fe987a-99ff-419d-8018-90603ea16fe7

State: Peer in Cluster (Connected)

[root@node1 log]#glustervolume info

Volume Name: testvol

Type: Replicate

Volume ID: bc637d83-0273-4373-9d00-d794a3a3d2e7

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 2 =2

Transport-type: tcp

Bricks:

Brick1: node1:/export/sdb1/brick

Brick2: node2:/export/sdb1/brick

Options Reconfigured:

transport.address-family: inet

nfs.disable: on

[root@node1 ~]#gluster volume start testvol#啟動volume

volume start: testvol: success

[root@node1 gluster]#gluster volume info#查看Volume信息

 

Volume Name: testvol

Type: Replicate

Volume ID: bc637d83-0273-4373-9d00-d794a3a3d2e7

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: node1:/export/sdb1/bricks

Brick2: node2:/export/sdb1/brick

Options Reconfigured:

transport.address-family: inet

nfs.disable: on

[root@node2 ~]#gluster volume status

 

Status of volume: testvol

Gluster processTCP PortRDMA PortOnlinePid

------------------------------------------------------------------------------

Brick node1:/export/sdb1/brick491520Y10774

Brick node2:/export/sdb1/brickN/AN/ANN/A

Self-heal Daemon on localhostN/AN/AY998

Self-heal Daemon on node1N/AN/AY10794

Task Status of Volume testvol

------------------------------------------------------------------------------

There are no active volume tasks

注意點:上邊的環境為每個服務器單塊硬盤的結構,如果是多塊硬盤的結構,應該為分別格式化,掛載,具體過程如下:

mkfs.xfs -i size=512 /dev/sdc1

mkdir /export/sdb1

mount /dev/sdc1 /export/sdc1

echo "/dev/sdc1 /export/sdc1 xfs defaults0 0">> /etc/fstab

mkdir -p /export/sdb1/brick#所有的節點執行

node1上執行:

每台主機一塊硬盤執行的命令(上邊我們配置的模式):

# gluster volume create testvol replica 2transport tcp node1:/export/sdb1/brick node2:/export/sdb1/brick

每台主機兩塊硬盤執行的命令:

# gluster volume create testvol replica 2transport tcp node1:/export/sdb1/brick node2:/export/sdb1/bricknode1:/export/sdc1/bricknode2:/export/sdc1/brick

照理推即。

掛載測試

[root@node1 ~]# mkdir/mnt/gluster/

[root@node1 ~]#mount -tglusterfs node1:/testvol /mnt/gluster/

 

Expanding

Volumes(擴容)

To expand a volume

 

前提條件開始有node1和node2節點,后加node3和node4節點的配置同上,必須一次加一對主機,注意修改集群里各個主機的/etc/hosts文件,新加入node節點

 

1)On the first server in the cluster,probe the server to which you want to add the new brick using the followingcommand:

# gluster peerprobe#執行的命令如下:

[root@node1 ~]#gluster peer probe node3

peer probe: success.

[root@node1 ~]#gluster peer probe node4

peer probe: success.

2)Add the brick using the followingcommand:

# gluster volumeadd-brick

[root@node1 ~]#gluster volume add-bricktestvol node3:/export/sdb1/brick node4:/export/sdb1/brick

volume add-brick:success#顯示成功

3)Check the volume information using the following command:(檢查命令如下

[root@node1 ~]#gluster volume info

Volume Name: testvol

Type: Distributed-Replicate

Volume ID:09363405-1c7c-4eb1-b815-b97822c1f274

Status: Started

Snapshot Count: 0

Number of Bricks: 2 x2 = 4

Transport-type: tcp

Bricks:

Brick1:node1:/export/sdb1/brick

Brick2:node2:/export/sdb1/brick

Brick3:node3:/export/sdb1/brick

Brick4:node4:/export/sdb1/brick

Options Reconfigured:

transport.address-family:inet

nfs.disable: on

Rebalance the volumeto ensure that all files are distributed to the new brick.

You can use therebalance command as described in Rebalancing Volumes

[root@node1 ~]#gluster volume rebalancetestvol start

volume rebalance:testvol: success: Rebalance on testvol has been started successfully. Userebalance status command to check status of the rebalance process.

ID: c9d052e8-2b6c-40b0-8d77-52290bcdb61

To shrink a volume在線收縮

本例的操作

[root@node1 gluster]#gluster volume remove-bricktestvol node3:/export/sdb1/brick node4:/export/sdb1/brick force

Removing brick(s) can result in data loss. Do you want to Continue? (y/n)y

volume remove-brick commit force: success

詳見后邊的文檔

1.Remove the brick using the following command:

# gluster volume remove-brickstart

For example, to remove server2:/exp2:

# gluster volume remove-brick test-volume server2:/exp2 force

Removingbrick(s) canresultindata loss.Doyou want toContinue? (y/n)

2.Enter "y" to confirm the operation. The commanddisplays the following message indicating that the remove brick operation issuccessfully started:

3.RemoveBrick successful

4.(Optional) View the status of the remove brick operation usingthe following command:

# gluster volume remove-brickstatus

For example, to view the status of removebrick operation on server2:/exp2 brick:

#glustervolumeremove-bricktest-volumeserver2:/exp2status

NodeRebalanced-filessizescannedstatus

-----------------------------------------------

617c923e-6450-4065-8e33-865e28d9428f34340162inprogress

5.Check the volume information using the following command:

# gluster volume info

The command displays information similar tothe following:

# gluster volume info

VolumeName:test-volume

Type:Distribute

Status:Started

Number ofBricks:3

Bricks:

Brick1:server1:/exp1

Brick3:server3:/exp3

Brick4:server4:/exp4

6.Rebalance the volume to ensure that all files are distributed tothe new brick.

You can use the rebalance command as described inRebalancing Volumes

 

主機上硬盤故障的處理方式

 

方法一:如果本機還有空閑的備用盤的處理方式

 

故障問題:

 

[root@node2 ~]#gluster volume status

Status of volume:testvol

Gluster processTCP PortRDMA PortOnlinePid

------------------------------------------------------------------------------

Bricknode1:/export/sdb1/brick491520Y2684

Brick node2:/export/sdb1/brickN/AN/ANN/A#sdb1顯示不在線,故障

Bricknode1:/export/sdc1/brick491530Y2703

Bricknode2:/export/sdc1/brick491530Y2704

Bricknode3:/export/sdb1/brick491520Y2197

Bricknode4:/export/sdb1/brick491520Y2207

Bricknode3:/export/sdc1/brick491530Y2216

Bricknode4:/export/sdc1/brick491530Y2226

Self-heal Daemon onlocalhostN/AN/AY1393

Self-heal Daemon onnode1N/AN/AY3090

Self-heal Daemon onnode4N/AN/AY2246

Self-heal Daemon onnode3N/AN/AY2236

Task Status of Volumetestvol

------------------------------------------------------------------------------

Task: Rebalance

ID:8b3a04a0-0449-4424-a458-29f602571ea2

Status: completed

 

從上方看到Brick node2:/export/sdb1/brick不在線,出現了問題

 

解決:

1.創建新的數據目錄,將備用的硬盤格式化,掛載到系統中去(故障主機上執行)

[root@node2 ~]#mkfs.xfs -i size=512/dev/sdd1#格式化

[root@node2 ~]#mkdir /export/sdd1/brick-p#建立相關的目錄

[root@node2 ~]#mount /dev/sdd1 /export/sdd1#掛載

[root@node2 ~]#echo "/dev/sdd1 /export/sdd1 xfs defaults 00">> /etc/fstab#加入開機啟動

2.查詢故障點的目錄的擴展屬性(正常主機執行)

[root@node1 brick]#getfattr -d -m. -e hex /export/sdb1/brick/

getfattr: Removingleading '/' from absolute path names

# file:export/sdb1/brick/

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.testvol-client-1=0x000000000000000000000000

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

trusted.glusterfs.dht.commithash=0x3000

trusted.glusterfs.volume-id=0xe107222fa1134606a9a7fcb16e4c0709

3.掛載卷並觸發自愈(故障主機執行)

 

[root@node2 ~]#mount -t glusterfs node2:/testvol /mnt#掛載點隨便,不重復就可以,node2:/testvol為之前生成的卷

[root@node2 ~]#mkdir /mnt/test#新建一個卷中不存在的目錄並刪除,根據你的掛載點的位置變換執行

[root@node2 ~]#rmdir /mnt/test#刪除剛才新建立的目錄

[root@node2 ~]#setfattr -n trusted.non-existent-key -v abc /mnt#設置擴展屬性觸發自愈

[root@node2 ~]#setfattr-x trusted.non-existent-key /mnt#設置擴展屬性觸發自愈

4.檢查當前節點是否掛起

正常的主機執行

[root@node1 gluster]#getfattr -d -m. -e hex /export/sdb1/brick/#

/export/sdb1/brick/你建立brick的位置

getfattr: Removing leading '/' from absolute path names

# file: export/sdb1/brick/

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.testvol-client-1=0x000000000000000400000004<<---- xattrs are marked

from source brick node1:/export/sdb1/brick--->>

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

trusted.glusterfs.dht.commithash=0x3334343336363233303800

trusted.glusterfs.volume-id=0xe107222fa1134606a9a7fcb16e4c0709

故障主機執行(正常也可以):

[root@node2 gluster]#gluster volume heal testvol info#查看你testvol的信息

Brick node1:/export/sdb1/brick

/

Status: Connected

Number of entries: 1

Brick node2:/export/sdb1/brick

Status: Transport endpoint is not connected

Number of entries: -#狀態顯示傳輸端點未連接

Brick node1:/export/sdc1/brick

Status: Connected

Number of entries: 0

Brick node2:/export/sdc1/brick

Status: Connected

Number of entries: 0

Brick node3:/export/sdb1/brick

Status: Connected

Number of entries: 0

Brick node4:/export/sdb1/brick

Status: Connected

Number of entries: 0

Brick node3:/export/sdc1/brick

Status: Connected

Number of entries: 0

Brick node4:/export/sdc1/brick

Status: Connected

Number of entries: 0

5.使用強制提交完成修復操作

故障機執行

[root@node2 ~]#gluster volume replace-brick testvol node2:/export/sdb1/bricknode2:/export/sdd1/brick commit force

volume replace-brick: success: replace-brick commit force operationsuccessful#提示成功

[root@node2 ~]#gluster volume status

Status of volume: testvol

Gluster processTCP PortRDMA PortOnlinePid

------------------------------------------------------------------------------

Brick node1:/export/sdb1/brick491520Y2684

Brick node2:/export/sdd1/brick491540Y10298#在線盤已經是sdd1,已經吧sdb1替換了

Brick node1:/export/sdc1/brick491530Y2703

Brick node2:/export/sdc1/brick491530Y2704

Brick node3:/export/sdb1/brick491520Y2197

Brick node4:/export/sdb1/brick491520Y2207

Brick node3:/export/sdc1/brick491530Y2216

Brick node4:/export/sdc1/brick491530Y2226

Self-heal Daemon on localhostN/AN/AY10307

Self-heal Daemon on node3N/AN/AY9728

Self-heal Daemon on node1N/AN/AY3284

Self-heal Daemon on node4N/AN/AY9736

Task Status of Volume testvol

------------------------------------------------------------------------------

Task:Rebalance

ID:8b3a04a0-0449-4424-a458-29f602571ea2

Status: notstarted

正常主機執行

[root@node1 gluster]#getfattr -d -m. -e hex /export/sdb1/brick/

getfattr: Removing leading '/' from absolute path names

# file: export/sdb1/brick/

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.testvol-client-1=0x000000000000000000000000<<---- Pending changelogs are cleared.

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x0000000100000000000000003ffffffe

trusted.glusterfs.dht.commithash=0x3334343336363233303800

trusted.glusterfs.volume-id=0xe107222fa1134606a9a7fcb16e4c0709

[root@node2 ~]#gluster volume heal testvol info

Brick node1:/export/sdb1/brick

Status: Connected

Number of entries: 0

Brick node2:/export/sdd1/brick

Status: Connected

Number of entries: 0

Brick node1:/export/sdc1/brick

Status: Connected

Number of entries: 0

Brick node2:/export/sdc1/brick

Status: Connected

Number of entries: 0

Brick node3:/export/sdb1/brick

Status: Connected

Number of entries: 0

Brick node4:/export/sdb1/brick

Status: Connected

Number of entries: 0

Brick node3:/export/sdc1/brick

Status: Connected

Number of entries: 0

Brick node4:/export/sdc1/brick

Status: Connected

Number of entries: 0

也可以使用(上邊的為官方文檔的修復過程,用下邊的也可以搞定):

當某塊磁盤損壞后,需要換一塊新的磁盤回去,這時集群中恰好還預留了備用磁盤,因此用備用磁盤替換損壞的磁盤,命令如下兩條命令這個就可以搞定

掛載磁盤參考上邊(掛載點/export/sdd1)

[root@node2 ~]#gluster volume replace-brick voltest node2:/export/sdb1/bricknode2:/export/sdd1/brickcommit(前邊為故障盤,后邊為恢復盤)

方式二:跨主機的同步

假設node2的sdb1有問題

前提:加入新主機node5(兩塊硬盤,一塊系統盤,一塊掛在盤),node5磁盤格式化,掛載和gluster的安裝過程等准備工作同上,注意修改集群里各個主機的/etc/hosts文件,新加入node節點:

將node5加入信任池

[root@node1 brick]#gluster peer probe node5

peer probe: success.

掛載磁盤

[root@node5 ~]#mkdir -p /export/sdb1 && mount /dev/sdb1 /export/sdb1

[root@node5 ~]#echo "/dev/sdb1 /export/sdb1 xfs defaults 0 0">> /etc/fstab

[root@node5 ~]#mount -a && mount

執行下邊的命令:

[root@node5 ~]#gluster volume replace-brick testvol node2:/export/sdb1/bricknode5:/export/sdb1/brickcommit force

volume replace-brick: success: replace-brick commit force operationsuccessful

 

替換后可以繼續使用,也可以在sdb1硬盤更換后后,數據倒回,命令如下

 

[root@node2 ~]#gluster volume replace-brick testvol node5:/export/sdb1/bricknode2:/export/sdb1/brick commit force

volume replace-brick: success: replace-brick commit force operationsuccessful

替換之前的狀態

[root@node1 brick]# gluster volume status

Status of volume: testvol

Gluster processTCP PortRDMA PortOnlinePid

------------------------------------------------------------------------------

Brick node1:/export/sdb1/brick491520Y2085

Brick

node5:/export/sdb1/brick491520Y18229

Brick node1:/export/sdc1/brick491530Y2076

Brick node2:/export/sdc1/brick491530Y2131

Brick node3:/export/sdb1/brick491520Y2197

Brick node4:/export/sdb1/brick491520Y2207

Brick node3:/export/sdc1/brick491530Y2216

Brick node4:/export/sdc1/brick491530Y2226

Self-heal Daemon on localhostN/AN/AY10565

Self-heal Daemon on node2N/AN/AY2265

Self-heal Daemon on node3N/AN/AY10416

Self-heal Daemon on node4N/AN/AY10400

Self-heal Daemon on node5N/AN/AY18238

Task Status of Volume testvol

------------------------------------------------------------------------------

Task:Rebalance

ID:8b3a04a0-0449-4424-a458-29f602571ea2

Status: notstarted

替換之后的狀態:

[root@node1 gluster]# gluster volume status

Status of volume: testvol

Gluster processTCP PortRDMA PortOnlinePid

------------------------------------------------------------------------------

Brick node1:/export/sdb1/brick491520Y2085

Brick

node2:/export/sdb1/brick(過來了)491530Y10208

Brick node1:/export/sdc1/brick491530Y2076

Brick node2:/export/sdc1/brick491520Y3474

Brick node3:/export/sdb1/brick491520Y2197

Brick node4:/export/sdb1/brick491520Y2207

Brick node3:/export/sdc1/brick491530Y2216

Brick node4:/export/sdc1/brick491530Y2226

Self-heal Daemon on localhostN/AN/AY10684

Self-heal Daemon on node3N/AN/AY10498

Self-heal Daemon on node5N/AN/AY10075

Self-heal Daemon on node4N/AN/AY10488

Self-heal Daemon on node2N/AN/AY10201

Task Status of Volume testvol

------------------------------------------------------------------------------

Task:Rebalance

ID:8b3a04a0-0449-4424-a458-29f602571ea2

Status: notstarted

 

數據的平衡

 

一般平衡數據有如下兩種場景:

Fix Layout:重新定位layout,原來的layout不變,(新)數據寫入新的節點

Fix Layout and Migrate Data:重新定位layout的修改,並且遷移已經存在的數據

要注意的是當有新的節點加入的時候,必須做fix layout,不然新寫入的數據還是寫入到老的節點上去

 

1.To rebalance a volume to fix layout changes(fix

layout)

Start the rebalance operation on any one ofthe server using the following command:

# gluster volume rebalance fix-layout start

For example:

# gluster volume rebalance test-volumefix-layout start

Starting rebalance on volume test-volumehas been successful

本機的命令如下:

[root@node1

gluster]#gluster volume rebalance testvolfix-layout start

volume rebalance: testvol: success:Rebalance on testvol has been started successfully. Use rebalance statuscommand to check status of the rebalance process.

ID: 0ea5aa16-b349-44ca-a51b-d5fcf47e1272

[root@node1

gluster]#gluster volume rebalance testvol status

Nodestatusrun time in h:m:s

--------------------------------

localhostfix-layout completed0:0:0

node2fix-layout completed0:0:0

node3fix-layout completed0:0:0

node4fix-layout completed0:0:0

volume rebalance: testvol: success

2.To rebalance a volume to fix

layout and migrate the existing dataFix Layout

and Migrate Data

 

Start the rebalance operation on any one ofthe server using the following command:

# gluster volume rebalance start

For example:

# gluster volume rebalance test-volumestart

Starting rebalancing on volume test-volumehas been successful

Start the migration operation forcefully onany one of the servers using the following command:

# gluster volume rebalance start force

For example:

# gluster volume rebalance test-volumestart force

Starting rebalancing on volume test-volumehas been successful

本機操作命令:

[root@node1

gluster]#gluster volume rebalance testvol start

volume rebalance: testvol: success:Rebalance on testvol has been started successfully. Use rebalance statuscommand to check status of the rebalance process.

ID: 2a47d454-fdc3-4d95-81ac-6981577d26e9

[root@node1

gluster]#gluster volume rebalance testvol status

NodeRebalanced-filessizescannedfailuresskippedstatusrun time in h:m:s

------------------------------------------------------------------------------------------

localhost00Bytes3701completed0:00:00

node200Bytes000completed0:00:00

node300Bytes2700completed0:00:00

node400Bytes000completed0:00:00

volume rebalance: testvol: success

2.Stopping Rebalance Operation(需要的話)

You can stop the rebalance operation, asneeded.

Stop the rebalance operation using the followingcommand:

# gluster volume rebalancestop

For example:

# gluster volume rebalance test-volume stop

NodeRebalanced-filessizescannedstatus

-----------------------------------------------

617c923e-6450-4065-8e33-865e28d9428f59590244stopped

Stopped rebalance process on volumetest-volume

Stopping

Volumes

Stop the volume using the followingcommand:

# gluster volume stop

For example, to stop test-volume:

# gluster volume stop test-volume

Stopping volume will make its datainaccessible. Do you want to continue? (y/n)

Enter y to confirm the operation. Theoutput of the command displays the following:

Stopping volume test-volume has been successful

Deleting

Volumes

Delete the volume using the followingcommand:

# gluster volume delete

For example, to delete test-volume:

# gluster volume delete test-volume

Deleting volume will erase all informationabout the volume. Do you want to continue? (y/n)

Enter y to confirm the operation. Thecommand displays the following:

Deleting volume test-volume has beensuccessful

Triggering Self-Heal on Replicate

In replicate module, previously you had tomanually trigger a self-heal when a brick goes offline and comes back online,to bring all the replicas in sync. Now the pro-active self-heal daemon runs inthe background, diagnoses issues and automatically initiates self-healing every10 minutes on the files which requireshealing.

You can view the list of files that needhealing,the list of files which are currently/previouslyhealed,list of files which are in split-brain state, and you can manually triggerself-heal on the entire volume or only on the files which needhealing.

·Trigger self-heal only on the files which requireshealing:

# gluster volume heal

For example, to trigger self-heal on fileswhich requireshealingof test-volume:

# gluster volume heal test-volume

Heal operation onvolumetest-volumehas been successful

·Trigger self-heal on all the files of a volume:

# gluster volume heal full

For example, to trigger self-heal on all thefiles of of test-volume:

# gluster volume heal test-volume full

Heal operation onvolumetest-volumehas been successful

·View the list of files that needshealing:

# gluster volume heal info

For example, to view the list of files ontest-volume that needshealing:

# gluster volume heal test-volume info

Brick server1:/gfs/test-volume_0

Numberofentries:0

Brick server2:/gfs/test-volume_1

Numberofentries:101

/95.txt

/32.txt

/66.txt

/35.txt

/18.txt

/26.txt

/47.txt

/55.txt

/85.txt

...

·View the list of files that are self-healed:

# gluster volume heal info

healed

For example, to view the list of files ontest-volume that are self-healed:

# gluster volume heal test-volume info healed

BrickServer1:/gfs/test-volume_0

Numberofentries:0

BrickServer2:/gfs/test-volume_1

Numberofentries:69

/99.txt

/93.txt

/76.txt

/11.txt

/27.txt

/64.txt

/80.txt

/19.txt

/41.txt

/29.txt

/37.txt

/46.txt

...

·View the list of files of a particular volume on which theself-heal failed:

# gluster volume heal info

failed

For example, to view the list of files oftest-volume that are not self-healed:

# gluster volume heal test-volume info failed

BrickServer1:/gfs/test-volume_0

Numberofentries:0

BrickServer2:/gfs/test-volume_3

Numberofentries:72

/90.txt

/95.txt

/77.txt

/71.txt

/87.txt

/24.txt

...

·View the list of files of a particular volume which are insplit-brain state:

# gluster volume heal info split-brain

For example, to view the list of files oftest-volume which are in split-brain state:

# gluster volume heal test-volume info split-brain

BrickServer1:/gfs/test-volume_2

Numberofentries:12

/83.txt

/28.txt

/69.txt

...

BrickServer2:/gfs/test-volume_3

Numberofentries:12

/83.txt

/28.txt

/69.txt

...

問題處理:

1.提示State: Peer in Cluster (Disconnected)

[root@node1 gluster]# gluster peer status

Number of Peers: 1

Hostname: node2

Uuid: 61fe987a-99ff-419d-8018-90603ea16fe7

State: Peer in Cluster(Disconnected)

解決方式:查看防火牆狀態和/etc/hosts文件,防火牆也可以通過規則放行,不過還是關掉好,性能考慮

 

2.重建volume出錯

 

[root@node1

sdb1]#gluster volume start testvol

volume start: testvol: failed: Failed toget extended attribute trusted.glusterfs.volume-id for brick dir /export/sdb1/brick.Reason : No data available#故障 重建volume出錯

處理方式: 查看volume的信息,刪除掉volume,清空信息,重新建立volume

[root@node1

sdb1]#gluster volume info

Volume Name: testvol

Type: Replicate

Volume ID:57a60503-c5ae-4671-b213-6f2a2f913615

Status: Created

Snapshot Count: 0

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: node1:/export/sdb1/brick

Brick2: node2:/export/sdb1/brick

Options Reconfigured:

transport.address-family: inet

nfs.disable: on

[root@node1

sdb1]#gluster volume delete testvol

Deleting volume will erase all informationabout the volume. Do you want to continue? (y/n) y

volume delete: testvol: success

[root@node1

sdb1]#setfattr -x trusted.glusterfs.volume-id/export/sdb1/brick && setfattr -x trusted.gfid /export/sdb1/brick&& rm -rf /export/sdb1/brick/..glusterfs

setfattr: /export/sdb1/brick: No suchattribute

[root@node1

sdb1]#setfattr -x trusted.glusterfs.volume-id/export/sdb1/brick && setfattr -x trusted.gfid /export/sdb1/brick&& rm -rf /export/sdb1/brick/..glusterfs

setfattr: /export/sdb1/brick: No suchattribute

[root@node1 sdb1]# gluster volume createtestvol replica 2 transport tcp node1:/export/sdb1/bricknode2:/export/sdb1/brick

volume create: testvol: success: pleasestart the volume to access data

[root@node1

sdb1]#gluster volume start testvol

volume start: testvol: success

[root@node1

sdb1]#gluster volume status

Status of volume: testvol

Gluster processTCP PortRDMA PortOnlinePid

------------------------------------------------------------------------------

Brick node1:/export/sdb1/brick491520Y2429

Brick node2:/export/sdb1/brick491520Y2211

Self-heal Daemon on localhostN/AN/AY2449

Self-heal Daemon on node2N/AN/AY2231

Task Status of Volume testvol

------------------------------------------------------------------------------

There are no active volume tasks

可以參照下邊的兩篇文章:

glusterfs volume create: testvol: failed: /data/brick1 ora prefix of it is already part of a volume

在創建volume的時候提示如下報錯

[root@gluster-node1 ~]# gluster volumecreate testvol 192.168.11.139:/data/brick1 192.168.11.140:/data/brick2forcevolume create: testvol: failed: /data/brick1 or a prefix of it is alreadypart of a volume

找到了國外的一篇博客,據這篇博客講。從glusterfs3.3開始,有一個新的變化就是會檢測這個目錄是否已經加入了volume。這就導致了很多gluster支持的問題。

假如你移除了一個volume的brick並且繼續使用這個volume,然后重新添加之前的這個brick,這時你的文件狀態就會改變。就會造成一系列的問題。它們中的大多數會引起數據丟失。

假如你重新使用一個brick的話,請確保你知道自己在做什么

解決辦法就是:

setfattr -x trusted.glusterfs.volume-id $brick_path

setfattr -x trusted.gfid $brick_path

rm -rf $brick_path/.glusterfs

1

2

3

4

5

6

7[root@gluster-node1 data]# setfattr -x trusted.glusterfs.volume-id /data/ctdb/

[root@gluster-node1 data]# setfattr -x trusted.gfid /data/ctdb/

[root@gluster-node1 data]# rm -rf /data/ctdb/.

./          ../         .glusterfs/

[root@gluster-node1 data]# rm -rf /data/ctdb/.glusterfs

[root@gluster-node1 data]# service glusterd restart

Starting glusterd:                                         [確定]

不要擔心提示說 屬性不存在,只要它不存在,那它就處於健康的狀態

最后,重啟一下glusterfs來確保它並沒有“記起來”舊的bricks

有些可能翻譯的不到位,可以查看原文

https://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-part-of-a-volume/

#glusterfs

error

GlusterFS: {path} or a prefix of it is already part of avolume

Starting

with GlusterFS 3.3, one change has been the check to see if a directory (or any

of it's ancestors) is already part of a volume. This is causing many support

questions in#gluster.

Thiswas implemented because if you remove a brick from a volume and continue to usethe volume, you can get file into a state where re-adding a former brick cancause all sort of problems, many of which can result in data loss.

Ifyou're going to reuse a brick, make sure you know what you're doing.

The Solution

Forthe directory (or any parent directories) that was formerly part of a volume, simply:

setfattr -x trusted.glusterfs.volume-id $brick_path

setfattr -x trusted.gfid $brick_path

rm -rf $brick_path/.glusterfs

Don'tworry if it says that the attribute does not exist. As long as it doesn'texist, you're in good shape.

Finally,restart glusterd to ensure it's not "remembering" the old bricks.

See

thebugzilla entryfor more details and see Jeff Darcy's article for more

information about how GlusterFS usesextended attributes.

[python]view plaincopy

1.[root@test glusterfs]# gluster volume create hello  replica 3  test.144:/data0/glusterfs test.145:/data0/glusterfs test.146:/data0/glusterfs

2.volume create: hello: success: please start the volume to access data

3.[root@test glusterfs]# gluster volume delete hello

4.Deleting volume will erase all information about the volume. Do you want tocontinue? (y/n) y

5.volume delete: hello: success

6.[root@test glusterfs]# gluster volume create gfs  replica 3  test.144:/data0/glusterfs test.145:/data0/glusterfs test.146:/data0/glusterfs

7.volume create: gfs: failed: Staging failed on test.144.Error: /data0/glusterfsisalready part of a volume

8.Staging failed on test.146.Error: /data0/glusterfsisalready part of a volume

9.Staging failed on test.145.Error: /data0/glusterfsisalready part of a volume

10.[root@test glusterfs]# ssh test.144 'setfattr -x trusted.glusterfs.volume-id /data0/glusterfs/ && setfattr -x trusted.gfid /data0/glusterfs/ && rm -rf /data0/glusterfs/..glusterfs'

11.setfattr: /data0/glusterfs/: No such attribute

12.[root@test glusterfs]# ssh test.145 'setfattr -x trusted.glusterfs.volume-id /data0/glusterfs/ && setfattr -x trusted.gfid /data0/glusterfs/ && rm -rf /data0/glusterfs/..glusterfs'

13.setfattr: /data0/glusterfs/: No such attribute

14.[root@test glusterfs]# ssh test.146 'setfattr -x trusted.glusterfs.volume-id /data0/glusterfs/ && setfattr -x trusted.gfid /data0/glusterfs/ && rm -rf /data0/glusterfs/..glusterfs'

15.setfattr: /data0/glusterfs/: No such attribute

16.[root@test glusterfs]#

17.[root@test glusterfs]# gluster volume create gfs  replica 3  test.144:/data0/glusterfs test.145:/data0/glusterfs test.146:/data0/glusterfs

18.volume create: gfs: success: please start the volume to access data

19.[root@test glusterfs]#

查看node2防火牆規則是否適當,網絡是否通順

3.之前brick在集群里邊,后來踢出來,再加報錯處理

[root@node1

brick]#gluster volume add-brick testvolnode3:/export/sdb1/brick node4:/export/sdb1/brick node3:/export/sdc1/bricknode4:/export/sdc1/brick

volume add-brick: failed: Pre Validationfailed on node3. /export/sdb1/brick is already part of a volume

Pre Validation failed on node4./export/sdb1/brick is already part of a volume

[root@node1

brick]#gluster volume add-brick testvolnode3:/export/sdb1/brick node4:/export/sdb1/brick node3:/export/sdc1/bricknode4:/export/sdc1/brick

volume add-brick: success

官方安裝手冊

Step 1 – Have atleast two nodes

·Fedora 22 (or later) on two nodes named"server1" and "server2"

·A working network connection

·At least two virtual disks, one for the OS installation,and one to be used to serve GlusterFS storage (sdb). This will emulate a realworld deployment, where you would want to separate GlusterFS storage from theOS install.

·Note: GlusterFS stores its dynamically generatedconfiguration files at /var/lib/glusterd. If at any point in time GlusterFS isunable to write to these files (for example, when the backing filesystem isfull), it will at minimum cause erratic behavior for your system; or worse,take your system offline completely. It is advisable to create separatepartitions for directories such as /var/log to ensure this does not happen.

Step 2 - Formatand mount the bricks

(on both nodes): Note: These examplesare going to assume the brick is going to reside on /dev/sdb1.

mkfs.xfs -isize=512/dev/sdb1

mkdir -p/data/brick1

echo'/dev/sdb1 /data/brick1 xfs defaults 1 2'>> /etc/fstab

mount -a&& mount

You should now see sdb1 mounted at/data/brick1

Step 3 -Installing GlusterFS

(on both servers) Install the software

yuminstallglusterfs-server

Start the GlusterFS management daemon:

service glusterd start

service glusterd status

glusterd.service -LSB:glusterfs server

Loaded: loaded (/etc/rc.d/init.d/glusterd)

Active: active (running) since Mon,13Aug201213:02:11-0700;2s ago

Process:19254ExecStart=/etc/rc.d/init.d/glusterd start (code=exited, status=0/SUCCESS)

CGroup: name=systemd:/system/glusterd.service

├19260/usr/sbin/glusterd -p /run/glusterd.pid

├19304/usr/sbin/glusterfsd --xlator-option georep-server.listen-port=24009-s localhost...

└19309/usr/sbin/glusterfs -f /var/lib/glusterd/nfs/nfs-server.vol -p/var/lib/glusterd/...

Step 4 -Configure the trusted pool

From "server1"

glusterpeer probe server2

Note: When using hostnames, the firstserver needs to be probed fromoneother server to setits hostname.

From "server2"

glusterpeer probe server1

Note: Once this pool has beenestablished, only trusted members may probe new servers into the pool. A newserver cannot probe the pool, it must be probed from the pool.

Step 5 - Set upa GlusterFS volume

On both server1 and server2:

mkdir -p /data/brick1/gv0

From any single server:

gluster volumecreategv0 replica2server1:/data/brick1/gv0 server2:/data/brick1/gv0

gluster volumestartgv0

Confirm that the volume shows"Started":

glustervolumeinfo

Note: If the volume is not started,clues as to what went wrong will be in log files under /var/log/glusterfs onone or both of the servers - usually in etc-glusterfs-glusterd.vol.log

Step 6 - Testingthe GlusterFS volume

For this step, we will use one of theservers to mount the volume. Typically, you would do this from an externalmachine, known as a "client". Since using this method would requireadditional packages to be installed on the client machine, we will use one ofthe servers as a simple place to test first, as if it were that"client".

mount -t glusterfs server1:/gv0 /mnt

foriin`seq -w 1 100`;docp -rp /var/log/messages /mnt/copy-test-$i; done

First, check the mount point:

ls-lA/mnt | wc -l

You should see 100 files returned. Next,check the GlusterFS mount points on each server:

ls -lA /data/brick1/gv0

You should see 100 files on each serverusing the method we listed here. Without replication, in a distribute onlyvolume (not detailed here), you should see about 50 files on each one.

1.Configure Firewall(最好關閉防火牆)

 

For the Gluster to communicate within a cluster either the firewallshave to be turned off or enable communication for each server.

iptables -IINPUT -p all -s `` -j ACCEPT

2. Configurethe trusted pool

 

Remember that the trusted pool is the term used to define a cluster

of nodes in Gluster. Choose a server to be your “primary” server. This is just

to keep things simple, you will generally want to run all commands from this

tutorial. Keep in mind, running many Gluster specific commands (like gluster

volume create) on one server in the cluster will execute the same command on

all other servers.(只需要在一台機器上執行)

 

3. Replace

nodename with hostname of the other server in the cluster, or IP address if you

don’t have DNS or /etc/hosts entries. Let say we want to connect to node02:(在DNS和/etc/hosts中都要設置)

gluster peerprobe node02

Notice that running gluster peer statusfrom the second node shows that the first node has already been added.

 

4. Partition

the disk(磁盤處理)

 

4.1Assuming you have a emptydisk at /dev/sdb:

fdisk /dev/sdb

4.2 And then create a single XFS partition using fdisk

Format thepartition

mkfs.xfs -i size=512 /dev/sdb1

4.3 Add an entry to /etc/fstab

echo "/dev/sdb1/export/sdb1 xfs defaults 0 0">> /etc/fstab

4.4Mount the partition as aGluster "brick"

mkdir -p /export/sdb1 && mount -a&& mkdir -p /export/sdb1/brick

Set up a Gluster volume

 

The most basic Gluster volume type is a“Distribute only” volume (also referred to as a “pure DHT” volume if you wantto impress the folks at the water cooler). This type of volume simplydistributes the data evenly across the available bricks in a volume. So, if Iwrite 100 files, on average, fifty will end up on one server, and fifty willend up on another. This is faster than a “replicated” volume, but isn’t aspopular since it doesn’t give you two of the most sought after features ofGluster — multiple copies of the data, and automatic failover if something goeswrong.

 

1.To set up a replicated volume:(設置復制卷)

 

gluster volumecreate gv0 replica 2 node01.mydomain.net:/export/sdb1/bricknode02.mydomain.net:/export/sdb1/brick

 

Breakingthis down into pieces:

 

the first part says to create a glustervolume named gv0 (the name is arbitrary, gv0 was chosen simply because it’sless typing than gluster_volume_0).

make the volume a replica volume

keep a copy of the data on at least 2bricks at any given time. Since we only have two bricks total, this means eachserver will house a copy of the data.

we specify which nodes to use, and whichbricks on those nodes. The order here is important when you have more bricks.

It is possible (as of the most currentrelease as of this writing, Gluster 3.3) to specify the bricks in a such a waythat you would make both copies of the data reside on a single node. This wouldmake for an embarrassing explanation to your boss when your bulletproof,completely redundant, always on super cluster comes to a grinding halt when asingle point of failure occurs.

 

2.Now, we can check to make sure things are working as expected:

 

# gluster volume info

 

And you should see results similar to thefollowing:

Volume Name: gv0

Type: Replicate

Volume ID: 8bc3e96b-a1b6-457d-8f7a-a91d1d4dc019

Status: Created

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: node01.yourdomain.net:/export/sdb1/brick

Brick2: node02.yourdomain.net:/export/sdb1/brick

This shows us essentially what we justspecified during the volume creation. The one this to mention is the Status. Astatus of Created means that the volume has been created, but hasn’t yet beenstarted, which would cause any attempt to mount the volume fail.

 

3.Now, we should start the volume.

# gluster volumestart gv0

其他資料

支持的數據類型:

Gluster does not support so called“structured data”, meaning live, SQL databases. Of course, using Gluster tobackup and restore the database would be fine - Gluster is traditionally betterwhen usingfile sizes at of least 16KB(with a sweet spot around 128KB or so).

 

結構化數據不支持,但是可以用來備份和重新存儲數據

硬件要求(是否可以是不同的硬件)

 

If you want to test on bare metal, sinceGluster is built with commodity hardware in mind, and because there is nocentralized meta-data server, a very simple cluster can be deployed with twobasic servers (2 CPU’s, 4GB of RAM each, 1 Gigabit network). This is sufficientto have a nice file share or a place to put some nightly backups. Gluster isdeployed successfully on all kinds of disks, from the lowliest 5200 RPM SATA tomightiest 1.21 gigawatt SSD’s. The more performance you need, the moreconsideration you will want to put into how much hardware to buy, but the greatthing about Gluster is that you can start small, and add on as your needs grow.

是否需要同樣配置的主機

OK, but if I addservers on later, don’t they have to be exactly the same?

In a perfect world, sure. Having thehardware be the same means less troubleshooting when the fires start poppingup. But plenty of people deploy Gluster on mix and match hardware, andsuccessfully.

不需要

文件系統格式的要求建議XFS格式

Typically, XFS is recommended but it can beused with other filesystems as well. Most commonly EXT4 is used when XFS isn’t,but you can (and many, many people do) use another filesystem that suits you.Now that we understand that, we can define a few of the common terms used inGluster.

注意點:

we can define a few of the common termsused in Gluster.

·Atrusted poolrefers collectively tothe hosts in a given Gluster Cluster.

·Anodeor “server” refers to any serverthat is part of a trusted pool. In general, this assumes all nodes are in thesame trusted pool.

·Abrickis used to refer to any device(really this means filesystem) that is being used for Gluster storage.

·Anexportrefers to the mount path ofthe brick(s) on a given server, for example, /export/brick1

·The termGlobal Namespaceis a fancy wayof saying a Gluster volume

·AGluster volumeis a collection of oneor more bricks (of course, typically this is two or more). This is analogous to/etc/exports entries for NFS.

·GNFSandkNFS. GNFS is how we refer to ourinline NFS server. kNFS stands for kernel NFS, or, as most people would say,just plain NFS. Most often, you will want kNFS services disabled on the Glusternodes. Gluster NFS doesn't take any additional configuration and works justlike you would expect with NFSv3. It is possible to configure Gluster and NFSto live in harmony if you want to.

Other notes:

·For this test, if you do not have DNS set up, you can getaway with using /etc/hosts entries for the two nodes. However, when you movefrom this basic setup to using Gluster in production, correct DNS entries(forward and reverse) and NTP are essential.

·When you install the Operating System, do not format theGluster storage disks! We will use specific settings with the mkfs commandlater on when we set up Gluster. If you are testing with a single disk (notrecommended), make sure to carve out a free partition or two to be used byGluster later, so that you can format or reformat at will during your testing.

·Firewalls are great, except when they aren’t. For storage

servers, being able to operate in a trusted environment without firewalls can

mean huge gains in performance, and is recommended. In case you absolutely need

to set up a firewall, have a look atSetting up clientsfor information on the ports used

硬件要求(最小需求)

64位系統,1個cpu,1GRAM,8G存儲

You will need to have at least two nodes witha 64 bit OS and a working network connection. At least one gig of RAM is thebare minimum recommended for testing, and you will want at least 8GB in anysystem you plan on doing any real work on. A single cpu is fine for testing, aslong as it is 64 bit.

 

 

####################################################################################

2.GlusterFS 安裝配置

 

2.1 GlusterFS 安裝前的准備

服務器規划:(vmware 實驗)

操作系統 IP 主機名 數據盤(2 塊)
CentOS 6.8 x86_64 10.1.0.151 mystorage1 sdb:10G  sdc:10G
CentOS 6.8 x86_64 10.1.0.152 mystorage2 sdb:10G  sdc:10G
CentOS 6.8 x86_64 10.1.0.153 mystorage3 sdb:10G  sdc:10G
CentOS 6.8 x86_64 10.1.0.154 mystorage4 sdb:10G  sdc:10G

2.2 GlusterFS 安裝

2.2.1 修改主機名

# vim /etc/sysconfig/network

執行 hostname  主機名 ;

主機名修改完畢

2.2.2 添加 hosts 文件實現集群主機之間相互能夠解析

# vim /etc/hosts 
10.1.0.151 mystorage1
10.1.0.152 mystorage2
10.1.0.153 mystorage3
10.1.0.154 mystorage4

2.2.3 關閉 SELinux 和 防火牆

# sed -i 's#SELINUX=enforcing#SELINUX=disabled#' /etc/selinux/config
# chkconfig iptables off
# reboot

2.2.4 安裝 EPEL 源

GlusterFS yum 源有部分包依賴 epel 源

# 移除/etc/yum.repos.d 中的原由yum源,更改為aliyun的源;
# yum install epel-release -y

2.2.5 安裝 GlusterFS 源及相關軟件包

復制代碼
# yum install centos-release-gluster37.noarch -y
# yum --enablerepo=centos-gluster*-test install glusterfs-server glusterfs-cli glusterfs-geo-replication -y

# 安裝完成后的包
rpm -qa | grep gluster*
centos-release-gluster37-1.0-4.el6.centos.noarch
glusterfs-api-3.7.13-1.el6.x86_64
glusterfs-3.7.13-1.el6.x86_64
glusterfs-client-xlators-3.7.13-1.el6.x86_64
glusterfs-fuse-3.7.13-1.el6.x86_64
glusterfs-server-3.7.13-1.el6.x86_64
glusterfs-libs-3.7.13-1.el6.x86_64
glusterfs-cli-3.7.13-1.el6.x86_64
glusterfs-geo-replication-3.7.13-1.el6.x86_64
復制代碼

2.3配置 GlusterFS

2.3.1 查看 GlusterFS 版本信息

使用 glusterfs -V 命令

[root@mystorage1 ~]# glusterfs -V
glusterfs 3.7.20 built on Jan 30 2017 15:39:27

 2.3.2啟動、停止服務

# 四台虛機上都執行
# /etc/init.d/glusterd start
# /etc/init.d/glusterd status
# chkconfig glusterd on

2.3.3 存儲主機加入信任主機池

在一台主機上執行,將其他主機加入,如下是在 mystorage1 上執行

復制代碼
[root@mystorage1 ~]# gluster peer probe mystorage2
peer probe: success. 
[root@mystorage1 ~]# gluster peer probe mystorage3
peer probe: success. 
[root@mystorage1 ~]# gluster peer probe mystorage4
peer probe: success. 
復制代碼

 2.3.4查看狀態

在其他機器查看狀態:

復制代碼
[root@mystorage2 ~]# gluster peer status
Number of Peers: 3

Hostname: mystorage1
Uuid: 6e6a84af-ac7a-44eb-85c9-50f1f46acef1
State: Peer in Cluster (Connected)

Hostname: mystorage3
Uuid: 36e4c45c-466f-47b0-b829-dcd4a69ca2e7
State: Peer in Cluster (Connected)

Hostname: mystorage4
Uuid: c607f6c2-bdcb-4768-bc82-4bc2243b1b7a
State: Peer in Cluster (Connected)
復制代碼

2.3.5 配置前的准備工作

安裝 xfs 支持包(ext4文件格式支持16TB的磁盤大小,而xfs是PB級別的)

# 四台都執行
# yum install xfsprogs -y

fdisk -l 查看磁盤設備

復制代碼
# fdisk /dev/sdb
n
p
1
w
** 線上是可以不做這一步的;
復制代碼
# 解釋說明:
  如果磁盤大於 2T 的話就用 parted 來分區,這里我們不用分區(可以不分區);   做分布式文件系統的時候數據盤一般不需要做 RAID,一般系統盤會做 RAID 1;
  如果有raid卡的話,最好用上,raid卡有數據緩存功能,也能提高磁盤的iops,最好的話,用RAID 5;
  如果都不做raid的話,也是沒問題的,glusterfs也是可以保證數據的安全的。

格式化創建文件系統

# 四台都執行
# mkfs.xfs -f /dev/sdb

在四台機器上創建掛載塊設備的目錄,掛載硬盤到目錄:

復制代碼
# 四台都執行
# mkdir -p /storage/brick{1..2}
# mount /dev/sdb /storage/brick1
# df -h
# 加入開機自動掛載
# echo "/dev/sdb /storage/brick1 xfs defaults 0 0" >>/etc/fstab
# mount -a
復制代碼

2.3.6 創建 volume 及其他操作

GlusterFS 五種卷

  • Distributed:分布式卷,文件通過 hash 算法隨機分布到由 bricks 組成的卷上。
  • Replicated: 復制式卷,類似 RAID 1,replica 數必須等於 volume 中 brick 所包含的存儲服務器數,可用性高。
  • Striped: 條帶式卷,類似 RAID 0,stripe 數必須等於 volume 中 brick 所包含的存儲服務器數,文件被分成數據塊,以 Round Robin 的方式存儲在 bricks 中,並發粒度是數據塊,大文件性能好。
  • Distributed Striped: 分布式的條帶卷,volume中 brick 所包含的存儲服務器數必須是 stripe 的倍數(>=2倍),兼顧分布式和條帶式的功能。
  • Distributed Replicated: 分布式的復制卷,volume 中 brick 所包含的存儲服務器數必須是 replica 的倍數(>=2倍),兼顧分布式和復制式的功能。

              分布式復制卷的brick順序決定了文件分布的位置,一般來說,先是兩個brick形成一個復制關系,然后兩個復制關系形成分布。

glustfs 最常用的卷就是分布式復制卷。
striped 的目的就提高性能,讀取更快。

企業一般用后兩種,大部分會用分布式復制(可用容量為 總容量/復制份數),通過網絡傳輸的話最好用萬兆交換機,萬兆網卡來做。這樣就會優化一部分性能。它們的數據都是通過網絡來傳輸的。

1)分布式卷

復制代碼
# 創建分布式卷
[root@mystorage1 ~]# gluster volume create gv1 mystorage1:/storage/brick1 mystorage2:/storage/brick1 force
volume create: gv1: success: please start the volume to access data

# 啟動創建的卷
[root@mystorage1 ~]# gluster volume start gv1
volume start: gv1: success

# 在另一台機器(mystorage4)查看卷信息
[root@mystorage4 ~]# gluster volume info
 
Volume Name: gv1
Type: Distribute
Volume ID: b6ec2f8a-d1f0-4d1b-806b-238efb6dcb84
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: mystorage1:/storage/brick1
Brick2: mystorage2:/storage/brick1
Options Reconfigured:
performance.readdir-ahead: on

# 掛載卷到目錄
[root@mystorage4 ~]# mount -t glusterfs 127.0.0.1:/gv1 /mnt
[root@mystorage4 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        33G  1.3G   30G   5% /
tmpfs           242M     0  242M   0% /dev/shm
/dev/sda1       976M   38M  888M   5% /boot
/dev/sdb         10G   33M   10G   1% /storage/brick1
127.0.0.1:/gv1   20G   65M   20G   1% /mnt

# 在 mystorage1 創建測試文件
[root@mystorage1 ~]# touch /mnt/{a..d}
[root@mystorage1 ~]# ll /mnt
total 0
-rw-r--r-- 1 root root 0 Jul 30 00:54 a
-rw-r--r-- 1 root root 0 Jul 30 00:54 b
-rw-r--r-- 1 root root 0 Jul 30 00:54 c
-rw-r--r-- 1 root root 0 Jul 30 00:54 d

# 在 mystorage4 也可看到新創建的文件,信任存儲池中的每一台主機掛載這個卷后都可以看到
[root@mystorage4 ~]# ll /mnt/
total 0
-rw-r--r-- 1 root root 0 Jul 30 00:54 a
-rw-r--r-- 1 root root 0 Jul 30 00:54 b
-rw-r--r-- 1 root root 0 Jul 30 00:54 c
-rw-r--r-- 1 root root 0 Jul 30 00:54 d

# 文件實際存在位置
[root@mystorage1 ~]# ls /storage/brick1
a  b  c  e
[root@mystorage2 ~]# ls /storage/brick1
d

# 上面可以看到文件根據 hash 算法隨機分布到由不同的 brick 上
復制代碼

 

使用 NFS 方式掛載

復制代碼
[root@mystorage3 ~]# mount -o mountproto=tcp -t nfs mystorage1:/gv1 /mnt/  
[root@mystorage3 ~]# ll /mnt
total 0
-rw-r--r-- 1 root root 0 Jul 30 00:54 a
-rw-r--r-- 1 root root 0 Jul 30 00:54 b
-rw-r--r-- 1 root root 0 Jul 30 00:54 c
-rw-r--r-- 1 root root 0 Jul 30 00:54 d

[root@mystorage2 ~]# mount -o mountproto=tcp -t nfs 192.168.56.13:/gv1 /mnt/    
# host 可以寫 IP,可以看到這個 mystorage3 的 IP,說明 gv1 是共享給信任存儲池的所有主機的
[root@mystorage2 ~]# ll /mnt/
total 0
-rw-r--r-- 1 root root 0 Jul 30 00:54 a
-rw-r--r-- 1 root root 0 Jul 30 00:54 b
-rw-r--r-- 1 root root 0 Jul 30 00:54 c
-rw-r--r-- 1 root root 0 Jul 30 00:54 d
復制代碼

 

2)復制式卷

復制代碼
# 創建復制式卷
[root@mystorage1 ~]# gluster volume create gv2 replica 2 mystorage3:/storage/brick1 mystorage4:/storage/brick1 force
volume create: gv2: success: please start the volume to access data

# 啟動創建的卷
[root@mystorage1 ~]# gluster volume start gv2
volume start: gv2: success

# 查看卷信息
[root@mystorage1 ~]# gluster volume info gv2
 
Volume Name: gv2
Type: Replicate
Volume ID: 11928696-263a-4c7a-a155-5115af29221f
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: mystorage3:/storage/brick1
Brick2: mystorage4:/storage/brick1
Options Reconfigured:
performance.readdir-ahead: on

# 掛載卷到目錄,創建測試文件
[root@mystorage1 ~]# mount -t glusterfs 127.0.0.1:/gv2 /opt
[root@mystorage1 ~]# touch /opt/{a..d}
[root@mystorage1 ~]# ls /opt
a  b  c  d

# 在 mystorage3,4 可看到新創建的文件
[root@mystorage3 ~]# mount -t glusterfs 127.0.0.1:/gv2 /opt
[root@mystorage3 ~]# ls /opt/
a  b  c  d

[root@mystorage4 ~]# mount -t glusterfs 127.0.0.1:/gv2 /opt
[root@mystorage4 ~]# ls /opt/
a  b  c  d

# 文件實際存在位置
[root@mystorage3 ~]# ls /storage/brick1
a  b  c  d
[root@mystorage4 ~]# ls /storage/brick1
a  b  c  d

# 上面可以看到文件根據在 2 台機器上的 brick 上都有 
復制代碼

格式化掛載第二塊硬盤

# mkfs.xfs -f /dev/sdc
# mkdir -p /storage/brick2
# echo "/dev/sdc  /storage/brick2  xfs defaults 0 0"  >> /etc/fstab
# mount -a
# df -h

3)分布式條帶卷

復制代碼
# 創建分布式條帶卷
[root@mystorage1 ~]# gluster volume create gv3 stripe 2 mystorage3:/storage/brick2 mystorage4:/storage/brick2 force
volume create: gv3: success: please start the volume to access data

# 啟動創建的卷
[root@mystorage1 ~]# gluster volume start gv3
volume start: gv3: success

# 查看卷信息
[root@mystorage1 ~]# gluster volume info gv3
 
Volume Name: gv3
Type: Stripe
Volume ID: 2871801f-b125-465c-be3a-4eeb2fb44916
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: mystorage3:/storage/brick2
Brick2: mystorage4:/storage/brick2
Options Reconfigured:
performance.readdir-ahead: on

# 掛載卷到目錄,創建測試文件
mkdir /gv1 /gv2 /gv3
mount -t glusterfs 127.0.0.1:gv1 /gv1
mount -t glusterfs 127.0.0.1:gv2 /gv2
mount -t glusterfs 127.0.0.1:gv3 /gv3
df -h

dd if=/dev/zero bs=1024 count=10000 of=/gv3/10M.file
dd if=/dev/zero bs=1024 count=20000 of=/gv3/20M.file

# 查看新創建的文件
[root@mystorage1 ~]# ll /gv3/
total 30000
-rw-r--r-- 1 root root 10240000 Jul 30 02:26 10M.file
-rw-r--r-- 1 root root 20480000 Jul 30 02:26 20M.file

# 文件實際存放位置
[root@mystorage3 ~]# ll -h /storage/brick2/
total 15M
-rw-r--r-- 2 root root 4.9M Jul 30 02:26 10M.file
-rw-r--r-- 2 root root 9.8M Jul 30 02:26 20M.file
[root@mystorage4 ~]# ll -h /storage/brick2/
total 15M
-rw-r--r-- 2 root root 4.9M Jul 30 02:25 10M.file
-rw-r--r-- 2 root root 9.8M Jul 30 02:26 20M.file

# 上面可以看到 10M 20M 的文件分別分成了 2 塊(這是條帶的特點),每塊又分別在同的 brick 下(這是分布式的特點)
復制代碼

4)分布式復制卷

復制代碼
# 查看復制式卷的效果
cd /gv2
rm -f *
dd if=/dev/zero bs=1024 count=10000 of=/gv2/10M.file
dd if=/dev/zero bs=1024 count=20000 of=/gv2/20M.file
dd if=/dev/zero bs=1024 count=30000 of=/gv2/30M.file


[root@mystorage3 ~]# ll -h /storage/brick1/
total 59M
-rw-r--r-- 2 root root 9.8M Jul 30 02:41 10M.file
-rw-r--r-- 2 root root  20M Jul 30 02:41 20M.file
-rw-r--r-- 2 root root  30M Jul 30 02:41 30M.file
[root@mystorage4 ~]# ll -h /storage/brick1
total 59M
-rw-r--r-- 2 root root 9.8M Jul 30 02:40 10M.file
-rw-r--r-- 2 root root  20M Jul 30 02:40 20M.file
-rw-r--r-- 2 root root  30M Jul 30 02:40 30M.file

# gv2 添加 brick 進行擴容
[root@mystorage1 ~]# gluster volume stop gv2
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y

[root@mystorage1 ~]# gluster volume add-brick gv2 replica 2 mystorage1:/storage/brick2 mystorage2:/storage/brick2 force
volume add-brick: success

[root@mystorage1 ~]# gluster volume start gv2
volume start: gv2: success

[root@mystorage1 ~]# gluster volume info gv2
 
Volume Name: gv2
Type: Distributed-Replicate                # 這里顯示是分布式復制卷,是在 gv2 復制卷的基礎上增加 2 塊 brick 形成的
Volume ID: 11928696-263a-4c7a-a155-5115af29221f
Status: Stopped
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: mystorage3:/storage/brick1
Brick2: mystorage4:/storage/brick1
Brick3: mystorage1:/storage/brick2
Brick4: mystorage2:/storage/brick2
Options Reconfigured:
performance.readdir-ahead: on
復制代碼

注意:當你給分布式復制卷和分布式條帶卷增加 bricks 時,你增加的 bricks 數目必須是復制或條帶數目的倍數,例如:你給一個分布式復制卷的 replica 為 2,你在增加 bricks 的時候數量必須為2、4、6、8等。 擴容后進行測試,發現文件都分布在擴容前的卷中。

分布式復制卷的最佳實踐:

1)搭建條件
- 塊服務器的數量必須是復制的倍數
- 將按塊服務器的排列順序指定相鄰的塊服務器成為彼此的復制
例如,8台服務器:
- 當復制副本為2時,按照服務器列表的順序,服務器1和2作為一個復制,3和4作為一個復制,5和6作為一個復制,7和8作為一個復制
- 當復制副本為4時,按照服務器列表的順序,服務器1/2/3/4作為一個復制,5/6/7/8作為一個復制
2)創建分布式復制卷

復制代碼
# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
Creation of test-volume has been successful
Please start the volume to access data
參考博客:
http://cmdschool.blog.51cto.com/2420395/1828450
復制代碼

 

磁盤存儲的平衡

注意:平衡布局是很有必要的,因為布局結構是靜態的,當新的 bricks 加入現有卷,新創建的文件會分布到舊的 bricks 中,所以需要平衡布局結構,使新加入的 bricks 生效。布局平衡只是使新布局生效,並不會在新的布局中移動老的數據,如果你想在新布局生效后,重新平衡卷中的數據,還需要對卷中的數據進行平衡。

復制代碼
# 再在 /gv2 下創建 2 個新的文件 10M.file1 20M.file1

[root@mystorage1 ~]# dd if=/dev/zero bs=1024 count=10000 of=/gv2/10M.file1
[root@mystorage1 ~]# dd if=/dev/zero bs=1024 count=20000 of=/gv2/20M.file1
[root@mystorage1 ~]# ll -rht /gv2/
total 88M
-rw-r--r-- 1 root root 9.8M Jul 30 02:40 10M.file
-rw-r--r-- 1 root root  20M Jul 30 02:40 20M.file
-rw-r--r-- 1 root root  30M Jul 30 02:40 30M.file
-rw-r--r-- 1 root root 9.8M Jul 30 03:10 10M.file1
-rw-r--r-- 1 root root  20M Jul 30 03:10 20M.file1

[root@mystorage1 ~]# ll /storage/brick2
total 0
[root@mystorage2 ~]# ll /storage/brick2
total 0
[root@mystorage3 ~]# ll -hrt /storage/brick1
total 88M
-rw-r--r-- 2 root root 9.8M Jul 30 02:41 10M.file
-rw-r--r-- 2 root root  20M Jul 30 02:41 20M.file
-rw-r--r-- 2 root root  30M Jul 30 02:41 30M.file
-rw-r--r-- 2 root root 9.8M Jul 30 03:12 10M.file1
-rw-r--r-- 2 root root  20M Jul 30 03:13 20M.file1
[root@mystorage4 ~]# ll -hrt /storage/brick1
total 88M
-rw-r--r-- 2 root root 9.8M Jul 30 02:40 10M.file
-rw-r--r-- 2 root root  20M Jul 30 02:40 20M.file
-rw-r--r-- 2 root root  30M Jul 30 02:40 30M.file
-rw-r--r-- 2 root root 9.8M Jul 30 03:10 10M.file1
-rw-r--r-- 2 root root  20M Jul 30 03:10 20M.file1

# 從上面可以看到,新創建的文件還是在之前的 bricks 中,並沒有分布中新加的 bricks 中


# 下面進行磁盤存儲平衡
[root@mystorage1 ~]# gluster volume rebalance gv2 start
volume rebalance: gv2: success: Rebalance on gv2 has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: e23213be-7771-4a2b-87b4-259fd048ec46

[root@mystorage1 ~]# gluster volume rebalance gv2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0            completed        0:0:1
                              mystorage2                0        0Bytes             0             0             0            completed        0:0:0
                              mystorage3                2        39.1MB             5             0             0            completed        0:0:2
                              mystorage4                0        0Bytes             0             0             0            completed        0:0:1
volume rebalance: gv2: success

# 查看磁盤存儲平衡后文件在 bricks 中的分布情況
[root@mystorage1 ~]# ll /storage/brick2
total 40000
-rw-r--r-- 2 root root 20480000 Jul 30 02:41 20M.file
-rw-r--r-- 2 root root 20480000 Jul 30 03:13 20M.file1
[root@mystorage2 ~]# ll /storage/brick2
total 40000
-rw-r--r-- 2 root root 20480000 Jul 30 02:41 20M.file
-rw-r--r-- 2 root root 20480000 Jul 30 03:13 20M.file1
[root@mystorage3 ~]# ll -hrt /storage/brick1
total 49M
-rw-r--r-- 2 root root 9.8M Jul 30 02:41 10M.file
-rw-r--r-- 2 root root  30M Jul 30 02:41 30M.file
-rw-r--r-- 2 root root 9.8M Jul 30 03:12 10M.file1
[root@mystorage4 ~]# ll -hrt /storage/brick1
total 49M
-rw-r--r-- 2 root root 9.8M Jul 30 02:40 10M.file
-rw-r--r-- 2 root root  30M Jul 30 02:40 30M.file
-rw-r--r-- 2 root root 9.8M Jul 30 03:10 10M.file1

# 從上面可以看到 20M.file 20M.file1 2 個文件 平衡到 新加的 2 個 brick 中了
復制代碼

每做一次擴容后都需要做一次磁盤平衡。 磁盤平衡是在萬不得已的情況下再做的,一般再創建一個卷就可以了。

移除 brick

你可能想在線縮小卷的大小,例如:當硬件損壞或網絡故障的時候,你可能想在卷中移除相關的 bricks。
注意:當你移除 bricks 的時候,你在 gluster 的掛載點將不能繼續訪問數據,只有配置文件中的信息移除后你才能繼續訪問 bricks 中的數據。當移除分布式復制卷或者分布式條帶卷的時候,移除的 bricks 數目必須是 replica 或者 stripe 的倍數。
例如:一個分布式條帶卷的 stripe 是 2,當你移除 bricks 的時候必須是 2、4、6、8 等。

復制代碼
# 下面移除 gv2 卷的 2 個 bricks

[root@mystorage1 ~]# gluster volume stop gv2
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: gv2: success
[root@mystorage1 ~]# gluster volume remove-brick gv2 replica 2 mystorage3:/storage/brick1 mystorage4:/storage/brick1 force
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit force: success
[root@mystorage1 ~]# gluster volume start gv2
volume start: gv2: success
[root@mystorage1 ~]# ll /gv2/
total 40000
-rw-r--r-- 1 root root 20480000 Jul 30 02:41 20M.file
-rw-r--r-- 1 root root 20480000 Jul 30 03:13 20M.file1

# 如果誤操作刪除了后,其實文件還在 /storage/brick1 里面的,加回來就可以了
[root@mystorage1 ~]# gluster volume stop gv2
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: gv2: success
[root@mystorage1 ~]# gluster volume add-brick gv2 replica 2 mystorage3:/storage/brick1 mystorage4:/storage/brick1 force
volume add-brick: success
[root@mystorage1 ~]# gluster volume info gv2
 
Volume Name: gv2
Type: Distributed-Replicate
Volume ID: 11928696-263a-4c7a-a155-5115af29221f
Status: Stopped
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: mystorage1:/storage/brick2
Brick2: mystorage2:/storage/brick2
Brick3: mystorage3:/storage/brick1
Brick4: mystorage4:/storage/brick1
Options Reconfigured:
performance.readdir-ahead: on
[root@mystorage1 ~]# gluster volume start gv2
volume start: gv2: success
[root@mystorage1 ~]# ll /gv2/                    # 文件還在
total 90000
-rw-r--r-- 1 root root 10240000 Jul 30 02:40 10M.file
-rw-r--r-- 1 root root 10240000 Jul 30 03:10 10M.file1
-rw-r--r-- 1 root root 20480000 Jul 30 02:41 20M.file
-rw-r--r-- 1 root root 20480000 Jul 30 03:13 20M.file1
-rw-r--r-- 1 root root 30720000 Jul 30 02:40 30M.file
復制代碼

刪除卷

一般會用在命名不規范的時候才會刪除

復制代碼
[root@mystorage1 ~]# umount /gv1
[root@mystorage1 ~]# gluster volume stop gv1
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: gv1: success
[root@mystorage1 ~]# gluster volume delete gv1
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: gv1: success
[root@mystorage1 ~]# gluster volume info gv1
Volume gv1 does not exist
復制代碼

VMware WorkStation在線加硬盤:

# echo "- - -" >  /sys/class/scsi_host/host2/scan
# fdisk -l

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM