GlusterFS數據存儲腦裂修復方案最全解析

本文轉載自查看原文 2020-06-03 10:28 630 GlusterFS/ Kubernetes

本文檔介紹了glusterfs中可用於監視復制卷狀態的heal info命令以及解決腦裂的方法

一. 概念解析

常見術語

名稱	解釋
Brick	GlusterFS 的基本存儲單元，由可信存儲池中服務器上對外輸出的目錄表示。存儲目錄的格式由服務器和目錄的絕對路徑構成 `SERVER:EXPORT`
Volume	一個卷，在邏輯上由N個bricks組成
Fuse	Unix-like OS上的可動態加載的模塊，允許用戶不用修改內核即可創建自己的文件系統
Glusterd	Gluster management daemon，glusterfs 后台進程，運行在所有Glusterfs 節點上
CLI	Command LineInterface 控制台，命令行界面
AFR	Automatic FileReplication 自動文件復制
GFID	glusterfs內部文件標識符，是一個uuid，每個文件唯一
ReplicateVolume	副本卷
Client	客戶端，掛載服務端的存儲
Server	存儲節點服務器，存儲數據的位置

1.1 什么是腦裂

腦裂是指文件的兩個或多個復制副本內容出現差異的情況。當文件處於腦裂狀態時，副本的brick之間文件的數據或元數據不一致，此時盡管所有brick都存在，卻沒有足夠的信息來權威地選擇一個原始副本並修復不良的副本。對於目錄，還存在一個條目腦裂，其中內部的文件在副本的各個brick中具有不同的gfid文件類型；當Gluster AFR無法確定復制集中哪個副本是正確時，此時將會產生腦裂。

1.2 腦裂類型

數據腦裂：文件中的數據在副本集中的brick上不同;
元數據腦裂：brick上的元數據不同;
條目裂腦：當文件在每個副本對上具有不同的gfid時，會發生這種情況；此時是不能自動治愈的。

1.3 查看腦裂信息

gluster volume heal <VOLNAME> info

此命令將會列出所有需要修復的文件（並由self-heal守護進程處理）。執行以后將會輸出文件路徑或者GFID。

heal info命令原理概述

當調用此命令時，將生成一個glfsheal進程，該進程將讀取/<brick-path>/.glusterfs/indices/下的各個子目錄中（它可以連接到的）所有brick條目;這些條目是需要修復文件的gfid;一旦從一個brick中獲得GFID條目，就根據該文件在副本集和trusted.afr.*擴展屬性的每個brick上進行查找，確定文件是否需要修復，是否處於腦裂或其他狀態。

命令輸出示例

[root@gfs ~]# gluster volume heal test info
Brick \<hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2> - Is in split-brain
<gfid:39f301ae-4038-48c2-a889-7dac143e82dd> - Is in split-brain
<gfid:c3c94de2-232d-4083-b534-5da17fc476ac> - Is in split-brain
<gfid:6dc78b20-7eb6-49a3-8edb-087b90142246>

Number of entries: 4

Brick <hostname:brickpath-b2>
/dir/file2
/dir/file1 - Is in split-brain
/dir - Is in split-brain
/dir/file3
/file4 - Is in split-brain
/dir/a

Number of entries: 6

命令輸出解釋

此命令輸出中列出的所有文件都需要修復；列出的文件將會附帶以下標記：

1）Is in split-brain

數據或元數據腦裂的文件將在其路徑/GFID后面附加ls in split-brain，例如，對/file4文件的輸出；但是，對於GFID腦裂中的文件，文件的父目錄顯示為腦裂，文件本身顯示為需要修復，例如，上面的/dir為文件/dir/a的GFID腦裂。腦裂中的文件如果不解決腦裂問題就無法自愈。

2） Is possibly undergoing heal

運行heal info命令時，將會鎖定存儲中的每個文件，以查找是否需要修復。但是，如果自我修復守護程序已經開始修復文件，則它將不會被glfsheal鎖定。在這種情況下，它將會輸出此消息。另一個可能的情況是多個glfsheal進程同時運行（例如，多個用戶同時運行heal info命令）並爭奪相同的鎖。

示例

我們使用兩塊brick b1和b2在復制卷test上；關閉self heal守護程序，掛載點為/mnt。

# gluster volume heal test info
Brick \<hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2> - Is in split-brain
<gfid:39f301ae-4038-48c2-a889-7dac143e82dd> - Is in split-brain
<gfid:c3c94de2-232d-4083-b534-5da17fc476ac> - Is in split-brain
<gfid:6dc78b20-7eb6-49a3-8edb-087b90142246>

Number of entries: 4

Brick <hostname:brickpath-b2>
/dir/file2
/dir/file1 - Is in split-brain
/dir - Is in split-brain
/dir/file3
/file4 - Is in split-brain
/dir/a

Number of entries: 6

輸出結果分析

brick b1，有四項需要修復：

1）gfid為6dc78b20-7eb6-49a3-8edb-087b90142246的文件需要修復
2）aaca219f-0e25-4576-8689-3bfd93ca70c2，
39f301ae-4038-48c2-a889-7dac143e82dd和c3c94de2-232d-4083-b534-5da17fc476ac 處於腦裂狀態

brick b2，有六項需要修復:

1）a、file2和file3需要修復
2）file1、file4和/dir處於腦裂狀態

二. 修復腦裂

命令語句

gluster volume heal <VOLNAME> info split-brain

輸出結果示例

# gluster volume heal test info split-brain
Brick <hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2>
<gfid:39f301ae-4038-48c2-a889-7dac143e82dd>
<gfid:c3c94de2-232d-4083-b534-5da17fc476ac>
Number of entries in split-brain: 3

Brick <hostname:brickpath-b2>
/dir/file1
/dir
/file4
Number of entries in split-brain: 3

注意，heal info命令，對於GFID split brains（相同的文件名但不同的GFID）
他們的父目錄處於腦裂中狀態。

2.1 使用gluster命令行工具解決腦裂問題

一旦確定了腦裂中的文件，就可以使用多種策略從gluster命令行完成其修復。此方法不支持Entry/GFID腦裂修復；可以使用以下策略來修復數據和元數據腦裂：

2.1.1 選擇較大的文件作為源文件

此命令對於已知/確定要將較大的文件視為源文件的文件修復非常有用。

gluster volume heal <VOLNAME> split-brain bigger-file <FILE>

在這里，<FILE>可以是從卷的根目錄中看到的完整文件名（也可以是文件的GFID字符串），一旦執行此命令，將會使用最大的<FILE>副本，並以該brick作為源完成修復。

示例：

在修復文件之前，需注意文件大小和md5校驗和：

在brick b1:

[brick1]# stat b1/dir/file1
  File: ‘b1/dir/file1’
  Size: 17              Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919362      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 13:55:40.149897333 +0530
Modify: 2015-03-06 13:55:37.206880347 +0530
Change: 2015-03-06 13:55:37.206880347 +0530
 Birth: -
[brick1]#
[brick1]# md5sum b1/dir/file1
040751929ceabf77c3c0b3b662f341a8  b1/dir/file1

在brick b2:

[brick2]# stat b2/dir/file1
  File: ‘b2/dir/file1’
  Size: 13              Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919365      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 13:54:22.974451898 +0530
Modify: 2015-03-06 13:52:22.910758923 +0530
Change: 2015-03-06 13:52:22.910758923 +0530
 Birth: -
[brick2]#
[brick2]# md5sum b2/dir/file1
cb11635a45d45668a403145059c2a0d5  b2/dir/file1

使用以下命令修復file1:

gluster volume heal test split-brain bigger-file /dir/file1

修復完成后，兩個brick上的md5校驗和和文件大小應該相同。

在brick b1查看：

[brick1]# stat b1/dir/file1
  File: ‘b1/dir/file1’
  Size: 17              Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919362      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 14:17:27.752429505 +0530
Modify: 2015-03-06 13:55:37.206880347 +0530
Change: 2015-03-06 14:17:12.880343950 +0530
 Birth: -
[brick1]#
[brick1]# md5sum b1/dir/file1
040751929ceabf77c3c0b3b662f341a8  b1/dir/file1

在brick b2查看：

[brick2]# stat b2/dir/file1
  File: ‘b2/dir/file1’
  Size: 17              Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919365      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 14:17:23.249403600 +0530
Modify: 2015-03-06 13:55:37.206880000 +0530
Change: 2015-03-06 14:17:12.881343955 +0530
 Birth: -
[brick2]#
[brick2]# md5sum b2/dir/file1
040751929ceabf77c3c0b3b662f341a8  b2/dir/file1

2.1.2 選擇以最新修改時間為源的文件

命令語句

gluster volume heal <VOLNAME> split-brain latest-mtime <FILE>

該命令使用對<FILE>具有最新修改時間的brick作為修復源。

2.1.3 選擇副本中的一個brick作為特定文件的源

命令語句

gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE>

在這里，<HOSTNAME:BRICKNAME>被選擇為源brick，使用存在於源brick中的文件作為修復源。

示例：

注意在修復前后的md5校驗和和文件大小。

修復前

在brick b1:

[brick1]# stat b1/file4
  File: ‘b1/file4’
  Size: 4               Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919356      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 13:53:19.417085062 +0530
Modify: 2015-03-06 13:53:19.426085114 +0530
Change: 2015-03-06 13:53:19.426085114 +0530
 Birth: -
[brick1]#
[brick1]# md5sum b1/file4
b6273b589df2dfdbd8fe35b1011e3183  b1/file4

在brick b2:

[brick2]# stat b2/file4
  File: ‘b2/file4’
  Size: 4               Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919358      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 13:52:35.761833096 +0530
Modify: 2015-03-06 13:52:35.769833142 +0530
Change: 2015-03-06 13:52:35.769833142 +0530
 Birth: -
[brick2]#
[brick2]# md5sum b2/file4
0bee89b07a248e27c83fc3d5951213c1  b2/file4

使用下述命令修復帶有gfid c3c94de2-232d-4083-b534-5da17fc476ac的文件:

gluster volume heal test split-brain source-brick test-host:/test/b1 gfid:c3c94de2-232d-4083-b534-5da17fc476ac

修復后：

在brick b1查看:

# stat b1/file4
  File: ‘b1/file4’
  Size: 4               Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919356      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 14:23:38.944609863 +0530
Modify: 2015-03-06 13:53:19.426085114 +0530
Change: 2015-03-06 14:27:15.058927962 +0530
 Birth: -
# md5sum b1/file4
b6273b589df2dfdbd8fe35b1011e3183  b1/file4

在brick b2查看:

# stat b2/file4
 File: ‘b2/file4’
  Size: 4               Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919358      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 14:23:38.944609000 +0530
Modify: 2015-03-06 13:53:19.426085000 +0530
Change: 2015-03-06 14:27:15.059927968 +0530
 Birth: -
# md5sum b2/file4
b6273b589df2dfdbd8fe35b1011e3183  b2/file4

2.1.4 選擇一個brick作為所有文件的源

場景：許多文件都處於腦裂狀態，使用一個brick作為源

命令語句

gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME>

上述命令的結果是，選擇<HOSTNAME:BRICKNAME>中的所有腦裂文件作為源文件並將其修復到集群中。

示例：

一個卷中有三個文件a，b和c發生腦裂。

# gluster volume heal test split-brain source-brick test-host:/test/b1
Healed gfid:944b4764-c253-4f02-b35f-0d0ae2f86c0f.
Healed gfid:3256d814-961c-4e6e-8df2-3a3143269ced.
Healed gfid:b23dd8de-af03-4006-a803-96d8bc0df004.
Number of healed entries: 3

如上所述，此方法不支持Entry/GFID腦裂修復不支持使用CLI修復腦裂。修復/dir將失敗，因為它在entry split-brain。

# gluster volume heal test split-brain source-brick test-host:/test/b1 /dir
Healing /dir failed:Operation not permitted.
Volume heal failed.

但是此種問題可以通過從該brick之外的所有brick中刪除文件來修復。參見下文修復目錄腦裂。

2.2 從客戶端修復腦裂

使用getfattr和setfattr命令，檢測文件的數據和元數據腦裂狀態，並從客戶端修復腦裂。

使用具有brick b0，b1，b2和b3的test卷進行測試。

# gluster volume info test

Volume Name: test
Type: Distributed-Replicate
Volume ID: 00161935-de9e-4b80-a643-b36693183b61
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: test-host:/test/b0
Brick2: test-host:/test/b1
Brick3: test-host:/test/b2
Brick4: test-host:/test/b3

brick的目錄結構如下：

# tree -R /test/b?
/test/b0
├── dir
│   └── a
└── file100

/test/b1
├── dir
│   └── a
└── file100

/test/b2
├── dir
├── file1
├── file2
└── file99

/test/b3
├── dir
├── file1
├── file2
└── file99

查看處於腦裂狀態的文件

# gluster v heal test info split-brain
Brick test-host:/test/b0/
/file100
/dir
Number of entries in split-brain: 2

Brick test-host:/test/b1/
/file100
/dir
Number of entries in split-brain: 2

Brick test-host:/test/b2/
/file99
<gfid:5399a8d1-aee9-4653-bb7f-606df02b3696>
Number of entries in split-brain: 2

Brick test-host:/test/b3/
<gfid:05c4b283-af58-48ed-999e-4d706c7b97d5>
<gfid:5399a8d1-aee9-4653-bb7f-606df02b3696>
Number of entries in split-brain: 2

可以通過以下命令查看文件的數據/元數據腦裂狀態

getfattr -n replica.split-brain-status <path-to-file>

如果文件位於數據/元數據腦裂中，則從客戶端執行的上述命令可提供一些信息；還提供了要分析的信息，以獲得有關該文件的更多信息。此命令不適用於gfid目錄腦裂。

示例：

1） file100元數據腦裂。

# getfattr -n replica.split-brain-status file100
file: file100
replica.split-brain-status="data-split-brain:no    metadata-split-brain:yes    Choices:test-client-0,test-client-1"

2） file1數據腦裂。

# getfattr -n replica.split-brain-status file1
file: file1
replica.split-brain-status="data-split-brain:yes    metadata-split-brain:no    Choices:test-client-2,test-client-3"

3） file99數據和元數據同時腦裂。

# getfattr -n replica.split-brain-status file99
file: file99
replica.split-brain-status="data-split-brain:yes    metadata-split-brain:yes    Choices:test-client-2,test-client-3"

4） dir是目錄腦裂，但如前所述，上述命令不適用於這種腦裂。

# getfattr -n replica.split-brain-status dir
file: dir
replica.split-brain-status="The file is not under data or metadata split-brain"

5） file2腦裂但不存在於任何卷中。

# getfattr -n replica.split-brain-status file2
file: file2
replica.split-brain-status="The file is not under data or metadata split-brain"

分析數據和元數據腦裂的文件

在客戶端對腦裂中的文件執行操作（比如cat、getfatter等）會出現input/output error錯誤。為了能夠分析這些文件，glusterfs提供了setfattr命令，可以在安裝glusterfs后直接使用。

# setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>

使用這個命令，可以選擇一個特定的brick來訪問腦裂的文件。

示例：

1） “file1”腦裂。試圖從文件中讀取會出現input/output error錯誤。

# cat file1
cat: file1: Input/output error

file1在test-client-2和test-client-3上發生腦裂。

將test-client-2設置為file1的split brain choice，可以從b2讀取文件。

# setfattr -n replica.split-brain-choice -v test-client-2 file1

對文件執行讀取操作。

# cat file1
xyz

同樣，要從其他客戶端查看文件，replica.split-brain-choice設置為test-client-3。

從錯誤的選擇中檢查文件會出錯

要撤消已設置的腦裂選擇，可以將上述setfattr命令與none一起用作擴展屬性的值。

示例：

# setfattr -n replica.split-brain-choice -v none file1

現在查看文件將再次出現Input/output error錯誤，如前所述。

# cat file
cat: file1: Input/output error

一旦確定了使用的文件，就應該設置brick以進行修復。使用以下命令完成此操作：

# setfattr -n replica.split-brain-heal-finalize -v <heal-choice> <path-to-file>

示例

# setfattr -n replica.split-brain-heal-finalize -v test-client-2 file1

上述命令可用於修復所有文件上的數據和元數據腦裂。

注意:

1）如果禁用了fopen keep cachefuse掛載選項，則每次選擇新副本之前都需要使inode無效。split-brain-choice檢查文件。可以使用如下命令：

# sefattr -n inode-invalidate -v 0 <path-to-file>

2）上面提到的從客戶端修復腦裂的過程將無法在nfs客戶端上運行，因為它不提供xattrs支持

2.3 自動修復腦裂

基於gluster命令行和客戶端的修復方法需要手動修復，手動運行命令。cluster.favorite child policy卷選項，當設置為可用的策略之一時，它將自動修復腦裂，而無需用戶干預；默認值為none，即禁用。

# gluster volume set help | grep -A3 cluster.favorite-child-policy
Option: cluster.favorite-child-policy
Default Value: none
Description: This option can be used to automatically resolve split-brains using various policies without user intervention. "size" picks the file with the biggest size as the source. "ctime" and "mtime" pick the file with the latest ctime and mtime respectively as the source. "majority" picks a file with identical mtime and size in more than half the number of bricks in the replica.

cluster.favorite child policy適用於該卷的所有文件。如果啟用了此選項，則不必在每個文件腦裂時手動修復腦裂文件，而將會根據設置的策略自動修復腦裂。

2.4 最佳實踐

1.獲取腦裂文件的路徑：

可以通過以下方法獲得：

a）命令gluster volume heal info split-brain。
b）標識從客戶端對其執行的文件操作始終失敗並出現Input/Output error的文件。

2.從客戶端關閉打開此文件的應用程序。虛擬機需要關閉電源。

3.確定正確的副本：

通過使用getfattr命令獲取和驗證擴展屬性的變更記錄；然后通過擴展屬性來確定哪些brick包含可信的文件

getfattr -d -m . -e hex <file-path-on-brick>

有可能會出現一個brick包含正確的數據，而另一個brick也包含正確的元數據

使用setfattr命令在包含文件數據/元數據的“不良副本”的brack上重置相關的擴展屬性。

5.在客戶端執行查找命令來觸發文件的自我修復：

ls -l <file-path-on-gluster-mount>

步驟3至5的詳細說明：

要了解如何解決腦裂，我們需要了解changelog擴展屬性。

getfattr -d -m . -e hex <file-path-on-brick>

示例：

[root@store3 ~]# getfattr -d -e hex -m. brick-a/file.txt
\#file: brick-a/file.txt
security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.afr.vol-client-3=0x000000000200000000000000
trusted.gfid=0x307a5c9efddd4e7c96e94fd4bcdcbd1b

trusted.afr.<volname>-client-<subvolume-index> Afr使用擴展屬性來維護文件的變更日志；這個值由glusterfs客戶端（fuse或nfs-server）進程計算；當glusterfs客戶端修改文件或目錄時，客戶端聯系每個模塊，並根據模塊的響應更新changelog擴展屬性。

示例：

[root@pranithk-laptop ~]# gluster volume info vol
 Volume Name: vol
 Type: Distributed-Replicate
 Volume ID: 4f2d7849-fbd6-40a2-b346-d13420978a01
 Status: Created
 Number of Bricks: 4 x 2 = 8
 Transport-type: tcp
 Bricks:
 brick-a: pranithk-laptop:/gfs/brick-a
 brick-b: pranithk-laptop:/gfs/brick-b
 brick-c: pranithk-laptop:/gfs/brick-c
 brick-d: pranithk-laptop:/gfs/brick-d
 brick-e: pranithk-laptop:/gfs/brick-e
 brick-f: pranithk-laptop:/gfs/brick-f
 brick-g: pranithk-laptop:/gfs/brick-g
 brick-h: pranithk-laptop:/gfs/brick-h

在上面的示例中：

Brick             |    Replica set        |    Brick subvolume index
----------------------------------------------------------------------------
-/gfs/brick-a     |       0               |       0
-/gfs/brick-b     |       0               |       1
-/gfs/brick-c     |       1               |       2
-/gfs/brick-d     |       1               |       3
-/gfs/brick-e     |       2               |       4
-/gfs/brick-f     |       2               |       5
-/gfs/brick-g     |       3               |       6
-/gfs/brick-h     |       3               |       7

brick中的每個文件都維護自己的變更日志，副本集中所有其他brick中存在的文件的變更日志，如該brick所示。

在上面給出的示例卷中，brick-a中的所有文件都有兩個條目，一個用於自身，另一個用於副本卷中的文件，即brick-b：

trusted.afr.vol-client-0=0x000000000000000000000000-->自身的更改日志（brick-a）
brick-b的trusted.afr.vol-client-1=0x000000000000000000000000-->更改日志，如brick-a所示

同樣，brick-b中的所有文件也將具有：
brick-a的trusted.afr.vol-client-0=0x000000000000000000000000-->更改日志，如brick-b所示
trusted.afr.vol-client-1=0x000000000000000000000000-->自身的更改日志（brick-b）

Changelog值解析

每個擴展屬性都有一個24位十六進制數字的值，前8位代表數據的變更日志，后8位代表變更日志
元數據的，最后8位數字表示目錄項的更改日志。

0x 000003d7 00000001 00000000
        |      |       |
        |      |        \_ changelog of directory entries
        |       \_ changelog of metadata
         \ _ changelog of data

首8位字段記錄數據變更記錄
中間8位字段記錄元數據變更記錄
末8位字段記錄索引gfid變更記錄

當發生腦裂時，文件的更改日志將如下所示：

示例：（兩份數據，元數據在同一個文件上腦裂對比）

[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a
getfattr: Removing leading '/' from absolute path names
\#file: gfs/brick-a/a
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000003d70000000100000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
\#file: gfs/brick-b/a
trusted.afr.vol-client-0=0x000003b00000000100000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57

結果解析

文件/gfs/brick-a/a上的changelog擴展屬性：

trusted.afr.vol-client-0的前8位都是零（0x00000000……………)
trusted.afr.vol-client-1的前8位並非全為零（0x000003d7……………)
所以/gfs/brick-a/a上的changelog表示數據操作成功完成,但在/gfs/brick-b/a上失敗了。

trusted.afr.vol-client-0的后8位全為零（x……..00000000…….）
trusted.afr.vol-client-1不是全為零（x……..0000000 1……）
因此/gfs/brick-a/a上的changelog表示數據操作成功完成,但在/gfs/brick-b/a上失敗了。

文件/gfs/brick-b/a上的changelog擴展屬性：

trusted.afr.vol-client-0的前8位並非全為零（0x000003b0……………）
trusted.afr.vol-client-1的前8位都為零（0x00000000……………）
所以/gfs/brick-b/a上的changelog表示數據操作成功完成,但在/gfs/brick-a/a上失敗了。

trusted.afr.vol-client-0的后8位不是全為零（x……..0000000 1…….）
trusted.afr.vol-client-1的后8位全為零（x……..00000000……）
所以/gfs/brick-b/a上的changelog表示數據操作成功完成,但在/gfs/brick-a/a上失敗了。

由於兩個副本都具有數據，元數據更改並未在兩個副本同時生效，因此它既是數據腦裂又是元數據腦裂。

確定正確的副本

使用stat，getfatter命令的輸出來決定要保留的元數據和要決定要保留哪些數據的文件內容。

繼續上面的例子，假設我們想要保留/gfs/brick-a/a和/gfs/brick-b/a的元數據。

重置相關變更日志以解決腦裂：

解決數據腦裂：

更改文件的changelog擴展屬性，某些數據在/gfs/brick-a/a上操作成功，但在/gfs/brick-b/a上操作失敗，所以/gfs/brick-b/a不應包含任何更改日志，重置在/gfs/brick-b/a的trusted.afr.vol-client-0上更改日志的數據部分。

解決元數據腦裂：

更改文件的changelog擴展屬性，某些數據在/gfs/brick-b/a上操作成功，但在/gfs/brick-a/a上失敗，所以/gfs/brick-a/a不應包含任何更改日志，重置trusted.afr.vol-client-1更改日志的元數據部分。

完成上述操作后，更改日志將如下所示：

在 /gfs/brick-b/a查看:
trusted.afr.vol-client-0
0x000003b00000000100000000 to 0x000000000000000100000000

元數據部分仍然不是全部為零，執行setfattr-n trusted.afr.vol-client-0-v 0x00000000000000010000000/gfs/brick-b/a

在/gfs/brick-a/a查看：
trusted.afr.vol-client-1
0x000003d70000000100000000 to 0x000003d70000000000000000

數據部分仍然不是全部為零，執行setfattr-n trusted.afr.vol-client-1-v 0x000003d7000000000000000/gfs/brick-a/a

在完成上述操作之后，變更日志如下所示：

[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a
getfattr: Removing leading '/' from absolute path names
#file: gfs/brick-a/a
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000003d70000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57

#file: gfs/brick-b/a
trusted.afr.vol-client-0=0x000000000000000100000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57

執行ls -l <file-path-on-gluster-mount>觸發自愈

修復目錄腦裂

當目錄上出現腦裂時，AFR可以保守地合並目錄中的不同條目。如果在一個brick上的目錄 storage 具有entry 1 ， 2 而在另一個brick上具有entry 3 ， 4 則AFR將合並目錄中的所有 1, 2, 3, 4 條目；以在同一目錄中具有條目。但是，如果由於目錄中文件的刪除而導致腦裂的情況，則可能導致重新顯示已刪除的文件。當至少有一個條目具有相同文件名但 gfid 在該目錄中不同時，腦裂需要人工干預。例：

在 brick-a 目錄上有2個條目， file1 帶有 gfid_x 和 file2 。在 brick-b 目錄中有2項 file1 與 gfid_y 和 file3 。這里的 file1 brick的gfid 有所不同。這類目錄腦裂需要人工干預才能解決此問題。必須刪除 file1 on brick-a 或 file1 on brick-b 才能解決裂腦問題。

此外， gfid-link 必須刪除相應的文件。這些 gfid-link 文件位於brick的頂級目錄中。如果文件的gfid為 0x307a5c9efddd4e7c96e94fd4bcdcbd1b （ getfattr 先前從命令接收到的trust.gfid 擴展屬性），則可以在找到gfid-link文件 /gfs/brick-a/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b 。

注意事項

刪除gfid-link之前，必須確保在該Brick上沒有指向該文件的硬鏈接，如果存在硬鏈接，則也必須刪除它們。

本文轉自GlusterFS官方文檔

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive 數據錯位修復方案 REDIS集群腦裂以及解決方案 redis集群腦裂以及解決方案什么是腦裂? 如何防止Redis腦裂導致數據丟失？ rpm數據庫被損壞修復方案 RabbitMQ腦裂問題解決方案調查 keepalive和腦裂問題 RabbitMq腦裂問題 zk腦裂