HBase快照遷移數據失敗原因及解決辦法


目錄

目錄 1

1. 背景 1

2. 環境 1

3. 執行語句 1

4. 問題描述 1

5. 錯誤信息 2

6. 問題原因 3

7. 解決辦法 4

 

1. 背景

機房裁撤,需將源HBase集群的數據遷移到目標HBase集群,采用快照遷移方式

2. 環境

Hadoop-3.1.2 + HBase-2.2.1

3. 執行語句

time hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -overwrite -snapshot test.snapshot -copy-from hdfs://192.168.32.30/hbase -copy-to hdfs://192.168.31.30/hbase -mappers 10 -bandwidth 30

4. 問題描述

遷移小表(耗時幾分鍾內)時沒有遇到過錯誤,但遷移大表(耗時超過30分鍾)時,一直報Can't find hfile”。網上有關該問題的內容稀少,其中只有一篇文章提到解決辦法,但按文章的方法並未能解決問題,而且從錯誤信息來看,並不是該文章所說的內存配置過小。

5. 錯誤信息

2019-10-18 19:57:27,261 INFO  [main] snapshot.ExportSnapshot: Finalize the Snapshot Export

2019-10-18 19:57:27,272 INFO  [main] snapshot.ExportSnapshot: Verify snapshot integrity

2019-10-18 19:57:27,323 ERROR [VerifySnapshot-pool1-t7] snapshot.SnapshotReferenceUtil: Can't find hfile: 643c8e0f85e5487982241077ae245f34 in the real (hdfs://192.168.31.30/hbase/data/test/135c6968cf1923ecde60afa8917354bb/cf1/643c8e0f85e5487982241077ae245f34) or archive (hdfs://192.168.31.30/hbase/archive/data/test/135c6968cf1923ecde60afa8917354bb/cf1/643c8e0f85e5487982241077ae245f34) directory for the primary table.

2019-10-18 19:57:27,325 ERROR [main] snapshot.ExportSnapshot: Snapshot export failed

org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Can't find hfile: 643c8e0f85e5487982241077ae245f34 in the real (hdfs://192.168.31.30/hbase/data/test/135c6968cf1923ecde60afa8917354bb/cf1/643c8e0f85e5487982241077ae245f34) or archive (hdfs://192.168.31.30/hbase/archive/data/test/135c6968cf1923ecde60afa8917354bb/cf1/643c8e0f85e5487982241077ae245f34) directory for the primary table.

        at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.concurrentVisitReferencedFiles(SnapshotReferenceUtil.java:238)

        at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.verifySnapshot(SnapshotReferenceUtil.java:197)

        at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.verifySnapshot(SnapshotReferenceUtil.java:181)

        at org.apache.hadoop.hbase.snapshot.ExportSnapshot.verifySnapshot(ExportSnapshot.java:825)

        at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:1043)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:1102)

        at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:1106)

2019-10-18 19:57:27,328 ERROR [VerifySnapshot-pool1-t8] snapshot.SnapshotReferenceUtil: Can't find hfile: 26f079c7a824495fbfd53e19bb17b61b in the real (hdfs://192.168.31.30/hbase/data/test/0efcc556e8e6d9591db883f28377d94b/cf1/26f079c7a824495fbfd53e19bb17b61b) or archive (hdfs://192.168.31.30/hbase/archive/data/test/0efcc556e8e6d9591db883f28377d94b/cf1/26f079c7a824495fbfd53e19bb17b61b) directory for the primary table.

2019-10-18 19:57:27,326 ERROR [VerifySnapshot-pool1-t1] snapshot.SnapshotReferenceUtil: Can't find hfile: b99bf4640cea4b61b236f254523bb411 in the real (hdfs://192.168.31.30/hbase/data/test/01df524951ebfb58fd7997a661849f0c/cf1/b99bf4640cea4b61b236f254523bb411) or archive (hdfs://192.168.31.30/hbase/archive/data/test/01df524951ebfb58fd7997a661849f0c/cf1/b99bf4640cea4b61b236f254523bb411) directory for the primary table.

2019-10-18 19:57:27,326 ERROR [VerifySnapshot-pool1-t6] snapshot.SnapshotReferenceUtil: Can't find hfile: 0c078354218b40f989b5512e87a1d40d in the real (hdfs://192.168.31.30/hbase/data/test/0a3b62dfd3c602a34e2b1a4b5829f37a/cf1/0c078354218b40f989b5512e87a1d40d) or archive (hdfs://192.168.31.30/hbase/archive/data/test/0a3b62dfd3c602a34e2b1a4b5829f37a/cf1/0c078354218b40f989b5512e87a1d40d) directory for the primary table.

2019-10-18 19:57:27,326 ERROR [VerifySnapshot-pool1-t5] snapshot.SnapshotReferenceUtil: Can't find hfile: f539907e9608424a8403d298fddf4570 in the real (hdfs://192.168.31.30/hbase/data/test/12c92f19120021d17d531f036e3d3eac/cf1/f539907e9608424a8403d298fddf4570) or archive (hdfs://192.168.31.30/hbase/archive/data/test/12c92f19120021d17d531f036e3d3eac/cf1/f539907e9608424a8403d298fddf4570) directory for the primary table.

2019-10-18 19:57:27,326 ERROR [VerifySnapshot-pool1-t4] snapshot.SnapshotReferenceUtil: Can't find hfile: 9e7dc776fc204c2688d46e79f7810b01 in the real (hdfs://192.168.31.30/hbase/data/test/01927a47aa1ea2a1a660af6c71f19c35/cf1/9e7dc776fc204c2688d46e79f7810b01) or archive (hdfs://192.168.31.30/hbase/archive/data/test/01927a47aa1ea2a1a660af6c71f19c35/cf1/9e7dc776fc204c2688d46e79f7810b01) directory for the primary table.

2019-10-18 19:57:27,326 ERROR [VerifySnapshot-pool1-t2] snapshot.SnapshotReferenceUtil: Can't find hfile: 807e7447315344e3b4bffbd3d15f1b2a in the real (hdfs://192.168.31.30/hbase/data/test/0209bf6c89327d5b0fb07782188f73fb/cf1/807e7447315344e3b4bffbd3d15f1b2a) or archive (hdfs://192.168.31.30/hbase/archive/data/test/0209bf6c89327d5b0fb07782188f73fb/cf1/807e7447315344e3b4bffbd3d15f1b2a) directory for the primary table.

2019-10-18 19:57:27,336 ERROR [VerifySnapshot-pool1-t7] snapshot.SnapshotReferenceUtil: Can't find hfile: ec45664b691941fe84502783b88f6ed7 in the real (hdfs://192.168.31.30/hbase/data/test/02f45bdfc56d563ee4ffa924c02dcb10/cf1/ec45664b691941fe84502783b88f6ed7) or archive (hdfs://192.168.31.30/hbase/archive/data/test/02f45bdfc56d563ee4ffa924c02dcb10/cf1/ec45664b691941fe84502783b88f6ed7) directory for the primary table.

2019-10-18 19:57:27,338 ERROR [VerifySnapshot-pool1-t3] snapshot.SnapshotReferenceUtil: Can't find hfile: 0c6a05097eb743ea8366b74cdfd3df8e in the real (hdfs://192.168.31.30/hbase/data/test/092dba98be14004efdca23e1d0ecd157/cf1/0c6a05097eb743ea8366b74cdfd3df8e) or archive (hdfs://192.168.31.30/hbase/archive/data/test/092dba98be14004efdca23e1d0ecd157/cf1/0c6a05097eb743ea8366b74cdfd3df8e) directory for the primary table.

6. 問題原因

該問題的原因是從源集群復制過來的文件在目標集群上不存在,檢查目標集群,可發現目標集群的NameNode上有出現未找到的文件,也就是說文件原來是存在的,但過程中又被刪除了

7. 解決辦法

在快照未建立之前,HBase會定期清理archive目錄下的數據。實測也正是如此,可將“org.apache.hadoop.hbase.master.cleaner.CleanerChore”的DEBUG日志打開,以觀察文件被刪除痕跡(修改HBaselog4j.properties)。

log4j.logger.org.apache.hadoop.hbase.master.cleaner.CleanerChore=DEBUG

 

CleanerChore線程清理archive目錄是通過配置項hbase.master.hfilecleaner.ttl控制的,默認是5分鍾(單位:毫秒),大表的文件遷移遠超5分鍾。將hbase.master.hfilecleaner.ttl調到兩小時的足夠大值后,問題消失。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM