HBase備份恢復練習


一.冷備

1.創建測試表並插入測試數據

[root@weekend05 ~]# hbase shell

hbase(main):005:0> create 'scores','grade','course'

0 row(s) in 0.4940 seconds

=> Hbase::Table – scores

put 'scores','Tom','grade:','5'

hbase(main):006:0> put 'scores','Tom','course:math','97'

0 row(s) in 0.0710 seconds

hbase(main):007:0> put 'scores','Tom','course:art','87'

0 row(s) in 0.0100 seconds

hbase(main):008:0> put 'scores','Tom','course:english','80'

0 row(s) in 0.0100 seconds

hbase(main):009:0> put 'scores','Jim','grade:','4'

0 row(s) in 0.0210 seconds

hbase(main):010:0> put 'scores','Jim','course:chinese','89'

0 row(s) in 0.0410 seconds

hbase(main):011:0> put 'scores','Jim','course:english','80'

0 row(s) in 0.0090 seconds

hbase(main):012:0> scan 'scores'

ROW COLUMN+CELL

Jim column=course:chinese, timestamp=1465031482172, value=89

Jim column=course:english, timestamp=1465031493584, value=80

Jim column=grade:, timestamp=1465031470174, value=4

Tom column=course:art, timestamp=1465031443686, value=87

Tom column=course:english, timestamp=1465031459071, value=80

Tom column=course:math, timestamp=1465031419695, value=97

2 row(s) in 0.0920 seconds

2.停止HBase

[root@master ~]# stop-hbase.sh

stopping hbase................

3.進行備份

[root@weekend05 ~]# hadoop distcp /hbase /hbasebackup

。。。。。

內容太多

File System Counters

FILE: Number of bytes read=315885

FILE: Number of bytes written=456706

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=103581

HDFS: Number of bytes written=103581

HDFS: Number of read operations=943

HDFS: Number of large read operations=0

HDFS: Number of write operations=187

Map-Reduce Framework

Map input records=161

Map output records=22

Input split bytes=151

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=7

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=130547712

File Input Format Counters

Bytes Read=37917

File Output Format Counters

Bytes Written=2424

org.apache.hadoop.tools.mapred.CopyMapper$Counter

BYTESCOPIED=103581

BYTESEXPECTED=103581

BYTESSKIPPED=10257

COPY=139

SKIP=22

4.啟動HBase

[root@weekend05 ~]# start-hbase.sh

[root@weekend05 ~]#hbase shell

5.刪除測試數據

hbase(main):001:0> list

TABLE

hbase_student

my_data

myns1

new_scores

nyist

scores

user

7 row(s) in 0.2270 seconds

=> ["hbase_student", "my_data", "myns1", "new_scores", "nyist", "scores", "user"]

hbase(main):002:0> disable 'scores'

0 row(s) in 1.3210 seconds

hbase(main):003:0> drop 'scores'

0 row(s) in 0.4170 seconds

6.停止HBase

[root@weekend05 ~]# stop-hbase.sh

stopping hbase...............

7.恢復數據

[root@weekend05 ~]# hdfs dfs -mv /hbase /hbase_tmp’

[root@weekend05 ~]# hadoop distcp -overwrite /hbasebackup /

………上面省略

File System Counters

FILE: Number of bytes read=406438

FILE: Number of bytes written=500029

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=167648

HDFS: Number of bytes written=167648

HDFS: Number of read operations=1503

HDFS: Number of large read operations=0

HDFS: Number of write operations=313

Map-Reduce Framework

Map input records=211

Map output records=0

Input split bytes=152

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=44

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=159383552

File Input Format Counters

Bytes Read=53305

File Output Format Counters

Bytes Written=8

org.apache.hadoop.tools.mapred.CopyMapper$Counter

BYTESCOPIED=167648

BYTESEXPECTED=167648

COPY=211

8.啟動HBase

[root@weekend05 ~]# start-hbase.sh

[root@weekend05 ~]# hbase shell

9.查看測試數據

hbase(main):001:0> scan 'scores'

ROW COLUMN+CELL

Jim column=course:chinese, timestamp=1465031482172, value=89

Jim column=course:english, timestamp=1465031493584, value=80

Jim column=grade:, timestamp=1465031470174, value=4

Tom column=course:art, timestamp=1465031443686, value=87

Tom column=course:english, timestamp=1465031459071, value=80

Tom column=course:math, timestamp=1465031419695, value=97

2 row(s) in 0.2320 seconds

二.熱備

(1)使用export和import

1.創建測試表並插入測試數據

hbase(main):006:0> create 'scores','grade','course'

0 row(s) in 0.4480 seconds

=> Hbase::Table - scores

hbase(main):007:0> put 'scores','Tom','grade:','5'

0 row(s) in 0.0910 seconds

hbase(main):008:0> put 'scores','Tom','course:math','97'

0 row(s) in 0.0240 seconds

hbase(main):009:0> put 'scores','Tom','course:art','87'

0 row(s) in 0.0130 seconds

hbase(main):010:0> put 'scores','Tom','course:english','80'

0 row(s) in 0.0100 seconds

hbase(main):011:0> put 'scores','Jim','grade:','4'

0 row(s) in 0.0100 seconds

hbase(main):012:0> put 'scores','Jim','course:chinese','89'

0 row(s) in 0.0130 seconds

hbase(main):013:0> put 'scores','Jim','course:english','80'

0 row(s) in 0.0210 seconds

hbase(main):014:0> scan 'scores'

ROW COLUMN+CELL

Jim column=course:chinese, timestamp=1465032277623, value=89

Jim column=course:english, timestamp=1465032284037, value=80

Jim column=grade:, timestamp=1465032263803, value=4

Tom column=course:art, timestamp=1465032250427, value=87

Tom column=course:english, timestamp=1465032257150, value=80

Tom column=course:math, timestamp=1465032242712, value=97

Tom column=grade:, timestamp=1465032224863, value=5

2 row(s) in 0.0320 seconds

2.導出數據

a.導出至Linux目錄下

[root@weekend05 ~]# hbase org.apache.hadoop.hbase.mapreduce.Export scores /tmp/scores

.。。。。

File System Counters

FILE: Number of bytes read=3658123

FILE: Number of bytes written=24070263

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=19940317

HDFS: Number of bytes written=385

HDFS: Number of read operations=138

HDFS: Number of large read operations=0

HDFS: Number of write operations=3

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=64

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=8

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=29687808

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=385

b.導出至HDFS目錄下

[root@weekend05 ~]# hbase org.apache.hadoop.hbase.mapreduce.Export scores hdfs://master:9000/myback_scores

.。。。。。

File System Counters

FILE: Number of bytes read=3658123

FILE: Number of bytes written=24067137

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=19940317

HDFS: Number of bytes written=385

HDFS: Number of read operations=138

HDFS: Number of large read operations=0

HDFS: Number of write operations=3

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=64

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=8

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=29687808

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=385

3.刪除表

hbase(main):015:0> disable 'scores'

0 row(s) in 1.2200 seconds

hbase(main):016:0> drop 'scores'

0 row(s) in 0.1700 seconds4.恢復數據

在HBase中創建表結構

hbase(main):017:0> create 'scores','grade','course'

0 row(s) in 0.4450 seconds

=> Hbase::Table – scores

導入數據(兩種方法)

1)[root@weekend05~]#hbase org.apache.hadoop.hbase.mapreduce.Import scores /tmp/scores

2)[root@weekend05~]#hbase org.apache.hadoop.hbase.mapreduce.Import scores hdfs://master:9000/myback_scores

5.測試數據

hbase(main):016:0> scan 'scores'

ROW COLUMN+CELL

Jim column=course:chinese, timestamp=1465032277623, value=89

Jim column=course:english, timestamp=1465032284037, value=80

Jim column=grade:, timestamp=1465032263803, value=4

Tom column=course:art, timestamp=1465032250427, value=87

Tom column=course:english, timestamp=1465032257150, value=80

Tom column=course:math, timestamp=1465032242712, value=97

Tom column=grade:, timestamp=1465032224863, value=5

2 row(s) in 0.0320 seconds

(2)使用CopyTable備份到本數據庫內的表內

在HBase中創建表結構

hbase(main):020:0> create 'new_scores','grade','course'

0 row(s) in 1.2730 seconds

使用CopyTable進行備份

[root@weekend05~]#hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=new_scores scores

查看數據

hbase(main):025:0> scan 'new_scores'

ROW COLUMN+CELL

Jim column=course:chinese, timestamp=1464608884979, value=89

Jim column=course:english, timestamp=1464608885088, value=80

Jim column=grade:, timestamp=1464608884637, value=4

Tom column=course:art, timestamp=1464608875482, value=87

Tom column=course:english, timestamp=1464608875733, value=80

Tom column=course:math, timestamp=1464608875256, value=97

Tom column=grade:, timestamp=1464608875010, value=5

2 row(s) in 0.1120 seconds

疑難小結:

HBase是一個基於LSM樹(log-structured merge-tree)的分布式數據存儲系統,它使用復雜的內部機制確保數據准確性、一致性、多版本等。因此,你如何獲取數十個region server在HDFS和內存中的存儲的眾多HFile文件、WALs(Write-Ahead-Logs)的一致的數據備份?Snapshots(快照)

HBase快照功能豐富,有很多特征,並且創建時不需要關閉集群。關於snapshot在文章《apache hbase snapshot介紹》中有更詳細的介紹。

快照能通過在HDFS中創建一個和unix硬鏈接相同的存儲文件,簡單捕捉你的hbase表的某一時刻的信息(圖1)。這些快照在幾秒內就可以完成,幾乎對整個集群沒有任何性能影響。並且,它只占用一個微不足道的空間。除了在metadata文件中存儲的極少目錄數據,你的數據不會冗余,快照允許你的系統回滾到(創建快照)那個時刻,當然,你需要恢復快照。通過在HBase shell中運行如下命令來創建一個表的快照:

hbase(main):001:0> snapshot 'myTable', 'MySnapShot'

在執行這條命令之后,你將發現在hdfs中有一些小的數據文件。在 /hbase/.snapshot/myTable (CDH4) 或者hbase/.hbase-snapshots (Apache 0.94.6.1),這些文件中存儲着快照信息。想要恢復數據只需要執行在shell中執行如下命令:

hbase(main):002:0> disable 'myTable'

hbase(main):003:0> restore_snapshot 'MySnapShot'

hbase(main):004:0> enable 'myTable'

正如你看到的,恢復快照需要對表進行離線操作。一旦恢復快照,那任何在快照時刻之后做的增加/更新數據都會丟失。如果你的業務需求是這樣的:你必須有數據的異地備份,你可以用exportSnapshot命令賦值一個表的數據到你的本地HDFS或者你選擇的遠程HDFS中。

HBase復制(HBase Relication)

HBase賦值是另外一個負載較輕的備份工具。復制有三種模式:主->從(master->slave),主<->主(master<->master)和循環(cyclic)。這種方法給你靈活的從任意數據中心獲取數據並且確保它能獲得在其他數據中心的所有副本。在一個數據中心發生災難性故障的情況下,客戶端應用程序可以利用DNS工具,重定向到另外一個備用位置復制是一個強大的,容錯的過程。它提供了“最終一致性”,意味着在任何時刻,最近對一個表的編輯可能無法應用到該表的所有副本,但是最終能夠確保一致。

注:對於一個存在的表,你需要通過本文描述的其他方法,手工的拷貝源表到目的表。復制僅僅在你啟動它之后才對新的寫/編輯操作有效

HBase的導出工具是一個內置的實用功能,它使數據很容易從hbase表導入HDFS目錄下的SequenceFiles文件。它創造了一個 map reduce任務,通過一系列HBase API來調用集群,獲取指定表格的每一行數據,並且將數據寫入指定的HDFS目錄中。這個工具對集群來講是性能密集的,因為它使用了mapreduce和 HBase 客戶端API。但是它的功能豐富,支持制定版本或日期范圍,支持數據的篩選,從而使增量備份可用。

下面是一個導出命令的簡單例子:

hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir>

一旦你的表導出了,你就可以復制生成的數據文件到你想存儲的任何地方(比如異地/離線集群存儲)。你可以執行一個遠程的HDFS集群/目錄作為命令的輸出目錄參數,這樣數據將會直接被導出到遠程集群。使用這個方法需要網絡,所以你應該確保到遠程集群的網絡連接是否可靠以及快速。

拷貝表(CopyTable)

拷貝表功能在文章《使用CopyTable在線備份HBase》中有詳細描述,但是這里做了基本的總結。和導出功能類似,拷貝表也使用HBase API創建了一個mapreduce任務,以便從源表讀取數據。不同的地方是拷貝表的輸出是hbase中的另一個表,這個表可以在本地集群,也可以在遠程集群。

一個簡單的例子如下:

hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=testCopy test

這個命令將會拷貝名為test的表到集群中的另外一個表testCopy。

請注意,這里有一個明顯的性能開銷,它使用獨立的“puts”操作來逐行的寫入數據到目的表。如果你的表非常大,拷貝表將會導致目標region server上的memstore被填滿,會引起flush操作並最終導致合並操作的產生,會有垃圾收集操作等等。

此外,你必須考慮到在HBase上運行mapreduce任務所帶來的性能影響。對於大型的數據集,這種方法的效果可能不太理想。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM