前言
本文基於偽分布式搭建 hadoop+zookeeper+hbase+opentsdb之后,想了解前因后果的可以看上一篇和上上篇。
opentsdb在hbase中生成4個表(tsdb, tsdb-meta, tsdb-tree, tsdb-uid),其中tsdb這個表最重要,數據遷移時,備份還原此表即可。
一、本地數據備份恢復
1、備份
本文測試本地備份服務器hostname:hbase3,ip為192.168.0.214。
# 進入shell命令 hbase shell # 快照tsdb表:快照名snapshot_tsdb_214 snapshot 'tsdb','snapshot_tsdb_214' ## 查看快照 # 1、shell查看快照 list_snapshots # 2、hdfs 查看快照:在hdfs的 /hbase/.hbase-snapshot 目錄下可以查看所有快照 hdfs dfs -ls -R /hbase/.hbase-snapshot
2、還原
【1】原數據查看
【2】刪除表
# disable表 disable 'tsdb' # 查看 drop 'tsdb'
【3】還原
# 恢復快照 restore_snapshot 'snapshot_tsdb_214'
# 查看表是否生成
list
【4】驗證
二、數據遷移:從一台服務器遷移到另一台服務器
本文從hbase3(ip:192.168.0.214)遷移到hbase1(ip:192.168.0.211),這兩台服務器搭建的環境一樣,並且做了互相免密登錄,在第一步已經在hbase3對tsdb做了快照snapshot_tsdb_214,先接下來要做的是,將此快照遷移到hbase1服務器上。
1、遷移快照
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot_tsdb_214 -copy-from hdfs://192.168.0.214:9000/hbase -copy-to hdfs://192.168.0.211:9000/hbase -mappers 20 -bandwidth 1024
注:標黃的部分是hadoop的core-site.xml配置文件中,配置的fs.defaultFS的值。
2、解決報錯:Java heap space
【1】報錯日志
2019-07-11 10:09:09,875 INFO [main] snapshot.ExportSnapshot: Copy Snapshot Manifest from hdfs://192.168.0.214:9000/hbase/.hbase-snapshot/snapshot_tsdb_214 to hdfs://192.168.0.211:9000/hbase/.hbase-snapshot/.tmp/snapshot_tsdb_214 2019-07-11 10:09:10,942 INFO [main] client.RMProxy: Connecting to ResourceManager at hbase3/192.168.0.214:8032 2019-07-11 10:09:13,100 INFO [main] snapshot.ExportSnapshot: Loading Snapshot 'snapshot_tsdb_214' hfile list 2019-07-11 10:09:13,516 INFO [main] mapreduce.JobSubmitter: number of splits:5 2019-07-11 10:09:13,798 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1562738985351_0004 2019-07-11 10:09:14,061 INFO [main] impl.YarnClientImpl: Submitted application application_1562738985351_0004 2019-07-11 10:09:14,116 INFO [main] mapreduce.Job: The url to track the job: http://hbase3:8088/proxy/application_1562738985351_0004/ 2019-07-11 10:09:14,116 INFO [main] mapreduce.Job: Running job: job_1562738985351_0004 2019-07-11 10:09:23,331 INFO [main] mapreduce.Job: Job job_1562738985351_0004 running in uber mode : false 2019-07-11 10:09:23,333 INFO [main] mapreduce.Job: map 0% reduce 0% 2019-07-11 10:09:34,529 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000000_0, Status : FAILED Error: Java heap space 2019-07-11 10:09:40,659 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000001_0, Status : FAILED Error: Java heap space 2019-07-11 10:09:44,717 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000002_0, Status : FAILED Error: Java heap space 2019-07-11 10:09:44,719 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000003_0, Status : FAILED Error: Java heap space 2019-07-11 10:09:45,728 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000004_0, Status : FAILED Error: Java heap space 2019-07-11 10:09:52,781 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000000_1, Status : FAILED Error: Java heap space 2019-07-11 10:09:57,830 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000001_1, Status : FAILED Error: Java heap space 2019-07-11 10:10:03,886 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000002_1, Status : FAILED Error: Java heap space 2019-07-11 10:10:03,887 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000003_1, Status : FAILED Error: Java heap space 2019-07-11 10:10:05,910 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000004_1, Status : FAILED Error: Java heap space 2019-07-11 10:10:10,965 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000000_2, Status : FAILED Error: Java heap space 2019-07-11 10:10:16,015 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000001_2, Status : FAILED Error: Java heap space 2019-07-11 10:10:21,068 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000002_2, Status : FAILED Error: Java heap space 2019-07-11 10:10:23,083 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000003_2, Status : FAILED Error: Java heap space 2019-07-11 10:10:24,089 INFO [main] mapreduce.Job: Task Id : attempt_1562738985351_0004_m_000004_2, Status : FAILED Error: Java heap space 2019-07-11 10:10:30,148 INFO [main] mapreduce.Job: map 20% reduce 0% 2019-07-11 10:10:31,156 INFO [main] mapreduce.Job: map 100% reduce 0% 2019-07-11 10:10:31,161 INFO [main] mapreduce.Job: Job job_1562738985351_0004 failed with state FAILED due to: Task failed task_1562738985351_0004_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0 2019-07-11 10:10:31,285 INFO [main] mapreduce.Job: Counters: 12 Job Counters Failed map tasks=16 Killed map tasks=4 Launched map tasks=20 Other local map tasks=20 Total time spent by all maps in occupied slots (ms)=288607 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=288607 Total vcore-milliseconds taken by all map tasks=288607 Total megabyte-milliseconds taken by all map tasks=295533568 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 2019-07-11 10:10:31,288 ERROR [main] snapshot.ExportSnapshot: Snapshot export failed org.apache.hadoop.hbase.snapshot.ExportSnapshotException: Task failed task_1562738985351_0004_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0 at org.apache.hadoop.hbase.snapshot.ExportSnapshot.runCopyJob(ExportSnapshot.java:839) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.doWork(ExportSnapshot.java:1078) at org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:154) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.AbstractHBaseTool.doStaticMain(AbstractHBaseTool.java:280) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:1141)
【2】異常分析
這個報錯信息是jvm的堆內存不夠,便查詢了hadoop運行mapreduce的時候jvm的默認值,得知在 hadoop的mapred-site.xml配置中有一個mapred.child.java.opts的參數,用於jvm運行時heap的運行參數和垃圾回收之類的參數的配置,heap的-Xmx默認值為200m,報錯說明這個值是不夠的,於是解決方案就是加大這個值。
這個值設置多大合適?https://blog.csdn.net/wjlwangluo/article/details/76667999文中說:一般情況下,該值設置為 總內存/並發數量(=核數)
【3】解決方案
# 進入目錄 cd /opt/soft/hadoop/hadoop-3.1.2/etc/hadoop # 進入文件編輯模式
vim mapred-site.xml # 添加以下內容 <property> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> </property>
【4】驗證
# 重啟hadoop、hbase:注意先停habse,並且不要要kill,因為hadoop在不斷的切割,用stop停止,它會記錄下來,下次啟動繼續切割 stop-hbase.sh stop-all.sh start-all.sh start-hbase.sh # 再次遷移快照
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot_tsdb_214 -copy-from hdfs://192.168.0.214:9000/hbase -copy-to hdfs://192.168.0.211:9000/hbase -mappers 20 -bandwidth 1024
查看進度是成功的:
遷移過程中,在hbase1 上會產生 快照的 ./tmp文件
# 查看 hdfs上快照文件 hdfs dfs -ls -R /hbase/.hbase-snapshot
遷移完畢,在hbase1 上會生成了相應的快照
# 進入shell命令
habase shell
# 查看快照
list_snapshot
3、恢復數據
【1】刪除hbase1的原始數據
## 我這里還是直接刪除tsdb表 # disable表 disable 'tsdb' # 查看 drop 'tsdb'
【2】還原數據
clone_snapshot 'snapshot_tsdb_214','tsdb'
【3】驗證
查驗tsdb表存在且有數據,但是grafana查驗無數據
停止hadoop、zookeeper、hbase、opentsdb,刪除了hbase1上的所有logs和data,格式化hdfs之后,再重啟所有程序,再將tsdb相關的4個快照從214遷移過來進行還原,再次刷新就有數據了
三、總結
1、后面經驗證,是opentsdb掛掉了,當opentsdb正常的時候,直接到上面第2步,只需要備份還原tsdb皆可。
2、另外驗證restore_snapshot未成功,所以暫定還原用 clone_snapshot方法。
3、已驗證,將快照傳到目的服務器后,關閉源服務器,也可以進行還原。
4、驗證:先把快照拷貝到本地,然后再上傳到目的服務器的hdfs里面,進行恢復,結果:驗證失敗。