hbase--遷移數據之snapshot (一)


hbase--遷移數據之snapshot  (一)

1.目標

  遷移測試環境的四張表到線上環境,名稱發生變動。要求phoenix可以訪問

2.實現原理

2.1  需要配置參數  hbase-site.xml

<property>    
<name>hbase.snapshot.enabled</name>    
<value>true</value>
</property>

2.2  實現過程

相關命令

list_snapshot  顯示備份
delete_snapshot  刪除備份
clone_snapshot  克隆到一個新表,復用snapshot指定的數據文件
restore_snapshot 恢復備份,替換當前表的schema 和 數據
snapshot 制作備份文件
常規的方法如下:
先禁用舊表
disable 'temp:old_table'
再將舊表備份成快照
snapshot 'temp:old_table', 'tableSnapshot'
刪除舊表數據
drop 'temp:old_table'
恢復舊表
restore_snapshot 'tableSnapshot'

那么跨集群怎么操作呢?
將快照數據從舊集群遷移到新集群
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 快照名 -copy-to hdfs://new-nn/hbase -copy-from hdfs://old-nn/hbase
new-nn和old-nn是兩個集群上active namenode的IP,
如果hdfs的端口不是默認的,則還需加上端口
-copy-to是目錄集群的hbase主目錄, -copy-from是源集群的hbase主目錄。
需要保證新舊集群中的所有節點能夠相互連的通,這個命令的本質是一個MR任務

接下來就是將遠程拷貝過來的快照文件,進行restore_snapshot ,然后創建phoenix的表進行映射,就可以在phoenix進行訪問了。

但是 restore_snapshot 操作進行覆蓋會造成數據的丟失,所以一般是 clone_snapshot 到一個新表。然后自行的進行數據整合。

3.實現步驟

簡單展示下snapshot 指令
hbase(main):020:0> snapshot 'TEST','TEST_snapshot'     #快照表TEST
hbase(main):021:0> list_snapshots     #當前快照列表信息 SNAPSHOT TABLE + CREATION TIME TEST_snapshot TEST (2021-07-30 15:13:47 +0800) hbase(main):022:0> clone_snapshot 'TEST_snapshot','TESTAA'   #克隆到新表 TESTAA (新表需要不存在,否則報錯) hbase(main):023:0> scan 'TESTAA'        #查看下數據 ROW COLUMN+CELL 0 column=INFO:BROWER, timestamp=1627458022126, value=ie 0 column=INFO:DATA, timestamp=1627458022126, value=20190520164020 0 column=INFO:IP, timestamp=1627626495290, value=192.168.168.170 0 column=INFO:_0, timestamp=1627626495290, value=x 1 column=INFO:BROWER, timestamp=1627458022126, value=chorm 1 column=INFO:DATA, timestamp=1627458022126, value=20190520164020 1 column=INFO:IP, timestamp=1627458022126, value=139.203.75.112 1 column=INFO:_0, timestamp=1627458022126, value= 2 column=INFO:BROWER, timestamp=1627458022126, value=chorm hbase(main):024:0> disable "TEST"  刪除TEST表
hbase(main):025:0> drop "TEST" (如果不刪除表,單純的disable,然后restore操作,也是可以的。但后續需要enable 該表。) (同時,做了實驗,實驗結果:數據會覆蓋) 

hbase(main):029:0> restore_snapshot 'TEST_snapshot'
hbase(main):031:0> scan 'TEST'
ROW                                        COLUMN+CELL                                                                                                              
0                                         column=INFO:BROWER, timestamp=1627458022126, value=ie                                                                    
0                                         column=INFO:DATA, timestamp=1627458022126, value=20190520164020                                                          
0                                         column=INFO:IP, timestamp=1627626495290, value=192.168.168.170                                                           
0                                         column=INFO:_0, timestamp=1627626495290, value=x                                                                         
1                                         column=INFO:BROWER, timestamp=1627458022126, value=chorm                                                                 
1                                         column=INFO:DATA, timestamp=1627458022126, value=20190520164020                                                          
1                                         column=INFO:IP, timestamp=1627458022126, value=139.203.75.112                                                            
1                                         column=INFO:_0, timestamp=1627458022126, value=                                                                          
2                                         column=INFO:BROWER, timestamp=1627458022126, value=chorm    

前戲結束正式開始正文

[root@master bin]# ./hbase shell
hbase(main):001:0> list 在hbase shell中執行如下: snapshot "ECOMMERCE_ANALYSIS_SHOW:STORE_DSR_HISTORY","STORE_DSR_HISTORY_snapshot" snapshot "ECOMMERCE_ANALYSIS_SHOW:STORE_EVALUATION_HISTORY","STORE_EVALUATION_HISTORY_snapshot" snapshot "ECOMMERCE_ANALYSIS_SHOW:STORE_INFO_HISTORY","STORE_INFO_HISTORY_snapshot" snapshot "ECOMMERCE_ANALYSIS_SHOW:STORE_ITEMS_HISTORY","STORE_ITEMS_HISTORY_snapshot" snapshot "ECOMMERCE_ANALYSIS_SHOW:STORE_SKU_HISTORY","STORE_SKU_HISTORY_snapshot" hbase(main):007:0> list_snapshots SNAPSHOT TABLE + CREATION TIME STORE_DSR_HISTORY_snapshot ECOMMERCE_ANALYSIS_SHOW:STORE_DSR_HISTORY (2022-01-18 11:17:26 +0800) STORE_EVALUATION_HISTORY_snapshot ECOMMERCE_ANALYSIS_SHOW:STORE_EVALUATION_HISTORY (2022-01-18 11:19:02 +0800) STORE_INFO_HISTORY_snapshot ECOMMERCE_ANALYSIS_SHOW:STORE_INFO_HISTORY (2022-01-18 11:19:02 +0800) STORE_ITEMS_HISTORY_snapshot ECOMMERCE_ANALYSIS_SHOW:STORE_ITEMS_HISTORY (2022-01-18 11:19:03 +0800) STORE_SKU_HISTORY_snapshot ECOMMERCE_ANALYSIS_SHOW:STORE_SKU_HISTORY (2022-01-18 11:19:05 +0800) 5 row(s) Took 0.0437 seconds

跨集群拷貝快照

cd /usr/local/hbase
./hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dmapreduce.map.memory.mb=4096 -Dmapreduce.map.java.opts=-Xmx3686m -overwrite -snapshot STORE_DSR_HISTORY_snapshot -copy-from hdfs://192.168.88.126:9820/hbase -copy-to hdfs://192.168.188.10:9820/hbase -mappers 10 -bandwidth 30
./hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dmapreduce.map.memory.mb=4096 -Dmapreduce.map.java.opts=-Xmx3686m -overwrite -snapshot STORE_EVALUATION_HISTORY_snapshot -copy-from hdfs://192.168.88.126:9820/hbase -copy-to hdfs://192.168.188.10:9820/hbase -mappers 10 -bandwidth 30
./hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dmapreduce.map.memory.mb=4096 -Dmapreduce.map.java.opts=-Xmx3686m -overwrite -snapshot STORE_INFO_HISTORY_snapshot -copy-from hdfs://192.168.88.126:9820/hbase -copy-to hdfs://192.168.188.10:9820/hbase -mappers 10 -bandwidth 30
./hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dmapreduce.map.memory.mb=4096 -Dmapreduce.map.java.opts=-Xmx3686m -overwrite -snapshot STORE_ITEMS_HISTORY_snapshot -copy-from hdfs://192.168.88.126:9820/hbase -copy-to hdfs://192.168.188.10:9820/hbase -mappers 10 -bandwidth 30
./hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dmapreduce.map.memory.mb=4096 -Dmapreduce.map.java.opts=-Xmx3686m -overwrite -snapshot STORE_SKU_HISTORY_snapshot -copy-from hdfs://192.168.88.126:9820/hbase -copy-to hdfs://192.168.188.10:9820/hbase -mappers 10 -bandwidth 30

查看拷貝過來的快照

新的環境hbase shell下查看  或者直接hdfs去查看 。hdfs的目錄:/hbase/.hbase-snapshot

hbase(main):007:0> list_snapshots
SNAPSHOT                                    TABLE + CREATION TIME                                                                                                          
STORE_DSR_HISTORY_snapshot                 ECOMMERCE_ANALYSIS_SHOW:STORE_DSR_HISTORY (2022-01-18 11:17:26 +0800)                                                          
STORE_EVALUATION_HISTORY_snapshot          ECOMMERCE_ANALYSIS_SHOW:STORE_EVALUATION_HISTORY (2022-01-18 11:19:02 +0800)                                                   
STORE_INFO_HISTORY_snapshot                ECOMMERCE_ANALYSIS_SHOW:STORE_INFO_HISTORY (2022-01-18 11:19:02 +0800)                                                         
STORE_ITEMS_HISTORY_snapshot               ECOMMERCE_ANALYSIS_SHOW:STORE_ITEMS_HISTORY (2022-01-18 11:19:03 +0800)                                                        
STORE_SKU_HISTORY_snapshot                 ECOMMERCE_ANALYSIS_SHOW:STORE_SKU_HISTORY (2022-01-18 11:19:05 +0800)                                                          
5 row(s)
Took 0.0437 seconds

新的集群創建新的庫,和原來不一樣,這邊采用clone

創建庫
create_namespace "ECOMMERCE_ANALYSIS_LDH"
恢復快照到表 clone_snapshot 'STORE_DSR_HISTORY_snapshot',"ECOMMERCE_ANALYSIS_LDH:STORE_DSR_HISTORY" clone_snapshot 'STORE_EVALUATION_HISTORY_snapshot',"ECOMMERCE_ANALYSIS_LDH:STORE_EVALUATION_HISTORY" clone_snapshot 'STORE_INFO_HISTORY_snapshot',"ECOMMERCE_ANALYSIS_LDH:STORE_INFO_HISTORY" clone_snapshot 'STORE_ITEMS_HISTORY_snapshot',"ECOMMERCE_ANALYSIS_LDH:STORE_ITEMS_HISTORY" clone_snapshot 'STORE_SKU_HISTORY_snapshot',"ECOMMERCE_ANALYSIS_LDH:STORE_SKU_HISTORY"

進入phoenix,創建表,進行關聯

CREATE TABLE "ECOMMERCE_ANALYSIS_LDH"."STORE_DSR_HISTORY" ("ID" BIGINT NOT NULL,"SHOP_ID" BIGINT,"DATA_JSON" VARCHAR,"START_DATE" DATE,"END_DATE" DATE,"CREATE_DATE" DATE CONSTRAINT PK PRIMARY KEY (ID))column_encoded_bytes=0;
CREATE TABLE "ECOMMERCE_ANALYSIS_LDH"."STORE_EVALUATION_HISTORY" ("ID" BIGINT NOT NULL,"SHOP_ID" BIGINT,"ITEM_ID" BIGINT,"DATA_JSON" VARCHAR,"CREATE_DATE" DATE CONSTRAINT PK PRIMARY KEY (ID))column_encoded_bytes=0;
CREATE TABLE "ECOMMERCE_ANALYSIS_LDH"."STORE_INFO_HISTORY" ("ID" BIGINT NOT NULL,"SHOP_ID" BIGINT,"DATA_JSON" VARCHAR,"CREATE_DATE" DATE CONSTRAINT PK PRIMARY KEY (ID))column_encoded_bytes=0;
CREATE TABLE "ECOMMERCE_ANALYSIS_LDH"."STORE_ITEMS_HISTORY" ("ID" BIGINT NOT NULL,"SHOP_ID" BIGINT,"DATA_JSON" VARCHAR,"CREATE_DATE" DATE,"END_DATE" DATE CONSTRAINT PK PRIMARY KEY (ID))column_encoded_bytes=0;
CREATE TABLE "ECOMMERCE_ANALYSIS_LDH"."STORE_SKU_HISTORY" ("ID" BIGINT NOT NULL,"SHOP_ID" BIGINT,"ITEM_ID" BIGINT,"DATA_JSON" VARCHAR,"CREATE_DATE" DATE,"END_DATE" DATE,"START_DATE" DATE CONSTRAINT PK PRIMARY KEY (ID))column_encoded_bytes=0;

 

4.實現結果展示

 查看數據

select * from "ECOMMERCE_ANALYSIS_LDH"."STORE_DSR_HISTORY" limit 1;
select * from "ECOMMERCE_ANALYSIS_LDH"."STORE_EVALUATION_HISTORY" limit 1;
select * from "ECOMMERCE_ANALYSIS_LDH"."STORE_INFO_HISTORY" limit 1;
select * from "ECOMMERCE_ANALYSIS_LDH"."STORE_ITEMS_HISTORY" limit 1;
select * from "ECOMMERCE_ANALYSIS_LDH"."STORE_SKU_HISTORY" limit 1;

5.問題探索

(1) 查詢phoenix  

select * 時會出現無法查詢所有字段的情況。只是phoenix在終端界面顯示問題而言,不用擔心。不過可以添加字段。

0: jdbc:phoenix:zk1> select ID,SHOP_ID,ITEM_ID,CREATE_DATE,END_DATE,START_DATE from "ECOMMERCE_ANALYSIS_LDH"."STORE_SKU_HISTORY" limit 1;
+---------------------+-----------+--------------+--------------------------+--------------------------+--------------------------+
|         ID          |  SHOP_ID  |   ITEM_ID    |       CREATE_DATE        |         END_DATE         |        START_DATE        |
+---------------------+-----------+--------------+--------------------------+--------------------------+--------------------------+
| 374237425255313408  | 63694215  | 25115592332  | 2021-10-29 00:00:00.000  | 2021-10-28 00:00:00.000  | 2021-09-29 00:00:00.000  |
+---------------------+-----------+--------------+--------------------------+--------------------------+--------------------------+
1 row selected (0.044 seconds)

(2)phoenix表關聯,查看不到導入的數據,但是新增數據可以看到。

4.x之后的phoenix的版本做映射的時候需要參數    column_encoded_bytes=0

 

1、如果只做查詢,強烈建議使用 phoenix 視圖方式映射,刪除視圖不影響 hbase 源數據,語法如下:
create view if not exists TEST(ID varchar primary key, INFO.DATA varchar, INFO.IP varchar, INFO.BROWER varchar)column_encoded_bytes=0;

2、必須要表映射,需要禁用列映射規則(會降低查詢性能),如下:
create table if not exists TEST(ID varchar primary key, INFO.DATA varchar, INFO.IP varchar, INFO.BROWER varchar)column_encoded_bytes=0;

(3)那么錯誤的映射如何解決呢?刪除Phoenix映射表的同時不刪除hbase原表

(a)刪除phoenix記錄
              DELETE from SYSTEM.CATALOG where TABLE_NAME ='TEST';
(b)更新hbase shell操作
hbase(main):003:0> describe 'TEST'
=> {coprocessor$1 => 
……
, coprocessor$5 =>
hbase(main):004:0> disable 'TEST' 
執行  
alter 'TEST',METHOD=>'table_att_unset',NAME=>'coprocessor$1' alter 'TEST',METHOD=>'table_att_unset',NAME=>'coprocessor$2' alter 'TEST',METHOD=>'table_att_unset',NAME=>'coprocessor$3' alter 'TEST',METHOD=>'table_att_unset',NAME=>'coprocessor$4' alter 'TEST',METHOD=>'table_att_unset',NAME=>'coprocessor$5'

hbase(main):009:0> alter 'TEST',METHOD=>'table_att_unset',NAME=>'coprocessor$5'
Updating all regions with the new schema...
All regions updated.
Done.
Took 1.4489 seconds
hbase(main):010:0> enable 'TEST' 
hbase(main):011:0> scan 'TEST'

重啟hbase phoenix
重新映射表--phoenix
create table if not exists TEST("ID" varchar primary key,"INFO"."DATA" varchar,"INFO"."IP" varchar,"INFO"."BROWER" varchar)column_encoded_bytes=0;
0: jdbc:phoenix:master> select * from "TEST";
+-----+-----------------+-----------------+----------+
| ID  |      DATA       |       IP        |  BROWER  |
+-----+-----------------+-----------------+----------+
| 0   | 20190520164020  | 171.15.136.201  | ie       |
| 1   | 20190520164020  | 139.203.75.112  | chorm    |
| 2   | 20190520164020  | 121.77.62.91    | chorm    |
| 3   | 20190520164020  | 139.213.175.14  | ie       |
| 4   | 20190520164020  | 210.45.253.237  | chorm    |
| 5   | 20190520164020  | 171.12.45.87    | chrome   |
| 6   | 20190520164020  | 139.200.93.224  | firefox  |
| 7   | 20190520164020  | 222.61.160.72   | chorm    |
+-----+-----------------+-----------------+----------+

(4)數據量很大,不想影響業務

快照

導出操作不會對Region server造成額外的負擔。因為它工作在HDFS層級,你僅需指定HDFS的位置即可。

也可以手動傳輸到指定位置。hdfs的目錄:/hbase/.hbase-snapshot

或者跨集群拷貝時,限制mapreduce任務的資源

ExportSnapshot命令也可以限定mapper個數,如下:
$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_table_snapshot -copy-to hdfs://follow_cluster_namenode:9820/hbase -mapers n
還可以限定拷貝的流量,如下:將拷貝的流量限定為200M。
$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_table_snapshot -copy-to hdfs://follow_cluster_namenode:9820/hbase -mapers n -bandwidth 200

(5)表很大,遷移時間很長  

hbase.master.hfilecleaner.ttl

修改下參數就行

遷移小表(耗時幾分鍾內)時沒有遇到過錯誤,但遷移大表(耗時超過30分鍾)時,一直報錯“Can't find hfile”。

CleanerChore線程清理archive目錄是通過配置項hbase.master.hfilecleaner.ttl控制的,默認是5分鍾(單位:毫秒),大表的文件遷移遠超5分鍾。將hbase.master.hfilecleaner.ttl調到兩小時的足夠大值后,問題消失。
hbase.master.hfilecleaner.ttl

 

 

 

---------------------------------------采遍所有的坑,讓別人無坑可踩---------------------------------------

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM