調用Mapreduce，org.apache.hadoop.hbase.mapreduce處理hbase問題

本文轉載自查看原文 2019-03-28 09:17 726 ambari

hbase org.apache.hadoop.hbase.mapreduce.Export
ERROR: Wrong number of arguments: 0
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

  Note: -D properties will be applied to the conf used. 
  For example: 
   -D mapreduce.output.fileoutputformat.compress=true
   -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
   -D mapreduce.output.fileoutputformat.compress.type=BLOCK
  Additionally, the following SCAN properties can be specified
  to control/limit what is exported..
   -D hbase.mapreduce.scan.column.family=<familyName>
   -D hbase.mapreduce.include.deleted.rows=true
   -D hbase.mapreduce.scan.row.start=<ROWSTART>
   -D hbase.mapreduce.scan.row.stop=<ROWSTOP>
For performance consider the following properties:
   -Dhbase.client.scanner.caching=100
   -Dmapreduce.map.speculative=false
   -Dmapreduce.reduce.speculative=false
For tables with very wide rows consider setting the batch size as below:
   -Dhbase.export.scanner.batch=10

Hbase 大表快速count

hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'osinfo_xdja'

(1) 從hbase表導出(# 默認不寫file://的時候就是導出到hdfs上了 )

HBase數據導出到HDFS或者本地文件

hbase org.apache.hadoop.hbase.mapreduce.Export emp file:///Users/a6/Applications/experiment_data/hbase_data/bak

HBase數據導出到本地文件

hbase org.apache.hadoop.hbase.mapreduce.Export emp /hbase/emp_bak

導出時可以限制scanner.batch的大小

如果在hbase中的一個row出現大量的數據，那么導出時會報出ScannerTimeoutException的錯誤。這時候需要設置hbase.export.scaaner.batch 這個參數。這樣導出時的錯誤就可以避免了。

hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.export.scanner.batch=2000  emp file:///Users/a6/Applications/experiment_data/hbase_data/bak

hbase的數據導出的時候，如果不適用compress的選項，數據量的大小可能相差5倍。因此使用compress的選項，備份數據的時候是可以節省不少空間的。
並且本人測試了compress選項的導出速度，和無此選項時差別不大（幾乎無差別）：

hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.export.scanner.batch=2000 -D mapred.output.compress=true

通過添加compress選項，最終導出文件的大小由335字節變成了325字節，
File Output Format Counters File Output Format Counters
Bytes Written=335 Bytes Written=323

導出指定行鍵范圍和列族

在公司准備要更換數據中心，需要將hbase數據庫中的數據進行遷移。雖然進行hbase數據庫數據遷移時，使用其自帶的工具import和export是很方便的。只不過，在遷移大量數據時，可能需要運行很長的時間，甚至可能出錯。這時，是可以通過指定行鍵范圍和列族，來減少單次export工具的運行時間。可以看出，支持的選項有好幾個。假如，我們想導出表test的數據，且只要列族Info，行鍵范圍在000到001之間，可以這樣寫：

這樣就可以了，且數據將會保存在hdfs中。
通過指定列族和行鍵范圍，可以只導出部分數據，避免export啟動的mapreduce任務運行時間過長。也就是可以分多次導出數據。

./hbase org.apache.hadoop.hbase.mapreduce.Export -D hbase.mapreduce.scan.column.family=Info -D hbase.mapreduce.scan.row.start=000 -D hbase.mapreduce.scan.row.stop=001 test /test_datas

導入hbase表(# 默認不寫file://的時候就是導出到hdfs上了 )

將hdfs上的數據導入到備份目標表中
localhost:bin a6$ hbase org.apache.hadoop.hbase.mapreduce.Driver import emp_bak /hbase/emp_bak/*
將本地文件上的數據導入到備份目標表中
hbase org.apache.hadoop.hbase.mapreduce.Driver import emp_bak file:///Users/a6/Applications/experiment_data/hbase_data/bak/*

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 org/apache/hadoop/hbase/mapreduce/TableReducer:Unsupported major.minor version52.0 使用MapReduce處理Hbase數據 hadoop2的mapreduce操作hbase數據十九、Hadoop學記筆記————Hbase和MapReduce Hbase 與mapreduce結合 074 hbase與mapreduce集成 HBase 與 MapReduce 整合 MapReduce操作HBase org.apache.hadoop.hbase.TableExistsException: hbase:namespace hbase出現org.apache.hadoop.hbase.ipc.FailedServerException