1、先看一個標准的hbase作為數據讀取源和輸出源的樣例:
1 2 3 4 5 6 7 8 |
Configuration conf = HBaseConfiguration.create(); Job job = new Job(conf, "job name "); job.setJarByClass(test.class); Scan scan = new Scan(); TableMapReduceUtil.initTableMapperJob(inputTable, scan, mapper.class, Writable.class, Writable.class, job); TableMapReduceUtil.initTableReducerJob(outputTable, reducer.class, job); job.waitForCompletion(true); |
首先創建配置信息和作業對象,設置作業的類。這些和正常的mapreduce一樣,唯一不一樣的就是數據源的說明部分,TableMapReduceUtil的initTableMapperJob和initTableReducerJob方法來實現。
用如上代碼:
數據輸入源是hbase的inputTable表,執行mapper.class進行map過程,輸出的key/value類型是 ImmutableBytesWritable和Put類型,最后一個參數是作業對象。需要指出的是需要聲明一個掃描讀入對象scan,進行表掃描讀取數 據用,其中scan可以配置參數,這里為了例子簡單不再詳述。
數據輸出目標是hbase的outputTable表,輸出執行的reduce過程是reducer.class類,操作的作業目標是job。與map比 缺少輸出類型的標注,因為他們不是必要的,看過源代碼就知道mapreduce的TableRecordWriter中write(key,value) 方法中,key值是沒有用到的。value只能是Put或者Delete兩種類型,write方法會自行判斷並不用用戶指明。
接下來就是mapper類:
1 2 3 4 5 6 7 8 9 10 11 |
public class mapper extends TableMapper<KEYOUT, VALUEOUT> { public void map(Writable key, Writable value, Context context) throws IOException, InterruptedException { //mapper邏輯 context.write(key, value); } } } |
繼承的是hbase中提供的TableMapper類,其實這個類也是繼承的MapReduce類。后邊跟的兩個泛型參數指定類型是mapper輸 出的數據類型,該類型必須繼承自Writable類,例如可能用到的put和delete就可以。需要注意的是要和initTableMapperJob 方法指定的數據類型一直。該過程會自動從指定hbase表內一行一行讀取數據進行處理。
然后reducer類:
1 2 3 4 5 6 7 8 |
public class countUniteRedcuer extends TableReducer<KEYIN, VALUEIN, KEYOUT> { public void reduce(Text key, Iterable<VALUEIN> values, Context context) throws IOException, InterruptedException { //reducer邏輯 context.write(null, put or delete); } } |
reducer繼承的是TableReducer類。后邊指定三個泛型參數,前兩個必須對應map過程的輸出key/value類型,第三個必須是 put或者delete。write的時候可以把key寫null,它是不必要的。這樣reducer輸出的數據會自動插入outputTable指定的 表內。
2、有時候我們需要數據源是hdfs的文本,輸出對象是hbase。這時候變化也很簡單:
1 2 3 4 5 6 7 8 9 10 11 |
Configuration conf = HBaseConfiguration.create(); Job job = new Job(conf, "job name "); job.setJarByClass(test.class); job.setMapperClass(mapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); FileInputFormat.setInputPaths(job, path); TableMapReduceUtil.initTableReducerJob(tableName, reducer.class, job); |
你會發現只需要像平常的mapreduce的作業聲明過程一樣,指定mapper的執行類和輸出key/value類型,指定 FileInputFormat.setInputPaths的數據源路徑,輸出聲明不變。便完成了從hdfs文本讀取數據輸出到hbase的命令聲明過 程。 mapper和reducer如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
public class mapper extends Mapper<LongWritable,Writable,Writable,Writable> { public void map(LongWritable key, Text line, Context context) { //mapper邏輯 context.write(k, one); } } public class redcuer extends TableReducer<KEYIN, VALUEIN, KEYOUT> { public void reduce(Writable key, Iterable<Writable> values, Context context) throws IOException, InterruptedException { //reducer邏輯 context.write(null, put or delete); } } |
mapper還依舊繼承原來的MapReduce類中的Mapper即可。同樣注意這前后數據類型的key/value一直性。
3、最后就是從hbase中的表作為數據源讀取,hdfs作為數據輸出,簡單的如下:
1 2 3 4 5 6 7 8 9 10 |
Configuration conf = HBaseConfiguration.create(); Job job = new Job(conf, "job name "); job.setJarByClass(test.class); Scan scan = new Scan(); TableMapReduceUtil.initTableMapperJob(inputTable, scan, mapper.class, Writable.class, Writable.class, job); job.setOutputKeyClass(Writable.class); job.setOutputValueClass(Writable.class); FileOutputFormat.setOutputPath(job, Path); job.waitForCompletion(true); |
mapper和reducer簡單如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
public class mapper extends TableMapper<KEYOUT, VALUEOUT>{ public void map(Writable key, Writable value, Context context) throws IOException, InterruptedException { //mapper邏輯 context.write(key, value); } } } public class reducer extends Reducer<Writable,Writable,Writable,Writable> { public void reducer(Writable key, Writable value, Context context) throws IOException, InterruptedException { //reducer邏輯 context.write(key, value); } } } |
最后說一下TableMapper和TableReducer的本質,其實這倆類就是為了簡化一下書寫代碼,因為傳入的4個泛型參數里都會有固定的參數類型,所以是Mapper和Reducer的簡化版本,本質他們沒有任何區別。源碼如下:
1 2 3 4 5 6 7 |
public abstract class TableMapper<KEYOUT, VALUEOUT> extends Mapper<ImmutableBytesWritable, Result, KEYOUT, VALUEOUT> { } public abstract class TableReducer<KEYIN, VALUEIN, KEYOUT> extends Reducer<KEYIN, VALUEIN, KEYOUT, Writable> { } |
好了,可以去寫第一個wordcount的hbase mapreduce程序了。