如圖所示:有三個ReducerTask,因此處理完成之后的數據存儲在三個文件中;
默認情況下,numReduceTasks的數量為1,前面做的實驗中,輸出數據都是在一個文件中。通過
自定義myPatitioner類,可以把
ruduce
處理后的數據分類匯總,這里
MyPartitioner是Partitioner的基類,如果需要定制partitioner也需要繼承該類。
HashPartitioner是mapreduce的默認partitioner。計算方法是
which reducer=(key.hashCode() & Integer.MAX_VALUE) % numReduceTasks,得到當前的目的reducer。
實驗內容,在上一個自定義排序的基礎上,把正方形和長方形分別進行排序,即設置兩個ReducerTask任務,通過自定義MyPartitioner實現。
package com.nwpulisz;
import java.io.IOException;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Reducer;
public class SelfDefineSort {
/**
* @param args
* @author nwpulisz
* @date 2016.4.1
*/
static final String INPUT_PATH="hdfs://192.168.255.132:9000/input";
static final String OUTPUT_PATH="hdfs://192.168.255.132:9000/output";
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
Path outPut_path= new Path(OUTPUT_PATH);
Job job = new Job(conf, "SelfDefineSort");
//如果輸出路徑是存在的,則提前刪除輸出路徑
FileSystem fileSystem = FileSystem.get(new URI(OUTPUT_PATH), conf);
if(fileSystem.exists(outPut_path))
{
fileSystem.delete(outPut_path,true);
}
job.setJarByClass(RectangleWritable.class); //注意不能少setJarByClass,要不出現報錯,源碼中的解釋。
//Set the Jar by finding where a given class came from.
FileInputFormat.setInputPaths(job, INPUT_PATH);
FileOutputFormat.setOutputPath(job, outPut_path);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setMapOutputKeyClass(RectangleWritable.class);
job.setMapOutputValueClass(NullWritable.class);
job.setPartitionerClass(MyPatitioner.class); //自定義myPatitioner類,把ruduce處理后的數據分類匯總;
job.setNumReduceTasks(2); //設置ReduceTask數量為2;
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
job.waitForCompletion(true);
}
static class MyMapper extends Mapper<LongWritable, Text, RectangleWritable, NullWritable>{
protected void map(LongWritable k1, Text v1,
Context context) throws IOException, InterruptedException {
String[] splits = v1.toString().split("\t");
RectangleWritable k2 = new RectangleWritable(Integer.parseInt(splits[0]),
Integer.parseInt(splits[1]));
context.write(k2,NullWritable.get());
}
}
static class MyReducer extends Reducer<RectangleWritable, NullWritable,
IntWritable, IntWritable>{
protected void reduce(RectangleWritable k2,
Iterable<NullWritable> v2s,
Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
context.write(new IntWritable(k2.getLength()), new IntWritable(k2.getWidth()));
}
}
}
class MyPatitioner extends Partitioner<RectangleWritable, NullWritable>{
@Override
public int getPartition(RectangleWritable k2, NullWritable v2, int numPartitions) {
// TODO Auto-generated method stub
if (k2.getLength() == k2.getWidth()) { //根據長方形和正方形進行分類
return 0;
}else {
return 1;
}
}
}
其中的
RectangleWritable類與上一節中定義的相同。
此處,在eclipse中直接運行該代碼,會顯示錯誤,如下圖:
可能是因為hadoop版本的原因,因此需要將源碼文件打成jar包,在hadoop服務器上運行,jar中包括內容為:
在hadoop上運行 hadoop jar SelfDefinePartitioner.jar(jar包名,自定義)
運行結果如下圖所示:
開始運行:
輸出結果:
