1.下載部分數據。由於實驗就僅僅下載2003年的部分氣象數據
2.通過zcat *gz > sample.txt命令解壓重定向
[hadoop@Master test_data]$ zcat *gz > /home/hadoop/input/sample.txt
3.查看數據格式
4.把文件sample.txt放進hdfs文件系統里
[hadoop@Master input]$ hadoop fs -put /home/hadoop/input/sample.txt /user/hadoop/in/sample.txt
5.Maper : MinTemperatureMapper.java
import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MinTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = -9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ String line = value.toString(); String year = line.substring(0,4); int airTemperature; airTemperature= Integer.parseInt(line.substring(14, 19).trim()); if (airTemperature!= MISSING) { context.write(new Text(year), new IntWritable(airTemperature)); } }
6.Reducer :MinTemperatureReducer.java
import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class MinTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { int minValue= Integer.MAX_VALUE; for (IntWritable value : values) { minValue= Math.min(minValue, value.get()); } context.write(key, new IntWritable(minValue)); } }
7.M-R Job :MinTemperature.java
import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MinTemperature { public static void main(String[] args) throws Exception { if (args.length!= 2) { System.err.println("Usage: MinTemperature<input path> <output path>"); System.exit(-1); } Job job= new Job(); job.setJarByClass(MinTemperature.class); job.setJobName("Min temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MinTemperatureMapper.class); job.setReducerClass(MinTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
8.編譯,壓縮成jar 包
[hadoop@Master myclass]$ javac -classpath /usr/hadoop/hadoop-core-1.2.1.jar MinTemperature*.java
[hadoop@Master myclass]$ jar cvf MinTemperature.jar MinTemperature*.class
added manifest
adding: MinTemperature.class(in = 1417) (out= 799)(deflated 43%)
adding: MinTemperatureMapper.class(in = 1740) (out= 722)(deflated 58%)
adding: MinTemperatureReducer.class(in = 1664) (out= 707)(deflated 57%)
9.運行作業
[hadoop@Master myclass]$ hadoop jar /usr/hadoop/myclass/MinTemperature.jar MinTemperature /user/hadoop/in/sample.txt ./out2
運行報錯。發現報錯,信息例如以下
找了半天原因。發現是沒刪掉class ,程序找不到類。在myclass 文件下刪掉class文件。僅僅保留生成的jar包
[hadoop@Master myclass]$ rm MinTemperature*.class
10.查看結果