本文地址:http://www.cnblogs.com/myresearch/p/mapreduce-compile-jar-run.html,轉載請注明源地址。
對於如何編譯WordCount.java,對於0.20 等舊版本版本的做法很常見,具體如下:
javac -classpath /usr/local/hadoop/hadoop-1.0.1/hadoop-core-1.0.1.jar WordCount.java
但較新的 2.X 版本中,已經沒有 hadoop-core*.jar 這個文件,因此編輯和打包自己的MapReduce程序與舊版本有所不同。
本文以 Hadoop 2.6環境下的WordCount實例來介紹 2.x 版本中如何編輯自己的MapReduce程序。
Hadoop 2.x 版本中的依賴 jar
Hadoop 2.x 版本中jar不再集中在一個 hadoop-core*.jar 中,而是分成多個 jar,如運行WordCount實例需要如下三個 jar:
-
$HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar
-
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar
-
$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
編譯、打包 Hadoop MapReduce 程序
將上述 jar 添加至 classpath 路徑:
hadoop@ubuntu:~$ export CLASSPATH="$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/lib/commons-cli-1.2.jar:$CLASSPATH"
接着就可以編譯 WordCount.java 了(使用的是 2.6.0源碼中的 WordCount.java)
文件位於/hadoop-2.6.0-src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples 中,
javac WordCount.java
編譯時會有警告,可以忽略。編譯后可以看到生成了幾個.class文件。
/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
1 warning
hadoop@ubuntu:~/opt/code$ ls
WordCount.class WordCount.java WordCount$MapClass.class WordCount$Reduce.class
接着把 .class 文件打包成 jar,才能在 Hadoop 中運行:
hadoop@ubuntu:~/opt/code$ jar -cvf WordCount.jar ./WordCount*.class
added manifest
adding: WordCount.class(in = 3363) (out= 1687)(deflated 49%)
adding: WordCount$MapClass.class(in = 1978) (out= 800)(deflated 59%)
adding: WordCount$Reduce.class(in = 1641) (out= 645)(deflated 60%)
創建HDFS所需的輸入文件夾:
hadoop@ubuntu:~/opt/code$ mkdir input
hadoop@ubuntu:~/opt/code$ echo "Hello Hadoop Goodbye Hadoop" > ./input/file1
hadoop@ubuntu:~/opt/code$ echo "Hello World Bye World" > ./input/file2
hadoop@ubuntu:~/opt/code$ ls ./input
file1 file2
運行我們的wordcount程序:
hadoop@ubuntu:~$ cd ~/opt/code
hadoop@ubuntu:~/opt/code$ ~/opt/hadoop-2.6.0/bin/hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output
程序運行完之后,檢查我們的輸出結果:
hadoop@ubuntu:~/opt/code$ ls ./output part-r-00000 _SUCCESS hadoop@ubuntu:~/opt/code$ cat ./output/part-r-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
PS:WordCount.java 源代碼如下:
package org.apache.hadoop.mapred; import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; /** * This is an example Hadoop Map/Reduce application. * It reads the text input files, breaks each line into words * and counts them. The output is a locally sorted list of words and the * count of how often they occurred. * * To run: bin/hadoop jar build/hadoop-examples.jar wordcount * [-m <i>maps</i>] [-r <i>reduces</i>] <i>in-dir</i> <i>out-dir</i> */ public class WordCount extends Configured implements Tool { /** * Counts the words in each line. * For each line of input, break the line into words and emit them as * (<b>word</b>, <b>1</b>). */ public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } } /** * A reducer class that just emits the sum of the input values. */ public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } static int printUsage() { System.out.println("wordcount [-m <maps>] [-r <reduces>] <input> <output>"); ToolRunner.printGenericCommandUsage(System.out); return -1; } /** * The main driver for word count map/reduce program. * Invoke this method to submit the map/reduce job. * @throws IOException When there is communication problems with the * job tracker. */ public int run(String[] args) throws Exception { JobConf conf = new JobConf(getConf(), WordCount.class); conf.setJobName("wordcount"); // the keys are words (strings) conf.setOutputKeyClass(Text.class); // the values are counts (ints) conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); List<String> other_args = new ArrayList<String>(); for(int i=0; i < args.length; ++i) { try { if ("-m".equals(args[i])) { conf.setNumMapTasks(Integer.parseInt(args[++i])); } else if ("-r".equals(args[i])) { conf.setNumReduceTasks(Integer.parseInt(args[++i])); } else { other_args.add(args[i]); } } catch (NumberFormatException except) { System.out.println("ERROR: Integer expected instead of " + args[i]); return printUsage(); } catch (ArrayIndexOutOfBoundsException except) { System.out.println("ERROR: Required parameter missing from " + args[i-1]); return printUsage(); } } // Make sure there are exactly 2 parameters left. if (other_args.size() != 2) { System.out.println("ERROR: Wrong number of parameters: " + other_args.size() + " instead of 2."); return printUsage(); } FileInputFormat.setInputPaths(conf, other_args.get(0)); FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1))); JobClient.runJob(conf); return 0; } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new WordCount(), args); System.exit(res); } }
參考資料
http://www.powerxing.com/hadoop-build-project-by-shell/
http://blog.sina.com.cn/s/blog_68cceb610101r6tg.html
http://www.cppblog.com/humanchao/archive/2014/05/27/207118.aspx