軟件包:
hadoop-2.7.2.tar.gz
hadoop-eclipse-plugin-2.7.2.jar
hadoop-common-2.7.1-bin.zip
eclipse
jdk1.8.45
hadoop-2.7.2(linux和windows各一份)
Linux系統(centos或其它)
Hadoop安裝環境
准備環境:
安裝Hadoop,安裝步驟參見Hadoop安裝章節。
安裝eclipse。
搭建過程如下:
1. 將hadoop-eclipse-plugin-2.7.2.jar拷貝到eclipse/dropins目錄下。
2. 解壓hadoop-2.7.2.tar.gz到E盤下。
3. 下載或者編譯hadoop-common-2.7.2(由於hadoop-common-2.7.1可以兼容hadoop-common-2.7.2,因此這里使用hadoop-common-2.7.1),如果想編譯可參考相關文章。
4. 將hadoop-common-2.7.1下的文件全部拷貝到E:\hadoop-2.7.2\bin下面,hadoop.dll在system32下面也要放一個,否則會報下圖的錯誤:
並配置系統環境變量HADOOP_HOME:
5. 啟動eclipse,打開windows->Preferences的Hadoop Map/Reduce中設置安裝目錄:
6. 打開Windows->Open Perspective中的Map/Reduce,在此perspective下進行hadoop程序開發。
7. 打開Windows->Show View中的Map/Reduce Locations,如下圖右鍵選擇New Hadoop location…新建hadoop連接。
8.
9. 新建工程並添加WordCount類:
10. 把log4j.properties和hadoop集群中的core-site.xml加入到classpath中。我的示例工程是maven組織,因此放到src/main/resources目錄。
11. log4j.properties文件內容如下:
log4j.rootLogger=debug,stdout,R log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.File=mapreduce_test.log log4j.appender.R.MaxFileSize=1MB log4j.appender.R.MaxBackupIndex=1 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n log4j.logger.com.codefutures=DEBUG
12. 在HDFS上創建目錄input
hadoop dfs -mkdir input
13. 拷貝本地README.txt到HDFS的input里
hadoop dfs -copyFromLocal /usr/local/hadoop/README.txt input
14. hadoop集群中hdfs-site.xml中要添加下面的配置,否則在eclipse中無法向hdfs中上傳文件:
<property> <name>dfs.permissions</name> <value>false</value> </property>
15. 若碰到Cannot connect to VM com.sun.jdi.connect.TransportTimeoutException,則關閉防火牆。
16. 書寫代碼如下:
package com.hadoop.example; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); System.out.print("--map: " + value.toString() + "\n"); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); System.out.print("--map token: " + word.toString() + "\n"); context.write(word, one); System.out.print("--context: " + word.toString() + "," + one.toString() + "\n"); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); System.out.print("--reduce: " + key.toString() + "," + result.toString() + "\n"); } } public static void main(String[] args) throws Exception { System.setProperty("hadoop.home.dir", "E:\\hadoop-2.7.2"); Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args) .getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setNumReduceTasks(2); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
17. 點擊WordCount.java,右鍵,點擊Run As—>Run Configurations,配置運行參數,即輸入和輸出文件夾,java application里面如果沒有wordcount就先把當前project run--->java applation一下。
hdfs://localhost:9000/user/hadoop/input hdfs://localhost:9000/user/hadoop/output
其中的localhost為hadoop集群的域名,也可以直接使用IP,如果使用域名的話需要編輯C:\Windows\System32\drivers\etc\HOSTS,添加IP與域名的映射關系
18. 運行完成后,查看運行結果:
方法1:
hadoop dfs -ls output
可以看到有兩個輸出結果,_SUCCESS和part-r-00000
執行hadoop dfs -cat output/*
方法2:
