本文記錄Maven構建Hadoop開發環境
軟件環境:Eclipse Kepler x64 & Hadoop 1.2.1 & Maven 3
硬件環境:Centos 6.5 x64
前提已經安裝Maven環境,詳見http://www.cnblogs.com/guarder/p/3734309.html
1、Maven創建項目
使用CMD命令在工作空間手動創建
E:\ws\mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=org.conan.myhadoop.mr -DartifactId=myHadoop -DpackageName=org.conan.myhadoop.mr -Dversion=1.0-SNAPSHOT -DinteractiveMode=false
命令下載依賴工程項目,時間比較漫長。
[INFO] Generating project in Batch mode
發現到這一步卡住不繼續往下,原因是網速或者權限問題,需要加個參數,-DarchetypeCatalog=internal 不要從遠程服務器上取catalog。
E:\ws\mvn archetype:generate -DarchetypeCatalog=interna -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=org.conan.myhadoop.mr -DartifactId=myHadoop -DpackageName=org.conan.myhadoop.mr -Dversion=1.0-SNAPSHOT -DinteractiveMode=false
[INFO] Parameter: groupId, Value: org.conan.myhadoop.mr
[INFO] Parameter: packageName, Value: org.conan.myhadoop.mr
[INFO] Parameter: package, Value: org.conan.myhadoop.mr
[INFO] Parameter: artifactId, Value: myHadoop
[INFO] Parameter: basedir, Value: E:\ws
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] project created from Old (1.x) Archetype in dir: E:\ws\myHadoop
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.303 s
[INFO] Finished at: 2014-05-17T22:21:09+08:00
[INFO] Final Memory: 8M/71M
[INFO] ------------------------------------------------------------------------
2、項目導入Eclipse
選擇導入Maven項目,不是導入Java項目。
3、增加Hadoop依賴
修改pom.xml,增加Hadoop配置。
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.2.1</version> </dependency>
4、下載依賴包
E:\ws\myHadoop\mvn clean install
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19:39 min
[INFO] Finished at: 2014-05-17T23:30:24+08:00
[INFO] Final Memory: 13M/99M
[INFO] ------------------------------------------------------------------------
下載完畢刷新Eclipse中的工程,會自動加載依賴類庫。
5、下載集群配置文件
將集群環境中core-site.xml,hdfs-site.xml,mapred-site.xml保存在本地Eclipse工程src/main/resources/hadoop目錄下面。
6、配置本地Host
修改C:\Windows\System32\drivers\etc,增加配置:
192.168.1.115 master
192.168.1.111 slave1
192.168.1.112 slave2
7、編寫測試程序
package org.conan.myhadoop.mr; import java.io.IOException; import java.util.Iterator; import java.util.StringTokenizer; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; public class WordCount { public static class WordCountMapper extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } } public static class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } result.set(sum); output.collect(key, result); } } public static void main(String[] args) throws Exception { String input = "hdfs://192.168.1.115:9000/user/huser/in"; String output = "hdfs://192.168.1.115:9000/user/huser/output/result"; JobConf conf = new JobConf(WordCount.class); conf.setJobName("WordCount"); conf.addResource("classpath:/hadoop/core-site.xml"); conf.addResource("classpath:/hadoop/hdfs-site.xml"); conf.addResource("classpath:/hadoop/mapred-site.xml"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(WordCountMapper.class); conf.setCombinerClass(WordCountReducer.class); conf.setReducerClass(WordCountReducer.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(input)); FileOutputFormat.setOutputPath(conf, new Path(output)); JobClient.runJob(conf); System.exit(0); } }
8、運行程序
在Eclipse中運行JAVA程序。
2014-5-18 9:46:36 org.apache.hadoop.util.NativeCodeLoader <clinit> 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-5-18 9:46:36 org.apache.hadoop.security.UserGroupInformation doAs 嚴重: PriviledgedActionException as:Administrator cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Administrator\mapred\staging\Administrator1092236978\.staging to 0700 Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Administrator\mapred\staging\Administrator1092236978\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.conan.myhadoop.mr.WordCount.main(WordCount.java:79)
運行報錯,這是文件權限問題,修改Hadoop源代碼src\core\org\apache\hadoop\fs\FileUtil.java文件。
private static void checkReturnValue(boolean rv, File p, FsPermission permission ) throws IOException { // if (!rv) { // throw new IOException("Failed to set permissions of path: " + p + // " to " + // String.format("%04o", permission.toShort())); // } }
注釋上面幾行,將Hadoop重新打包,替換Maven中的Hadoop對應JAR包,繼續運行程序。
2014-5-18 10:03:00 org.apache.hadoop.util.NativeCodeLoader <clinit> 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-5-18 10:03:00 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2014-5-18 10:03:00 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2014-5-18 10:03:00 org.apache.hadoop.io.compress.snappy.LoadSnappy <clinit> 警告: Snappy native library not loaded 2014-5-18 10:03:00 org.apache.hadoop.mapred.FileInputFormat listStatus 信息: Total input paths to process : 3 2014-5-18 10:03:01 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local1959811789_0001 2014-5-18 10:03:01 org.apache.hadoop.mapred.LocalJobRunner$Job run 警告: job_local1959811789_0001 org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=Administrator, access=WRITE, inode="output":huser:supergroup:rwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1459) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:362) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1161) at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:319)
運行報錯,修改HDFS目錄權限和配置文件hdfs-site.xml。
[huser@master hadoop-1.2.1]$ bin/hadoop fs -chmod 777 /user/huser Warning: $HADOOP_HOME is deprecated.
<property> <name>dfs.permissions</name> <value>false</value> <description> If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories. </description> </property>
重啟集群,再運行JAVA程序。
2014-5-18 10:40:22 org.apache.hadoop.util.NativeCodeLoader <clinit> 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-5-18 10:40:22 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2014-5-18 10:40:22 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2014-5-18 10:40:23 org.apache.hadoop.io.compress.snappy.LoadSnappy <clinit> 警告: Snappy native library not loaded 2014-5-18 10:40:23 org.apache.hadoop.mapred.FileInputFormat listStatus 信息: Total input paths to process : 3 2014-5-18 10:40:23 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local985253743_0001 2014-5-18 10:40:23 org.apache.hadoop.mapred.LocalJobRunner$Job run 信息: Waiting for map tasks 2014-5-18 10:40:23 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Starting task: attempt_local985253743_0001_m_000000_0 2014-5-18 10:40:23 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask updateJobWithSplit 信息: Processing split: hdfs://192.168.1.115:9000/user/huser/in/test.txt:0+172 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: io.sort.mb = 100 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: data buffer = 79691776/99614720 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: record buffer = 262144/327680 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local985253743_0001_m_000000_0 is done. And is in the process of commiting 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.115:9000/user/huser/in/test.txt:0+172 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local985253743_0001_m_000000_0' done. 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Finishing task: attempt_local985253743_0001_m_000000_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Starting task: attempt_local985253743_0001_m_000001_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask updateJobWithSplit 信息: Processing split: hdfs://192.168.1.115:9000/user/huser/in/test3.txt:0+20 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: io.sort.mb = 100 2014-5-18 10:40:24 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 33% reduce 0% 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: data buffer = 79691776/99614720 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: record buffer = 262144/327680 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local985253743_0001_m_000001_0 is done. And is in the process of commiting 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.115:9000/user/huser/in/test3.txt:0+20 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local985253743_0001_m_000001_0' done. 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Finishing task: attempt_local985253743_0001_m_000001_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Starting task: attempt_local985253743_0001_m_000002_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask updateJobWithSplit 信息: Processing split: hdfs://192.168.1.115:9000/user/huser/in/test2.txt:0+13 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: io.sort.mb = 100 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: data buffer = 79691776/99614720 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: record buffer = 262144/327680 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local985253743_0001_m_000002_0 is done. And is in the process of commiting 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.115:9000/user/huser/in/test2.txt:0+13 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local985253743_0001_m_000002_0' done. 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Finishing task: attempt_local985253743_0001_m_000002_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job run 信息: Map task executor complete. 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2014-5-18 10:40:24 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 3 sorted segments 2014-5-18 10:40:25 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 0% 2014-5-18 10:40:25 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 3 segments left of total size: 266 bytes 2014-5-18 10:40:25 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2014-5-18 10:40:27 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local985253743_0001_r_000000_0 is done. And is in the process of commiting 2014-5-18 10:40:27 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2014-5-18 10:40:27 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local985253743_0001_r_000000_0 is allowed to commit now 2014-5-18 10:40:27 org.apache.hadoop.mapred.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local985253743_0001_r_000000_0' to hdfs://192.168.1.115:9000/user/huser/output/result 2014-5-18 10:40:27 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2014-5-18 10:40:27 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local985253743_0001_r_000000_0' done. 2014-5-18 10:40:28 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2014-5-18 10:40:29 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local985253743_0001 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Counters: 20 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=205 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=224 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=3410 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=774 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=276151 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=224 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=278 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map input records=8 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=18 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=242 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1128595456 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map input bytes=205 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Combine input records=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=305 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Combine output records=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map output records=9
運行成功,查看輸出。
[huser@master hadoop-1.2.1]$ bin/hadoop fs -ls /user/huser/output/result Warning: $HADOOP_HOME is deprecated. Found 2 items -rw-r--r-- 3 Administrator supergroup 0 2014-04-18 10:11 /user/huser/output/result/_SUCCESS -rw-r--r-- 3 Administrator supergroup 224 2014-04-18 10:11 /user/huser/output/result/part-00000 [huser@master hadoop-1.2.1]$ bin/hadoop fs -cat /user/huser/output/result/part-00000 Warning: $HADOOP_HOME is deprecated. 111111111ccc11111222222222eeeeeeee222222 1 11111111tttttttttttttttttttffffffffffffffffffff 1 222222222222ccc2222222222f 1 2ccc2222222222f 1 33333333333ttttttttttttttttt 1 4fff 1 4fffffffffffffffffffffffff 1 hadoop 1 hello 1