一、前言
環境:
- 系統:centos6.5
- hadoop版本:Apache hadoop2.7.3(Windows和centos都是同一個)
- eclipse版本:4.2.0(juno版本,windows)
- ant版本:ant 1.7.1(windows)
- java版本:1.8.0_05(windows)
我是在虛擬機中安裝的系統,具體的安裝和配置參考:Hadoop單機偽分布部署。
二、制作插件
1. 下載hadoop2x-eclipse-plugin-master.zip
在github下載:https://github.com/winghc/hadoop2x-eclipse-plugin。
2. 修改配置
原來的配置是編譯2.4.1的,為了編譯成2.7.3版本,首先需要改2個文件。
1) 修改hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml
在第82行 找到 <target name="jar" depends="compile" unless="skip.contrib">標簽,找到下面的子標簽(在127行):
<copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}.jar" todir="${build.dir}/lib" verbose="true"/>
刪掉這行,然后在下面增加多三行信息:
<copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}-incubating.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar" todir="${build.dir}/lib" verbose="true"/>
然后找到標簽<attribute name="Bundle-ClassPath",在下面的標簽中,刪除lib/htrace-core-${htrace.version}.jar這行,同樣加多三行:
lib/servlet-api-${servlet-api.version}.jar,
lib/commons-io-${commons-io.version}.jar,
lib/htrace-core-${htrace.version}-incubating.jar"
修改完的文件:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <project default="jar" name="eclipse-plugin"> <import file="../build-contrib.xml"/> <path id="eclipse-sdk-jars"> <fileset dir="${eclipse.home}/plugins/"> <include name="org.eclipse.ui*.jar"/> <include name="org.eclipse.jdt*.jar"/> <include name="org.eclipse.core*.jar"/> <include name="org.eclipse.equinox*.jar"/> <include name="org.eclipse.debug*.jar"/> <include name="org.eclipse.osgi*.jar"/> <include name="org.eclipse.swt*.jar"/> <include name="org.eclipse.jface*.jar"/> <include name="org.eclipse.team.cvs.ssh2*.jar"/> <include name="com.jcraft.jsch*.jar"/> </fileset> </path> <path id="hadoop-sdk-jars"> <fileset dir="${hadoop.home}/share/hadoop/mapreduce"> <include name="hadoop*.jar"/> </fileset> <fileset dir="${hadoop.home}/share/hadoop/hdfs"> <include name="hadoop*.jar"/> </fileset> <fileset dir="${hadoop.home}/share/hadoop/common"> <include name="hadoop*.jar"/> </fileset> </path> <!-- Override classpath to include Eclipse SDK jars --> <path id="classpath"> <pathelement location="${build.classes}"/> <!--pathelement location="${hadoop.root}/build/classes"/--> <path refid="eclipse-sdk-jars"/> <path refid="hadoop-sdk-jars"/> </path> <!-- Skip building if eclipse.home is unset. --> <target name="check-contrib" unless="eclipse.home"> <property name="skip.contrib" value="yes"/> <echo message="eclipse.home unset: skipping eclipse plugin"/> </target> <target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib"> <echo message="contrib: ${name}"/> <javac encoding="${build.encoding}" srcdir="${src.dir}" includes="**/*.java" destdir="${build.classes}" debug="${javac.debug}" deprecation="${javac.deprecation}"> <classpath refid="classpath"/> </javac> </target> <!-- Override jar target to specify manifest --> <target name="jar" depends="compile" unless="skip.contrib"> <mkdir dir="${build.dir}/lib"/> <copy todir="${build.dir}/lib/" verbose="true"> <fileset dir="${hadoop.home}/share/hadoop/mapreduce"> <include name="hadoop*.jar"/> </fileset> </copy> <copy todir="${build.dir}/lib/" verbose="true"> <fileset dir="${hadoop.home}/share/hadoop/common"> <include name="hadoop*.jar"/> </fileset> </copy> <copy todir="${build.dir}/lib/" verbose="true"> <fileset dir="${hadoop.home}/share/hadoop/hdfs"> <include name="hadoop*.jar"/> </fileset> </copy> <copy todir="${build.dir}/lib/" verbose="true"> <fileset dir="${hadoop.home}/share/hadoop/yarn"> <include name="hadoop*.jar"/> </fileset> </copy> <copy todir="${build.dir}/classes" verbose="true"> <fileset dir="${root}/src/java"> <include name="*.xml"/> </fileset> </copy> <copy file="${hadoop.home}/share/hadoop/common/lib/protobuf-java-${protobuf.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/log4j-${log4j.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/commons-configuration-${commons-configuration.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/commons-lang-${commons-lang.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-core-asl-${jackson.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-mapper-asl-${jackson.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-log4j12-${slf4j-log4j12.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-api-${slf4j-api.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/guava-${guava.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/hadoop-auth-${hadoop.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/netty-${netty.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}-incubating.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar" todir="${build.dir}/lib" verbose="true"/> <jar jarfile="${build.dir}/hadoop-${name}-${hadoop.version}.jar" manifest="${root}/META-INF/MANIFEST.MF"> <manifest> <attribute name="Bundle-ClassPath" value="classes/, lib/hadoop-mapreduce-client-core-${hadoop.version}.jar, lib/hadoop-mapreduce-client-common-${hadoop.version}.jar, lib/hadoop-mapreduce-client-jobclient-${hadoop.version}.jar, lib/hadoop-auth-${hadoop.version}.jar, lib/hadoop-common-${hadoop.version}.jar, lib/hadoop-hdfs-${hadoop.version}.jar, lib/protobuf-java-${protobuf.version}.jar, lib/log4j-${log4j.version}.jar, lib/commons-cli-${commons-cli.version}.jar, lib/commons-configuration-${commons-configuration.version}.jar, lib/commons-httpclient-${commons-httpclient.version}.jar, lib/commons-lang-${commons-lang.version}.jar, lib/commons-collections-${commons-collections.version}.jar, lib/jackson-core-asl-${jackson.version}.jar, lib/jackson-mapper-asl-${jackson.version}.jar, lib/slf4j-log4j12-${slf4j-log4j12.version}.jar, lib/slf4j-api-${slf4j-api.version}.jar, lib/guava-${guava.version}.jar, lib/netty-${netty.version}.jar, lib/servlet-api-${servlet-api.version}.jar, lib/commons-io-${commons-io.version}.jar, lib/htrace-core-${htrace.version}-incubating.jar"/> </manifest> <fileset dir="${build.dir}" includes="classes/ lib/"/> <!--fileset dir="${build.dir}" includes="*.xml"/--> <fileset dir="${root}" includes="resources/ plugin.xml"/> </jar> </target> </project>
2) 修改hadoop2x-eclipse-plugin-master/ivy/libraries.properties
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#This properties file lists the versions of the various artifacts used by hadoop and components.
#It drives ivy and the generation of a maven POM
# This is the version of hadoop we are generating
#hadoop.version=2.6.0 modify
hadoop.version=2.7.3
hadoop-gpl-compression.version=0.1.0
#These are the versions of our dependencies (in alphabetical order)
apacheant.version=1.7.1
ant-task.version=2.0.10
asm.version=3.2
aspectj.version=1.6.5
aspectj.version=1.6.11
checkstyle.version=4.2
commons-cli.version=1.2
commons-codec.version=1.4
#commons-collections.version=3.2.1 modify
commons-collections.version=3.2.2
commons-configuration.version=1.6
commons-daemon.version=1.0.13
#commons-httpclient.version=3.0.1 modify
commons-httpclient.version=3.1
commons-lang.version=2.6
#commons-logging.version=1.0.4 modify
commons-logging.version=1.1.3
#commons-logging-api.version=1.0.4 modify
commons-logging-api.version=1.1.3
#commons-math.version=2.1 modify
commons-math.version=3.1.1
commons-el.version=1.0
commons-fileupload.version=1.2
#commons-io.version=2.1 modify
commons-io.version=2.4
commons-net.version=3.1
core.version=3.1.1
coreplugin.version=1.3.2
#hsqldb.version=1.8.0.10 modify
hsqldb.version=2.0.0
#htrace.version=3.0.4 modify
htrace.version=3.1.0
ivy.version=2.1.0
jasper.version=5.5.12
jackson.version=1.9.13
#not able to figureout the version of jsp & jsp-api version to get it resolved throught ivy
# but still declared here as we are going to have a local copy from the lib folder
jsp.version=2.1
jsp-api.version=5.5.12
jsp-api-2.1.version=6.1.14
jsp-2.1.version=6.1.14
#jets3t.version=0.6.1 modify
jets3t.version=0.9.0
jetty.version=6.1.26
jetty-util.version=6.1.26
#jersey-core.version=1.8 modify
#jersey-json.version=1.8 modify
#jersey-server.version=1.8 modify
jersey-core.version=1.9
jersey-json.version=1.9
jersey-server.version=1.9
#junit.version=4.5 modify
junit.version=4.11
jdeb.version=0.8
jdiff.version=1.0.9
json.version=1.0
kfs.version=0.1
log4j.version=1.2.17
lucene-core.version=2.3.1
mockito-all.version=1.8.5
jsch.version=0.1.42
oro.version=2.0.8
rats-lib.version=0.5.1
servlet.version=4.0.6
servlet-api.version=2.5
#slf4j-api.version=1.7.5 modify
#slf4j-log4j12.version=1.7.5 modify
slf4j-api.version=1.7.10
slf4j-log4j12.version=1.7.10
wagon-http.version=1.0-beta-2
xmlenc.version=0.52
#xerces.version=1.4.4 modify
xerces.version=2.9.1
protobuf.version=2.5.0
guava.version=11.0.2
netty.version=3.6.2.Final
3. 用ant進行構建
這個時候在centos系統上,hadoop已經能夠正常運行了,為了后面方便eclipse訪問,關掉集群的權限校驗(注意不要在生產環境這么做)。在mapred-site.xml增加一條配置:
<property> <name>dfs.permissions</name> <value>false</value> </property>
重啟集群。
將已經能在centos運行的hadoop2.7.3安裝目錄復制到windows,另外確保ant和eclipse已經配置好了,在ant命令中需要用到三個目錄信息,以我本地為例:
- eclipse:C:\eclipse
- hadoop:E:\5-Hadoop\hadoop-2.7.3
- hadoop2x-eclipse-plugin-master:E:\5-Hadoop\hadoop2x-eclipse-plugin-master
進入目錄E:\5-Hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\
輸入命令進行構建:
ant jar -Dhadoop.version=2.7.3 -Declipse.home=C:\eclipse -Dhadoop.home=E:\5-Hadoop\hadoop-2.7.3
如果路徑有空格的話,記得用雙引號括起來,不過建議把軟件都安裝沒有空格的路徑上。等待幾分鍾,顯示構建成功,最后的生成的hadoop-eclipse-plugin-2.7.3.jar存放在目錄E:\5-Hadoop\hadoop2x-eclipse-plugin-master\build\contrib\eclipse-plugin下。
三、配置插件
1. 安裝插件
將hadoop-eclipse-plugin-2.7.3.jar放在eclipse的plugin目錄中:C:\eclipse\plugins,重啟eclipse。
如果順利,打開eclipse,通過Windows->Preferences->Hadoop Map/Reduce設置安裝路徑:
2. 設置location
通過Windows–> Show View –> Others –> Map/Redure Location ,通過圖標新建一個location:
設置參數:
location name隨便起一個,user name實際上也是默認就行,我之前以為需要填root,實際上什么名字都可以,重點是host跟兩個port,在這里卡了一段時間。
實際上在centos端,我的主機名字是harry.com:
所以在windows端增加hosts配置:
192.168.0.210 harry.com
DFS Master的端口,根據hadoop的配置文件core-site.xml配置的端口去填:
至於Map/Reduce(V2) Master據說是job tracker的端口號(可是hadoop2已經用yarn了),在mapred-site.xml配置,我沒有配這個,所以直接用9001的就行了。
完成之后,在左側應該就可以看到hdfs里面的內容了:
如果host和port沒配對,在連接hdfs的時候可能會報錯:
An internal error occurred during: "Map/Reduce location status updater". java.lang.NullPointerExcept
如何還有其他報錯,請先確保集群能正常運行,關閉centos的防火牆,如果能順利走到這步,整個部署過程就已經完成了一半了。
四、運行MapReduce作業
1. 新建測試項目
新建hadoop里的hello world程序word count,並從源碼中拷貝WordCount.java過來,如下:
把包名改成Test,源碼如下:
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package Test; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length < 2) { System.err.println("Usage: wordcount <in> [<in>...] <out>"); System.exit(2); } Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) { FileInputFormat.addInputPath(job, new Path(otherArgs[i])); } FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
實際上,剛開始構建的時候,會報錯:
The type java.util.Map$Entry cannot be resolved. It is indirectly referenced from required .class files WordCount.java /WordCount/src/Test line 1 Java Problem
查了一下,原來我的eclipse版本是Helios,而java版本是1.8,不支持。於是有兩個選擇:降到java1.8以下或者換更高的eclipse版本,我的選擇是升eclipse版本到juno版本,問題解決。
2. 設置運行參數並運行
點擊WordCount類,右鍵Run As –> Run Configurations ,點擊Arguments,填寫輸入目錄,輸出目錄參數:
運行,報錯:
Exception in thread "main" java.io.IOException: (null) entry in command string: null chmod 0700 C:\tmp\hadoop-Administrator\mapred\staging\Administrator134847124\.staging ...
設置環境變量HADOOP_HOME為E:\5-Hadoop\hadoop-2.7.3,在 https://github.com/SweetInk/hadoop-common-2.7.1-bin 中下載winutils.exe,libwinutils.lib拷貝到%HADOOP_HOME%\bin目錄,重新運行,嗯,重新報錯:
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
...
在https://github.com/SweetInk/hadoop-common-2.7.1-bin中下載hadoop.dll,並拷貝到c:\windows\system32目錄中。重新執行,報WARN:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
在項目的src下面新建file名為log4j.properties的文件,內容為:
# Configure logging for testing: optionally with log file #log4j.rootLogger=debug,appender log4j.rootLogger=info,appender #log4j.rootLogger=error,appender #\u8F93\u51FA\u5230\u63A7\u5236\u53F0 log4j.appender.appender=org.apache.log4j.ConsoleAppender #\u6837\u5F0F\u4E3ATTCCLayout log4j.appender.appender.layout=org.apache.log4j.TTCCLayout
因為剛才其實已經運行成功了,所以這次需要先刪掉hdfs中的out1目錄,然后運行,然后在console中會打印出這次任務的運行數據,最后在hdfs中查看運行結果:
至此,終於成功利用在eclipse通過hadoop-eclipse-plugin-2.7.3.jar插件,成功連接centos下運行中的hadoop,開發環境搭建完成!!!
五、參考
1. hadoop 2.7.3 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.3.jar
2. An internal error occurred during: "Map/Reduce location status updater". java.lang.NullPointerExcept
4. Hadoop學習筆記2:eclipse運行Mapreduce程序問題總結
5. Eclipse搭建hadoop開發環境[hadoop-eclipse-plugin-2.5.2]
7. JDK8之The type java.util.Map$Entry cannot be resolved
(完)