hadoop開發環境部署——通過eclipse遠程連接hadoop2.7.3進行開發


一、前言

環境:

  • 系統:centos6.5
  • hadoop版本:Apache hadoop2.7.3(Windows和centos都是同一個)
  • eclipse版本:4.2.0(juno版本,windows)
  • ant版本:ant 1.7.1(windows)
  • java版本:1.8.0_05(windows)

我是在虛擬機中安裝的系統,具體的安裝和配置參考:Hadoop單機偽分布部署

 

二、制作插件

1. 下載hadoop2x-eclipse-plugin-master.zip

在github下載:https://github.com/winghc/hadoop2x-eclipse-plugin

 

2. 修改配置

原來的配置是編譯2.4.1的,為了編譯成2.7.3版本,首先需要改2個文件。

1) 修改hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml

在第82行 找到 <target name="jar" depends="compile" unless="skip.contrib">標簽,找到下面的子標簽(在127行):

<copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}.jar"  todir="${build.dir}/lib" verbose="true"/>

刪掉這行,然后在下面增加多三行信息:

<copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}-incubating.jar"  todir="${build.dir}/lib" verbose="true"/>  
<copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar"  todir="${build.dir}/lib" verbose="true"/>

然后找到標簽<attribute name="Bundle-ClassPath",在下面的標簽中,刪除lib/htrace-core-${htrace.version}.jar這行,同樣加多三行:

lib/servlet-api-${servlet-api.version}.jar,  
lib/commons-io-${commons-io.version}.jar,  
lib/htrace-core-${htrace.version}-incubating.jar"

修改完的文件:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->

<project default="jar" name="eclipse-plugin">

  <import file="../build-contrib.xml"/>

  <path id="eclipse-sdk-jars">
    <fileset dir="${eclipse.home}/plugins/">
      <include name="org.eclipse.ui*.jar"/>
      <include name="org.eclipse.jdt*.jar"/>
      <include name="org.eclipse.core*.jar"/>
      <include name="org.eclipse.equinox*.jar"/>
      <include name="org.eclipse.debug*.jar"/>
      <include name="org.eclipse.osgi*.jar"/>
      <include name="org.eclipse.swt*.jar"/>
      <include name="org.eclipse.jface*.jar"/>

      <include name="org.eclipse.team.cvs.ssh2*.jar"/>
      <include name="com.jcraft.jsch*.jar"/>
    </fileset> 
  </path>

  <path id="hadoop-sdk-jars">
    <fileset dir="${hadoop.home}/share/hadoop/mapreduce">
      <include name="hadoop*.jar"/>
    </fileset> 
    <fileset dir="${hadoop.home}/share/hadoop/hdfs">
      <include name="hadoop*.jar"/>
    </fileset> 
    <fileset dir="${hadoop.home}/share/hadoop/common">
      <include name="hadoop*.jar"/>
    </fileset> 
  </path>



  <!-- Override classpath to include Eclipse SDK jars -->
  <path id="classpath">
    <pathelement location="${build.classes}"/>
    <!--pathelement location="${hadoop.root}/build/classes"/-->
    <path refid="eclipse-sdk-jars"/>
    <path refid="hadoop-sdk-jars"/>
  </path>

  <!-- Skip building if eclipse.home is unset. -->
  <target name="check-contrib" unless="eclipse.home">
    <property name="skip.contrib" value="yes"/>
    <echo message="eclipse.home unset: skipping eclipse plugin"/>
  </target>

 <target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib">
    <echo message="contrib: ${name}"/>
    <javac
     encoding="${build.encoding}"
     srcdir="${src.dir}"
     includes="**/*.java"
     destdir="${build.classes}"
     debug="${javac.debug}"
     deprecation="${javac.deprecation}">
     <classpath refid="classpath"/>
    </javac>
  </target>

  <!-- Override jar target to specify manifest -->
  <target name="jar" depends="compile" unless="skip.contrib">
    <mkdir dir="${build.dir}/lib"/>
    <copy  todir="${build.dir}/lib/" verbose="true">
          <fileset dir="${hadoop.home}/share/hadoop/mapreduce">
           <include name="hadoop*.jar"/>
          </fileset>
    </copy>
    <copy  todir="${build.dir}/lib/" verbose="true">
          <fileset dir="${hadoop.home}/share/hadoop/common">
           <include name="hadoop*.jar"/>
          </fileset>
    </copy>
    <copy  todir="${build.dir}/lib/" verbose="true">
          <fileset dir="${hadoop.home}/share/hadoop/hdfs">
           <include name="hadoop*.jar"/>
          </fileset>
    </copy>
    <copy  todir="${build.dir}/lib/" verbose="true">
          <fileset dir="${hadoop.home}/share/hadoop/yarn">
           <include name="hadoop*.jar"/>
          </fileset>
    </copy>

    <copy  todir="${build.dir}/classes" verbose="true">
          <fileset dir="${root}/src/java">
           <include name="*.xml"/>
          </fileset>
    </copy>



    <copy file="${hadoop.home}/share/hadoop/common/lib/protobuf-java-${protobuf.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/log4j-${log4j.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-configuration-${commons-configuration.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-lang-${commons-lang.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
    <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-core-asl-${jackson.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-mapper-asl-${jackson.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-log4j12-${slf4j-log4j12.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-api-${slf4j-api.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/guava-${guava.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/hadoop-auth-${hadoop.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/netty-${netty.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}-incubating.jar"  todir="${build.dir}/lib" verbose="true"/>  
    <copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar"  todir="${build.dir}/lib" verbose="true"/>  
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar"  todir="${build.dir}/lib" verbose="true"/> 

    <jar
      jarfile="${build.dir}/hadoop-${name}-${hadoop.version}.jar"
      manifest="${root}/META-INF/MANIFEST.MF">
      <manifest>
   <attribute name="Bundle-ClassPath" 
    value="classes/, 
 lib/hadoop-mapreduce-client-core-${hadoop.version}.jar,
 lib/hadoop-mapreduce-client-common-${hadoop.version}.jar,
 lib/hadoop-mapreduce-client-jobclient-${hadoop.version}.jar,
 lib/hadoop-auth-${hadoop.version}.jar,
 lib/hadoop-common-${hadoop.version}.jar,
 lib/hadoop-hdfs-${hadoop.version}.jar,
 lib/protobuf-java-${protobuf.version}.jar,
 lib/log4j-${log4j.version}.jar,
 lib/commons-cli-${commons-cli.version}.jar,
 lib/commons-configuration-${commons-configuration.version}.jar,
 lib/commons-httpclient-${commons-httpclient.version}.jar,
 lib/commons-lang-${commons-lang.version}.jar,  
 lib/commons-collections-${commons-collections.version}.jar,  
 lib/jackson-core-asl-${jackson.version}.jar,
 lib/jackson-mapper-asl-${jackson.version}.jar,
 lib/slf4j-log4j12-${slf4j-log4j12.version}.jar,
 lib/slf4j-api-${slf4j-api.version}.jar,
 lib/guava-${guava.version}.jar,
 lib/netty-${netty.version}.jar,
 lib/servlet-api-${servlet-api.version}.jar,  
 lib/commons-io-${commons-io.version}.jar,  
 lib/htrace-core-${htrace.version}-incubating.jar"/>
   </manifest>
      <fileset dir="${build.dir}" includes="classes/ lib/"/>
      <!--fileset dir="${build.dir}" includes="*.xml"/-->
      <fileset dir="${root}" includes="resources/ plugin.xml"/>
    </jar>
  </target>

</project>

2) 修改hadoop2x-eclipse-plugin-master/ivy/libraries.properties

#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.

#This properties file lists the versions of the various artifacts used by hadoop and components.
#It drives ivy and the generation of a maven POM

# This is the version of hadoop we are generating
#hadoop.version=2.6.0    modify
hadoop.version=2.7.3
hadoop-gpl-compression.version=0.1.0

#These are the versions of our dependencies (in alphabetical order)
apacheant.version=1.7.1
ant-task.version=2.0.10

asm.version=3.2
aspectj.version=1.6.5
aspectj.version=1.6.11

checkstyle.version=4.2

commons-cli.version=1.2
commons-codec.version=1.4
#commons-collections.version=3.2.1    modify
commons-collections.version=3.2.2
commons-configuration.version=1.6
commons-daemon.version=1.0.13
#commons-httpclient.version=3.0.1    modify
commons-httpclient.version=3.1
commons-lang.version=2.6
#commons-logging.version=1.0.4        modify
commons-logging.version=1.1.3
#commons-logging-api.version=1.0.4    modify
commons-logging-api.version=1.1.3
#commons-math.version=2.1    modify
commons-math.version=3.1.1
commons-el.version=1.0
commons-fileupload.version=1.2
#commons-io.version=2.1        modify
commons-io.version=2.4
commons-net.version=3.1
core.version=3.1.1
coreplugin.version=1.3.2

#hsqldb.version=1.8.0.10    modify
hsqldb.version=2.0.0
#htrace.version=3.0.4    modify
htrace.version=3.1.0

ivy.version=2.1.0

jasper.version=5.5.12
jackson.version=1.9.13
#not able to figureout the version of jsp & jsp-api version to get it resolved throught ivy
# but still declared here as we are going to have a local copy from the lib folder
jsp.version=2.1
jsp-api.version=5.5.12
jsp-api-2.1.version=6.1.14
jsp-2.1.version=6.1.14
#jets3t.version=0.6.1    modify
jets3t.version=0.9.0
jetty.version=6.1.26
jetty-util.version=6.1.26
#jersey-core.version=1.8    modify
#jersey-json.version=1.8    modify
#jersey-server.version=1.8    modify
jersey-core.version=1.9
jersey-json.version=1.9
jersey-server.version=1.9
#junit.version=4.5    modify
junit.version=4.11
jdeb.version=0.8
jdiff.version=1.0.9
json.version=1.0

kfs.version=0.1

log4j.version=1.2.17
lucene-core.version=2.3.1

mockito-all.version=1.8.5
jsch.version=0.1.42

oro.version=2.0.8

rats-lib.version=0.5.1

servlet.version=4.0.6
servlet-api.version=2.5
#slf4j-api.version=1.7.5    modify
#slf4j-log4j12.version=1.7.5    modify
slf4j-api.version=1.7.10
slf4j-log4j12.version=1.7.10

wagon-http.version=1.0-beta-2
xmlenc.version=0.52
#xerces.version=1.4.4    modify
xerces.version=2.9.1

protobuf.version=2.5.0
guava.version=11.0.2
netty.version=3.6.2.Final

 

3. 用ant進行構建

這個時候在centos系統上,hadoop已經能夠正常運行了,為了后面方便eclipse訪問,關掉集群的權限校驗(注意不要在生產環境這么做)。在mapred-site.xml增加一條配置:

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

重啟集群。

將已經能在centos運行的hadoop2.7.3安裝目錄復制到windows,另外確保ant和eclipse已經配置好了,在ant命令中需要用到三個目錄信息,以我本地為例:

  • eclipse:C:\eclipse
  • hadoop:E:\5-Hadoop\hadoop-2.7.3
  • hadoop2x-eclipse-plugin-master:E:\5-Hadoop\hadoop2x-eclipse-plugin-master

進入目錄E:\5-Hadoop\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\

輸入命令進行構建:

ant jar -Dhadoop.version=2.7.3 -Declipse.home=C:\eclipse -Dhadoop.home=E:\5-Hadoop\hadoop-2.7.3

如果路徑有空格的話,記得用雙引號括起來,不過建議把軟件都安裝沒有空格的路徑上。等待幾分鍾,顯示構建成功,最后的生成的hadoop-eclipse-plugin-2.7.3.jar存放在目錄E:\5-Hadoop\hadoop2x-eclipse-plugin-master\build\contrib\eclipse-plugin下。

 

三、配置插件

1. 安裝插件

將hadoop-eclipse-plugin-2.7.3.jar放在eclipse的plugin目錄中:C:\eclipse\plugins,重啟eclipse。

如果順利,打開eclipse,通過Windows->Preferences->Hadoop Map/Reduce設置安裝路徑:

image

 

2. 設置location

通過Windows–> Show View –> Others –> Map/Redure Location ,通過圖標新建一個location:

image

設置參數:

QQ截圖20180529211104

location name隨便起一個,user name實際上也是默認就行,我之前以為需要填root,實際上什么名字都可以,重點是host跟兩個port,在這里卡了一段時間。

實際上在centos端,我的主機名字是harry.com:

QQ截圖20180529211503

所以在windows端增加hosts配置:

192.168.0.210 harry.com

DFS Master的端口,根據hadoop的配置文件core-site.xml配置的端口去填:

QQ截圖20180529211740

至於Map/Reduce(V2) Master據說是job tracker的端口號(可是hadoop2已經用yarn了),在mapred-site.xml配置,我沒有配這個,所以直接用9001的就行了。

完成之后,在左側應該就可以看到hdfs里面的內容了:

QQ截圖20180529212243

如果host和port沒配對,在連接hdfs的時候可能會報錯:

An internal error occurred during: "Map/Reduce location status updater". java.lang.NullPointerExcept

如何還有其他報錯,請先確保集群能正常運行,關閉centos的防火牆,如果能順利走到這步,整個部署過程就已經完成了一半了。

 

四、運行MapReduce作業

1. 新建測試項目

新建hadoop里的hello world程序word count,並從源碼中拷貝WordCount.java過來,如下:

QQ截圖20180529230442

把包名改成Test,源碼如下:

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package Test;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
    
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
      
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
  
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length < 2) {
      System.err.println("Usage: wordcount <in> [<in>...] <out>");
      System.exit(2);
    }
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    for (int i = 0; i < otherArgs.length - 1; ++i) {
      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
    }
    FileOutputFormat.setOutputPath(job,
      new Path(otherArgs[otherArgs.length - 1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

實際上,剛開始構建的時候,會報錯:

The type java.util.Map$Entry cannot be resolved. It is indirectly referenced from required .class files    WordCount.java    /WordCount/src/Test    line 1    Java Problem

查了一下,原來我的eclipse版本是Helios,而java版本是1.8,不支持。於是有兩個選擇:降到java1.8以下或者換更高的eclipse版本,我的選擇是升eclipse版本到juno版本,問題解決。

 

2. 設置運行參數並運行

點擊WordCount類,右鍵Run As –> Run Configurations ,點擊Arguments,填寫輸入目錄,輸出目錄參數:

QQ截圖20180529231417

運行,報錯:

Exception in thread "main" java.io.IOException: (null) entry in command string: null chmod 0700 C:\tmp\hadoop-Administrator\mapred\staging\Administrator134847124\.staging
...

設置環境變量HADOOP_HOME為E:\5-Hadoop\hadoop-2.7.3,在 https://github.com/SweetInk/hadoop-common-2.7.1-bin 中下載winutils.exe,libwinutils.lib拷貝到%HADOOP_HOME%\bin目錄,重新運行,嗯,重新報錯:

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
...

https://github.com/SweetInk/hadoop-common-2.7.1-bin中下載hadoop.dll,並拷貝到c:\windows\system32目錄中。重新執行,報WARN

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

在項目的src下面新建file名為log4j.properties的文件,內容為:

# Configure logging for testing: optionally with log file  
#log4j.rootLogger=debug,appender  
log4j.rootLogger=info,appender  
#log4j.rootLogger=error,appender    
#\u8F93\u51FA\u5230\u63A7\u5236\u53F0  
log4j.appender.appender=org.apache.log4j.ConsoleAppender  
#\u6837\u5F0F\u4E3ATTCCLayout  
log4j.appender.appender.layout=org.apache.log4j.TTCCLayout

因為剛才其實已經運行成功了,所以這次需要先刪掉hdfs中的out1目錄,然后運行,然后在console中會打印出這次任務的運行數據,最后在hdfs中查看運行結果:

QQ截圖20180529231919

至此,終於成功利用在eclipse通過hadoop-eclipse-plugin-2.7.3.jar插件,成功連接centos下運行中的hadoop,開發環境搭建完成!!!

 

五、參考

1. hadoop 2.7.3 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.3.jar

2. An internal error occurred during: "Map/Reduce location status updater". java.lang.NullPointerExcept

3. The type java.util.Map$Entry cannot be resolved. It is indirectly referenced from required .class files [duplicate]

4. Hadoop學習筆記2:eclipse運行Mapreduce程序問題總結

5. Eclipse搭建hadoop開發環境[hadoop-eclipse-plugin-2.5.2]

6. Eclipse下搭建Hadoop2.7.3開發環境

7. JDK8之The type java.util.Map$Entry cannot be resolved

(完)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM