[b0008] Windows 7 下 hadoop 2.6.4 eclipse 本地開發調試配置


目的:

基於上篇的方法介紹,開發很不方便 。[0007] windows 下 eclipse 開發 hdfs程序樣例 

裝上插件,方便后續直接在windows下的IDE開發調試。

環境:

  • Linux  Hadoop 2.6.4,參考文章 [0001]
  • Win 7 64  下的 Eclipse Version: Luna Service Release 1 (4.4.1)

    工具:

說明:

   以下整個步驟過程是在全部弄好后,才來填補的。中間修改多次,為了快速成文有些內容從其他地方復制。因此,如果完全照着步驟,可能需要一些小修改。整個思路是對的。

1.  准備Hadoop安裝包

在windows下解壓 Hadoop 2.6.4 安裝程序包。  將Linux上的hadoop 安裝目錄下 etc/hadoop的所有配置文件

全部替換 windows下解壓后的配置文件

2 . 安裝HDFS eclipse 插件

  • eclipse關閉狀態下, 將 hadoop-eclipse-plugin-2.6.4.jar 放到該目錄下 eclipse安裝目錄\plugins\
  • 啟動eclipse
  • 菜單欄->窗口windows->首選項preferences->Hadoop mapeduce ,指定hadoop路徑為前面的解壓路徑
  • 菜單欄->窗口windows->Open Perspective->Other->選擇Map/Reduce ok->Map/Reduce Location選項卡 ->右邊藍色小象 打開配置窗口如圖,進行如下設置,點擊ok

1位置為配置的名稱,任意。

2位置為mapred-site.xml文件中的mapreduce.jobhistory.address配置,如果沒有則默認是10020。

3位置為core-site.xml文件中的fs.defaultFS:hdfs://ssmaster:9000 。

這是網上找到圖片,我的設置 

hadoop2.6偽分布式,ssmaster:10020,ssmaster:9000

 

 clip_image005

 

設置成功后,在eclipse這里可以直接顯示Linux Hadoop hdfs的文件目錄

clip_image006

 

可以直接在這里 下載、上傳、 刪除HDFS上的文件,很方便

 

3  配置Mapreduce Windows 插件包

 3.1  下載hadoop 2.6 windows插件包包

沒找到2.6.4的,用2.6的最后也成功了。

其中參考下載地址: http://download.csdn.net/detail/myamor/8393459,這個似乎是win8的, 本人的系統win7,不是從這里下的。 忘記哪里了。可以搜索 winutils.exe + win7 。 下載后的文件應該有 hadoop.dll hadoop.pdb hadoop.lib hadoop.exp winutils.exe winutils.pdb libwinutils.lib

 

 3.2   配置

      a  解壓上面的插件包, 將文件全部拷貝到 G:\RSoftware\hadoop-2.6.4\hadoop-2.6.4\bin ,該路徑為前面"2 . 安裝HDFS eclipse 插件"的hadoop指定路徑。  

      b  設置環境變量

         HADOOP_HOME =G:\RSoftware\hadoop-2.6.4\hadoop-2.6.4

         Path 中添加 G:\RSoftware\hadoop-2.6.4\hadoop-2.6.4\bin

         確保有 HADOOP_USER_NAME = hadoop   上一篇 [0007]中設置

      

      重啟Eclipse ,讀取新環境變量

 

4    測試Mapreduce

   4.1 新建mapreduce 工程      

  這里寫圖片描述

這里寫圖片描述

完成后項目會自動把Hadoop的所有jar包導入

 

4.2  項目配置log4j

 

在src目錄下,創建log4j.properties文件 ,內容如下

log4j.rootLogger=debug,stdout,R
log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.File=mapreduce_test.log log4j.appender.R.MaxFileSize=1MB log4j.appender.R.MaxBackupIndex=1 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m% log4j.logger.com.codefutures=DEBUG

 

4.3 WordCount類中  添加代碼

 

在WordCount項目里右鍵src新建class,包名com.xxm(請自行命明),類名為WordCount

這里寫圖片描述

 

package mp.filetest;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/**
 * 描述:WordCount explains by xxm
 * @author xxm
 */
public class WordCount2 {

 /**
 * Map類:自己定義map方法
 */
 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    /**
    * LongWritable, IntWritable, Text 均是 Hadoop 中實現的用於封裝 Java 數據類型的類
    * 都能夠被串行化從而便於在分布式環境中進行數據交換,可以將它們分別視為long,int,String 的替代品。
    */
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    /**
    * Mapper類中的map方法:
    * protected void map(KEYIN key, VALUEIN value, Context context)
    * 映射一個單個的輸入k/v對到一個中間的k/v對
    * Context類:收集Mapper輸出的<k,v>對。
    */
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
        }
    }
 } 

 /**
 * Reduce類:自己定義reduce方法
 */       
 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    /**
    * Reducer類中的reduce方法:
    * protected void reduce(KEYIN key, Interable<VALUEIN> value, Context context)
    * 映射一個單個的輸入k/v對到一個中間的k/v對
    * Context類:收集Reducer輸出的<k,v>對。
    */
    public void reduce(Text key, Iterable<IntWritable> values, Context context) 
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
 }

 /**
 * main主函數
 */       
 public static void main(String[] args) throws Exception {

    Configuration conf = new Configuration();//創建一個配置對象,用來實現所有配置

    Job job = new Job(conf, "wordcount2");//新建一個job,並定義名稱

    job.setOutputKeyClass(Text.class);//為job的輸出數據設置Key類
    job.setOutputValueClass(IntWritable.class);//為job輸出設置value類
    
    job.setMapperClass(Map.class); //為job設置Mapper類
    job.setReducerClass(Reduce.class);//為job設置Reduce類
    job.setJarByClass(WordCount2.class);

    job.setInputFormatClass(TextInputFormat.class);//為map-reduce任務設置InputFormat實現類
    job.setOutputFormatClass(TextOutputFormat.class);//為map-reduce任務設置OutputFormat實現類

    FileInputFormat.addInputPath(job, new Path(args[0]));//為map-reduce job設置輸入路徑
    FileOutputFormat.setOutputPath(job, new Path(args[1]));//為map-reduce job設置輸出路徑
    job.waitForCompletion(true); //運行一個job,並等待其結束
 }

}
View Code

 

可選, 如果沒有配置,最后可能報這個錯誤,在文章最后面異常部分, 按照異常解決辦法配置。

(   Y.2  運行過程中 異常

1 main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)

 

4.4 運行

確保Hadoop已經啟動

在WordCount的代碼區域,右鍵,點擊Run As—>Run Configurations,配置運行參數,文件夾輸入和輸出,第2個參數的路徑確保HDFS上不存在
hdfs://ssmaster:9000/input 
hdfs://ssmaster:9000/output  

 clip_image008

 

點擊 Run運行,可以直接在eclipse的控制台看到執行進度和結果

INFO - Job job_local1914346901_0001 completed successfully

  INFO - Counters: 38 
    File System Counters 
        FILE: Number of bytes read=4109 
        FILE: Number of bytes written=1029438 
        FILE: Number of read operations=0 
        FILE: Number of large read operations=0 
        FILE: Number of write operations=0 
        HDFS: Number of bytes read=134 
        HDFS: Number of bytes written=40 
        HDFS: Number of read operations=37 
        HDFS: Number of large read operations=0 
        HDFS: Number of write operations=6 
    Map-Reduce Framework 
        Map input records=3 
        Map output records=7 
        Map output bytes=70 
        Map output materialized bytes=102 
        Input split bytes=354 
        Combine input records=7 
        Combine output records=7 
        Reduce input groups=5 
        Reduce shuffle bytes=102 
        Reduce input records=7 
        Reduce output records=5 
        Spilled Records=14 
        Shuffled Maps =3 
        Failed Shuffles=0 
        Merged Map outputs=3 
        GC time elapsed (ms)=21 
        CPU time spent (ms)=0 
        Physical memory (bytes) snapshot=0 
        Virtual memory (bytes) snapshot=0 
        Total committed heap usage (bytes)=1556611072 
    Shuffle Errors 
        BAD_ID=0 
        CONNECTION=0 
        IO_ERROR=0 
        WRONG_LENGTH=0 
        WRONG_MAP=0 
        WRONG_REDUCE=0 
    File Input Format Counters 
        Bytes Read=42 
    File Output Format Counters 
        Bytes Written=40
執行日志

在“DFS Locations”下,刷新剛創建的“hadoop”看到本次任務的輸出目錄下是否有輸出文件。

 clip_image010

 

4.5 可選  命令行下執行,導出成jar包,上傳到Linux

右鍵項目名字->導出->java/jar文件 ->指定jar路徑名字->指定main類為   完成

先刪除剛才的輸出目錄

 hadoop@ssmaster:~/java_program$ hadoop fs -rm -r /output
hadoop@ssmaster:~/java_program$ hadoop fs -ls / Found 4 items drwxr-xr-x - hadoop supergroup 0 2016-10-24 05:04 /data drwxr-xr-x - hadoop supergroup 0 2016-10-23 00:45 /input drwxr-xr-x - hadoop supergroup 0 2016-10-24 05:04 /test drwx------ - hadoop supergroup 0 2016-10-23 00:05 /tmp




執行 hadoop  jar hadoop_mapr_wordcount.jar  /input /output

hadoop@ssmaster:~/java_program$ hadoop  jar hadoop_mapr_wordcount.jar  /input /output 
16/10/24 08:30:32 INFO client.RMProxy: Connecting to ResourceManager at ssmaster/192.168.249.144:8032
16/10/24 08:30:33 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/10/24 08:30:33 INFO input.FileInputFormat: Total input paths to process : 1
16/10/24 08:30:34 INFO mapreduce.JobSubmitter: number of splits:1
16/10/24 08:30:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477315002921_0004
16/10/24 08:30:34 INFO impl.YarnClientImpl: Submitted application application_1477315002921_0004
16/10/24 08:30:34 INFO mapreduce.Job: The url to track the job: http://ssmaster:8088/proxy/application_1477315002921_0004/
16/10/24 08:30:34 INFO mapreduce.Job: Running job: job_1477315002921_0004
16/10/24 08:30:43 INFO mapreduce.Job: Job job_1477315002921_0004 running in uber mode : false
16/10/24 08:30:43 INFO mapreduce.Job:  map 0% reduce 0%
16/10/24 08:30:52 INFO mapreduce.Job:  map 100% reduce 0%
16/10/24 08:31:02 INFO mapreduce.Job:  map 100% reduce 100%
16/10/24 08:31:04 INFO mapreduce.Job: Job job_1477315002921_0004 completed successfully
16/10/24 08:31:05 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=3581
        FILE: Number of bytes written=220839
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1863
        HDFS: Number of bytes written=1425
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=6483
        Total time spent by all reduces in occupied slots (ms)=7797
        Total time spent by all map tasks (ms)=6483
        Total time spent by all reduce tasks (ms)=7797
        Total vcore-milliseconds taken by all map tasks=6483
        Total vcore-milliseconds taken by all reduce tasks=7797
        Total megabyte-milliseconds taken by all map tasks=6638592
        Total megabyte-milliseconds taken by all reduce tasks=7984128
    Map-Reduce Framework
        Map input records=11
        Map output records=303
        Map output bytes=2969
        Map output materialized bytes=3581
        Input split bytes=101
        Combine input records=0
        Combine output records=0
        Reduce input groups=158
        Reduce shuffle bytes=3581
        Reduce input records=303
        Reduce output records=158
        Spilled Records=606
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=241
        CPU time spent (ms)=4530
        Physical memory (bytes) snapshot=456400896
        Virtual memory (bytes) snapshot=1441251328
        Total committed heap usage (bytes)=312999936
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=1762
    File Output Format Counters 
        Bytes Written=1425
View Code

備注:如何導出包,可以用這種方式執行  hadoop  jar xxxx.jar  wordcount /input /output [遺留]

 

Y 異常

Y.1    Permission denied: user=Administrator

在第2步最后, HDFS的某個目錄可能提示:

 Permission denied: user=Administrator, access=WRITE, inode="hadoop": hadoop:supergroup:rwxr-xr-x

用戶Administator在hadoop上執行寫操作時被權限系統拒,windows eclipse的默認用 用戶Administator 去訪問hadoop的文件

解決如下:

windows 添加環境變量 HADOOP_USER_NAME ,值為 hadoop (這是Linux上hadoop2.6.4 的用戶名)

重啟eclipse生效

 

Y.2  運行過程中 異常

1 main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

原因:未知  

解決:

a  將前面下載的配置包中的 hadoop.dll 文件拷貝到  C:\Windows\System32 ,參考中提示需要 重啟電腦

b  源碼包 hadoop-2.6.4-src.tar.gz解壓,hadoop-2.6.4-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio下NativeIO.java 復制到對應的Eclipse的project

       修改如下地方

       

 

2  log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://ssmaster:9000/output already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:267)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:140)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1315)
    at mp.filetest.WordCount2.main(WordCount2.java:88)
執行錯誤日志

原因: log4j.properties文件沒有

解決: 照步驟做 4.2

3  Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the

2016-10-24 20:42:03,603 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-10-24 20:42:03,709 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(373)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
    at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:116)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:93)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:73)
    at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
View Code

原因:hadoop 2.6 windows插件包沒配置好

解決:安裝步驟3.2中配置

 

 

Z 總結:

 加油,干得好。

 后續

  照着參考里面的程序,跑一下,測試直接跑程序能否成功 done

  有空弄明白 log4j.properties配置中各個參數含義

  將Hadoop源碼包導入項目中,以便跟蹤調試

 

C 參考:

c.1  安裝:     Win7+Eclipse+Hadoop2.6.4開發環境搭建 

c.2  安裝:     Hadoop學習筆記(4)-Linux ubuntu 下  Eclipse下搭建Hadoop2.6.4開發環境

c.3  錯誤處理:關於使用Hadoop MR的Eclipse插件開發時遇到Permission denied問題的解決辦法 

c.4  錯誤處理: 解決Exception: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z 等一系列問題

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM