Hadoop基本開發環境搭建（原創，已實踐）

本文轉載自查看原文 2016-08-29 12:05 35541 Hadoop/ 分布式系統/ developer/ 開發/ 環境搭建

軟件包：

　　hadoop-2.7.2.tar.gz

　　hadoop-eclipse-plugin-2.7.2.jar

　　hadoop-common-2.7.1-bin.zip

　　eclipse

　 jdk1.8.45

　　hadoop-2.7.2（linux和windows各一份）

　　Linux系統（centos或其它）

　　Hadoop安裝環境

准備環境：

　　安裝Hadoop，安裝步驟參見Hadoop安裝章節。

　　安裝eclipse。

搭建過程如下：

1. 將hadoop-eclipse-plugin-2.7.2.jar拷貝到eclipse/dropins目錄下。

2. 解壓hadoop-2.7.2.tar.gz到E盤下。

3. 下載或者編譯hadoop-common-2.7.2（由於hadoop-common-2.7.1可以兼容hadoop-common-2.7.2，因此這里使用hadoop-common-2.7.1），如果想編譯可參考相關文章。

4. 將hadoop-common-2.7.1下的文件全部拷貝到E:\hadoop-2.7.2\bin下面，hadoop.dll在system32下面也要放一個，否則會報下圖的錯誤：

並配置系統環境變量HADOOP_HOME：

5. 啟動eclipse,打開windows->Preferences的Hadoop Map/Reduce中設置安裝目錄:

6. 打開Windows->Open Perspective中的Map/Reduce，在此perspective下進行hadoop程序開發。

7. 打開Windows->Show View中的Map/Reduce Locations，如下圖右鍵選擇New Hadoop location…新建hadoop連接。

9. 新建工程並添加WordCount類：

10. 把log4j.properties和hadoop集群中的core-site.xml加入到classpath中。我的示例工程是maven組織，因此放到src/main/resources目錄。

11. log4j.properties文件內容如下：

log4j.rootLogger=debug,stdout,R 
log4j.appender.stdout=org.apache.log4j.ConsoleAppender 
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 
log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n 
log4j.appender.R=org.apache.log4j.RollingFileAppender 
log4j.appender.R.File=mapreduce_test.log 
log4j.appender.R.MaxFileSize=1MB 
log4j.appender.R.MaxBackupIndex=1 
log4j.appender.R.layout=org.apache.log4j.PatternLayout 
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n 
log4j.logger.com.codefutures=DEBUG

12. 在HDFS上創建目錄input

　　hadoop dfs -mkdir input

13. 拷貝本地README.txt到HDFS的input里

　　 hadoop dfs -copyFromLocal /usr/local/hadoop/README.txt input

14. hadoop集群中hdfs-site.xml中要添加下面的配置，否則在eclipse中無法向hdfs中上傳文件：

<property>
     <name>dfs.permissions</name>
     <value>false</value>
</property>

15. 若碰到Cannot connect to VM com.sun.jdi.connect.TransportTimeoutException，則關閉防火牆。

16. 書寫代碼如下：

package com.hadoop.example;

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
    public static class TokenizerMapper extends
            Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {

            StringTokenizer itr = new StringTokenizer(value.toString());

            System.out.print("--map: " + value.toString() + "\n");
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                System.out.print("--map token: " + word.toString() + "\n");
                context.write(word, one);
                
                System.out.print("--context: " + word.toString() + "," + one.toString() + "\n");
            }
        }
    }

    public static class IntSumReducer extends
            Reducer<Text, IntWritable, Text, IntWritable> {

        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {

            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
            
            System.out.print("--reduce: " + key.toString() + "," + result.toString() + "\n");
        }
    }

    public static void main(String[] args) throws Exception {

        System.setProperty("hadoop.home.dir", "E:\\hadoop-2.7.2");
        
        Configuration conf = new Configuration();

        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();

        if (otherArgs.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }

    
        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setNumReduceTasks(2);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

17. 點擊WordCount.java，右鍵，點擊Run As—>Run Configurations，配置運行參數，即輸入和輸出文件夾，java application里面如果沒有wordcount就先把當前project run--->java applation一下。

　　hdfs://localhost:9000/user/hadoop/input hdfs://localhost:9000/user/hadoop/output

其中的localhost為hadoop集群的域名，也可以直接使用IP,如果使用域名的話需要編輯C:\Windows\System32\drivers\etc\HOSTS，添加IP與域名的映射關系

18. 運行完成后，查看運行結果：

方法1：

hadoop dfs -ls output

可以看到有兩個輸出結果，_SUCCESS和part-r-00000

執行hadoop dfs -cat output/*

　方法2:

　　展開DFS Locations，如下圖所示，雙擊打開part-r00000查看結果:

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 eclipse+HBASE開發環境搭建（已實踐）【原創干貨】大數據Hadoop/Spark開發環境搭建 Windows上搭建hadoop開發環境在Eclipse下搭建Hadoop開發環境在windows上搭建hadoop開發環境 Windows 搭建Hadoop 2.7.3開發環境 Hadoop Eclipse開發環境搭建 hadoop搭建與eclipse開發環境設置搭建基於MyEclipse的Hadoop開發環境在Eclipse下搭建Hadoop開發環境