Windows + IDEA 手動開發MapReduce程序

本文轉載自查看原文 2017-10-20 12:22 2409 大數據

參見馬士兵老師的博文：map_reduce

環境配置

Windows本地解壓Hadoop壓縮包，然后像配置JDK環境變量一樣在系統環境變量里配置HADOOP_HOME和path環境變量。注意：hadoop安裝目錄盡量不要包含空格或者中文字符。

形如：

添加windows環境下依賴的庫文件

把盤中（盤地址提取碼：s6uv）共享的bin目錄覆蓋HADOOP_HOME/bin目錄下的文件。
如果還是不行，把其中hadoop.dll復制到C:\windows\system32目錄下，可能需要重啟機器。
注意：配置好之后不需要啟動Windows上的Hadoop

pom.xml

 <!-- hadoop start -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-minicluster</artifactId>
      <version>2.7.4</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>2.7.4</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-assemblies</artifactId>
      <version>2.7.4</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-maven-plugins</artifactId>
      <version>2.7.4</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.7.4</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>2.7.4</version>
    </dependency>
    <!-- hadoop end -->

代碼

WordMapper：

public class WordMapper extends Mapper<Object,Text,Text,IntWritable> {

    private final  static  IntWritable one = new IntWritable(1);

    private Text word = new Text();

    @Override
    public void map(Object key , Text value , Context context) throws IOException, InterruptedException{

        StringTokenizer itr = new StringTokenizer(value.toString()) ;

        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word,one);
        }
    }
}

WordReducer：：

public class WordReducer extends Reducer<Text,IntWritable,Text,IntWritable> {

    private IntWritable result = new IntWritable() ;

    public void reduce(Text key , Iterable<IntWritable> values, Context context) throws IOException , InterruptedException {
        int sum = 0 ;
        for (IntWritable val : values) {
            sum += val.get();
        }
        result.set(sum);
        context.write(key,result);
    }
}

本地計算 + 本地HDFS文件

public static void main(String[] args) throws Exception{

    //如果配置好環境變量，沒有重啟機器，然后報錯找不到hadoop.home  可以手動指定
   // System.setProperty("hadoop.home.dir","E:\\hadoop\\hadoop-2.7.4");

    List<String> lists = Arrays.asList("E:\\input","E:\\output");

    Configuration configuration = new Configuration();

    Job job = new Job(configuration,"word count") ;

    job.setJarByClass(WordMain.class); // 主類
    job.setMapperClass(WordMapper.class); // Mapper
    job.setCombinerClass(WordReducer.class); //作業合成類
    job.setReducerClass(WordReducer.class); // reducer
    job.setOutputKeyClass(Text.class); // 設置作業輸出數據的關鍵類
    job.setOutputValueClass(IntWritable.class); // 設置作業輸出值類

    FileInputFormat.addInputPath(job,new Path(lists.get(0))); //文件輸入
    FileOutputFormat.setOutputPath(job,new Path(lists.get(1))); // 文件輸出

    System.exit(job.waitForCompletion(true) ? 0 : 1); //等待完成退出
}

本地計算 + 遠程HDFS文件

把遠程HDFS文件系統中的文件拉到本地來運行。

相比上面的改動點:

FileInputFormat.setInputPaths(job, "hdfs://master:9000/wcinput/");
FileOutputFormat.setOutputPath(job, new Path("hdfs://master:9000/wcoutput2/"));

注意這里是把HDFS文件拉到本地來運行，如果觀察輸出的話會觀察到jobID帶有local字樣，同時這樣的運行方式是不需要yarn的（自己停掉jarn服務做實驗）。

遠程計算 + 遠程HDFS文件

這個方式是將文件打成一個jar文件，通過Hadoop Client自動上傳到Hadoop集群，然后使用遠程HDFS文件進行計算。

java代碼：

public static void main(String[] args) throws Exception{

    Configuration configuration = new Configuration();

    configuration.set("fs.defaultFS", "hdfs://master:9000/");

    configuration.set("mapreduce.job.jar", "target/wc.jar");
    configuration.set("mapreduce.framework.name", "yarn");
    configuration.set("yarn.resourcemanager.hostname", "master");
    configuration.set("mapreduce.app-submission.cross-platform", "true");


    Job job = new Job(configuration,"word count") ;

    job.setJarByClass(WordMain2.class); // 主類
    job.setMapperClass(WordMapper.class); // Mapper
    job.setCombinerClass(WordReducer.class); //作業合成類
    job.setReducerClass(WordReducer.class); // reducer
    job.setCombinerClass(WordReducer.class); //作業合成類
    job.setOutputKeyClass(Text.class); // 設置作業輸出數據的關鍵類
    job.setOutputValueClass(IntWritable.class); // 設置作業輸出值類

    FileInputFormat.setInputPaths(job, "/opt/learning/hadoop/wordcount/*.txt");
    FileOutputFormat.setOutputPath(job, new Path("/opt/learning/output7/"));

    System.exit(job.waitForCompletion(true) ? 0 : 1); //等待完成退出
}

如果運行過程中遇到權限問題，配置執行時的虛擬機參數 -DHADOOP_USER_NAME=root 。

形如下圖：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 idea手動保存文本 IDEA 手動開啟 annotation processors IDEA手動增加lib目錄 idea手動安裝lombok插件 Windows手動安裝MySQL 手動監控Windows端口 IDEA手動編譯 --- 無法自動編譯問題十九、IDEA的pom文件手動添加依賴 Idea：手動安裝更新包 idea 中手動導入jar包