Wordcount on YARN 一個MapReduce示例

本文轉載自查看原文 2014-06-01 22:21 7595

Hadoop YARN版本：2.2.0

關於hadoop yarn的環境搭建可以參考這篇博文：Hadoop 2.0安裝以及不停集群加datanode

hadoop hdfs yarn偽分布式運行，有如下進程

1320 DataNode
1665 ResourceManager
1771 NodeManager
1195 NameNode
1487 SecondaryNameNode

寫一個mapreduce示例，在yarn上跑，wordcount數單詞示例

代碼在github上：https://github.com/huahuiyang/yarn-demo

步驟一

我們要處理的輸入如下，每行包含一個或多個單詞，空格分開。可以用hadoop fs -put ... 把本地文件放到hdfs上去，方便mapreduce程序讀取

hadoop yarn
mapreduce
hello redis
java hadoop
hello world
here we go

wordcount程序希望完成數單詞任務，輸出格式是 <單詞出現次數>

步驟二

新建一個工程，工程結構如下，這個是個maven管理的工程

源代碼如下：

pom.xml文件

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>hadoop-yarn</groupId>
    <artifactId>hadoop-demo</artifactId>
    <version>0.0.1-SNAPSHOT</version>

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.1.1-beta</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.1.1-beta</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-common</artifactId>
            <version>2.1.1-beta</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>2.1.1-beta</version>
        </dependency>
    </dependencies>
</project>

package com.yhh.mapreduce.wordcount;
import java.io.IOException;

import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,IntWritable>  {

    @Override
    public void map(LongWritable key, Text value,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        
        String line = value.toString();
        if(line != null) {
            String[] words = line.split(" ");
            for(String word:words) {
                output.collect(new Text(word), new IntWritable(1));
            }
        }
        
    }

}

package com.yhh.mapreduce.wordcount;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{

    @Override
    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int count = 0;
        while(values.hasNext()) {
            values.next();
            count++;
        }
        output.collect(key, new IntWritable(count));
    }

}

package com.yhh.mapreduce.wordcount;

import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;

public class WordCount {
    public static void main(String[] args) throws IOException {
        if(args.length != 2) {
            System.err.println("Error!");
            System.exit(1);
        }
        
        JobConf conf = new JobConf(WordCount.class);
        conf.setJobName("word count mapreduce demo");
        
        conf.setMapperClass(WordCountMapper.class);
        conf.setReducerClass(WordCountReducer.class);
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
        
        FileInputFormat.addInputPath(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));
        
        JobClient.runJob(conf);
        
    }

}

步驟三

打包發布成jar，右擊java工程，選擇Export...，然后選擇jar file生成目錄，這邊發布成wordcount.jar，然后上傳到hadoop集群

[root@hadoop-namenodenew ~]# ll wordcount.jar 
-rw-r--r--. 1 root root 4401 6月   1 22:05 wordcount.jar

運行mapreduce任務。命令如下

hadoop jar ~/wordcount.jar com.yhh.mapreduce.wordcount.WordCount data.txt /wordcount/result

可以用hadoop job -list看任務運行情況，運行成功大概會有如下輸出

14/06/01 22:06:25 INFO mapreduce.Job: The url to track the job: http://hadoop-namenodenew:8088/proxy/application_1401631066126_0003/
14/06/01 22:06:25 INFO mapreduce.Job: Running job: job_1401631066126_0003
14/06/01 22:06:33 INFO mapreduce.Job: Job job_1401631066126_0003 running in uber mode : false
14/06/01 22:06:33 INFO mapreduce.Job:  map 0% reduce 0%
14/06/01 22:06:40 INFO mapreduce.Job:  map 50% reduce 0%
14/06/01 22:06:41 INFO mapreduce.Job:  map 100% reduce 0%
14/06/01 22:06:47 INFO mapreduce.Job:  map 100% reduce 100%
14/06/01 22:06:48 INFO mapreduce.Job: Job job_1401631066126_0003 completed successfully
14/06/01 22:06:49 INFO mapreduce.Job: Counters: 43

然后mapreduce輸出的任務結果如下，單詞按照字典序排序

hadoop fs -cat /wordcount/result/part-00000

go    1
hadoop    2
hello    2
here    1
java    1
mapreduce    1
redis    1
we    1
world    1
yarn    1

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 MapReduce 編程模型 & WordCount 示例初學Hadoop之圖解MapReduce與WordCount示例分析 Hadoop 6、第一個mapreduce程序 WordCount MapReduce和yarn hadoop之MapReduce WordCount分析 MapReduce實現WordCount mapreduce(1)--wordcount的實現 MapReduce程序（一）——wordCount 實驗6：Mapreduce實例——WordCount 三.hadoop mapreduce之WordCount例子