【hadoop2.6.0】用C++ 編寫mapreduce

本文轉載自查看原文 2015-01-07 11:08 3855 hadoop2.6.0

hadoop通過hadoop streaming 來實現用非Java語言寫的mapreduce代碼。對於一個一點Java都不會的我來說，這真是個天大的好消息。

官網上hadoop streaming的介紹在：http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStreaming.html

我們用wordcount的例子來說明，輸入文件我用的是從網上下載的哈利波特第七部的英文版，命名為h.txt

用C++寫map程序，只要能夠從標准輸入中讀取信息，並且能用標准輸出來輸出<key, value>鍵值對就行。

對於wordcount單詞計數來說，map程序非常簡單，只要把每個單詞分別輸出后面再輸出個1就行，表示每個單詞出現了1次

wordcount_map.cpp程序如下：

#include <iostream>
#include <string>
using namespace std;

int main(int argc, char** argv)
{
    string word;
    while(cin >> word)
    {
        cout << word << "/t" << "1" << endl;
    }
    return 0;
}

reduce程序要能夠讀取map的輸出鍵值對，並且把key值（單詞）相同的鍵值對做整合，並且輸出整合后結果

wordcount_reduce.cpp程序如下：

#include <iostream>
#include <string>
#include <map>
using namespace std;

int main(int argc, char** argv)
{
    string key, num;
    map<string, int> count; 
    map<string, int>::iterator it;
    while(cin >> key >> num)
    {
        it = count.find(key);
        if(it != count.end())
        {
            it->second++;
        }
        else
        {
            count.insert(make_pair(key, 1));
        }
    }

    for(it = count.begin(); it != count.end(); it++)
    {
        cout << it->first << "/t" << it->second << endl;
    }
    return 0;
}

把兩個.cpp文件編譯為可執行文件，並且把這兩個可執行文件放在hadoop根目錄下

g++ -o mapperC wordcount_map.cpp
g++ -o reduceC wordcount_reduce.cpp

上傳待處理文件h.txt到 hdfs 的 /user/kzy/input中

bin/hdfs dfs -put h.txt  /user/kzy/input

要運行hadoop streaming需要hadoop-streaming-2.6.0.jar，位置在hadoop-2.6.0/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar 開始我各種運行不了，就是因為新版本里面文件的位置和以前不一樣了。

執行mapreduce,里面的選項我並不是完全理解，但是這樣可以正常運行。注意，老版本里的-jobconf 已經改名叫 -D 了

bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar  \
-D  mapred.job.name="word count~"  \
-input /user/kzy/input/h.txt 
-output /user/output/c++_out  \
-mapper ./mapperC \ 
-reducer ./reduceC  \
-file mapperC  -file reduceC

查看結果，sort中 -k 2 表示用以tab為分隔符的第二個字段來排序 -n表示用數字形式排序 -r表示從大到小排序顯示結果前20行

bin/hadoop dfs -cat /user/output/c++_out/* | sort -k 2 -n -r|head -20

結果如下：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hadoop2.6.0安裝 — 集群 Hadoop2.6.0子項目hadoop-mapreduce-examples的簡單介紹使用命令行編譯打包運行自己的MapReduce程序 Hadoop2.6.0 【hadoop2.6.0】利用Hadoop的 Java API Hadoop2.6.0在CentOS 7中的集群搭建【hadoop2.6.0】安裝+例子運行搭建hadoop2.6.0 HA及YARN HA hadoop2.6.0 --- 64位源代碼 Ubuntu + hadoop2.6.0下安裝Hive Hadoop2.6.0安裝—單機/偽分布