首先要下載mongo-hadoop adapter
git clone https://github.com/mongodb/mongo-hadoop.git
git checkout release-1.0
然后進入mongo-hadoop目錄,找到build.sbt將 hadoopRelease in ThisBuild 修改成如下所示:
hadoopRelease in ThisBuild := "0.20"
然后運行 ./sbt package(關於sbt https://github.com/harrah/xsbt/wiki)
運行時需要翻牆才行。
運行結束之后會再core/target目錄下生成一個jar文件mongo-hadoop-core_0.20.205.0-1.0.1.jar,然后下載mongodb的驅動包。
wget --no-check-certificate https://github.com/downloads/mongodb/mongo-java-driver/mongo-2.7.3.jar 下載完之后就可以開始開發mongo-hadoop的程序了。
運行自帶示例: 首先將數據導入到mongodb中,命令如下。
./sbt load-sample-data
然后再eclipe中新建一個項目,例如treasury,將mongo-hadoop/example/treasury_field中的源文件和資源文件復制到新建的項目下。
如圖所示:
然后修改mongo-treasury_yield.xml文件中mongodb的url和存放collection
<
property
>
<!-- If you are reading from mongo, the URI -->
< name >mongo.input.uri </ name >
< value >mongodb://127.0.0.1/mongo_hadoop.yield_historical.in </ value >
</ property >
< property >
<!-- If you are writing to mongo, the URI -->
< name >mongo.output.uri </ name >
< value >mongodb://127.0.0.1/mongo_hadoop.yield_historical.out </ value >
</ property >
<!-- If you are reading from mongo, the URI -->
< name >mongo.input.uri </ name >
< value >mongodb://127.0.0.1/mongo_hadoop.yield_historical.in </ value >
</ property >
< property >
<!-- If you are writing to mongo, the URI -->
< name >mongo.output.uri </ name >
< value >mongodb://127.0.0.1/mongo_hadoop.yield_historical.out </ value >
</ property >
然后修改TreasuryYieldXMLConfig.java如下:
Configuration.addDefaultResource( "resources/mongo-treasury_yield.xml");
Configuration.addDefaultResource( "resources/mongo-defaults.xml" );
Configuration.addDefaultResource( "resources/mongo-defaults.xml" );
后將項目打包成jar文件。
運行 hadoop jar treasury.jar com.mongodb.hadoop.treasury.TreasuryXMLConfig 即可運行hadoop程序。運行結果如下圖:mongodb中的數據。