安裝hadoop要先做以下准備:
1.jdk,安裝教程在
http://www.cnblogs.com/stardjyeah/p/4640917.html
2.ssh無密碼驗證,配置教程在
http://www.cnblogs.com/stardjyeah/p/4641524.html
3.linux靜態ip配置,教程在
http://www.cnblogs.com/stardjyeah/p/4640691.html
准備好以后就可以進行hadoop 2.5.0安裝和配置了
1) 解壓hadoop到自己的hadoop目錄
2) 2.X版本較1.X版本改動很大,主要是用Hadoop MapReduceV2(Yarn) 框架代替了一代的架構,其中JobTracker 和 TaskTracker 不見了,取而代之的是 ResourceManager, ApplicationMaster 與 NodeManager 三個部分,而具體的配置文件位置與內容也都有了相應變化,具體的可參考文獻:http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/
3) hadoop/etc/hadoop/hadoop-env.sh 與 hadoop/etc/hadoop/yarn-env.sh來配置兩個文件里的JAVA_HOME
4) 配置etc/hadoop/core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop/hadoop-2.5.0/tmp</value> </property> </configuration>
5) 配置etc/hadoop/hdfs-site.xml (注意:這里需要自己手動用mkdir創建name和data文件夾,具體位置也可以自己選擇,其中dfs.replication的值建議配置為與分布式 cluster 中實際的 DataNode 主機數一致。)
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/hadoop/hadoop-2.5.0/hdfs/name</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/hadoop/hadoop-2.5.0/hdfs/data</value> <final>true</final> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.rpc-address</name> <value>localhost:9000</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>localhost:50090</value> </property> </configuration>
6) 配置etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>localhost:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>localhost:19888</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/home/hadoop/hadoop/hadoop-2.5.0/mr-history/tmp</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/home/hadoop/hadoop/hadoop-2.5.0/mr-history/done</value> </property> </configuration>
7) 配置etc/hadoop/yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>localhost:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>localhost:18030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>localhost:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>localhost:18041</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>localhost:8088</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/home/hadoop/hadoop/hadoop-2.5.0/mynode/my</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/home/hadoop/hadoop/hadoop-2.5.0/mynode/logs</value> </property> <property> <name>yarn.nodemanager.log.retain-seconds</name> <value>10800</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/logs</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir-suffix</name> <value>logs</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>-1</value> </property> <property> <name>yarn.log-aggregation.retain-check-interval-seconds</name> <value>-1</value> </property> </configuration>
8) 啟動測試
先格式化namenode:bin/hdfs dfs namenode –format
如果沒有報錯則表示成功
啟動hdfs: sbin/start-dfs.sh
Jps查看是否啟動了namenode,datanode, SecondaryNameNode
啟動yarn:start-yarn.sh
Jps查看是否啟動了NodeManager, ResourceManager
然后登陸8088端口看是否會出現如下頁面:
登陸50070看是否會出現如下頁面:
登陸50090看是否會出現如下頁面:
如果頁面都出現,則表示hadoop安裝成功!
下面測試一下hdfs文件系統
建立一個目錄:bin/hdfs dfs -mkdir /TestDir/
上傳一個文件:bin/hdfs dfs -put ./test.txt /TestDir/
上傳成功,下面進行wordcount測試
1.dfs上創建input目錄
$bin/hadoop fs -mkdir -p input
2.把hadoop目錄下的test.txt拷貝到dfs新建的input里
$bin/hadoop fs -copyFromLocal test.txt input
3.運行WordCount
$bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.5.0-sources.jar org.apache.hadoop.examples.WordCount input output
4.運行完畢后,查看單詞統計結果
$bin/hadoop fs -cat output/*
假如程序的輸出路徑為output,如果該文件夾已經存在,先刪除
$bin/hadoop dfs -rmr output
查看wordcount結果如下: