1. Hadoop集群介紹
Hadoop集群部署,就是以Cluster mode方式進行部署。
Hadoop的節點構成如下:
HDFS daemon: NameNode, SecondaryNameNode, DataNode
YARN damones: ResourceManager, NodeManager, WebAppProxy
MapReduce Job History Server
2. 集群部署
本次測試的分布式環境為:Master 1台 (test166),Slave 1台(test167)
2.1 首先在各節點上安裝Hadoop
安裝方法參照 Hadoop系列之(一):Hadoop單機部署
2.2 在各節點上設置主機名
# cat /etc/hosts 10.86.255.166 test166 10.86.255.167 test167
2.3 在各節點上設置SSH無密碼登錄
2.4 設置Hadoop的環境變量
# vi /etc/profile export HADOOP_HOME=/usr/local/hadoop-2.7.0 export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/Hadoop
讓設置生效
# source /etc/profile
2.5 Hadoop設定
2.5.1 在Master節點的設定文件中指定Slave節點
# vi etc/hadoop/slaves test167
2.5.2 Master,Slave所有節點共同設定
# vi etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://test166:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop-2.7.0/tmp</value> </property> </configuration>
# vi etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
# vi etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
2.5.3 在各節點指定HDFS文件存儲的位置(默認是/tmp)
Master節點: namenode
創建目錄並賦予權限
# mkdir -p /usr/local/hadoop-2.7.0/tmp/dfs/name # chmod -R 777 /usr/local/hadoop-2.7.0/tmp
# vi etc/hadoop/hdfs-site.xml <property> <name>dfs.namenode.name.dir</name> <value>file:///usr/local/hadoop-2.7.0/tmp/dfs/name</value> </property>
Slave節點:datanode
創建目錄並賦予權限
# mkdir -p /usr/local/hadoop-2.7.0/tmp/dfs/data # chmod -R 777 /usr/local/hadoop-2.7.0/tmp
# vi etc/hadoop/hdfs-site.xml <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/local/hadoop-2.7.0/tmp/dfs/data</value> </property>
2.5.4 YARN設定
Master節點: resourcemanager
# vi etc/hadoop/yarn-site.xml <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>test166</value> </property> </configuration>
Slave節點: nodemanager
# vi etc/hadoop/yarn-site.xml <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>test166</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
2.5.5 Master上啟動job history server,Slave節點上指定
Slave節點:
# vi etc/hadoop/mapred-site.xml <property> <name>mapreduce.jobhistory.address</name> <value>test166:10020</value> </property>
2.5.6 格式化HDFS(Master,Slave)
# hadoop namenode -format
2.5.7 在Master上啟動daemon,Slave上的服務會一起啟動
啟動HDFS
# sbin/start-dfs.sh
啟動YARN
# sbin/start-yarn.sh
啟動job history server
# sbin/mr-jobhistory-daemon.sh start historyserver
確認
Master節點:
# jps
Slave節點:
# jps
2.5.8 創建HDFS
# hdfs dfs -mkdir /user # hdfs dfs -mkdir /user/test22
2.5.9 拷貝input文件到HDFS目錄下
# hdfs dfs -put etc/hadoop /user/test22/input
查看
# hdfs dfs -ls /user/test22/input
2.5.10 執行hadoop job
# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep /user/test22/input output 'dfs[a-z.]+'
確認執行結果
# hdfs dfs -cat output/*
3. 后記
本次集群部署主要是為了測試驗證,生產環境中的HA,安全等設定,接下來會進行介紹。