Linux下Hadoop2.7.3集群環境的搭建
本文旨在提供最基本的,可以用於在生產環境進行Hadoop、HDFS分布式環境的搭建,對自己是個總結和整理,也能方便新人學習使用。
基礎環境
JDK的安裝與配置
現在直接到Oracle官網(http://www.oracle.com/)尋找JDK7的安裝包不太容易,因為現在官方推薦JDK8。找了半天才找到JDK下載列表頁的地址(http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html)。因為選擇Linux操作系統作為部署環境,所以選擇64位的版本。我選擇的是jdk-7u79-linux-x64.gz。
這里直接使用rpm包直接安裝
rpm –ivh jdk-7u131-linux-x64.rpm
回到/home/hadoop目錄,配置java環境變量,命令如下:
在.bash_profile中加入以下內容:
立刻讓java環境變量生效,執行如下命令:
source .bash_profile
最后驗證java是否安裝配置正確:
Host
由於我搭建Hadoop集群包含三台機器,所以需要修改調整各台機器的hosts文件配置,命令如下:
vi /ets/hosts
如果沒有足夠的權限,可以切換用戶為root。
三台機器的內容統一增加以下host配置:
SSH信任
由於NameNode與DataNode之間通信,使用了SSH,所以需要配置免登錄,使slave可以ssh免密登陸master。
具體配置可參考:
http://www.cnblogs.com/chenjunjie/p/4000228.html
文件目錄
為了便於管理,給Master的hdfs的NameNode、DataNode及臨時文件,在用戶目錄下創建目錄:
/home/hadoop/hdfs/name
/home/hadoop/hdfs/data
/home/hadoop/hdfs/tmp
然后將這些目錄通過scp命令拷貝到Slave1和Slave2的相同目錄下。
Hadoop的安裝與配置
下載
首先到Apache官網(http://www.apache.org/dyn/closer.cgi/hadoop/common/)下載Hadoop,從中選擇推薦的下載鏡像(http://mirrors.hust.edu.cn/apache/hadoop/common/),我選擇hadoop-2.7.3的版本
使用以下命令hadoop-2.7.2.tar.gz解壓縮到/home/hadoop目錄
tar -zxvf hadoop-2.7.3.tar.gz
環境變量
回到/home/hadoop目錄,配置hadoop環境變量,命令如下:
vi .bash_profile
在.bash_profile中加入以下內容:
export HADOOP_DEV_HOME=/home/hadoop/hadoop-2.7.3
export PATH=$PATH:$HADOOP_DEV_HOME/bin
export PATH=$PATH:$HADOOP_DEV_HOME/sbin
export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
export YARN_HOME=${HADOOP_DEV_HOME}
export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
export JAVA_LIBRARY_PATH='/home/hadoop/hadoop-2.7.3/lib/native'
export HBASE_HOME=/home/hadoop/hbase-1.2.4
export PATH=$PATH:$HBASE_HOME/bin
立刻讓hadoop環境變量生效,執行如下命令:
source .bash_profile
Hadoop的配置
進入hadoop-2.7.3的配置目錄:
cd home/hadoop/hadoop-2.7.3/etc/hadoop
依次修改core-site.xml、hdfs-site.xml、mapred-site.xml及yarn-site.xml文件。
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>Master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Master:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>Master:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
在hadoop-env.sh中加入如下配置:
export JAVA_HOME=/usr/java/jdk1.7.0_131
在masters中加入
master
在slave中加入
slave1
slave2
樣例如圖:
最后,將整個hadoop-2.7.3文件夾及其子文件夾使用scp復制到兩台Slave的相同目錄中:
scp -r hadoop-2.7.3 hadoop@Slave1:/home/hadoop/
scp -r hadoop-2.7.3 hadoop@Slave2:/home/hadoop/
運行Hadoop
運行HDFS
格式化NameNode
執行命令:
hadoop namenode -format
執行過程如下圖:
最后的執行結果如下圖:
啟動NameNode
hadoop-daemon.sh start namenode
執行結果如下圖:
最后在Master上執行ps -ef | grep hadoop,得到如下結果:
在Master上執行jps命令,得到如下結果:
說明NameNode啟動成功。
啟動DataNode
執行命令如下:
hadoop-daemons.sh start datanode
執行結果如下:
在Slave1上執行命令,如下圖:
在Slave2上執行命令,如下圖:
說明Slave1和Slave2上的DataNode運行正常。
以上啟動NameNode和DataNode的方式,可以用start-dfs.sh腳本替代:
運行YARN
運行Yarn也有與運行HDFS類似的方式。啟動ResourceManager使用以下命令:
yarn-daemon.sh start resourcemanager
批量啟動多個NodeManager使用以下命令:
yarn-daemons.sh start nodemanager
以上方式我們就不贅述了,來看看使用start-yarn.sh的簡潔的啟動方式:
在Master上執行jps:
說明ResourceManager運行正常。
在兩台Slave上執行jps,也會看到NodeManager運行正常,如下圖:
測試Hadoop
測試HDFS
最后測試下親手搭建的Hadoop集群是否執行正常,測試的命令如下圖所示:
測試YARN
可以訪問YARN的管理界面,驗證YARN,如下圖所示:
測試mapreduce
本人比較懶,不想編寫mapreduce代碼。幸好Hadoop安裝包里提供了現成的例子,在Hadoop的share/hadoop/mapreduce目錄下。運行例子:
配置運行Hadoop中遇見的問題
yarn.nodemanager.aux-services錯誤
在執行start-yarn.sh腳本啟動YARN時,在Slave1和Slave2機器上執行jps命令未發現NodeManager進程,於是登錄Slave機器查看日志,發現以下錯誤信息:
參考網上的解決方式,是因為yarn-site.xml文件中yarn.nodemanager.aux-services對應的值mapreduce.shuffle已經被替換為mapreduce_shuffle。有些參考用書上也錯誤的寫為另一個值mapreduce-shuffle。