Hadoop集群安裝-CDH5(3台服務器集群)


CDH5包下載:http://archive.cloudera.com/cdh5/

主機規划:

IP

Host

部署模塊

進程

192.168.107.82

Hadoop-NN-01

NameNode

ResourceManager

NameNode

DFSZKFailoverController

ResourceManager

192.168.107.83

Hadoop-DN-01

Zookeeper-01

DataNode

NodeManager

Zookeeper

DataNode

NodeManager

JournalNode

QuorumPeerMain

192.168.107.84

Hadoop-DN-02

Zookeeper-02

DataNode

NodeManager

Zookeeper

DataNode

NodeManager

JournalNode

QuorumPeerMain

各個進程解釋:

  • NameNode
  • ResourceManager
  • DFSZKFC:DFS Zookeeper Failover Controller 激活Standby NameNode
  • DataNode
  • NodeManager
  • JournalNode:NameNode共享editlog結點服務(如果使用NFS共享,則該進程和所有啟動相關配置接可省略)。
  • QuorumPeerMain:Zookeeper主進程

目錄規划:

名稱

路徑

$HADOOP_HOME

/home/hadoopuser/hadoop-2.6.0-cdh5.6.0

Data

$ HADOOP_HOME/data

Log

$ HADOOP_HOME/logs

 

配置:

一、關閉防火牆(防火牆可以以后配置)

二、安裝JDK(略)

三、修改HostName並配置Host3台)

[root@Linux01 ~]# vim /etc/sysconfig/network
[root@Linux01 ~]# vim /etc/hosts

192.168.107.82 Hadoop-NN-01
192.168.107.83 Hadoop-DN-01 Zookeeper-01
192.168.107.84 Hadoop-DN-02 Zookeeper-01

四、為了安全,創建Hadoop專門登錄的用戶(5台)

[root@Linux01 ~]# useradd hadoopuser
[root@Linux01 ~]# passwd hadoopuser
[root@Linux01 ~]# su – hadoopuser        #切換用戶

五、配置SSH免密碼登錄(2NameNode

[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-keygen   #生成公私鑰
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoopuser@Hadoop-NN-01

-I 表示 input

~/.ssh/id_rsa.pub 表示哪個公鑰組

或者省略為:

[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id Hadoop-NN-01(或寫IP:10.10.51.231)   #將公鑰扔到對方服務器
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh-copy-id ”6000 Hadoop-NN-01”  #如果帶端口則這樣寫

注意修改Hadoop的配置文件 Hadoop-env.sh

export HADOOP_SSH_OPTS=”-p 6000”

[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01  #驗證(退出當前連接命令:exit、logout)
[hadoopuser@Linux05 hadoop-2.6.0-cdh5.6.0]$ ssh Hadoop-NN-01 –p 6000  #如果帶端口這樣寫

六、配置環境變量:vi ~/.bashrc 然后 source ~/.bashrc5台)

[hadoopuser@Linux01 ~]$ vi ~/.bashrc
# hadoop cdh5
export HADOOP_HOME=/home/hadoopuser/hadoop-2.6.0-cdh5.6.0
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

[hadoopuser@Linux01 ~]$ source ~/.bashrc  #生效

七、安裝zookeeper2DataNode

1、解壓

2、配置環境變量:vi ~/.bashrc

[hadoopuser@Linux01 ~]$ vi ~/.bashrc
# zookeeper cdh5
export ZOOKEEPER_HOME=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0
export PATH=$PATH:$ZOOKEEPER_HOME/bin

[hadoopuser@Linux01 ~]$ source ~/.bashrc  #生效

3、修改日志輸出

[hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/libexec/zkEnv.sh
56行: 找到如下位置修改語句:ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"

4、修改配置文件

[hadoopuser@Linux01 ~]$ vi $ZOOKEEPER_HOME/conf/zoo.cfg

# zookeeper
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoopuser/zookeeper-3.4.5-cdh5.6.0/data
clientPort=2181

# cluster
server.1=Zookeeper-01:2888:3888
server.2=Zookeeper-02:2888:3888

5、設置myid

(1)Hadoop-DN -01:

mkdir $ZOOKEEPER_HOME/data
echo 1 > $ZOOKEEPER_HOME/data/myid

(2)Hadoop-DN -02:

mkdir $ZOOKEEPER_HOME/data
echo 2 > $ZOOKEEPER_HOME/data/myid

6、各結點啟動:

[hadoopuser@Linux01 ~]$ zkServer.sh start

7、驗證

[hadoopuser@Linux01 ~]$ jps

3051 Jps
2829 QuorumPeerMain

8、狀態

[hadoopuser@Linux01 ~]$ zkServer.sh status

JMX enabled by default
Using config: /home/zero/zookeeper/zookeeper-3.4.5-cdh5.0.1/bin/../conf/zoo.cfg
Mode: follower

9、附錄zoo.cfg各配置項說明

 

屬性

意義

tickTime

時間單元,心跳和最低會話超時時間為tickTime的兩倍

dataDir

數據存放位置,存放內存快照和事務更新日志

clientPort

客戶端訪問端口

initLimit

配 置 Zookeeper 接受客戶端(這里所說的客戶端不是用戶連接 Zookeeper服務器的客戶端,而是 Zookeeper 服務器集群中連接到 Leader 的 Follower 服務器)初始化連接時最長能忍受多少個心跳時間間隔數。當已經超過 10 個心跳的時間(也就是 tickTime)長度后 Zookeeper 服務器還沒有收到客戶端的返回信息,那么表明這個客戶端連接失敗。總的時間長度就是 5*2000=10 秒。

syncLimit

這個配置項標識 Leader 與 Follower 之間發送消息,請求和應答時間長度,最長不能超過多少個

server.id=host:port:port

server.A=BCD

集群結點列表:

A :是一個數字,表示這個是第幾號服務器;

B :是這個服務器的 ip 地址;

C :表示的是這個服務器與集群中的 Leader 服務器交換信息的端口;

D :表示的是萬一集群中的 Leader 服務器掛了,需要一個端口來重新進行選舉,選出一個新的 Leader,而這個端口就是用來執行選舉時服務器相互通信的端口。如果是偽集群的配置方式,由於 B 都是一樣,所以不同的 Zookeeper 實例通信端口號不能一樣,所以要給它們分配不同的端口號。

 

八、安裝Hadoop,並配置(只裝1配置完成后分發給其它節點)

1、解壓

2、修改配置文件

1)修改 $HADOOP_HOME/etc/hadoop/masters

Hadoop-NN-01

2)修改 $HADOOP_HOME/etc/hadoop/slaves

Hadoop-DN-01
Hadoop-DN-02

3)修改 $HADOOP_HOME/etc/hadoop/vi core-site.xml

<configuration>
        <property>
               <name>fs.defaultFS</name>
               <value>hdfs://Hadoop-NN-01:9000</value>
               <description>定義HadoopMaster的URI和端口</description>
        </property>
        <property>
               <name>io.file.buffer.size</name>
               <value>131072</value>
               <description>用作序列化文件處理時讀寫buffer的大小</description>
        </property>
        <property>
               <name>hadoop.tmp.dir</name>
               <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/tmp</value>
               <description>臨時數據存儲目錄設定</description>
        </property>
</configuration>

(4)修改 $HADOOP_HOME/etc/hadoop/hdfs-site.xml

<configuration>
        <property>
               <name>dfs.namenode.name.dir</name>
               <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/name</value>
               <description> namenode 存放name table(fsimage)本地目錄(需要修改)</description>
        </property>
        <property>
               <name>dfs.datanode.data.dir</name>
               <value>/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/dfs/data</value>
               <description>datanode存放block本地目錄(需要修改)</description>
        </property>
        <property>
               <name>dfs.replication</name>
               <value>1</value>
               <description>文件副本個數,默認為3</description>
        </property>
        <property>
            <name>dfs.blocksize</name>
            <value>134217728</value>
            <description>塊大小128M</description>
        </property>
        <property>
            <name>dfs.permissions</name>
            <value>false</value>
            <description>是否對DFS中的文件進行權限控制(測試中一般用false)</description>
        </property>
</configuration>

 

(5)修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml

<configuration>
        <property>
               <name>yarn.resourcemanager.address</name>
               <value>Hadoop-NN-01:8032</value>
        </property>
        <property>
               <name>yarn.resourcemanager.scheduler.address</name>
               <value>Hadoop-NN-01:8030</value>
        </property>
        <property>
               <name>yarn.resourcemanager.resource-tracker.address</name>
               <value>Hadoop-NN-01:8031</value>
        </property>
        <property>
               <name>yarn.resourcemanager.admin.address</name>
               <value>Hadoop-NN-01:8033</value>
        </property>
        <property>
               <name>yarn.resourcemanager.webapp.address</name>
               <value>Hadoop-NN-01:8088</value>
        </property>
        <property>
               <name>yarn.nodemanager.aux-services</name>
               <value>mapreduce_shuffle</value>
        </property>
        <property>
               <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
               <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
</configuration>

(6)修改 $HADOOP_HOME/etc/hadoop/ mapred-site.xml

<configuration>
        <property>
               <name>mapreduce.framework.name</name>
               <value>yarn</value>
        </property>
        <property>
               <name>mapreduce.jobhistory.address</name>
               <value>Hadoop-NN-01:10020</value>
        </property>
        <property>
               <name>mapreduce.jobhistory.webapp.address</name>
               <value>Hadoop-NN-01:19888</value>
        </property>
</configuration>

7)修改 $HADOOP_HOME/etc/hadoop/hadoop-env.sh

#--------------------Java Env------------------------------
export JAVA_HOME="/usr/java/jdk1.8.0_73"
#--------------------Hadoop Env----------------------------
#export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_PREFIX="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0"
#--------------------Hadoop Daemon Options-----------------
# export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
# export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
#--------------------Hadoop Logs---------------------------
#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
#--------------------SSH PORT-------------------------------
export HADOOP_SSH_OPTS="-p 6000"        #如果你修改了SSH登錄端口,一定要修改此配置。

8)修改 $HADOOP_HOME/etc/hadoop/yarn-env.sh

#Yarn Daemon Options
#export YARN_RESOURCEMANAGER_OPTS
#export YARN_NODEMANAGER_OPTS
#export YARN_PROXYSERVER_OPTS
#export HADOOP_JOB_HISTORYSERVER_OPTS

#Yarn Logs
export YARN_LOG_DIR="/home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs"

3、分發程序

scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-01:/home/hadoopuser
scp -r /home/hadoopuser/hadoop-2.6.0-cdh5.6.0 hadoopuser@Hadoop-DN-02:/home/hadoopuser

 

4、格式化NameNode

[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop namenode -format

5、啟動JournalNode

[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/hadoopuser/hadoop-2.6.0-cdh5.6.0/logs/hadoop-puppet-journalnode-BigData-03.out

驗證JournalNode

[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ jps

9076 Jps
9029 JournalNode

6、啟動HDFS

集群啟動法:Hadoop-NN-01: start-dfs.sh

[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-dfs.sh

單進程啟動法:

<1>NameNode(Hadoop-NN-01,Hadoop-NN-02):hadoop-daemon.sh start namenode

<2>DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start datanode

<3>JournalNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start journalnode

7、啟動Yarn

<1>集群啟動

Hadoop-NN-01啟動Yarn,命令所在目錄:$HADOOP_HOME/sbin

[hadoopuser@Linux01 hadoop-2.6.0-cdh5.6.0]$ start-yarn.sh

<2>單進程啟動

ResourceManager(Hadoop-NN-01,Hadoop-NN-02):yarn-daemon.sh start resourcemanager

DataNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):yarn-daemon.sh start nodemanager

驗證(略)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM