准備工作:
hadoop3.2.0+jdk1.8+centos7+zookeeper3.4.5
以上是我搭建集群使用的基礎包
一、環境准備
| master1 | master2 | slave1 | slave2 | slave3 |
| jdk、NameNode、DFSZKFailoverController(zkfc) | jdk、NameNode、DFSZKFailoverController(zkfc) | jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain | jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain | jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain |
說明:
在hadoop集群中通常由兩個namenode組成,一個處於active狀態,一個處於stanbdy狀態,Active NameNode對外提供服務,而Standby NameNode則不對外提供服務,僅同步active namenode的狀態,以便能夠在它失敗時快速進行切換。
hadoop官方提供了兩種HDFS HA的解決方案,一種是NFS,另一種是QJM。這里我們使用簡單的QJM。在該方案中,主備NameNode之間通過一組JournalNode同步元數據信息,一條數據只要成功寫入多數JournalNode即認為寫入成功。通常配置奇數個JournalNode這里還配置了一個zookeeper集群,用於ZKFC(DFSZKFailoverController)故障轉移,當Active NameNode掛掉了,會自動切換Standby NameNode為standby狀態。
hadoop中依然存在一個問題,就是ResourceManager只有一個,存在單點故障,hadoop-3.2.0解決了這個問題,有兩個ResourceManager,一個是Active,一個是Standby,狀態由zookeeper進行協調。
將五個虛擬機分別關閉防火牆,更改主機名:
systemctl stop firewalld systemctl disabled firewalld vim /etc/hostname 在五台虛擬機依次修改,保存 master1 master2 slave1 slave2 slave3
配置hosts文件:
vim /etc/hosts #添加內容 master1 192.168.60.10 master2 192.168.60.11 slave1 192.168.60.12 slave2 192.168.60.13 slave3 192.168.60.14
配置免密登錄:
ssh-keygen -t rsa #在每台虛擬機執行
cd /root/.ssh/
cat id_rsa.pub >> authorized_keys
scp authorized_keys root@master2:/root/.ssh/
#一次執行上述步驟,最后分發 authorized_keys 文件到各個節點
二、安裝步驟
jdk1.8安裝:
1.解壓文件
tar -zxvf jdk1.8.tar.gz -C /usr/local #自己定義目錄
2.配置環境變量
vim /etc/profile export JAVA_HOME=/usr/local/jdk export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$JAVA_HOME/bin:$PATH
source /etc/profile #更新資源
java -version #驗證
zookeeper安裝:
1.解壓zookeeper
tar -zxvf zookeeper-3.4.5.tar.gz -C /usr/local/soft #自己定義目錄
2.修改配置文件
cd /usr/local/zookeeper-3.4.5/conf/ cp zoo_sample.cfg zoo.cfg vim zoo.cfg
#修改一下內容
dataDir=/usr/local/soft/zookeeper-3.4.5/tmp
在后面添加:
server.1=slave1:2888:3888 server.2=slave2:2888:3888 server.3=slave3:2888:3888
#保存退出
3.在zookeeper目錄下創建tmp文件夾
mkdir /usr/local/soft/zookeeper-3.4.5/tmp
再創建一個空文件
touch /usr/local/soft/zookeeper-3.4.5/tmp/myid
最后向該文件寫入ID
echo 1 > /usr/local/soft/zookeeper-3.4.5/tmp/myid
4.將配置好的zookeeper拷貝到其他節點(首先分別在slave2、slave3根目錄下創建一個soft目錄:mkdir /usr/local/soft/)
scp -r /usr/local/soft/zookeeper-3.4.5/ itcast06:/usr/local/soft
scp -r /usr/local/soft/zookeeper-3.4.5/ itcast07:/usr/local/soft
5.注意要修改myid內容
slave2:
echo 2 > /usr/local/soft/zookeeper-3.4.5/tmp/myid slave3: echo 3 > /usr/local/soft/zookeeper-3.4.5/tmp/myid
6.啟動zookeeperokeeper集群(三台機器都要啟動)
cd 到zookeeper/conf下
./zdServer.sh start
hadoop集群配置:
1.解壓文件
tar -zxvf hadoop-3.2.0.tar.gz -C /usr/local/soft/
2.添加環境變量
vim /etc/profile export HADOOP_HOME=/usr/local/soft/hadoop-3.2.0 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/bin:$PATH source /etc/profile #更新資源 hadoop version #驗證
3.配置hadoop-env.sh,添加JAVA_HOME
export JAVA_HOME=/usr/local/jdk
4.配置core-site.xml
<configuration> <!-- 指定hdfs的nameservice為ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://ns1</value> </property> <!-- 指定hadoop臨時目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/soft/hadoop-3.2.0/tmp</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>slave1:2181,slave2:2181,slave3:2181</value> </property> </configuration>
5.配置hdfs-site.xml
<configuration> <!--指定hdfs的nameservice為ns1,需要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <!-- ns1下面有兩個NameNode,分別是nn1,nn2 --> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <!-- nn1的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>master1:9000</value> </property> <!-- nn1的http通信地址 --> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>master1:50070</value> </property> <!-- nn2的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>master2:9000</value> </property> <!-- nn2的http通信地址 --> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>master2:50070</value> </property> <!-- 指定NameNode的元數據在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://slave1:8485;slave2:8485;slave3:8485/ns1</value> </property> <!-- 指定JournalNode在本地磁盤存放數據的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/soft/hadoop-3.2.0/journal</value> </property> <!-- 開啟NameNode失敗自動切換 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失敗自動切換實現方式 --> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行--> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <!-- 使用sshfence隔離機制時需要ssh免登陸 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔離機制超時時間 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> </configuration>
6.配置mapred-site.xml
<configuration> <!-- 指定mr框架為yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
7.配置yarn-site.xml
<configuration> <!-- 開啟RM高可靠 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id -->
<property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2,rm3</value> </property> <!-- 分別指定RM的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>slave1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>slave2</value> </property>
<property>
<name>yarn.resourcemanager.hostname.rm3</name>
<value>slave3</value>
</property> <!-- 指定zk集群地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>slave1:2181,slave2:2181,slave3:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
<!-- 防止運行mapreduce出錯根據hadoop classpath輸出決定value -->
<property>
<name>yarn.application.classpath</name>
<value>/usr/local/soft/hadoop-3.2.0/etc/hadoop:/usr/local/soft/hadoop-3.2.0/share/hadoop/common/lib/*:/usr/local/soft/hadoop-3.2.0/share/hadoop/common/*:/usr/local/soft/hadoop-3.2.0/share/hadoop/hdfs:/usr/local/soft/hadoop-3.2.0/share/hadoop/hdfs/lib/*:/usr/local/soft/hadoop-3.2.0/share/hadoop/hdfs/*:/usr/local/soft/hadoop-3.2.0/share/hadoop/mapreduce/lib/*:/usr/local/soft/hadoop-3.2.0/share/hadoop/mapreduce/*:/usr/local/soft/hadoop-3.2.0/share/hadoop/yarn:/usr/local/soft/hadoop-3.2.0/share/hadoop/yarn/lib/*:/usr/local/soft/hadoop-3.2.0/share/hadoop/yarn/*</value>
</property>
</configuration>
8.配置workers
slave1
slave2
slave3
9.配置sbin/start-yarn.sh、sbin/stop-yarn.sh 和 sbin/start-dfs.sh sbin/stop-dfs.sh
dfs添加:
HDFS_NAMENODE_USER=root HDFS_DATANODE_USER=root HDFS_JOURNALNODE_USER=root HDFS_ZKFC_USER=root
yarn添加:
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
10.將Hadoop3.2.0分發到各個節點
scp -r hadoop3.2.0 root@master2:/usr/local/soft/ scp -r hadoop3.2.0 root@slave1:/usr/local/soft/ scp -r hadoop3.2.0 root@slave2:/usr/local/soft/ scp -r hadoop3.2.0 root@slave3:/usr/local/soft/
三、啟動集群
zookeeper集群已經啟動
cd /usr/local/soft/zookeeper-3.4.5/bin/ ./zkServer.sh start #查看狀態:一個leader,兩個follower ./zkServer.sh status




啟動journalnode(分別在在slave1、slave2、slave3上執行)
cd /usr/local/soft/hadoop-3.2.0 sbin/hadoop-daemon.sh start journalnode #運行jps命令檢驗,多了JournalNode進程
格式化HDFS
#在master1上執行命令: hdfs namenode -format #格式化后會在根據core-site.xml中的hadoop.tmp.dir配置生成個文件,這里我配置的是/usr/local/soft/hadoop-3.2.0/tmp,然后將/usr/local/soft/hadoop-3.2.0/tmp拷貝到itcast02的/usr/local/soft/hadoop-3.2.0/tmp下。
scp -r tmp/ root@master2:/usr/local/soft/hadoop-3.2.0 #分發到各個節點
格式化ZK(在master1)上執行
hdfs zkfc -formatZK
啟動集群(在master1)執行
sbin/start-dfs.sh
sbin/start-yarn.sh








驗證YARN:
運行一下hadoop提供的demo中的WordCount程序:
cd到hadoop目錄下:
hadoop fs -put /etc/profile /
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /profile /out
