配置 hadoop+yarn+hbase+storm+kafka+spark+zookeeper 高可用集群,同時安裝相關組建:JDK,MySQL,Hive,Flume
文章目錄
環境介紹
節點介紹
-
虛擬機數量:8 台
-
操作系統版本:CentOS-7-x86_64-Minimal-1611.iso
每台虛擬機的配置如下:
虛擬機名稱 | CPU核心數 | 內存(G) | 硬盤(G) | 網卡 |
---|---|---|---|---|
hadoop1 | 2 | 8 | 100 | 2 |
hadoop2 | 2 | 8 | 100 | 2 |
hadoop3 | 2 | 8 | 100 | 2 |
hadoop4 | 2 | 8 | 100 | 2 |
hadoop5 | 2 | 8 | 100 | 2 |
hadoop6 | 2 | 8 | 100 | 2 |
hadoop7 | 2 | 8 | 100 | 2 |
hadoop8 | 2 | 8 | 100 | 2 |
集群介紹
8節點Hadoop+Yarn+Spark+Hbase+Kafka+Storm+ZooKeeper高可用集群搭建:
集群 | 虛擬機節點 |
---|---|
HadoopHA集群 | hadoop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
YarnHA集群 | hadoop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
ZooKeeper集群 | hadoop3,hadoop4,hadoop5 |
Hbase集群 | hadoop3,hadoop4,hadoop5,hadoop6,hadoop7 |
Kafka集群 | hadoop6,hadoop7,hadoop8 |
Storm集群 | hadoop3,hadoop4,hadoop5,hadoop6,hadoop7 |
SparkHA集群 | hadooop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
集群詳細規划:
虛擬機名稱 | IP | 安裝軟件 | 進程 | 功能 |
---|---|---|---|---|
hadoop1 | 59.68.29.79 | jdk,hadoop,mysql | NameNode,ResourceManeger,DFSZKFailoverController(zkfc),master(spark) | hadoop的NameNode節點,spark的master節點,yarn的ResourceManeger節點 |
hadoop2 | 10.230.203.11 | jdk,hadoop,spark | NameNode,ResourceManeger,DFSZKFailoverController(zkfc),worker(spark) | hadoop(yarn)的容災節點,spark的容災節點 |
hadoop3 | 10.230.203.12 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HMaster,…(storm),worker(spark) | storm,hbase,zookeeper的主節點 |
hadoop4 | 10.230.203.13 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HRegionServer,…(storm),worker(spark) | |
hadoop5 | 10.230.203.14 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HRegionServer,…(storm),worker(spark) | |
hadoop6 | 10.230.203.15 | jdk,hadoop,hbase,storm,kafka,spark | DataNode,NodeManager,journalnode,kafka,HRegionServer,…(storm),worker(spark) | kafka的主節點 |
hadoop7 | 10.230.203.16 | jdk,hadoop,hbase,storm,kafka,spark | DataNode,NodeManager,journalnode,kafka,HRegionServer,…(storm),worker(spark) | |
hadoop8 | 10.230.203.17 | jdk,hadoop,kafka,spark | DataNode,NodeManager,journalnode,kafka,worker(spark) |
軟件版本介紹
-
JDK版本: jdk-8u65-linux-x64.tar.gz
-
hadoop版本: hadoop-2.7.6.tar.gz
-
zookeeper版本: zookeeper-3.4.12.tar.gz
-
hbase版本: hbase-1.2.6-bin.tar.gz
-
Storm版本: apache-storm-1.1.3.tar.gz
-
kafka版本: kafka_2.11-2.0.0.tgz
-
MySQL版本: mysql-5.6.41-linux-glibc2.12-x86_64.tar.gz
-
hive版本: apache-hive-2.3.3-bin.tar.gz
-
Flume版本: apache-flume-1.8.0-bin.tar.gz
-
Spark版本: spark-2.3.1-bin-hadoop2.7.tgz
前期准備
相關配置
每台主機節點都進行相同設置
新建用戶 centos
千萬注意:不要在root權限下配置集群
- 新建 centos 用戶組
$> groupadd centos
- 1
- 新建用戶 centos,並將該用戶添加到用戶組 centos
$> useradd centos -g centos
- 1
- 為 centos 用戶設置密碼
$> passwd centos
- 1
添加sudo權限
- 切換到root用戶,修改 /etc/sudoers 文件
$> nano /etc/sudoers 添加如下語句: ## Allow root to run any commands anywhere root ALL=(ALL) ALL centos ALL=(ALL) ALL
- 1
- 2
- 3
- 4
- 5
- 6
更改用戶名
- 進入 /etc/hostname 下,將原來的內容刪除,添加新的用戶名
$> sudo nano /etc/hostname 用戶名:hadoop1,hadoop2.....
- 1
- 2
- 3
主機名與IP映射
- 進入 /etc/hosts,將原來的內容刪除,添加主機節點之間的相互映射
$> sudo nano /etc/hosts 添加內容如下: 127.0.0.1 localhost 59.68.29.79 hadoop1 10.230.203.11 hadoop2 10.230.203.12 hadoop3 10.230.203.13 hadoop4 10.230.203.14 hadoop5 10.230.203.15 hadoop6 10.230.203.16 hadoop7 10.230.203.17 hadoop8
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
顯示當前文件的絕對路徑
命令:pwd。形如 ~ 轉換為 /home/centos。方便確定當前文件的路徑
- 進入 /etc/profile 中進行配置
[centos@hadoop1 ~]$ sudo nano /etc/profile 在末尾添加: export PS1='[\u@\h `pwd`]\$' // source /etc/profile 馬上生效 [centos@hadoop1 /home/centos]$
- 1
- 2
- 3
- 4
- 5
- 6
- 7
ssh免密登錄
hadoop1 和 hadoop2 是容災節點(解決單點故障問題),所以這兩個主機除了能互相訪問之外,還需要登錄其他主機節點,可以免密登錄
- 檢查是否安裝了ssh相關軟件包(openssh-server + openssh-clients + openssh)
[centos@hadoop1 /home/centos]$ yum list installed | grep ssh
- 1
- 檢查是否啟動了sshd進程
[centos@hadoop1 /home/centos]$ ps -Af | grep sshd
- 1
- 在hadoop1~hadoop8主機節點的 ~(/home/centos) 目錄下創建 .ssh 文件目錄,並修改權限
[centos@hadoop1 /home/centos]$ mkdir .ssh [centos@hadoop1 /home/centos]$ chmod 700 ~/.ssh
- 1
- 2
- 在hadoop1主機上生成秘鑰對,追加公鑰到~/.ssh/authorized_keys文件中,並修改authorized_keys文件的權限為644(centos系統)
//生成秘鑰對 [centos@hadoop1 /home/centos]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa //進入 ~/.ssh 文件夾下 [centos@hadoop1 /home/centos]$ cd ~/.ssh //追加公鑰到~/.ssh/authorized_keys文件中 [centos@hadoop1 /home/centos/.ssh]$ cat id_rsa.pub >> authorized_keys // 修改authorized_keys文件的權限為644 [centos@hadoop1 /home/centos/.ssh]$ chmod 644 authorized_keys
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 將hadoop1的公鑰文件id_rsa.pub遠程復制給其他7台主機節點,並放置在/home/centos/.ssh/authorized_keys下
//重名名 [centos@hadoop2 /home/centos/.ssh]$ mv id_rsa.pub id_rsa_hadoop1.pub [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop2:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop3:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop4:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop5:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop6:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop7:/home/centos/.ssh/authorized_keys [centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop8:/home/centos/.ssh/authorized_keys
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 在hadoop2主機上生成秘鑰對。為了與hadoop1的公鑰區分,重命名為 id_rsa_hadoop2.pub。追加公鑰到~/.ssh/authorized_keys文件中,並分發給其他7台主機節點
//生成秘鑰對 [centos@hadoop2 /home/centos]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa //重名名 [centos@hadoop2 /home/centos/.ssh]$ mv id_rsa.pub id_rsa_hadoop2.pub //追加公鑰到~/.ssh/authorized_keys文件中 [centos@hadoop1 /home/centos/.ssh]$ cat id_rsa_hadoop2.pub >> authorized_keys //將authorized_keys分發給其他節點 [centos@hadoop1 /home/centos/.ssh]$ scp authorized_keys centos@hadoop:/home/centos/.ssh/ ... 分發給其他節點
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
關閉防火牆
為了保證集群正常啟動,先要關閉各台主機的防火牆,一些命令如下:
[cnetos 6.5之前的版本] $>sudo service firewalld stop //停止服務 $>sudo service firewalld start //啟動服務 $>sudo service firewalld status //查看狀態 [centos7] $>sudo systemctl enable firewalld.service //"開機啟動"啟用 $>sudo systemctl disable firewalld.service //"開機自啟"禁用 $>sudo systemctl start firewalld.service //啟動防火牆 $>sudo systemctl stop firewalld.service //停止防火牆 $>sudo systemctl status firewalld.service //查看防火牆狀態 [開機自啟] $>sudo chkconfig firewalld on //"開啟自啟"啟用 $>sudo chkconfig firewalld off //"開啟自啟"禁用
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
兩個批處理腳本
提示:為了全局可用,腳本都放在 /usr/local/bin 目錄下。只在hadoop1和hadoop2節點配置
//以本地用戶身份創建xcall.sh $>touch ~/xcall.sh //centos //將其復制到 /usr/local/bin 目錄下 $>sudo mv xcall.sh /usr/local/bin //修改權限 $>sudo chmod a+x xcall.sh //添加腳本 $>sudo nano xcall.sh
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
批分發指令腳本(xcall.sh)
#!/bin/bash params=$@ i=1 for (( i=1 ; i <= 8 ; i = $i + 1 )) ; do echo ============= s$i $params ============= ssh hadoop$i "$params" done
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
批同步腳本(xsync.sh):類似於 scp 指令
#!/bin/bash if [[ $# -lt 1 ]] ; then echo no params ; exit ; fi p=$1 #echo p=$p dir=`dirname $p` #echo dir=$dir filename=`basename $p` #echo filename=$filename cd $dir fullpath=`pwd -P .` #echo fullpath=$fullpath user=`whoami` for (( i = 1 ; i <= 8 ; i = $i + 1 )) ; do echo ======= hadoop$i ======= rsync -lr $p ${user}@hadoop$i:$fullpath done ;
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
集群環境搭建
安裝JDK
-
准備JDK:jdk-8u65-linux-x64.tar.gz,將其上傳到主機hadoop1的 /home/centos/localsoft 目錄下,該目錄用於存放所有需要安裝的軟件安裝包
-
在根目錄下(/)新建一個 soft 文件夾,並將該文件夾的用戶組權限和用戶權限改為 centos,該文件夾下為所有需要安裝的軟件
//創建soft文件夾 [centos@hadoop1 /home/centos]$ sudo mkdir /soft //修改權限(centosmin0是自己的本機用戶名) [centos@hadoop1 /home/centos]$ sudo chown centos:centos /soft
- 1
- 2
- 3
- 4
- 5
- 解壓 jdk-8u65-linux-x64.tar.gz 到 /soft 目錄下,並創建符號鏈接
// 從 /home/centos/localsoft 下解壓到 /soft [centos@hadoop1 /home/centos/localsoft]$ tar -xzvf jdk-8u65-linux-x64.tar.gz -C /soft // 創建符號鏈接 [centos@hadoop1 /soft]$ ln -s /soft/jdk1.8.0_65 jdk
- 1
- 2
- 3
- 4
- 5
- 在 /etc/profile 文件中配置環境變量,同時 source /etc/profile,使其立即生效
// 進入profile [centos@hadoop1 /home/centos]$ sudo nano /etc/profile // 環境變量 # jdk export JAVA_HOME=/soft/jdk export PATH=$PATH:$JAVA_HOME/bin // source 立即生效 [centos@hadoop1 /home/centos]$ source /etc/profile
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 檢驗是否安裝配置成功
[centos@hadoop1 /home/centos]$ java -version // 顯示如下 java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
- 1
- 2
- 3
- 4
- 5
- 6
- 按照以上步驟配置其他主句(hadoop2~hadoop8):可以使用批分發指令(xsync.sh分發文件給其他7台主機節點)
Hadoop安裝配置(手動HA搭建)
1. hadoop安裝配置
- 准備hadoop:hadoop-2.7.6.tar.gz,解壓到 /soft 目錄下,創建符號鏈接
// 從 /home/centos/localsoft 下解壓到 /soft [centos@hadoop1 /home/centos/localsoft]$ tar -xzvf hadoop-2.7.6.tar.gz -C /soft // 創建符號鏈接 [centos@hadoop1 /soft]$ ln -s /soft/hadoop-2.7.6 hadoop
- 1
- 2
- 3
- 4
- 5
- 在 /etc/profile 下配置環境變量,source /etc/profile 立即生效,使用 hadoop version 檢測是否安裝成功
// 進入profile [centos@hadoop1 /home/centos]$ sudo nano /etc/profile // 環境變量 # hadoop export HADOOP_HOME=/soft/hadoop export PATH=$PATH:$HADOOP_HOME/bin/:$HADOOP_HOME/sbin // source 立即生效 [centos@hadoop1 /home/centos]$ source /etc/profilea // 檢測是否安裝成功 [centos@hadoop1 /home/centos]$ hadoop version 顯示如下: Hadoop 2.7.6 Subversion https://shv@git-wip-us.apache.org/repos/asf/hadoop.git -r 085099c66cf28be31604560c376fa282e69282b8 Compiled by kshvachk on 2018-04-18T01:33Z Compiled with protoc 2.5.0 From source with checksum 71e2695531cb3360ab74598755d036 This command was run using /soft/hadoop-2.7.6/share/hadoop/common/hadoop-common-2.7.6.jar
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
提示: 現在的操作在hadoop1節點上,先不用在其他節點進行安裝配置,等后續配置結束后再一起將配置傳給其他節點,能大大節省工作量。
2. hadoop手動NameNode HA搭建
基於hadoop的原生NameNode HA搭建,后面會與zookeeper集群進行整合,實現自動容災(Yarn+NameNode)
- 進入 /soft/hadoop/etc 目錄,復制 hadoop 文件為:full,ha,pesudo,並創建指向ha的符號鏈接hadoop
[centos@hadoop1 /soft/hadoop/etc]$ cp hadoop ha [centos@hadoop1 /soft/hadoop/etc]$ cp hadoop full [centos@hadoop1 /soft/hadoop/etc]$ cp hadoop pesudo // 創建符號鏈接 [centos@hadoop1 /soft/hadoop/etc]$ ln -s /soft/hadoop/etc/ha hadoop
- 1
- 2
- 3
- 4
- 5
- 6
- 進入 ha 目錄下配置4個文件:core-site.xml;hdfs-site.xml;mapred-site.xml;yarn-site.xml
[core-site.xml]
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <!--- 配置新的本地目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/home/centos/hadoop</value> </property> <property> <name>ipc.client.connect.max.retries</name> <value>20</value> </property> <property> <name>ipc.client.connect.retry.interval</name> <value>5000</value> </property> </configuration>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
[hdfs-site.xml]
<configuration> <!-- 配置nameservice --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!-- myucluster下的名稱節點兩個id --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <!-- 配置每個nn的rpc地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>hadoop1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>hadoop2:8020</value> </property> <!-- 配置webui端口 --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>hadoop1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>hadoop2:50070</value> </property> <!-- 名稱節點共享編輯目錄 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop3:8485;hadoop4:8485;hadoop5:8485;hadoop6:8485;hadoop7:8485;hadoop8:8485/mycluster</value> </property> <!-- java類,client使用它判斷哪個節點是激活態 --> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property>