一 完全分布式集群(單點)
Hadoop官方地址:http://hadoop.apache.org/
1 准備3台客戶機
1.1防火牆,靜態IP,主機名
關閉防火牆,設置靜態IP,主機名此處略,參考 Linux之CentOS7.5安裝及克隆
1.2 修改host文件
我們希望三個主機之間都能夠使用主機名稱的方式相互訪問而不是IP,我們需要在hosts中配置其他主機的host。因此我們在主機的/etc/hosts下均進行如下配置:
[root@hadoop0 ~]# vi /etc/hosts 配置主機host 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.100.21 hadoop0 192.168.100.22 hadoop1 192.168.100.23 hadoop2 將配置發送到其他主機(同時在其他主機上配置) [root@hadoop0 ~]# scp -r /etc/hosts root@hadoop1:/etc/ [root@hadoop0 ~]# scp -r /etc/hosts root@hadoop2:/etc/ 測試 [root@hadoop0 ~]# ping hadoop0 [root@hadoop0 ~]# ping hadoop1 [root@hadoop0 ~]# ping hadoop2
1.3 添加用戶賬號
在所有的主機下均建立一個賬號hadoop用來運行hadoop ,並將其添加至sudoers中 [root@hadoop0 ~]# useradd hadoop #添加用戶通過手動輸入修改密碼 [root@hadoop0 ~]# passwd hadoop #更改用戶 hadoop 的密碼 hadoop #密碼
#passwd: 所有的身份驗證令牌已經成功更新。 設置hadoop用戶具有root權限 修改 /etc/sudoers 文件,找到下面一行,在root下面添加一行,如下所示: [root@hadoop0 ~]# vim /etc/sudoers
#找到這里 ## Allow root to run any commands anywhere root ALL=(ALL) ALL hadoop ALL=(ALL) ALL
修改完畢 :wq! 保存退出,現在可以用hadoop帳號登錄,然后用命令 su hadoop ,切換用戶即可獲得root權限進行操作。
1.4 /目錄下創建文件夾
1)在root用戶下創建hadoop文件夾 [root@hadoop0 /]# cd / [root@hadoop0 /]# mkdir hadoop 2)修改hadoop文件夾的所有者 [root@hadoop0 opt]# chown -R hadoop:hadoop /hadoop 3)查看hadoop文件夾的所有者 [root@hadoop0 /]# ll
drwxr-xr-x. 4 hadoop hadoop 46 May 10 16:56 hadoop
2 安裝配置jdk1.8
[root@hadoop0 ~]# rpm -qa|grep java #查詢是否安裝java軟件: 這里使用官網下載好的tar包傳到服務器 /hadoop 下 [root@hadoop0 hadoop]# tar -zxvf jdk-8u181-linux-x64.tar.gz [root@hadoop0 hadoop]# mv jdk1.8.0_181 jdk1.8 設置JAVA_HOME
[root@hadoop0 hadoop]# vim /etc/profile
#在文件末尾添加
#JAVA
export JAVA_HOME=/hadoop/jdk1.8
export JRE_HOME=/hadoop/jdk1.8/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/sbin
[root@hadoop0 hadoop]# source /etc/profile #重啟配置文件生效
#檢查是否配置成功
[hadoop@hadoop0 /]$ java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
3 安裝hadoop集群
3.1 集群部署規划
| 節點名稱 | NN1 | NN2 | DN | RM | NM |
| hadoop0 | NameNode | DataNode | ResourceManager | NodeManager | |
| hadoop1 | SecondaryNameNode | DataNode | NodeManager | ||
| hadoop2 | DataNode | NodeManager |
3.2 設置SSH免密鑰
關於ssh免密碼的設置,要求每兩台主機之間設置免密碼,自己的主機與自己的主機之間也要求設置免密碼。 這項操作可以在hadoop用戶下執行,執行完畢公鑰在/home/hadoop/.ssh/id_rsa.pub
[hadoop@hadoop0 ~]# ssh-keygen -t rsa [hadoop@hadoop0 ~]# ssh-copy-id hadoop0 [hadoop@hadoop0 ~]# ssh-copy-id hadoop1 [hadoop@hadoop0 ~]# ssh-copy-id hadoop2
node1與node2為namenode節點要相互免秘鑰 HDFS的HA
[hadoop@hadoop1 ~]# ssh-keygen -t rsa [hadoop@hadoop1 ~]# ssh-copy-id hadoop1 [hadoop@hadoop1 ~]# ssh-copy-id hadoop0 [hadoop@hadoop1 ~]# ssh-copy-id hadoop2
node2與node3為yarn節點要相互免秘鑰 YARN的HA
[hadoop@hadoop2 ~]# ssh-keygen -t rsa [hadoop@hadoop2 ~]# ssh-copy-id hadoop2 [hadoop@hadoop2 ~]# ssh-copy-id hadoop0 [hadoop@hadoop2 ~]# ssh-copy-id hadoop1
3.3 解壓安裝hadoop
[hadoop@hadoop0 hadoop]# tar -zxvf hadoop-2.7.7.tar.gz
4 配置hadoop集群
注意:配置文件在hadoop2.7.7/etc/hadoop/下
4.1 修改core-site.xml
[hadoop@hadoop0 hadoop]$ vi core-site.xml <configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop0:9000</value>
</property>
<!-- 指定hadoop運行時產生文件的存儲目錄,注意tmp目錄需要創建 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/hadoop-2.7.7/tmp</value>
</property>
</configuration>
4.2 修改hadoop-env.sh
[hadoop@hadoop0 hadoop]$ vi hadoop-env.sh 修改 export JAVA_HOME=/hadoop/jdk1.8
4.3 修改hdfs-site.xml
[hadoop@hadoop0 hadoop]$ vi hdfs-site.xml <configuration>
<!-- 設置dfs副本數,不設置默認是3個 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 設置namenode數據存放路徑 -->
<property>
<name>dfs.name.dir</name>
<value>/hadoop/hadoop-2.7.7/dfs/name</value>
</property>
<!-- 設置datanode數據存放路徑 -->
<property>
<name>dfs.data.dir</name>
<value>/hadoop/hadoop-2.7.7/dfs/data</value>
</property>
<!-- 設置secondname的端口 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1:50090</value>
</property>
</configuration>
4.4 修改slaves
[hadoop@hadoop0 hadoop]$ vi slaves hadoop0 hadoop1 hadoop2
4.5 修改mapred-site.xml
[hadoop@hadoop0 hadoop]# mv mapred-site.xml.template mapred-site.xml
[hadoop@hadoop0 hadoop]$ vi mapred-site.xml
<configuration>
<!-- 指定mr運行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4.6 修改yarn-site.xml
[hadoop@hadoop0 hadoop]$ vi yarn-site.xml
<configuration>
<!-- reducer獲取數據的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop0</value>
</property>
</configuration>
4.9 分發hadoop到節點
[hadoop@hadoop0 /]# scp -r hadoop/ hadoop@hadoop1:$PWD [hadoop@hadoop0 /]# scp -r hadoop/ hadoop@hadoop2:$PWD
4.10 配置環境變量
[hadoop@hadoop0 ~]$ sudo vim /etc/profile 末尾追加 export HADOOP_HOME=/hadoop/hadoop-2.7.7 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin 編譯生效 source /etc/profile
5 啟動驗證集群
5.1 啟動集群
如果集群是第一次啟動,需要格式化namenode
[hadoop@hadoop0 hadoop-2.7.7]$ hdfs namenode -format

啟動Hdfs:
[hadoop@hadoop0 ~]# start-dfs.sh
Starting namenodes on [hadoop0]
hadoop0: starting namenode, logging to /hadoop/hadoop-2.7.7/logs/hadoop-hadoop-namenode-hadoop0.out
hadoop0: starting datanode, logging to /hadoop/hadoop-2.7.7/logs/hadoop-hadoop-datanode-hadoop0.out
hadoop2: starting datanode, logging to /hadoop/hadoop-2.7.7/logs/hadoop-hadoop-datanode-hadoop2.out
hadoop1: starting datanode, logging to /hadoop/hadoop-2.7.7/logs/hadoop-hadoop-datanode-hadoop1.out
Starting secondary namenodes [hadoop1]
hadoop1: starting secondarynamenode, logging to /hadoop/hadoop-2.7.7/logs/hadoop-hadoop-secondarynamenode-hadoop1.out
啟動Yarn: 注意:Namenode和ResourceManger如果不是同一台機器,不能在NameNode上啟動 yarn,應該在ResouceManager所在的機器上啟動yarn。
[hadoop@hadoop1 ~]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /hadoop/hadoop-2.7.7/logs/yarn-hadoop-resourcemanager-hadoop0.out
hadoop2: starting nodemanager, logging to /hadoop/hadoop-2.7.7/logs/yarn-hadoop-nodemanager-hadoop2.out
hadoop1: starting nodemanager, logging to /hadoop/hadoop-2.7.7/logs/yarn-hadoop-nodemanager-hadoop1.out
hadoop0: starting nodemanager, logging to /hadoop/hadoop-2.7.7/logs/yarn-hadoop-nodemanager-hadoop0.out
jps查看進程
[hadoop@hadoop0 hadoop-2.7.7]$ jps
2162 NodeManager
2058 ResourceManager
2458 Jps
1692 NameNode
1820 DataNode
[hadoop@hadoop1 hadoop]$ jps
1650 Jps
1379 DataNode
1478 SecondaryNameNode
1549 NodeManager
[hadoop@hadoop2 hadoop]$ jps
1506 DataNode
1611 NodeManager
1711 Jps
5.2 Hadoop啟動停止方式
1)各個服務組件逐一啟動 分別啟動hdfs組件: hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode 啟動yarn: yarn-daemon.sh start|stop resourcemanager|nodemanager 2)各個模塊分開啟動(配置ssh是前提)常用 start|stop-dfs.sh start|stop-yarn.sh 3)全部啟動(不建議使用) start|stop-all.sh
5.3 集群時間同步
是否安裝ntp
[hadoop@hadoop0 hadoop-2.7.7]$ ntp
bash: ntp: command not found
[hadoop@hadoop0 hadoop-2.7.7]$ sudo yum install ntp #安裝ntp
[hadoop@hadoop0 hadoop-2.7.7]$ sudo ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime #linux的時區設置為中國上海時區
[hadoop@hadoop0 hadoop-2.7.7]$ sudo ntpdate time.windows.com #與當地網絡時間同步
12 May 12:33:09 ntpdate[2571]: adjust time server 13.70.22.122 offset 0.029582 sec
[hadoop@hadoop0 hadoop-2.7.7]$ date #查看是否同步成功
Sun May 12 12:33:16 CST 2019

