一、Hadoop環境准備
1.集群規划
主機名 |
IP |
HDFS |
YARN |
hadoop102 |
10.0.0.102 |
NameNode、DataNode |
NodeManager |
hadoop103 |
10.0.0.103 |
DataNode、SecondaryNameNode |
NodeManager、ResourceManager |
hadoop104 |
10.0.0.104 |
DataNode |
NodeManager |
#1.注意事項:
ps:
1)NameNode和SecondaryNameNode不要安裝在同一台服務器
2)ResourceManager也很消耗內存,不要和NameNode、SecondaryNameNode配置在同一台機器上。
#2.配置文件說明
Hadoop配置文件分兩類:默認配置文件和自定義配置文件,只有用戶想修改某一默認配置值時,才需要修改自定義配置文件,更改相應屬性值。
1)默認配置文件:
要獲取的默認文件 文件存放在Hadoop的jar包中的位置
[core-default.xml] hadoop-common-3.1.3.jar/core-default.xml
[hdfs-default.xml] hadoop-hdfs-3.1.3.jar/hdfs-default.xml
[yarn-default.xml] hadoop-yarn-common-3.1.3.jar/yarn-default.xml
[mapred-default.xml] hadoop-mapreduce-client-core-3.1.3.jar/mapred-default.xml
2)自定義配置文件:
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四個配置文件存放在$HADOOP_HOME/etc/hadoop這個路徑上,用戶可以根據項目需求重新進行修改配置。
2.修改主機名稱
#1.修改hadoop102的hosts文件
[root@hadoop102 ~]# vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.102 hadoop102
10.0.0.103 hadoop103
10.0.0.104 hadoop104
#2.將hadoop102的hosts文件拷貝到hadoop103
[root@hadoop102 ~]# scp /etc/hosts root@hadoop103:/etc/hosts
root@hadoop103's password:
hosts 100% 222 1.5KB/s 00:00
#2.將hadoop102的hosts文件拷貝到hadoop104
[root@hadoop102 ~]# scp /etc/hosts root@hadoop104:/etc/hosts
root@hadoop104's password:
hosts 100% 222 108.8KB/s 00:00
3.創建部署用戶
#1.創建用戶
[root@hadoop102 ~]# useradd delopy
[root@hadoop103 ~]# useradd delopy
[root@hadoop104 ~]# useradd delopy
#2.sudo提權
[root@hadoop102 ~]# vim /etc/sudoers
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
delopy ALL=(ALL) ALL
#3.復制sudo文件到hadoop103
[root@hadoop102 ~]# scp /etc/sudoers root@hadoop103:/etc/sudoers
root@hadoop103's password:
sudoers 100% 4356 1.0MB/s 00:00
#4.復制sudo文件到hadoop104
[root@hadoop102 ~]# scp /etc/sudoers root@hadoop104:/etc/sudoers
root@hadoop104's password:
sudoers 100% 4356 769.0KB/s 00:00
#5.創建程序和數據目錄
[root@hadoop102 ~]# mkdir /data/
[root@hadoop102 ~]# mkdir /opt/module
[root@hadoop102 ~]# chown -R delopy.delopy /data/
[root@hadoop102 ~]# chown -R delopy.delopy /opt/module/
二、SSH免密登錄
1.生成密鑰對(所有機器)
#1.切換delopy用戶
[root@hadoop102 ~]# su delopy
#2.設置用戶密碼
[root@hadoop102 ~]# passwd delopy
Changing password for user delopy.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
#3.生成密鑰對,一直回車即可
[delopy@hadoop102 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/delopy/.ssh/id_rsa):
Created directory '/home/delopy/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/delopy/.ssh/id_rsa.
Your public key has been saved in /home/delopy/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:8gV808AJIHCQTE8uEkUPuCn16A8IuSQrMfQSf2CBBEc delopy@hadoop102
The key's randomart image is:
+---[RSA 2048]----+
|=OE== ...o.. |
|.*+X . . oo |
|o=*o= o o . |
|X+.+.. o . |
|=B. . . S . |
|= o o . |
|. o . |
| . |
| |
+----[SHA256]-----+
2.查看密鑰(所有機器)
#1.查看生成的密鑰對
[delopy@hadoop102 ~]$ cd ~/.ssh
[delopy@hadoop102 ~/.ssh]$ ll
total 8
-rw------- 1 delopy delopy 1679 2021-08-31 14:59 id_rsa
-rw-r--r-- 1 delopy delopy 398 2021-08-31 14:59 id_rsa.pub
#3.查看公鑰
[root@hadoop102 ~/.ssh]# cat id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCebmCdoFk9XrT5AVoNJFlhwoYArJY80BU9JyNwwXziR6NjuTrS4pzENBwx/Lbq0/qMI/PdZMMdiBYhpZTL/DkZyDoRf+2zRzPNQUMvTrK3bjIH4CAs3L7qSrkGICeaWQ9PIJwaRqF2yPS16qFTnq8aAimz08UiGzLfhGUHiEA+QF8usoe3titLXQ9fguRxyCfigdCEeq+xhPVuDpXCNoi6Woh4mnegGoVtJWgguFG0DU1gfUGckl0oKHM4ZbVBaQWTmQjHUKgvwwlXAO4gZ3qkVcGzMxfcc0P/OMqojYEbD5n/RFiMbN8ylCJt6QjOj23NzTG/LTNFFRbDfbLRhhm1 root@hadoop102
[delopy@hadoop102 ~/.ssh]$ cat id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7WKwWyb3lliFTQ1HPxZS63NvFLPYYiLovVhspMCWkiRKrgIGXB++tBRi2vJvLpLyMOpJVRc0hIUD2ycBgHuWLtWYNqma/1xzeIu67OrsK+v8+CeTCzqZ97DPp881Uu+4SoVQOkla7evpH40DOibvKd7SN8L7Mk+PEsVCeIrNyA/g2iZ9+M+XWaZIIYJb15QRPZLcgj1GHcR0cf6DtuTt26pCVimSYJ8DOYNNfHfwWKyJfBKKaQUX3ByYDbKIIH+yw3VbLgyU3v9oseYCA5psqeuD0YLuERrr45rydNRL7/oeoW2NicHSG2V1H6KBQBq861HcdbmcE2nbZtWrAsKpv delopy@hadoop102
[delopy@hadoop103 ~/.ssh]$ cat id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDLlKkbKYyIpUpwYLRBIqLhhU2YYb9o1dafpNwR8IkIj6rDBc2OzD1fqdzSQSpHX8LXShDTv2nr4R++SG1MabwqJ4q7JKwmZRSjuy/flQK0uhtSW6rPNqZX3P8Tl8rSqUMInOwwna9qCZTI8gajPrXRHAJ+oKRWWtGQ3M6t6larC4tXSoFQ4nBkPEgXUFnYphX1mYJiD0QduUXZwK7IMzFXPP/SkW+PddepFlsV2gTf2xCsLh7RHhsh0zWThkJGqLb6nPbIjOydQ84C3Z5DusAxOqlvuQk2FKpOQrB0dAgtHog7Oc/1vJqAMRe6MPdzaExl+OIEW2Xh8jJf9JWSkcs3 delopy@hadoop103
[delopy@hadoop104 ~/.ssh]$ cat id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2EXMXB9V4f86vRhD2cHhZEd+gqatotEy9HkwKfajelPgH1KcD4jepM7h+RmutGj+QfCSE/fj56GuebjHFJmB8eB1X5wZ0B3lBbz+KV/bNB7IAHvEWn7KG6nkdkzT47zLJrWVY6zxS0BMW86WF4wNGeyHq4R3XZnRxEW/LJ/ZjENpJkh7X2Om2H6d+tq8WjBSCvlidSB8WlG+OAnLxk/rVUaUdRmBTXqBUhcWqIsD+vMaa/rESxvXbrn/0pl83ZVguRpbNPHbpEPvUujBn/FPSvwv0DN9JEB+v+AzOQADJvT+2mDI/FDzCPpashoeSN31p1vdgXJUQEsBaIlxrm94H delopy@hadoop104
3.配置SSH免密(所有機器)
#1.編輯新文件authorized_keys,將所有公鑰添加進去
[delopy@hadoop102 ~/.ssh]$ vim authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCebmCdoFk9XrT5AVoNJFlhwoYArJY80BU9JyNwwXziR6NjuTrS4pzENBwx/Lbq0/qMI/PdZMMdiBYhpZTL/DkZyDoRf+2zRzPNQUMvTrK3bjIH4CAs3L7qSrkGICeaWQ9PIJwaRqF2yPS16qFTnq8aAimz08UiGzLfhGUHiEA+QF8usoe3titLXQ9fguRxyCfigdCEeq+xhPVuDpXCNoi6Woh4mnegGoVtJWgguFG0DU1gfUGckl0oKHM4ZbVBaQWTmQjHUKgvwwlXAO4gZ3qkVcGzMxfcc0P/OMqojYEbD5n/RFiMbN8ylCJt6QjOj23NzTG/LTNFFRbDfbLRhhm1 root@hadoop102
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7WKwWyb3lliFTQ1HPxZS63NvFLPYYiLovVhspMCWkiRKrgIGXB++tBRi2vJvLpLyMOpJVRc0hIUD2ycBgHuWLtWYNqma/1xzeIu67OrsK+v8+CeTCzqZ97DPp881Uu+4SoVQOkla7evpH40DOibvKd7SN8L7Mk+PEsVCeIrNyA/g2iZ9+M+XWaZIIYJb15QRPZLcgj1GHcR0cf6DtuTt26pCVimSYJ8DOYNNfHfwWKyJfBKKaQUX3ByYDbKIIH+yw3VbLgyU3v9oseYCA5psqeuD0YLuERrr45rydNRL7/oeoW2NicHSG2V1H6KBQBq861HcdbmcE2nbZtWrAsKpv delopy@hadoop102
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDLlKkbKYyIpUpwYLRBIqLhhU2YYb9o1dafpNwR8IkIj6rDBc2OzD1fqdzSQSpHX8LXShDTv2nr4R++SG1MabwqJ4q7JKwmZRSjuy/flQK0uhtSW6rPNqZX3P8Tl8rSqUMInOwwna9qCZTI8gajPrXRHAJ+oKRWWtGQ3M6t6larC4tXSoFQ4nBkPEgXUFnYphX1mYJiD0QduUXZwK7IMzFXPP/SkW+PddepFlsV2gTf2xCsLh7RHhsh0zWThkJGqLb6nPbIjOydQ84C3Z5DusAxOqlvuQk2FKpOQrB0dAgtHog7Oc/1vJqAMRe6MPdzaExl+OIEW2Xh8jJf9JWSkcs3 delopy@hadoop103
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2EXMXB9V4f86vRhD2cHhZEd+gqatotEy9HkwKfajelPgH1KcD4jepM7h+RmutGj+QfCSE/fj56GuebjHFJmB8eB1X5wZ0B3lBbz+KV/bNB7IAHvEWn7KG6nkdkzT47zLJrWVY6zxS0BMW86WF4wNGeyHq4R3XZnRxEW/LJ/ZjENpJkh7X2Om2H6d+tq8WjBSCvlidSB8WlG+OAnLxk/rVUaUdRmBTXqBUhcWqIsD+vMaa/rESxvXbrn/0pl83ZVguRpbNPHbpEPvUujBn/FPSvwv0DN9JEB+v+AzOQADJvT+2mDI/FDzCPpashoeSN31p1vdgXJUQEsBaIlxrm94H delopy@hadoop104
#2.修改文件權限為600
[delopy@hadoop102 ~/.ssh]$ chmod 600 authorized_keys
#3.ssh文件夾下(~/.ssh)的文件功能解釋
known_hosts 記錄ssh訪問過計算機的公鑰(public key)
id_rsa 生成的私鑰
id_rsa.pub 生成的公鑰
authorized_keys 存放授權過的無密登錄服務器公鑰
4.測試SSH免密登錄(所有機器)
#1.ssh免密登錄hadoop102
[delopy@hadoop102 ~/.ssh]$ ssh hadoop102
The authenticity of host 'hadoop102 (10.0.0.102)' can't be established.
ECDSA key fingerprint is SHA256:g6buQ4QMSFl+5MMAh8dTCmLtkIfdT8sgRFYc6uCzV3c.
ECDSA key fingerprint is MD5:5f:d7:ad:07:e8:fe:d2:49:ec:79:2f:d4:91:59:c5:03.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop102,10.0.0.102' (ECDSA) to the list of known hosts.
Last login: Tue Aug 31 15:21:35 2021
[delopy@hadoop102 ~]$ logout
Connection to hadoop102 closed.
#2.ssh免密登錄hadoop103
[delopy@hadoop102 ~/.ssh]$ ssh hadoop103
The authenticity of host 'hadoop103 (10.0.0.103)' can't be established.
ECDSA key fingerprint is SHA256:g6buQ4QMSFl+5MMAh8dTCmLtkIfdT8sgRFYc6uCzV3c.
ECDSA key fingerprint is MD5:5f:d7:ad:07:e8:fe:d2:49:ec:79:2f:d4:91:59:c5:03.
Are you sure you want to continue connecting (yes/no)? yes
There were 16 failed login attempts since the last successful login.
Last login: Tue Aug 31 14:58:54 2021
[delopy@hadoop103 ~]$ logout
Connection to hadoop103 closed.
#3.ssh免密登錄hadoop104
[delopy@hadoop102 ~/.ssh]$ ssh hadoop104
The authenticity of host 'hadoop104 (10.0.0.104)' can't be established.
ECDSA key fingerprint is SHA256:g6buQ4QMSFl+5MMAh8dTCmLtkIfdT8sgRFYc6uCzV3c.
ECDSA key fingerprint is MD5:5f:d7:ad:07:e8:fe:d2:49:ec:79:2f:d4:91:59:c5:03.
Are you sure you want to continue connecting (yes/no)? yes
Last failed login: Tue Aug 31 15:12:11 CST 2021 from 10.0.0.102 on ssh:notty
There were 4 failed login attempts since the last successful login.
Last login: Tue Aug 31 15:01:13 2021
[delopy@hadoop104 ~]$ logout
Connection to hadoop103 closed.
三、編寫集群分發腳本xsync
1.scp(secure copy)安全拷貝
#1.scp定義
scp可以實現服務器與服務器之間的數據拷貝。(from server1 to server2)
#2.基本語法
scp -r $pdir/$fname $user@$host:$pdir/$fname
命令 遞歸 要拷貝的文件路徑/名稱 目的地用戶@主機:目的地路徑/名稱
2.rsync遠程同步工具
#1.rsync定義
rsync主要用於備份和鏡像。具有速度快、避免復制相同內容和支持符號鏈接的優點。
rsync和scp區別:用rsync做文件的復制要比scp的速度快,rsync只對差異文件做更新。scp是把所有文件都復制過去。
#2.基本語法
rsync -av $pdir/$fname $user@$host:$pdir/$fname
命令 選項參數 要拷貝的文件路徑/名稱 目的地用戶@主機:目的地路徑/名稱
選項參數說明:
-a 歸檔拷貝
-v 顯示復制過程
3.需求分析
#1.需求:循環復制文件到所有節點的相同目錄下
#2.需求分析:
1)rsync命令原始拷貝:
rsync -av /opt/module atguigu@hadoop103:/opt/
2)期望腳本:
xsync要同步的文件名稱
3)期望腳本在任何路徑都能使用(腳本放在聲明了全局環境變量的路徑)
[delopy@hadoop102 ~]$ echo $PATH
/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
4.編寫xsync集群分發腳本
#1.在/home/deploy/bin目錄下創建xsync文件
[delopy@hadoop102 ~]$ mkdir bin
[delopy@hadoop102 ~]$ cd bin/
[delopy@hadoop102 ~/bin]$ vim xsync
#!/bin/bash
#1. 判斷參數個數
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
#!/bin/bash
#1. 判斷參數個數
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍歷集群所有機器
for host in hadoop102 hadoop103 hadoop104
do
echo ==================== $host ====================
#3. 遍歷所有目錄,挨個發送
for file in $@
do
#4. 判斷文件是否存在
if [ -e $file ]
then
#5. 獲取父目錄
pdir=$(cd -P $(dirname $file); pwd)
#6. 獲取當前文件的名稱
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
#2.修改腳本 xsync 具有執行權限
[delopy@hadoop102 ~/bin]$ chmod +x xsync
#3.測試腳本
[delopy@hadoop102 ~/bin]$ ./xsync /home/delopy/bin
#4.配置環境變量
[delopy@hadoop102 ~]$ sudo vim /etc/profile.d/my_env.sh
# RSYNC_HOME
export PATH=/home/delopy/bin:$PATH
# JAVA_HOME
export JAVA_HOME=/opt/module/jdk
export PATH=$PATH:$JAVA_HOME/bin
# HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
#5.同步環境變量配置(root所有者)
[delopy@hadoop102 ~]$ sudo ./bin/xsync /etc/profile.d/my_env.sh
注意:如果用了sudo,那么xsync一定要給它的路徑補全。
讓環境變量生效
#5.所有機器刷新環境變量並查看
[atguigu@hadoop103 bin]$ source /etc/profile
[delopy@hadoop102 ~]$ echo $PATH
/opt/hadoop/bin:/opt/hadoop/sbin:/home/delopy/bin:/home/delopy/bin:/home/delopy/bin/xsync:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/opt/jdk/bin
四、JDK安裝
JDK官網下載:https://www.oracle.com
1.創建軟件存放目錄
[delopy@hadoop102 ~]$ mkdir /data/software/
[delopy@hadoop102 ~]$ cd /data/software/
2.上傳JDK安裝包
[delopy@hadoop102 /data/software]$ rz
[delopy@hadoop102 /data/software]$ ll
total 181192
-rw-r--r-- 1 delopy delopy 185540433 2021-06-16 14:21 jdk-8u131-linux-x64.tar.gz
3.解壓安裝包
[delopy@hadoop102 /data/software]$ tar xf jdk-8u131-linux-x64.tar.gz -C /opt/module/
[delopy@hadoop102 /data/software]$ cd /opt/module/
[delopy@hadoop102 /opt/module]$ ll
total 0
drwxr-xr-x 8 delopy delopy 255 2017-03-15 16:35 jdk1.8.0_131
4.做軟連接
[delopy@hadoop102 /data/software]$ cd /opt/module/
[delopy@hadoop102 /opt/module]$ ll
total 0
drwxr-xr-x 8 delopy delopy 255 2017-03-15 16:35 jdk1.8.0_131
5.推送JDK到其他機器
[delopy@hadoop102 /opt/module]$ xsync /opt/module/
6.驗證JDK版本(所有機器)
[delopy@hadoop102 /opt/module]$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
五、Hadoop安裝
Hadoop官網下載地址:https://hadoop.apache.org/releases.html
1.下載安裝包
[delopy@hadoop102 ~]$ cd /data/software/
[delopy@hadoop102 /data/software]$ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
[delopy@hadoop102 /data/software]$ ll
total 772196
-rw-r--r-- 1 delopy delopy 605187279 2021-06-15 17:55 hadoop-3.3.1.tar.gz
2.解壓安裝包
[delopy@hadoop102 /data/software]$ tar xf hadoop-3.3.1.tar.gz -C /opt/module/
[delopy@hadoop102 /data/software]$ cd /opt/module/
[delopy@hadoop102 /opt/module]$ ll
total 0
drwxr-xr-x 10 delopy delopy 215 2021-06-15 13:52 hadoop-3.3.1
3.做軟連接
[delopy@hadoop102 /opt/module]$ ln -s hadoop-3.3.1 hadoop
[delopy@hadoop102 /opt/module]$ ll
total 0
lrwxrwxrwx 1 delopy delopy 12 2021-09-01 11:43 hadoop -> hadoop-3.3.1
drwxr-xr-x 10 delopy delopy 215 2021-06-15 13:52 hadoop-3.3.1
4.同步Hadoop程序到其他機器
[delopy@hadoop102 /opt/module]$ xsync /opt/module/
5.驗證hadoop(所有機器)
[delopy@hadoop102 /opt/module]$ hadoop version
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /opt/module/hadoop-3.3.1/share/hadoop/common/hadoop-common-3.3.1.jar
六、Hadoop集群配置
1.核心配置文件
[delopy@hadoop102 ~]$ cd /opt/module/hadoop/etc/hadoop/
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ vim core-site.xml
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:8020</value>
<description>指定NameNode的地址</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/data</value>
<description>指定hadoop數據的存儲目錄</description>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>delopy</value>
<description>配置HDFS網頁登錄使用的靜態用戶為delopy</description>
</property>
</configuration>
2.HDFS配置文件
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ vim hdfs-site.xml
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop102:9870</value>
<description>nn web端訪問地址</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:9868</value>
<description>2nn web端訪問地址</description>
</property>
</configuration>
3.YARN配置文件
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ vim yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>指定MR走shuffle</description>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
<description>指定ResourceManager的地址</description>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
<description>環境變量的繼承</description>
</property>
</configuration>
4.MapReduce配置文件
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>指定MapReduce程序運行在Yarn上</description>
</property>
</configuration>
5.配置workers
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ vim workers
hadoop102
hadoop103
hadoop104
ps: 該文件中添加的內容結尾不允許有空格,文件中不允許有空行。
6.分發配置好的Hadoop配置文件
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ xsync /opt/module/
七、啟動Hadoop集群
1.格式化HDFS
#1.如果集群是第一次啟動,需要在hadoop102節點格式化NameNode(注意:格式化NameNode,會產生新的集群id,導致NameNode和DataNode的集群id不一致,集群找不到已往數據。如果集群在運行過程中報錯,需要重新格式化NameNode的話,一定要先停止namenode和datanode進程,並且要刪除所有機器的data和logs目錄,然后再進行格式化。)
[delopy@hadoop102 ~]$ hdfs namenode -format
... ...
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop102/10.0.0.102
************************************************************/
2.啟動HDFS
#1.hadoop102啟動HDFS
[delopy@hadoop102 ~]$ start-dfs.sh
Starting namenodes on [hadoop102]
Starting datanodes
hadoop103: WARNING: /opt/module/hadoop/logs does not exist. Creating.
hadoop104: WARNING: /opt/module/hadoop/logs does not exist. Creating.
Starting secondary namenodes [hadoop104]
#2.查看集群HDFS啟動狀態
[delopy@hadoop102 ~]$ jps
18016 Jps
17653 NameNode
17756 DataNode
[delopy@hadoop103 ~]$ jps
16681 DataNode
16748 Jps
[delopy@hadoop104 ~]$ jps
31880 DataNode
31976 SecondaryNameNode
32024 Jps
3.啟動YARN
#1.hadoop103啟動YARN
[delopy@hadoop103 ~]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
#2.查看集群YARN啟動狀態
[delopy@hadoop103 ~]$ jps
16968 NodeManager
16681 DataNode
17052 Jps
16862 ResourceManager
[delopy@hadoop102 ~]$ jps
18800 NameNode
18905 DataNode
19323 Jps
19229 NodeManager
[delopy@hadoop104 ~]$ jps
32197 Jps
31880 DataNode
31976 SecondaryNameNode
32090 NodeManager
4.Web端查看HDFS的NameNode
#1.瀏覽器中輸入:http://hadoop102:9870,下圖可以看到Live Nodes:3,Disk:300G.

5.Web端查看YARN的ResourceManager
#1.瀏覽器中輸入:http://hadoop103:8088,

八、配置歷史服務器
為了查看程序的歷史運行情況,需要配置一下歷史服務器。具體配置步驟如下:
1.配置mapred-site.xml
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>指定MapReduce程序運行在Yarn上</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop102:10020</value>
<description>歷史服務器端地址</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop102:19888</value>
<description>歷史服務器web端地址</description>
</property>
</configuration>
2.分發配置
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ xsync /opt/module/
3.啟動歷史服務器
#1.hadoop102啟動歷史服務器
[delopy@hadoop102 ~]$ mapred --daemon start historyserver
#2.查看歷史服務器是否啟動
[delopy@hadoop102 ~]$ jps
18800 NameNode
20133 Jps
18905 DataNode
19229 NodeManager
20077 JobHistoryServer
4.查看JobHistory
#1.瀏覽器輸入:http://hadoop102:19888/jobhistory

九、配置日志聚集
日志聚集概念:應用運行完成以后,將程序運行日志信息上傳到HDFS系統上。

日志聚集功能好處:可以方便的查看到程序運行詳情,方便開發調試。
ps:開啟日志聚集功能,需要重新啟動NodeManager 、ResourceManager和HistoryServer。
開啟日志聚集功能具體步驟如下:
1.配置yarn-site.xml
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ vim yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>指定MR走shuffle</description>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
<description>指定ResourceManager的地址</description>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
<description>環境變量的繼承</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>開啟日志聚集功能</description>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop102:19888/jobhistory/logs</value>
<description>設置日志聚集服務器地址</description>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
<description>設置日志保留時間為7天</description>
</property>
</configuration>
2.分發配置
[delopy@hadoop102 /opt/module/hadoop/etc/hadoop]$ xsync /opt/module/
3.關閉NodeManager 、ResourceManager和HistoryServer
#1.在hadoop103執行操作
[delopy@hadoop103 ~]$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
#2.在hadoop102執行操作
[delopy@hadoop102 ~]$ mapred --daemon stop historyserver
#3.查看hadoop啟動狀態
[delopy@hadoop103 ~]$ jps
16681 DataNode
21466 Jps
4.啟動NodeManager 、ResourceManager和HistoryServer
#1.在hadoop103執行操作.關閉NodeManager、ResourceManager和HistoryServer
[delopy@hadoop103 ~]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
#2.在hadoop102執行操作
[delopy@hadoop102 ~]$ mapred --daemon start historyserver
#3.查看hadoop啟動狀態
[delopy@hadoop103 ~]$ jps
21584 ResourceManager
16681 DataNode
21849 JobHistoryServer
21692 NodeManager
21903 Jps
十、集群功能測試
1.新建小文件
[delopy@hadoop102 ~]$ vim /data/software/1.txt
c 是世界上最好的語言!
java 是世界上最好的語言!
python 是世界上最好的語言!
go 是世界上最好的語言!
2.上傳文件到Hadoop
[delopy@hadoop102 ~]$ hadoop fs -mkdir /input
[delopy@hadoop102 ~]$ hadoop fs -put /data/software/1.txt /input
3.執行wordcount程序
[delopy@hadoop102 ~]$ hadoop jar /opt/module/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /input /output
2021-09-02 11:20:50,127 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop103/10.0.0.103:8032
2021-09-02 11:20:51,862 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/delopy/.staging/job_1630552034170_0001
2021-09-02 11:20:53,199 INFO input.FileInputFormat: Total input files to process : 1
2021-09-02 11:20:53,675 INFO mapreduce.JobSubmitter: number of splits:1
2021-09-02 11:20:54,787 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1630552034170_0001
2021-09-02 11:20:54,788 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-09-02 11:20:56,510 INFO conf.Configuration: resource-types.xml not found
2021-09-02 11:20:56,510 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-09-02 11:20:57,688 INFO impl.YarnClientImpl: Submitted application application_1630552034170_0001
2021-09-02 11:20:57,969 INFO mapreduce.Job: The url to track the job: http://hadoop103:8088/proxy/application_1630552034170_0001/
2021-09-02 11:20:57,970 INFO mapreduce.Job: Running job: job_1630552034170_0001
2021-09-02 11:21:47,852 INFO mapreduce.Job: Job job_1630552034170_0001 running in uber mode : false
2021-09-02 11:21:47,854 INFO mapreduce.Job: map 0% reduce 0%
2021-09-02 11:22:12,655 INFO mapreduce.Job: map 100% reduce 0%
2021-09-02 11:22:54,276 INFO mapreduce.Job: map 100% reduce 100%
2021-09-02 11:22:56,374 INFO mapreduce.Job: Job job_1630552034170_0001 completed successfully
2021-09-02 11:22:56,735 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=84
FILE: Number of bytes written=545011
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=239
HDFS: Number of bytes written=58
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=19108
Total time spent by all reduces in occupied slots (ms)=39504
Total time spent by all map tasks (ms)=19108
Total time spent by all reduce tasks (ms)=39504
Total vcore-milliseconds taken by all map tasks=19108
Total vcore-milliseconds taken by all reduce tasks=39504
Total megabyte-milliseconds taken by all map tasks=19566592
Total megabyte-milliseconds taken by all reduce tasks=40452096
Map-Reduce Framework
Map input records=4
Map output records=8
Map output bytes=173
Map output materialized bytes=84
Input split bytes=98
Combine input records=8
Combine output records=5
Reduce input groups=5
Reduce shuffle bytes=84
Reduce input records=5
Reduce output records=5
Spilled Records=10
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1328
CPU time spent (ms)=4710
Physical memory (bytes) snapshot=347209728
Virtual memory (bytes) snapshot=5058207744
Total committed heap usage (bytes)=230821888
Peak Map Physical memory (bytes)=223215616
Peak Map Virtual memory (bytes)=2524590080
Peak Reduce Physical memory (bytes)=123994112
Peak Reduce Virtual memory (bytes)=2533617664
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=141
File Output Format Counters
Bytes Written=58
[delopy@hadoop102 ~]$
4.web端查看測試結果
#1.進入HDFS的web界面,進入HDFS文件系統,

#2. 進入HDFS文件系統后,選擇文件目錄為/,可以看到我們三個目錄:
input 為我們創建的目錄
ouput 為我們導出執行結果的目錄,執行程序會自動生成
tmp 為我們的臨時目錄

#3.首先點擊input,進入我們的input目錄,看到所在目錄有我們推送的1.txt文件,點擊這個文件可以下載和查看我們的文件信息


#4.退回到我們的/目錄,點擊output目錄,可以看到以下兩個文件,點擊第二個文件,查看文件信息,可以看到每個詞的出現的次數。


#5.進入yarn的web界面,點擊Applications可以看到我們正在執行的job,最右邊可以看到這個Job的執行進程是否結束。


#6.點擊yarn的job界面的history,進入歷史服務器的web界面,點擊logs查看任務運行日志

#7.查看運行日志詳情,至此hadoop分布式安裝基本完畢。

十一、集群啟動/停止方式總結
1.各個模塊分開啟動/停止(配置ssh是前提)常用
#1.整體啟動/停止HDFS
start-dfs.sh/stop-dfs.sh
#2.整體啟動/停止YARN
start-yarn.sh/stop-yarn.sh
2.各個服務組件逐一啟動/停止
#1.分別啟動/停止HDFS組件
hdfs --daemon start/stop namenode/datanode/secondarynamenode
#2.啟動/停止YARN
yarn --daemon start/stop resourcemanager/nodemanager