大數據-分布式-Hadoop介紹


一、Hadoop介紹

        Hadoop是大數據組件。大數據是海量數據的處理和分析的技術,需要用分布式框架。分布式則是通過多個主機的進程協同在一起,構成整個應用。

        Hadoop 是一個由 Apache 基金會所開發的分布式系統基礎架構,它可以使用戶在不了解分布式底層細節的情況下開發分布式程序,充分利用集群的威力進行高速運算和存儲。Hadoop解決了兩大問題:大數據存儲、大數據分析。也就是 Hadoop 的兩大核心:HDFS 和 MapReduce。

        Hadoop主要基於java語言實現,由三個核心子系統組成:HDFS、YARN、MapReduce,其中HDFS是一套分布式文件系統;YARN是資源管理系統;MapReduce是運行在YARN上的應用,負責分布式處理管理.。從操作系統的角度來看的話,HDFS相當於Linnux的ext3/ext4文件系統,而yarn相當於Linux的進程調度和內存分配模塊。

1. HDFS(Hadoop Distributed File System)是可擴展、容錯、高性能的分布式文件系統,異步復制,一次寫入多次讀取,主要負責存儲。適合部署在大量廉價的機器上,提供高吞吐量的數據訪問。

2. YARN:資源管理器,可為上層應用提供統一的資源管理和調度,兼容多計算框架。

3. MapReduce 是一種分布式編程模型,把對大規模數據集的處理分發給網絡上的多個節點,之后收集處理結果進行規約。

二、Hadoop集群三種模式

1. 在CentOS 7.6上搭建hadoop

1)修改主機名,關閉防火牆

[root@kvm01 ~]# hostnamectl set-hostname hadoop101
[root@hadoop101 ~]# systemctl stop firewalld
[root@hadoop101 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

2)在/opt目錄下創建module和software目錄

[root@hadoop101 ~]# mkdir /opt/{module,software}

3)安裝jdk和hadoop

#下載jdk

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

#下載hadoop-2.7.3.tar.gz

https://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

[root@hadoop101 ~]# cd /opt/software/
[root@hadoop101 software]# ll
total 574904
-rw-r--r-- 1 root root 214092195 Oct 28 22:07 hadoop-2.7.3.tar.gz
-rw-r--r-- 1 root root 179472367 May 19 20:30 jdk-8u251-linux-x64.rpm
-rw-r--r-- 1 root root 195132576 May 27 18:05 jdk-8u251-linux-x64.tar.gz
[root@hadoop101 software]# tar -zxvf jdk-8u251-linux-x64.tar.gz
[root@hadoop101 software]# tar -zxvf hadoop-2.7.3.tar.gz 

4)創建符號鏈接

[root@hadoop101 software]# ll
total 574904
drwxr-xr-x 9 root  root        149 Aug 18  2016 hadoop-2.7.3
-rw-r--r-- 1 root  root  214092195 Oct 28 22:07 hadoop-2.7.3.tar.gz
drwxr-xr-x 7 10143 10143       245 Mar 12  2020 jdk1.8.0_251
-rw-r--r-- 1 root  root  179472367 May 19 20:30 jdk-8u251-linux-x64.rpm
-rw-r--r-- 1 root  root  195132576 May 27 18:05 jdk-8u251-linux-x64.tar.gz
[root@hadoop101 software]# ln -s jdk1.8.0_251/ jdk
[root@hadoop101 software]# ln -s hadoop-2.7.3/ hadoop
[root@hadoop101 software]# ll
total 574904
lrwxrwxrwx 1 root  root         13 Oct 28 22:24 hadoop -> hadoop-2.7.3/
drwxr-xr-x 9 root  root        149 Aug 18  2016 hadoop-2.7.3
-rw-r--r-- 1 root  root  214092195 Oct 28 22:07 hadoop-2.7.3.tar.gz
lrwxrwxrwx 1 root  root         13 Oct 28 22:23 jdk -> jdk1.8.0_251/
drwxr-xr-x 7 10143 10143       245 Mar 12  2020 jdk1.8.0_251
-rw-r--r-- 1 root  root  179472367 May 19 20:30 jdk-8u251-linux-x64.rpm
-rw-r--r-- 1 root  root  195132576 May 27 18:05 jdk-8u251-linux-x64.tar.gz

5)配置環境變量

[root@hadoop101 software]# vim /etc/profile
[root@hadoop101 software]# tail /etc/profile
done

unset i
unset -f pathmunge
#配置以下內容
export JAVA_HOME=/opt/software/jdk
export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/opt/software/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#生效 [root@hadoop101 software]# source
/etc/profile

6)查看版本

[root@hadoop101 software]# java -version
java version "1.8.0_251"
Java(TM) SE Runtime Environment (build 1.8.0_251-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.251-b08, mixed mode)

[root@hadoop101 software]# hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /opt/software/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar

7)相關命令

#當前目錄創建文件
[root@hadoop101 software]# hdfs dfs -touchz hadoop.txt
#列出當前目錄下文件(夾) [root@hadoop101 software]# hdfs dfs
-ls Found 8 items drwxr-xr-x - root root 149 2016-08-18 09:49 hadoop drwxr-xr-x - root root 149 2016-08-18 09:49 hadoop-2.7.3 -rw-r--r-- 1 root root 214092195 2020-10-28 22:07 hadoop-2.7.3.tar.gz -rw-r--r-- 1 root root 0 2020-10-28 22:40 hadoop.txt drwxr-xr-x - 10143 10143 245 2020-03-12 14:37 jdk -rw-r--r-- 1 root root 179472367 2020-05-19 20:30 jdk-8u251-linux-x64.rpm -rw-r--r-- 1 root root 195132576 2020-05-27 18:05 jdk-8u251-linux-x64.tar.gz drwxr-xr-x - 10143 10143 245 2020-03-12 14:37 jdk1.8.0_251
#刪除文件 [root@hadoop101 software]# hdfs dfs
-rm hadoop.txt 20/10/28 22:40:55 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 20/10/28 22:40:55 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted hadoop.txt [root@hadoop101 software]# hdfs dfs -ls Found 7 items drwxr-xr-x - root root 149 2016-08-18 09:49 hadoop drwxr-xr-x - root root 149 2016-08-18 09:49 hadoop-2.7.3 -rw-r--r-- 1 root root 214092195 2020-10-28 22:07 hadoop-2.7.3.tar.gz drwxr-xr-x - 10143 10143 245 2020-03-12 14:37 jdk -rw-r--r-- 1 root root 179472367 2020-05-19 20:30 jdk-8u251-linux-x64.rpm -rw-r--r-- 1 root root 195132576 2020-05-27 18:05 jdk-8u251-linux-x64.tar.gz drwxr-xr-x - 10143 10143 245 2020-03-12 14:37 jdk1.8.0_251
#創建文件夾 [root@hadoop101 software]# hdfs dfs
-mkdir aaa [root@hadoop101 software]# hdfs dfs -ls Found 8 items drwxr-xr-x - root root 6 2020-10-28 22:41 aaa drwxr-xr-x - root root 149 2016-08-18 09:49 hadoop drwxr-xr-x - root root 149 2016-08-18 09:49 hadoop-2.7.3 -rw-r--r-- 1 root root 214092195 2020-10-28 22:07 hadoop-2.7.3.tar.gz drwxr-xr-x - 10143 10143 245 2020-03-12 14:37 jdk -rw-r--r-- 1 root root 179472367 2020-05-19 20:30 jdk-8u251-linux-x64.rpm -rw-r--r-- 1 root root 195132576 2020-05-27 18:05 jdk-8u251-linux-x64.tar.gz drwxr-xr-x - 10143 10143 245 2020-03-12 14:37 jdk1.8.0_251

2. Hadoop的目錄結構

1)bin目錄:存放對Hadoop相關的服務(HDFS,YARN)進行操作的腳本

2)etc目錄:Hadoop的配置文件目錄,存放Hadoop的配置文件

3)lib目錄:存放Hadoop的本地庫(對數據進行壓縮解壓功能)

4)sbin目錄:存放啟動或者停止Hadoop相關服務的腳本

5)share目錄:存放Hadoop的依賴jar包、文檔和官方案例

3. Hadoop三種集群三種模式:本地模式、偽分布式、完全分布式

Hadoop官方網站:http://hadoop.apache.org/

1). 本地模式

單主機模式,不需要啟動進程

官方Grep案例

a. 創建在hadoop文件下面創建一個input文件夾

[root@hadoop101 software]# cd hadoop
[root@hadoop101 hadoop]# ll
total 108
drwxr-xr-x 2 root root   194 Aug 18  2016 bin
drwxr-xr-x 3 root root    20 Aug 18  2016 etc
drwxr-xr-x 2 root root   106 Aug 18  2016 include
drwxr-xr-x 3 root root    20 Aug 18  2016 lib
drwxr-xr-x 2 root root   239 Aug 18  2016 libexec
-rw-r--r-- 1 root root 84854 Aug 18  2016 LICENSE.txt
-rw-r--r-- 1 root root 14978 Aug 18  2016 NOTICE.txt
-rw-r--r-- 1 root root  1366 Aug 18  2016 README.txt
drwxr-xr-x 2 root root  4096 Aug 18  2016 sbin
drwxr-xr-x 4 root root    31 Aug 18  2016 share
[root@hadoop101 hadoop]# mkdir input

b.將Hadoop的xml配置文件復制到input

[root@hadoop101 hadoop]# cp etc/hadoop/*.xml input
[root@hadoop101 hadoop]# ll input/
total 48
-rw-r--r-- 1 root root 4436 Oct 28 22:47 capacity-scheduler.xml
-rw-r--r-- 1 root root  774 Oct 28 22:47 core-site.xml
-rw-r--r-- 1 root root 9683 Oct 28 22:47 hadoop-policy.xml
-rw-r--r-- 1 root root  775 Oct 28 22:47 hdfs-site.xml
-rw-r--r-- 1 root root  620 Oct 28 22:47 httpfs-site.xml
-rw-r--r-- 1 root root 3518 Oct 28 22:47 kms-acls.xml
-rw-r--r-- 1 root root 5511 Oct 28 22:47 kms-site.xml
-rw-r--r-- 1 root root  690 Oct 28 22:47 yarn-site.xml

c.執行share目錄下的MapReduce程序

[root@hadoop101 hadoop]# ll share/hadoop/mapreduce
total 4972
-rw-r--r-- 1 root root  537521 Aug 18  2016 hadoop-mapreduce-client-app-2.7.3.jar
-rw-r--r-- 1 root root  773501 Aug 18  2016 hadoop-mapreduce-client-common-2.7.3.jar
-rw-r--r-- 1 root root 1554595 Aug 18  2016 hadoop-mapreduce-client-core-2.7.3.jar
-rw-r--r-- 1 root root  189714 Aug 18  2016 hadoop-mapreduce-client-hs-2.7.3.jar
-rw-r--r-- 1 root root   27598 Aug 18  2016 hadoop-mapreduce-client-hs-plugins-2.7.3.jar
-rw-r--r-- 1 root root   61745 Aug 18  2016 hadoop-mapreduce-client-jobclient-2.7.3.jar
-rw-r--r-- 1 root root 1551594 Aug 18  2016 hadoop-mapreduce-client-jobclient-2.7.3-tests.jar
-rw-r--r-- 1 root root   71310 Aug 18  2016 hadoop-mapreduce-client-shuffle-2.7.3.jar
-rw-r--r-- 1 root root  295812 Aug 18  2016 hadoop-mapreduce-examples-2.7.3.jar
drwxr-xr-x 2 root root    4096 Aug 18  2016 lib
drwxr-xr-x 2 root root      30 Aug 18  2016 lib-examples
drwxr-xr-x 2 root root    4096 Aug 18  2016 sources
[root@hadoop101 hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input/ output 'dfs[a-z.]+'

d.查看輸出結果

[root@hadoop101 hadoop]# ll output/
total 4
-rw-r--r-- 1 root root 11 Oct 28 22:54 part-r-00000
-rw-r--r-- 1 root root  0 Oct 28 22:54 _SUCCESS
[root@hadoop101 hadoop]# cat output/*
1    dfsadmin

官方WordCount案例

#創建目錄和文件
[root@hadoop101 hadoop]# mkdir wcinput [root@hadoop101 hadoop]# cd wcinput
/ [root@hadoop101 wcinput]# touch wc.input
#編輯wc.input文件 [root@hadoop101 wcinput]# vi wc.input [root@hadoop101 wcinput]# cat wc.input hadoop yarn hadoop mapreduce docker xixi
#回到Hadoop目錄 [root@hadoop101 wcinput]# cd ..
#執行程序 [root@hadoop101 hadoop]# hadoop jar share
/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount wcinput wcoutput
#查看結果 [root@hadoop101 hadoop]# ll wcoutput/ total 4 -rw-r--r-- 1 root root 44 Oct 28 23:04 part-r-00000 -rw-r--r-- 1 root root 0 Oct 28 23:04 _SUCCESS [root@hadoop101 hadoop]# cat wcoutput/part-r-00000 docker 1 hadoop 2 mapreduce 1 xixi 1 yarn 1

2). 偽分布式

單主機,需要啟動進程,啟動HDFS並運行MapReduce程序

a. 配置三種模式並存

[root@hadoop101 hadoop]# pwd
/opt/software/hadoop
[root@hadoop101 hadoop]# cd etc/
[root@hadoop101 etc]# ll
total 4
drwxr-xr-x 2 root root 4096 Aug 18  2016 hadoop
#拷貝hadoop文件夾 [root@hadoop101 etc]# cp
-r hadoop local [root@hadoop101 etc]# cp -r hadoop pseudo [root@hadoop101 etc]# cp -r hadoop full [root@hadoop101 etc]# ll total 16 drwxr-xr-x 2 root root 4096 Oct 28 23:14 full drwxr-xr-x 2 root root 4096 Aug 18 2016 hadoop drwxr-xr-x 2 root root 4096 Oct 28 23:14 local drwxr-xr-x 2 root root 4096 Oct 28 23:14 pseudo [root@hadoop101 etc]# rm -rf hadoop/ [root@hadoop101 etc]# ln -s pseudo/ hadoop [root@hadoop101 etc]# ll total 12 drwxr-xr-x 2 root root 4096 Oct 28 23:14 full lrwxrwxrwx 1 root root 7 Oct 28 23:15 hadoop -> pseudo/ drwxr-xr-x 2 root root 4096 Oct 28 23:14 local drwxr-xr-x 2 root root 4096 Oct 28 23:14 pseudo

b.進入hadoop配置文件目錄下,修改配置文件

[root@hadoop101 etc]# cd hadoop/
[root@hadoop101 hadoop]# pwd
/opt/software/hadoop/etc/hadoop
#配置core-site.xml         
[root@hadoop101 hadoop]# vim core-site.xml 
[root@hadoop101 hadoop]# cat core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
#<!-- 指定HDFS中NameNode的地址 --> <value>hdfs://localhost/</value> </property> </configuration>

#配置hdfs-site.xml [root@hadoop101 hadoop]# vim hdfs-site.xml [root@hadoop101 hadoop]# cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <!-- 指定HDFS副本的數量 --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
#配置 mapred-site.xml [root@hadoop101 hadoop]# cp mapred-site.xml.template mapred-site.xml [root@hadoop101 hadoop]# vim mapred-site.xml [root@hadoop101 hadoop]# cat mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property>
<!-- 指定MR運行在YARN上 --> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

#配置yarn-site.xml [root@hadoop101 hadoop]# vim yarn-site.xml [root@hadoop101 hadoop]# cat yarn-site.xml <?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property>
<!-- 指定YARN的ResourceManager的地址 --> <name>yarn.resourcemanager.hostname</name> <value>hadoop101</value> </property> <property>
<!-- Reducer獲取數據的方式 --> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
[root@hadoop101 hadoop]# vim hadoop-env.sh 
[root@hadoop101 hadoop]# cat hadoop-env.sh |grep JAVA_HOME
# The only required environment variable is JAVA_HOME.  All others are
# set JAVA_HOME in this file, so that it is correctly defined on
export JAVA_HOME=/opt/software/jdk

c. 配置ssh免密登陸

[root@hadoop101 hadoop]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
d6:a3:c3:ef:c8:76:09:6c:cd:1b:5c:1a:c6:a1:78:f4 root@hadoop101
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|        . .      |
|       o + .     |
|      . o.E .    |
|       oS=o+     |
|       o+.*.     |
|       .+. +     |
|       ..++      |
|       .ooo      |
+-----------------+
[root@hadoop101 hadoop]# ssh-copy-id hadoop101
The authenticity of host 'hadoop101 (10.0.0.131)' can't be established.
ECDSA key fingerprint is 80:18:52:09:3f:b9:8b:95:3c:dd:fa:d6:0c:98:15:7b.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@hadoop101's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop101'"
and check to make sure that only the key(s) you wanted were added.

d.啟動集群

#格式化hdfs文件系統 

[root@hadoop101 hadoop]# hdfs namenode -format

#啟動hadoop進程

[root@hadoop101 hadoop]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /opt/software/hadoop-2.7.3/logs/hadoop-root-namenode-hadoop101.out
localhost: starting datanode, logging to /opt/software/hadoop-2.7.3/logs/hadoop-root-datanode-hadoop101.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/software/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-hadoop101.out
starting yarn daemons
resourcemanager running as process 2014. Stop it first.
localhost: starting nodemanager, logging to /opt/software/hadoop-2.7.3/logs/yarn-root-nodemanager-hadoop101.out

e. 查看集群

[root@hadoop101 hadoop]# jps
2548 NameNode
3174 Jps
3063 NodeManager
2651 DataNode
2014 ResourceManager
2815 SecondaryNameNode

f.web端查看HDFS文件系統

hdfs:分布式文件系統,進程如下

DataNode

NameNode

SecondaryNameNode

http://10.0.0.131:50070 //ip為當前機器ip

g.操作集群

#在HD FS文件系統上創建一個文件

[root@hadoop101 hadoop]# hdfs dfs -touchz /hadoop.txt

[root@hadoop101 hadoop]# hdfs dfs -ls /
Found 1 items
-rw-r--r--   1 root supergroup          0 2020-10-30 21:31 /hadoop.txt
[root@hadoop101 hadoop]# hdfs dfs -mkdir /aaa
[root@hadoop101 hadoop]# hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - root supergroup          0 2020-10-30 21:34 /aaa
-rw-r--r--   1 root supergroup          0 2020-10-30 21:31 /hadoop.txt

#mapreduce:分布式計算 

通過Mapreduce進行簡單的單詞統計統計詞頻,使用自帶的demo

#在本地創建wordcount.txt文件,並將一段英文文檔粘貼到文檔中

[root@hadoop101 hadoop]# vim wordcount.txt
[root@hadoop101 hadoop]# cat wordcount.txt 
hdfs dfs -put wordcount.txt /

hadoop jar /soft/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /wordcount.txt /out

#將wordcount.txt發送到hadoop中

[root@hadoop101 hadoop]# hdfs dfs -put wordcount.txt /
[root@hadoop101 hadoop]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2020-10-30 21:34 /aaa
-rw-r--r--   1 root supergroup          0 2020-10-30 21:31 /hadoop.txt
-rw-r--r--   1 root supergroup        144 2020-10-30 21:43 /wordcount.txt

#運行MapReduce程序

[root@hadoop101 hadoop]# hadoop jar /opt/software/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /wordcount.txt /out

 

[root@hadoop101 ~]# hdfs dfs -get /out/part-r-00000 .
[root@hadoop101 ~]# ll
total 8
-rw-------. 1 root root 1423 Sep  5 12:14 anaconda-ks.cfg
-rw-r--r--  1 root root  165 Oct 30 22:22 part-r-00000
[root@hadoop101 ~]# cat part-r-00000 
-put    1
/    1
/out    1
/soft/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar    1
/wordcount.txt    1
dfs    1
hadoop    1
hdfs    1
jar    1
wordcount    1
wordcount.txt    1

#YARN的瀏覽器頁面查看

http://10.0.0.131:8088/cluster

 

3)完全分布式

a.虛擬機准備

192.168.0.231    hadoop100
192.168.0.67     hadoop101
192.168.0.224    hadoop102

b.設置host(每台主機都設置)

[root@hadoop101 ~]# vim /etc/hosts
[root@hadoop101 ~]# cat /etc/hosts
::1    localhost    localhost.localdomain    localhost6    localhost6.localdomain6
127.0.0.1    localhost    localhost.localdomain    localhost4    localhost4.localdomain4
192.168.0.231    hadoop100
192.168.0.67     hadoop101
192.168.0.224    hadoop102

c.配置ssh免密登陸

[root@hadoop100 ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:ICTqNaED9KkO30osSsa2prTPKQh/ju3CvLPSFZLNcEY root@hadoop101
The key's randomart image is:
+---[RSA 2048]----+
|o..oE            |
|..+o=            |
|.o %. .          |
|. * =. .         |
|.o . .  S        |
|=o ..            |
|+%+..            |
|O=@+o            |
|=+B%+            |
+----[SHA256]-----+
[root@hadoop100 ~]# ssh-copy-id 192.168.0.67
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '192.168.0.67 (192.168.0.67)' can't be established.
ECDSA key fingerprint is SHA256:2SoZqpuaEJKIuDV64Hydg+892ggDkDkYAASKS0QFC5o.
ECDSA key fingerprint is MD5:3d:a5:c5:95:b9:43:f7:3e:e1:3f:63:d0:57:0b:14:25.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.168.0.67's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh '192.168.0.67'"
and check to make sure that only the key(s) you wanted were added.

[root@hadoop100 ~]# ssh-copy-id 192.168.0.224
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '192.168.0.224 (192.168.0.224)' can't be established.
ECDSA key fingerprint is SHA256:kKoP8PDUCMrFswXVR+xlX6uzwrM/b69911yWEeykBmg.
ECDSA key fingerprint is MD5:37:ab:d1:42:70:60:f9:94:7e:03:02:af:9a:f8:95:9f.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.168.0.224's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh '192.168.0.224'"
and check to make sure that only the key(s) you wanted were added.
#驗證
[root@hadoop100 ~]# ssh root@192.168.0.67 Last login: Tue Nov 3 21:42:15 2020 from 192.168.0.231 [root@hadoop101 ~]# exit logout Connection to 192.168.0.67 closed. [root@hadoop100 ~]# ssh root@192.168.0.224 Last login: Tue Nov 3 21:14:02 2020 from 123.139.40.135 [root@hadoop102 ~]# exit logout Connection to 192.168.0.224 closed.

 d. 配置JAVA_HOME

在/server/tools/hadoop/etc/hadoop目錄下的三個腳本
hadoop-env.sh 
yarn-env.sh
mapred-env.sh
都需要配置JAVA_HOME變量,全路徑:
export JAVA_HOME=/home/java/jdk
[root@hadoop100 tools]# cd hadoop/etc/hadoop/
[root@hadoop100 hadoop]# vim hadoop-env.sh 
[root@hadoop100 hadoop]# vim yarn-env.sh
[root@hadoop100 hadoop]# vim mapred-env.sh 
[root@hadoop100 hadoop]# grep JAVA_HOME hadoop-env.sh 
# The only required environment variable is JAVA_HOME.  All others are
# set JAVA_HOME in this file, so that it is correctly defined on
export JAVA_HOME=/server/tools/jdk
[root@hadoop101 hadoop]# grep JAVA_HOME yarn-env.sh 
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
if [ "$JAVA_HOME" != "" ]; then
  #echo "run java in $JAVA_HOME"
  JAVA_HOME=/server/tools/jdk
if [ "$JAVA_HOME" = "" ]; then
  echo "Error: JAVA_HOME is not set."
JAVA=$JAVA_HOME/bin/java
[root@hadoop100 hadoop]# grep JAVA_HOME mapred-env.sh 
export JAVA_HOME=/server/tools/jdk

e.集群配置

# core-site.xml
[root@hadoop100 hadoop]# cat core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop100:8020</value> #HDFS的URI,文件系統://namenode標識:端口號 </property> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop</value> #namenode上本地的hadoop臨時文件夾 </property> </configuration> [root@hadoop100 hadoop]# mkdir -p /data/hadoop
#hdfs-site.xml
[root@hadoop100 hadoop]# vim hdfs-site.xml
[root@hadoop100 hadoop]# cat hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>  #副本個數,配置默認是3,應小於datanode機器數量
</property> </configuration>
#mapred-site.xml
[root@hadoop100 hadoop]# cp mapred-site.xml.template mapred-site.xml [root@hadoop100 hadoop]# vim mapred-site.xml [root@hadoop100 hadoop]# cat mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
#yarn-site.xml
[root@hadoop100 hadoop]# vim yarn-site.xml
[root@hadoop100 hadoop]# cat yarn-site.xml 
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop100</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
#slaves
[root@hadoop100 hadoop]# vim slaves [root@hadoop100 hadoop]# cat slaves hadoop101 hadoop102

f.分發配置文件

[root@hadoop100 hadoop]# scp -r /server/tools/hadoop/etc/hadoop/ root@hadoop101:/server/tools/hadoop/etc/
[root@hadoop100 hadoop]# scp -r /server/tools/hadoop/etc/hadoop/ root@hadoop102:/server/tools/hadoop/etc/

g.格式化,啟動hadoop

[root@hadoop100 hadoop]# start-all.sh
[root@hadoop100 hadoop]# hdfs namenode -format

h.驗證

[root@hadoop100 hadoop]# jps
3825 SecondaryNameNode
3634 NameNode
4245 Jps
3981 ResourceManager
1263 WrapperSimpleApp

[root@hadoop101 tools]# jps
4214 NodeManager
1258 WrapperSimpleApp
4364 Jps
3934 SecondaryNameNode
3774 DataNode
[root@hadoop102 hadoop]# jps
3376 NodeManager
3207 DataNode
3673 SecondaryNameNode
1261 WrapperSimpleApp
3951 Jps

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM