Hadoop單機和偽分布式安裝


本教程為單機版+偽分布式的Hadoop,安裝過程寫的有些簡單,只作為筆記方便自己研究Hadoop用。

環境

操作系統 Centos 6.5_64bit  
本機名稱 hadoop001  
本機IP 192.168.3.128  
JDK jdk-8u40-linux-x64.rpm 點此下載
Hadoop 2.7.3 點此下載

 

Hadoop 有兩個主要版本,Hadoop 1.x.y 和 Hadoop 2.x.y 系列,比較老的教材上用的可能是 0.20 這樣的版本。Hadoop 2.x 版本在不斷更新,本教程均可適用。如果需安裝 0.20,1.2.1這樣的版本,本教程也可以作為參考,主要差別在於配置項,配置請參考官網教程或其他教程。

 

單機安裝


 

一、創建Hadoop用戶

為了方便之后的操作,不干擾其他用戶,咱們先建一個單獨的Hadoop用戶並設置密碼[root@localhost ~]# useradd -m hadoop -s /bin/bash

[root@localhost ~]# passwd hadoop
Changing password for user hadoop.
New password: 
BAD PASSWORD: it is based on a dictionary word
BAD PASSWORD: is too simple
Retype new password: 
passwd: all authentication tokens updated successfully.

 

//還要修改host文件
[root@hadoop001 .ssh]# vim /etc/hosts
192.168.3.128 hadoop001

 

  

 

二、創建SSH無密碼登錄

單節點、集群都需要用到SSH登錄,方便無障礙登錄和通訊。

[hadoop@hadoop001 .ssh]$ cd ~/.ssh/
[hadoop@hadoop001 .ssh]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): // 回車
Enter passphrase (empty for no passphrase):   //回車
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
97:75:b0:56:3b:57:8c:1f:b1:51:b6:d9:9f:77:f3:cf hadoop@hadoop001
The key's randomart image is:
+--[ RSA 2048]----+
|            . .=*|
|             +.+O|
|            + +=+|
|           + . o+|
|        S o    o+|
|         .      =|
|                .|
|               ..|
|                E|
+-----------------+
[hadoop@hadoop001 .ssh]$ cat ./id_rsa.pub >> ./authorized_keys
[hadoop@hadoop001 .ssh]$ ll
total 12
-rw-rw-r--. 1 hadoop hadoop  398 Mar 14 14:09 authorized_keys
-rw-------. 1 hadoop hadoop 1675 Mar 14 14:09 id_rsa
-rw-r--r--. 1 hadoop hadoop  398 Mar 14 14:09 id_rsa.pub
[hadoop@hadoop001 .ssh]$ chmod 644 authorized_keys
[hadoop@hadoop001 .ssh]$ ssh hadoop001
Last login: Tue Mar 14 14:11:52 2017 from hadoop001

這樣的話本機免密碼登錄已經配置成功了。

 

三、安裝JDK

rpm -qa  |grep java

// 卸載所有出現的包 
rpm -e --nodeps java-x.x.x-gcj-compat-x.x.x.x-40jpp.115

// 執行jdk-8u40-linux-x64.rpm包,不用配環境變量,不過需要加JAVA_HOME

echo "JAVA_HOME"=/usr/java/latest/ >> /etc/environment

 

測試安裝成功與否

[hadoop@hadoop001 soft]$ java -version
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)


四、安裝Hadoop

//安裝到opt目錄下
[root@hadoop001 soft]# tar -zxf hadoop-2.7.3.tar.gz -C /opt/

修改目錄權限

[root@hadoop001 opt]# ll
total 20
drwxr-xr-x.  9 root  root  4096 Aug 17  2016 hadoop-2.7.3

[root@hadoop001 opt]# chown -R hadoop:hadoop hadoop-2.7.3/
[root@hadoop001 opt]# ll
total 20
drwxr-xr-x.  9 hadoop hadoop 4096 Aug 17  2016 hadoop-2.7.3

添加環境變量

[hadoop@hadoop001 bin]$ vim ~/.bash_profile
# hadoop 
HADOOP_HOME=/opt/hadoop-2.7.3

PATH=$PATH:$HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export PATH

 

測試安裝成功與否

[hadoop@hadoop001 bin]$ hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

 

單詞統計

創建輸入文件夾input放輸入文件

[root@hadoop001 /]# mkdir -p /data/input

//創建測試文件word.txt

[root@hadoop001 /]# vim word.txt

Hi, This is a test file.
Hi, I love hadoop and love you .

//授權
[root@hadoop001 /]# chown hadoop:hadoop /data/input/word.txt

//運行單詞統計
[hadoop@hadoop001 hadoop-2.7.3]$ hadoop jar /opt/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /data/input/word.txt /data/output/

//...中間日志省略
17/03/14 15:22:44 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=592316
		FILE: Number of bytes written=1165170
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=3
		Map output records=14
		Map output bytes=114
		Map output materialized bytes=127
		Input split bytes=90
		Combine input records=14
		Combine output records=12
		Reduce input groups=12
		Reduce shuffle bytes=127
		Reduce input records=12
		Reduce output records=12
		Spilled Records=24
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=525336576
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=59
	File Output Format Counters 
		Bytes Written=85
 

  

執行成功,到output目錄下看結果

[hadoop@hadoop001 output]$ vim part-r-00000
.       1
Hi,     2
I       1
This    1
a       1
and     1
file.   1
hadoop  1
is      1
love    2
test    1
you     1

【至此單機安裝完成】

 

偽分布式安裝


Hadoop 可以在單節點上以偽分布式的方式運行,Hadoop 進程以分離的 Java 進程來運行,節點既作為 NameNode 也作為 DataNode,同時,讀取的是 HDFS 中的文件。

Hadoop 的配置文件位於 /$HADOOP_HOME/etc/hadoop/ 中,偽分布式至少需要修改2個配置文件 core-site.xmlhdfs-site.xml

Hadoop的配置文件是 xml 格式,每個配置以聲明 property 的 name 和 value 的方式來實現。

 

修改core-site.xml

<configuration>

 <property>
      <name>hadoop.tmp.dir</name>
              <value>file:/opt/hadoop-2.7.3/tmp</value>
              <description>Abase for other temporary directories.</description>
 </property>
                                                       
 <property>
      <name>fs.defaultFS</name>
      <value>hdfs://hadoop001:9000</value>
   </property>

</configuration>

 

修改hdfs-site.xml

<configuration>
<property>
             <name>dfs.replication</name>
             <value>1</value>
</property>

<property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/data/dfs/name</value>
 </property>

 <property>
           <name>dfs.datanode.data.dir</name>
           <value>file:/data/dfs/data</value>
 </property>
</configuration>

偽分布式雖然只需要配置 fs.defaultFS 和 dfs.replication 就可以運行(官方教程如此),不過若沒有配置 hadoop.tmp.dir 參數,則默認使用的臨時目錄為 /tmp/hadoo-hadoop,而這個目錄在重啟時有可能被系統清理掉,導致必須重新執行 format 才行。所以我們進行了設置,同時也指定 dfs.namenode.name.dir 和 dfs.datanode.data.dir,否則在接下來的步驟中可能會出錯。

 

修改mapred-site.xml

文件默認不存在,只有一個模板,復制一份

[hadoop@hadoop001 hadoop]$ cp mared-site.xml.template mared-site.xml

configration下添加

<property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
 </property>
 <property>
     <name>mapreduce.jobhistory.address</name>
     <value>master:10020</value>
 </property>
 <property>
     <name>mapreduce.jobhistory.webapp.address</name>
     <value>master:19888</value>
 </property>

  

修改yarn-site.xml

 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
  </property>
  <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
   <name>yarn.resourcemanager.address</name>
   <value>hadoop001:8032</value>
  </property>
  <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>hadoop001:8030</value>
  </property>
  <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>hadoop001:8035</value>
  </property>
  <property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>hadoop001:8033</value>
  </property>
  <property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>hadoop001:8088</value>
  </property>

  

 

格式化namenode

[hadoop@hadoop001 hadoop]$ hdfs namenode –format

 

好,格式化后啟動namenode和datanode的守護進程,發現報錯

image

設置一下hadoop-env.sh文件,把${JAVA_HOME}替換成絕對路徑

[hadoop@hadoop001 hadoop-2.7.3]$ vim etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.8.0_40/

重新啟動start-dfs.sh + start-yarn.sh 或者 start-all.sh

image

守護進程已經成功啟動了,證明配置偽分布式成功。

 

遠程訪問http://192.168.3.128:50070,發現無法訪問,本地可以訪問。

原因其實是修改了hadoop-env.sh 后沒有重啟格式化namenode,重新格式化后發現datanode啟動不起來了。

最后,刪除datanode數據文件下VERSION文件,格式化后重啟就可以了。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM