Hadoop詳細安裝配置過程


步驟一基礎環境搭建

 

1.下載並安裝ubuntukylin-15.10-desktop-amd64.iso

2.安裝ssh

sudo apt-get install openssh-server openssh-client

3.搭建vsftpd

#sudo apt-get update

#sudo apt-get install vsftpd

配置參考 http://www.linuxidc.com/Linux/2015-01/111970.htm

http://jingyan.baidu.com/article/67508eb4d6c4fd9ccb1ce470.html

http://zhidao.baidu.com/link?url=vEmPmg5sV6IUfT4qZqivtiHtXWUoAQalGAL7bOC5XrTumpLRDfa-OmFcTzPetNZUqAi0hgjBGGdpnldob6hL5IhgtGVWDGSmS88iLvhCO4C

vsftpd的開始、關閉和重啟

$sudo /etc/init.d/vsftpd start   #開始
$sudo /etc/init.d/vsftpd stop    #關閉
$sudo /etc/init.d/vsftpd restart   #重啟

4.安裝jdk1.7

sudo chown -R hadoop:hadoop /opt

cp /soft/jdk-7u79-linux-x64.gz /opt

sudo vi /etc/profile

alias untar='tar -zxvf'

sudo source /etc/profile

source /etc/profile

untar jdk*

環境變量配置
# vi /etc/profile
●在profile文件最后加上
# set java environment
export JAVA_HOME=/opt/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
配置完成后,保存退出。
●不重啟,更新命令
#source /etc/profile
●測試是否安裝成功
# Java –version

其他問題:

1.sudo 出現unable to resolve host 解決方法

參考 http://blog.csdn.net/yuzhiyuxia/article/details/19998665

2.Linux開機時停在 Starting sendmail 不動了的解決方案

參考 http://blog.chinaunix.net/uid-21675795-id-356995.html

3.ubuntu 安裝軟件時出現 E: Unable to locate package vsftpd

參考 http://www.ithao123.cn/content-2584008.html

4.[Linux/Ubuntu] vi/vim 使用方法講解

參考 http://www.cnblogs.com/emanlee/archive/2011/11/10/2243930.html

 

步驟二環境克隆

 

1.克隆master虛擬機至node1 node2

分別修改master的主機名為master、node1的主機名為node1、node2的主機名為node2

(啟動node1、node2系統默認分配遞增ip,無需手動修改)

分別修改/etc/hosts中的ip和主機名(包含其他節點ip和主機名)

---------

步驟三配置ssh免密碼連入

hadoop@node1:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

Generating public/private dsa key pair.

Created directory '/home/hadoop/.ssh'.

Your identification has been saved in /home/hadoop/.ssh/id_dsa.

Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.

The key fingerprint is:

SHA256:B8vBju/uc3kl/v9lrMqtltttttCcXgRkQPbVoU hadoop@node1

The key's randomart image is:

+---[DSA 1024]----+

| ...o.o. |

| o+.E . |

| . oo + |

| .. + + |

|o +. o ooo +|

|=o. . o. ooo. o.|

|*o... .+=o .+++.+|

+----[SHA256]-----+

hadoop@node1:~$ cd .ssh

hadoop@node1:~/.ssh$ ll

總用量 16

drwx------ 2 hadoop hadoop 4096 Jul 24 20:31 ./

drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31 ../

-rw------- 1 hadoop hadoop 668 Jul 24 20:31 id_dsa

-rw-r--r-- 1 hadoop hadoop 602 Jul 24 20:31 id_dsa.pub

hadoop@node1:~/.ssh$ cat id_dsa.pub >> authorized_keys

hadoop@node1:~/.ssh$ ll

總用量 20

drwx------ 2 hadoop hadoop 4096 Jul 24 20:32 ./

drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31 ../

-rw-rw-r-- 1 hadoop hadoop 602 Jul 24 20:32 authorized_keys

-rw------- 1 hadoop hadoop 668 Jul 24 20:31 id_dsa

-rw-r--r-- 1 hadoop hadoop 602 Jul 24 20:31 id_dsa.pub

步驟四機回環ssh免密碼登錄測試

hadoop@node1:~/.ssh$ ssh localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established.

ECDSA key fingerprint is SHA256:daO0dssyqt12tt9yGUauImOh6tt6A1SgxzSfSmpQqJVEiQTxas.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

* Documentation: https://help.ubuntu.com/

270 packages can be updated.

178 updates are security updates.

New release '16.04 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Sun Jul 24 20:21:39 2016 from 192.168.219.1

hadoop@node1:~$ exit

注銷

Connection to localhost closed.

hadoop@node1:~/.ssh$

出現以上信息說明操作成功,其他兩節點同樣操作

結點(master)能通過SSH免密碼錄兩子結點slave

hadoop@node1:~/.ssh$ scp hadoop@master:~/.ssh/id_dsa.pub ./master_dsa.pub

The authenticity of host 'master (192.168.219.128)' can't be established.

ECDSA key fingerprint is SHA256:daO0dssyqtt9yGUuImOh646A1SgxzSfatSmpQqJVEiQTxas.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'master,192.168.219.128' (ECDSA) to the list of known hosts.

hadoop@master's password:

id_dsa.pub 100% 603 0.6KB/s 00:00

hadoop@node1:~/.ssh$ cat master_dsa.pub >> authorized_keys

如上過程顯示node1結點通過scp令遠程master結點並復制master的公鑰文件到當前的目錄下

這一過程需要密碼驗證接着master結點的公鑰文件追加至authorized_keys文件通過這步操作

如果不出問題master結點就可以通過ssh遠程免密碼連接node1結點了master結點操作如下

hadoop@master:~/.ssh$ ssh node1

The authenticity of host 'node1 (192.168.219.129)' can't be established.

ECDSA key fingerprint is SHA256:daO0dssyqt9yGUuImOh3466A1SttgxzSfSmpQqJVEiQTxas.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'node1,192.168.219.129' (ECDSA) to the list of known hosts.

Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

* Documentation: https://help.ubuntu.com/

270 packages can be updated.

178 updates are security updates.

New release '16.04 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Sun Jul 24 20:39:30 2016 from 192.168.219.1

hadoop@node1:~$ exit

注銷

Connection to node1 closed.

hadoop@master:~/.ssh$

由上圖可以看出node1結點首連接時需要“YES”確認連接

這意味着master結點連接node1結點時需要人工詢問無法自動連接

輸入yes后成功接入緊接着注銷退出至master結點要實現ssh免密碼連接至其它結點

還差一步只需要再執行一遍ssh node1如果沒有要求你輸入”yes”就算成功了過程如下

hadoop@master:~/.ssh$ ssh node1

Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

* Documentation: https://help.ubuntu.com/

270 packages can be updated.

178 updates are security updates.

New release '16.04 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Sun Jul 24 20:47:20 2016 from 192.168.219.128

hadoop@node1:~$ exit

注銷

Connection to node1 closed.

hadoop@master:~/.ssh$

如上圖所,master已經可以通過ssh免密碼錄至node1結點了。

對node2結點也可以上面同樣的方法進行

表面上看,這兩結點的ssh免密碼錄已經配置成功,但我們還需要對結點master也要進行上面的同樣工作,

這一步有點讓人困惑,但有原因的,具體原因現也說不太好,據說真實物理結點時需要做這項工作,

jobtracker有可能會分布其它結點上,jobtracker有不存master結點上的可能性。

對master自身進行ssh免密碼錄測試工作:

hadoop@master:~/.ssh$ scp hadoop@master:~/.ssh/id_dsa.pub ./master_dsa.pub

The authenticity of host 'master (127.0.0.1)' can't be established.

ECDSA key fingerprint is SHA256:daO0dssttqt9yGUuImOahtt166AgxttzSfSmpQqJVEiQTxas.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'master' (ECDSA) to the list of known hosts.

id_dsa.pub 100% 603 0.6KB/s 00:00

hadoop@master:~/.ssh$ cat master_dsa.pub >> authorized_key

hadoop@master:~/.ssh$ ssh master

Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

* Documentation: https://help.ubuntu.com/

270 packages can be updated.

178 updates are security updates.

New release '16.04 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

Last login: Sun Jul 24 20:39:24 2016 from 192.168.219.1

hadoop@master:~$ exit

注銷

Connection to master closed.

至此,SSH免密碼錄已經配置成功。

-------------------------

解壓hadoop-2.6.4.tar.gz

/opt$untar hadoop-2.6.4.tar.gz

mv hadoop-2.6.4.tar.gz hadoop

步驟五更新環境變量

vi /etc/profile

export JAVA_HOME=/opt/jdk1.7.0_79

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export HADOOP_HOME=/opt/hadoop

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

alias untar='tar -zxvf'

alias viprofile='vi /etc/profile'

alias sourceprofile='source /etc/profile'

alias catprofile='cat /etc/profile'

alias cdhadoop='cd /opt/hadoop/'

source /etc/profile

------------------

步驟六修改配置

一共有7文件要修改:

$HADOOP_HOME/etc/hadoop/hadoop-env.sh

$HADOOP_HOME/etc/hadoop/yarn-env.sh

$HADOOP_HOME/etc/hadoop/core-site.xml

$HADOOP_HOME/etc/hadoop/hdfs-site.xml

$HADOOP_HOME/etc/hadoop/mapred-site.xml

$HADOOP_HOME/etc/hadoop/yarn-site.xml

$HADOOP_HOME/etc/hadoop/slaves

$HADOOP_HOME表hadoop根目錄

a) hadoop-env.sh yarn-env.sh

這二文件修改JAVA_HOME后的目錄,改成實際本機jdk所目錄位置

vi etc/hadoop/hadoop-env.sh (及 vi etc/hadoop/yarn-env.sh)

到下面這行的位置,改成(jdk目錄位置,大家根據實際情況修改)

export JAVA_HOME=/opt/jdk1.7.0_79

另外 hadoop-env.sh , 建議加上這句:

export HADOOP_PREFIX=/opt/hadoop

b) core-site.xml 參考下面的內容修改

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://master:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/opt/hadoop/tmp</value>

</property>

</configuration>

注:/opt/hadoop/tmp 目錄如不存,則先mkdir手動創建

core-site.xml的完整參數請參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/core-default.xml

c) hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.datanode.ipc.address</name>

<value>0.0.0.0:50020</value>

</property>

<property>

<name>dfs.datanode.http.address</name>

<value>0.0.0.0:50075</value>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

</configuration>

:dfs.replication 表數據副數,一不大於 datanode 的節點數。

hdfs-site.xml的完整參數請參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

d) mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

mapred-site.xml的完整參數請參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

e)yarn-site.xml

<?xml version="1.0"?>

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

yarn-site.xml的完整參數請參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

另外,hadoop 1.x與2.x相比, 1.x的很多參數已經被標識過時,具體可參考

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

最后一文件slaves暫時不管(可以先mv slaves slaves.bak 將它改名),上述配置弄后,就可以master上啟用 NameNode測試了,方法:

$HADOOP_HOME/bin/hdfs namenode –format 式化

16/07/25 。。。

16/07/25 20:34:42 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1076359968-127.0.0.1-140082506

16/07/25 20:34:42 INFO common.Storage: Storage directory /opt/hadoop/tmp/dfs/name has been successfully formatted.

16/07/25 20:34:43 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

16/07/25 20:34:43 INFO util.ExitUtil: Exiting with status 0

16/07/25 20:34:43 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at master/127.0.0.1

************************************************************/

等看到這時,表示格式化ok

$HADOOP_HOME/sbin/start-dfs.sh

動完成后,輸入jps (ps -ef | grep ...)查看進程,如果看到以下二進程:

5161 SecondaryNameNode

4989 NameNode

master節點基ok了

再輸入$HADOOP_HOME/sbin/start-yarn.sh ,完成后,再輸入jps查看進程

5161 SecondaryNameNode

5320 ResourceManager

4989 NameNode

如果看到這3進程,表yarn也ok了

f) 修改 /opt/hadoop/etc/hadoop/slaves

如果剛才mv slaves slaves.bak對該文件重名過,先運行 mv slaves.bak slaves 把名字改回來,再

vi slaves 編輯該文件,輸入

node1

node2

保存退出,最后運行

$HADOOP_HOME/sbin/stop-dfs.sh

$HADOOP_HOME/sbin/stop-yarn.sh

停掉剛才動的服務

步驟七master上的hadoop目錄復制到 node1,node2

仍然保持master器上

cd 先進入目錄 cd /opt

zip -r hadoop.zip hadoop

scp -r hadoop.zip hadoop@node1:/opt/

scp -r hadoop.zip hadoop@node2:/opt/

unzip hadoop.zip

注: node1 、 node2 上的hadoop臨時目錄(tmp)及數據目錄(data),仍然要先手動創建。

-----

步驟八驗證

master節點上,重新

$HADOOP_HOME/sbin/start-dfs.sh

$HADOOP_HOME/sbin/start-yarn.sh

------

hadoop@master:/opt/hadoop/sbin$ start-dfs.sh

Starting namenodes on [master]

master: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-master.out

node1: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-node1.out

node2: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-node2.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out

------

hadoop@master:/opt/hadoop/sbin$ start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-master.out

node1: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-node1.out

node2: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-node2.out

------

的話,master節點上有幾下3進程:

ps -ef | grep ResourceManager

ps -ef | grep SecondaryNameNode

ps -ef | grep NameNode

7482 ResourceManager

7335 SecondaryNameNode

7159 NameNode

slave01、slave02上有幾下2進程:

ps -ef | grep DataNode

ps -ef | grep NodeManager

2296 DataNode

2398 NodeManager

同時可瀏覽:

http://master:50070/

http://master:8088/

查看狀態

t1JPG

t2JPG

 

另外也可以通過 bin/hdfs dfsadmin -report 查看hdfs的狀態報告

其它注意事項:

a) master(即:namenode節點)若要重新式化,請先清空各datanode上的data目錄(最連tmp目錄也一起清空),式化完成后,動dfs時,datanode會動失

b) 如果覺得master器上只運行namenode比較浪費,想把master也當成一datanode,直接slaves文件里,添加一行master即可

c) 了方便操作,可修改/etc/profile,把hadoop所需的lib目錄,先加到CLASSPATH環境變量,同時把hadoop/bin,hadoop/sbin目錄也加入到PATH變量,可參考下面的內容(根據實際情況修改)

export HADOOP_HOME=/home/hadoop/hadoop-2.6.0

export JAVA_HOME=/usr/java/jdk1.7.0_51

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

 

 

                                                                                                                                                                                                                                                                                                                 by colplay

                                                                                                                                                                                                                                                                                                              2016.07.25


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM