完全分布式模式:
前面已經說了本地模式和偽分布模式,這兩種在hadoop的應用中並不用於實際,因為幾乎沒人會將整個hadoop集群搭建在一台服務器上(hadoop主要是圍繞:分布式計算和分布式存儲,如果以一台服務器做,那就完全違背了hadoop的核心方法)。簡單說,本地模式是hadoop的安裝,偽分布模式是本地搭建hadoop的模擬環境。(當然實際上並不是這個樣子的,小博主有機會給大家說!)
那么在hadoop的搭建,其實真正用於生產的就是完全分布式模式:
思路簡介
域名解析
ssh免密登陸
java和hadoop環境
配置hadoop文件
復制主節點到其他節點
格式化主節點
hadoop搭建過程+簡介
在搭建完全分布式前大家需要了解以下內容,以便於大家更好的了解hadoop環境:
1.hadoop的核心:分布式存儲和分布式計算(用官方的說法就是HDFS和MapReduce)
2.集群結構:1+1+n 集群結構(主節點+備用節點+多個從節點)
3.域名解析:這里為了方便,我們選擇修改/etc/hosts實現域名解析(hadoop會在.../etc/hadoop/salves下添加從節點,這里需要解析名,當然你也能直接輸入ip地址,更簡單)
4.hadoop的命令發放,需要從ssh接口登錄到其他服務器上,所以需要配置ssh免密登陸
5.本文采取1+1+3 集群方式:域名為:s100(主),s10(備主),s1,s2,s3(從)
一:配置域名解析
主——s100:
[root@localhost ~]# vim /etc/hosts
1 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 2 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 3 192.168.1.68 s100 4 192.168.1.108 s1 5 192.168.1.104 s2 6 192.168.1.198 s3 7 192.168.1.197 s10
將s100上的/etc/hosts拷貝到其他hadoop的集群服務器上。例如:
將s100的/etc/hosts拷貝到s1上
[root@localhost ~]# scp /etc/hosts root@192.168.1.108:/etc/hosts The authenticity of host '192.168.1.108 (192.168.1.108)' can't be established. RSA key fingerprint is dd:64:75:5f:96:11:07:39:a3:fb:aa:3c:30:ae:59:82. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.1.108' (RSA) to the list of known hosts. root@192.168.1.108's password: hosts 100% 246 0.2KB/s 00:00
將所有服務器的域名解析配置完成,進行下一步
二:配置ssh免密碼登錄主——s100:
ssh生成相應密鑰對:id_rsa私鑰和id_rsa.pub公鑰
[root@localhost ~]# ssh-keygen -t rsa -P '' Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: a4:6e:8d:31:66:e1:92:04:37:8e:1c:a5:83:5e:39:c5 root@localhost.localdomain The key's randomart image is: +--[ RSA 2048]----+ | o.=. | | o BoE | |. =+o . . | |. .o.o + | | . o B S | | = = | | + . | | . | | | +-----------------+ [root@localhost ~]# cd /root/.ssh/ [root@localhost .ssh]# ls id_rsa id_rsa.pub known_hosts
默認是存在/當前user/.ssh(/root/.ssh或者/home/user/.ssh)下的!
有了密鑰對:將id_rsa.pub加到授權中:
[root@localhost .ssh]# cat id_rsa.pub >> authorized_keys(/root/.ssh下)
[root@localhost .ssh]# ls
authorized_keys id_rsa id_rsa.pub known_hosts
試一下是否本地免密登陸設置成功:
[root@localhost .ssh]# ssh localhost The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is 9e:e0:91:0f:1f:98:af:1a:83:5d:33:06:03:8a:39:93. Are you sure you want to continue connecting (yes/no)? yes(第一次登陸需要確定) Warning: Permanently added 'localhost' (RSA) to the list of known hosts. Last login: Tue Dec 26 19:09:23 2017 from 192.168.1.156 [root@localhost ~]# exit logout Connection to localhost closed.
ok!沒有問題,那么配置其他服務器,其實只需要把本機s100的id_rsa.pub復制到其他服務器上就可以了!
這里就選擇ssh-copy-id命令傳送到其他服務器上
[root@localhost .ssh]# ssh-copy-id root@s1(s1是主機地址,這里提醒大家一下,因為有人因為這個問題問過我╭(╯^╰)╮) The authenticity of host 's1 (192.168.1.108)' can't be established. RSA key fingerprint is dd:64:75:5f:96:11:07:39:a3:fb:aa:3c:30:ae:59:82. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 's1' (RSA) to the list of known hosts. root@s1's password: Now try logging into the machine, with "ssh 'root@s1'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting.
主節點
三:配置java環境和安裝hadoop(hadoop環境)
備注:這里小伙伴必須要知道的是,不管hadoop的主節點還是從節點甚至說備主節點,他們的java環境和hadoop環境都一樣,所以我們只需要配置完一台服務器,就完成了所有的配置過程
因為完全分布模式也是在本地模式的基礎上配置的,所以我們首先配置本地模式:
完全分布式模式 = 本地模式 + 配置文件
java環境和hadoop的安裝等過程就是前面所說的本地模式了,這里就不多說了:
四:配置內容:
備注:對於配置文件以后會有時間會單獨寫一篇相關的文檔
主要修改以下五個文件:
hadoop的配置文件:/data/hadoop/etc/hadoop [root@localhost hadoop]# cd /data/hadoop/etc/hadoop [root@localhost hadoop]# ls core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml4 slaves
配置 core-site.xml:
主要:指定主節點
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://s100/</value> </property> #臨時文件 <property> <name>hadoop.tmp.dir</name> <value>/root/hadoop</value> </property> </configuration>
配置hdfs-site.xml:
主要:指定備份數量
<configuration> #默認備份數為3,如果采取默認數,那么slaves不能少於備份數 <property> <name>dfs.replication</name> <value>2</value>#備份數 </property> #備主 <property> <name>dfs.namenode.secondary.http-address</name> <value>s10:50000</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///${hadoop.tmp.dir}/dfs/name</value> </property> </configuration>
配置mapred-site.xml:
主要:指定資源管理yran方法
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
配置yarn-site.xml:
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>s100</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
配置slaves:
s1
s2
s3
五:scp-java環境和hadoop配置文件(hadoop環境),java環境直接安裝
做到這里,基本就完成了,現在就把主節點的所以配置都放到從節點上!
scp -r /data/hadoop root@s1:/data/
復制hadoop
[root@localhost ~]# scp /etc/profile root@s1:/etc/profile
復制環境變量
登錄到s1中執行source
[root@localhost ~]# ssh s1 Last login: Wed Dec 27 23:18:48 2017 from s100 [root@localhost ~]# source /etc/profile
s1配置完成,其他的服務器一樣!
六:格式化主節點
[root@localhost ~]# hadoop namenode -format
啟動hadoop:
start-all.sh
關閉hadoop:
stop-all.sh
jps查詢進程信息
主節點:
[root@localhost ~]# jps 30996 Jps 30645 NameNode 30917 ResourceManager
2主節點:
[root@localhost ~]# jps 33571 Jps 33533 SecondaryNameNode
從節點:
[root@localhost ~]# jps 33720 Jps 33691 NodeManager 33630 DataNode