本文主要介紹基本的hadoop的搭建過程。首先說下我的環境准備。我的筆記本使用的是Windows10專業版,裝的虛擬機軟件為VMware WorkStation Pro,虛擬機使用的系統為centos7。hadoop安裝需要的軟件有hadoop-2.6.0,jdk-1.8.0。軟件版本可不同,請網友們自行百度下載。
整體規划
1.本次集群搭建共需要四個節點,每個節點都是最小化安裝的centos7。並且每個節點都有一個zgw用戶。將安裝所需要的hadoop,jdk文件已預先放置在了zgw用戶的家目錄。
2.四個節點的名字分別為namenode,datanode1,datanode2,SecondNamenode。其中namenode為Master節點。
CentOS安裝之前的准備工作
1.准備四台安裝好的centos7虛擬機。安裝過程請自行百度。
2.設置靜態IP.
1 sudo cp /etc/sysconfig/network-scripts/ifcfg-eno16777736 /etc/sysconfig/network-scripts/ifcfg-eno16777736.bak.zgw
2 sudo vi /etc/sysconfig/network-scripts/ifcfg-eno16777736
內容如下:
1 TYPE=Ethernet
2 BOOTPROTO=static 3 DEFROUTE=yes 4 PEERDNS=yes 5 PEERROUTES=yes 6 IPV4_FAILURE_FATAL=no 7 IPV6INIT=yes 8 IPV6_AUTOCONF=yes 9 IPV6_DEFROUTE=yes 10 IPV6_PEERDNS=yes 11 IPV6_PEERROUTES=yes 12 IPV6_FAILURE_FATAL=no 13 NAME=eno16777736 14 UUID=32b53370-f40b-4b40-b29a-daef1a58d6dc 15 DEVICE=eno16777736 16 ONBOOT=yes 17 IPADDR=192.168.190.11 18 NETMASK=255.255.255.0 19 DNS1=192.168.190.2 20 DNS2=223.5.5.5 21 GATEWAY=192.168.190.2
3.關閉centos7防火牆。
1 systemctl stop firewalld.service #停止firewall
2
3 systemctl disable firewalld.service #禁止firewall開機啟動
4.修改每個節點的/etc/hostname,修改成相應的主機名(四個節點不能相同)。
sudo vi /etc/hostname
5.將每個節點的IP地址和主機名對應關系寫到每個節點的/etc/hosts中。
5.1修改hosts。
sudo vi /etc/hosts
內容如下,后四行為添加內容。
1 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
2 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 3 192.168.190.11 namenode 4 192.168.190.12 datanode1 5 192.168.190.13 datanode2 6 192.168.190.14 SecondNamenode
5.2測試
1 [zgw@namenode ~]$ ping datanode1
2 PING datanode1 (192.168.190.12) 56(84) bytes of data. 3 64 bytes from datanode1 (192.168.190.12): icmp_seq=1 ttl=64 time=0.711 ms 4 64 bytes from datanode1 (192.168.190.12): icmp_seq=2 ttl=64 time=0.377 ms 5 64 bytes from datanode1 (192.168.190.12): icmp_seq=3 ttl=64 time=0.424 ms 6 ^C 7 --- datanode1 ping statistics --- 8 3 packets transmitted, 3 received, 0% packet loss, time 2016ms 9 rtt min/avg/max/mdev = 0.377/0.504/0.711/0.147 ms 10 [zgw@namenode ~]$ ping datanode2 11 PING datanode2 (192.168.190.13) 56(84) bytes of data. 12 64 bytes from datanode2 (192.168.190.13): icmp_seq=1 ttl=64 time=2.31 ms 13 64 bytes from datanode2 (192.168.190.13): icmp_seq=2 ttl=64 time=3.22 ms 14 64 bytes from datanode2 (192.168.190.13): icmp_seq=3 ttl=64 time=2.62 ms 15 ^C 16 --- datanode2 ping statistics --- 17 3 packets transmitted, 3 received, 0% packet loss, time 2025ms 18 rtt min/avg/max/mdev = 2.316/2.722/3.221/0.375 ms 19 [zgw@namenode ~]$ ping SecondNamenode 20 PING SecondNamenode (192.168.190.14) 56(84) bytes of data. 21 64 bytes from SecondNamenode (192.168.190.14): icmp_seq=1 ttl=64 time=1.23 ms 22 64 bytes from SecondNamenode (192.168.190.14): icmp_seq=2 ttl=64 time=0.404 ms 23 ^C 24 --- SecondNamenode ping statistics --- 25 2 packets transmitted, 2 received, 0% packet loss, time 1011ms 26 rtt min/avg/max/mdev = 0.404/0.817/1.230/0.413 ms
6添加免密鑰登錄
6.1在namenode節點上生成密鑰對:ssh-keygen。
1 [zgw@namenode ~]$ ssh-keygen
2 Generating public/private rsa key pair.
3 Enter file in which to save the key (/home/zgw/.ssh/id_rsa): 4 Created directory '/home/zgw/.ssh'. 5 Enter passphrase (empty for no passphrase): 6 Enter same passphrase again: 7 Your identification has been saved in /home/zgw/.ssh/id_rsa. 8 Your public key has been saved in /home/zgw/.ssh/id_rsa.pub. 9 The key fingerprint is: 10 b1:a5:c5:c6:81:9e:8a:68:0c:ba:b6:76:24:3c:5c:33 zgw@namenode 11 The key's randomart image is: 12 +--[ RSA 2048]----+ 13 | .. | 14 | .o . | 15 | ...* | 16 |. E oB | 17 |+o..o. .S | 18 |.=+.. . | 19 | o+ | 20 |.o . | 21 |o.o | 22 +-----------------+
6.2將namenode機器上的~/.ssh/id_rsa.pub復制到其他機器上。此處需注意,一定要個namenode本機發送一份。
1 [zgw@namenode ~]$ ssh-copy-id namenode
2 /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed 3 /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys 4 zgw@namenode's password: 5 6 Number of key(s) added: 1 7 8 Now try logging into the machine, with: "ssh 'namenode'" 9 and check to make sure that only the key(s) you wanted were added. 10 11 [zgw@namenode ~]$ ssh-copy-id datanode1 12 /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed 13 /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys 14 zgw@datanode1's password: 15 16 Number of key(s) added: 1 17 18 Now try logging into the machine, with: "ssh 'datanode1'" 19 and check to make sure that only the key(s) you wanted were added. 20 21 [zgw@namenode ~]$ ssh-copy-id datanode2 22 The authenticity of host 'datanode2 (192.168.190.13)' can't be established. 23 ECDSA key fingerprint is 63:6b:24:0d:60:93:5c:a0:98:2f:b9:79:85:ca:90:dd. 24 Are you sure you want to continue connecting (yes/no)? yes 25 /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed 26 /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys 27 zgw@datanode2's password: 28 29 Number of key(s) added: 1 30 31 Now try logging into the machine, with: "ssh 'datanode2'" 32 and check to make sure that only the key(s) you wanted were added. 33 34 [zgw@namenode ~]$ ssh-copy-id SecondNamenode 35 The authenticity of host 'secondnamenode (192.168.190.14)' can't be established. 36 ECDSA key fingerprint is 63:6b:24:0d:60:93:5c:a0:98:2f:b9:79:85:ca:90:dd. 37 Are you sure you want to continue connecting (yes/no)? yes 38 /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed 39 /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys 40 zgw@secondnamenode's password: 41 42 Number of key(s) added: 1 43 44 Now try logging into the machine, with: "ssh 'SecondNamenode'" 45 and check to make sure that only the key(s) you wanted were added. 46 47 [zgw@namenode ~]$
6.3復制完成后進行測試。
1 [zgw@namenode ~]$ ssh datanode1
2 Last login: Tue Dec 27 06:26:37 2016 from 192.168.190.1
3 [zgw@datanode1 ~]$ exit 4 登出 5 Connection to datanode1 closed. 6 [zgw@namenode ~]$ ssh datanode2 7 Last login: Tue Dec 27 05:56:22 2016 from 192.168.190.1 8 [zgw@datanode2 ~]$ exit 9 登出 10 Connection to datanode2 closed. 11 [zgw@namenode ~]$ ssh SecnodNamenode 12 ssh: Could not resolve hostname secnodnamenode: Name or service not known 13 [zgw@namenode ~]$ ssh SecondNamenode 14 Last login: Tue Dec 27 05:56:27 2016 from 192.168.190.1 15 [zgw@SecondNamenode ~]$ exit 16 登出 17 Connection to secondnamenode closed. 18 [zgw@namenode ~]$
hadoop安裝准備
1jdk的安裝與配置。
1.1解壓jdk到/opt目錄。若jdk不在命令行所在的目錄下,需加上路徑。
1 tar -zxvf jdk-8u91-linux-x64.tar.gz -C /opt
1.2設置軟連接。方便以后的升級。
1 ln -s /opt/jdk1.8.0_91 /opt/jdk
1.3設置環境變量。必須執行source命令。
1 echo "export JAVA_HOME=/opt/jdk" >> /etc/profile #設置JAVA_HONE
2 echo 'export PATH=$JAVA_HOME/bin:$PATH' >> /etc/profile #添加PATH 3 source /etc/profile
1.4jdk測試。如出現如下結果則表示安裝成功。
1 [zgw@namenode ~]$ java -version
2 java version "1.8.0_91"
3 Java(TM) SE Runtime Environment (build 1.8.0_91-b14) 4 Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode) 5 [zgw@namenode ~]$
2創建hadoop用戶。
2.1 創建hadoop用戶組。
groupadd -g 20000 hadoop #組號為20000
2.2創建用戶hdfs,yarn,mr。
1 useradd -m -d /home/hdfs -u 20001 -s /bin/bash -g hadoop hdfs
2 useradd -m -d /home/yarn -u 20002 -s /bin/bash -g hadoop yarn 3 useradd -m -d /home/mr -u 20003 -s /bin/bash -g hadoop mr
2.3為用戶創建密碼。
1 echo hdfs:zgw | chpasswd
2 echo yarn:zgw | chpasswd 3 echo mr:zgw | chpasswd
2.4將用戶添加到sudo用戶組。
1 usermod -G sudo hdfs
2 usermod -G sudo yarn 3 usermod -G sudo mr
2.5為每個用戶創建免密鑰登錄。如前文所述,這里不再贅述。切記,一定要做!!!!
3創建目錄。
3.1創建hadoop所需的目錄如下。
1 mkdir -p /data/hadoop/hdfs/nn
2 mkdir -p /data/hadoop/hdfs/snn 3 mkdir -p /data/hadoop/hdfs/dn 4 mkdir -p /data/hadoop/yarn/nm
3.2設置目錄權限。/opt為安裝hadoop的目錄,在這里一並設置。
1 chown -R 20000:hadoop /data
2 chown -R hdfs /data/hadoop/hdfs 3 chown -R yarn /data/hadoop/yarn 4 chmod -R 777 /opt 5 chmod -R 777 /data/hadoop/hdfs 6 chmod -R 777 /data/hadoop/yarn
hadoop安裝
1解壓hadoop。若hadoop不在命令行所在的目錄下,需加上路徑。
1 tar -zxvf hadoop-2.6.0.tar.gz -C /opt
2創建軟連接。
1 ln -s /opt/hadoop-2.6.0 /opt/hadoop
3設置環境變量。
1 echo "export HADOOP_HOME=/opt/hadoop" >> /etc/profile
2 echo 'export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH' >> /etc/profile 3 source /etc/profile
4hadoop命令測試。
1 [zgw@namenode ~]$ hadoop version
2 Hadoop 2.6.0
3 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
4 Compiled by jenkins on 2014-11-13T21:10Z 5 Compiled with protoc 2.5.0 6 From source with checksum 18e43357c8f927c0695f1e9522859d6a 7 This command was run using /opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar 8 [zgw@namenode ~]$
5hadoop設置如下。
5.1 core-site.xml設置。
1 sudo vi /opt/hadoop/etc/hadoop/core-site.xml
內容如下:
1 <?xml version="1.0" encoding="UTF-8"?>
2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3 <!--
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License. 6 You may obtain a copy of the License at 7 8 http://www.apache.org/licenses/LICENSE-2.0 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under the License is distributed on an "AS IS" BASIS, 12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 See the License for the specific language governing permissions and 14 limitations under the License. See accompanying LICENSE file. 15 --> 16 17 <!-- Put site-specific property overrides in this file. --> 18 19 <configuration> 20 <property> 21 <name>fs.defaultFS</name> 22 <value>hdfs://192.168.190.11:9000</value> 23 </property> 24 </configuration>
22行配置為Master節點IP。我的是namenode節點。
5.2hdfs-site.xml設置。
1 sudo vi /opt/hadoop/etc/hadoop/hdfs-site.xml
內容如下:
1 <?xml version="1.0" encoding="UTF-8"?>
2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3 <!--
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License. 6 You may obtain a copy of the License at 7 8 http://www.apache.org/licenses/LICENSE-2.0 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under the License is distributed on an "AS IS" BASIS, 12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 See the License for the specific language governing permissions and 14 limitations under the License. See accompanying LICENSE file. 15 --> 16 17 <!-- Put site-specific property overrides in this file. --> 18 19 <configuration> 20 <property> 21 <name>dfs.permissions.enabled</name> 22 <value>false</value> 23 </property> 24 <property> 25 <name>dfs.blocksize</name> 26 <value>32m</value> 27 <description> 28 The default block size for new files, in bytes. 29 You can use the following suffix (case insensitive): 30 k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), 31 Or provide complete size in bytes (such as 134217728 for 128 MB). 32 </description> 33 </property> 34 35 <property> 36 <name>dfs.nameservices</name> 37 <value>hadoop-cluster-zgw</value> 38 </property> 39 <property> 40 <name>dfs.replication</name> 41 <value>3</value> 42 </property> 43 <property> 44 <name>dfs.namenode.name.dir</name> 45 <value>/data/hadoop/hdfs/nn</value> 46 </property> 47 <property> 48 <name>dfs.namenode.checkpoint.dir</name> 49 <value>/data/hadoop/hdfs/snn</value> 50 </property> 51 <property> 52 <name>dfs.namenode.checkpoint.edits.dir</name> 53 <value>/data/hadoop/hdfs/snn</value> 54 </property> 55 <property> 56 <name>dfs.datanode.data.dir</name> 57 <value>/data/hadoop/hdfs/dn</value> 58 </property> 59 <property> 60 <name>dfs.namenode.secondary.http-address</name> 61 <value>192.168.190.14:50090</value> 62 </property> 63 </configuration>
61行IP為SecondNamenode節點。
5.3yarn-site.xml設置。
1 sudo vi /opt/hadoop/etc/hadoop/yarn-site.xml
內容如下:
1 <?xml version="1.0"?>
2 <!--
3 Licensed under the Apache License, Version 2.0 (the "License");
4 you may not use this file except in compliance with the License. 5 You may obtain a copy of the License at 6 7 http://www.apache.org/licenses/LICENSE-2.0 8 9 Unless required by applicable law or agreed to in writing, software 10 distributed under the License is distributed on an "AS IS" BASIS, 11 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 See the License for the specific language governing permissions and 13 limitations under the License. See accompanying LICENSE file. 14 --> 15 <configuration> 16 17 <!-- Site specific YARN configuration properties --> 18 <property> 19 <name>yarn.resourcemanager.hostname</name> 20 <value>192.168.190.11</value> 21 </property> 22 <property> 23 <name>yarn.nodemanager.aux-services</name> 24 <value>mapreduce_shuffle</value> 25 </property> 26 <property> 27 <name>yarn.nodemanager.local-dirs</name> 28 <value>/data/hadoop/yarn/nm</value> 29 </property> 30 </configuration>
20行IP為yarn集群的resourcemanager節點。可以和namenode相同,也可以不同,他們沒有必然聯系。
5.4mapred設置。
1 sudo vi /opt/hadoop/etc/hadoop/mapred-site.xml
內容如下:
1 <?xml version="1.0"?>
2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3 <!--
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License. 6 You may obtain a copy of the License at 7 8 http://www.apache.org/licenses/LICENSE-2.0 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under the License is distributed on an "AS IS" BASIS, 12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 See the License for the specific language governing permissions and 14 limitations under the License. See accompanying LICENSE file. 15 --> 16 17 <!-- Put site-specific property overrides in this file. --> 18 19 <configuration> 20 <property> 21 <name>mapreduce.framework.name</name> 22 <value>yarn</value> 23 </property> 24 </configuration>
5.5slaves設置。將datanode寫進slaves,我把SecondNamenode也作為一個datanode。
1 sudo vi /opt/hadoop/etc/hadoop/slaves
內容如下:
datanode1
datanode2
SecondNamenode
5.6設置JDK路徑。
1 sudo vi /opt/hadoop/etc/hadoop/hadoop-env.sh
修改內容如下,找到#export JAVA_HOME=/opt/jdk,我的在25行。將其設置如下。
1 export JAVA_HOME=/opt/jdk
注意:一定要去掉前面的#。
5.7創建logs目錄。在/opt/hadoop/下查看,如若有logs目錄則不用創建,如若沒有logs目錄,則創建logs。
1 sudo mkdir /opt/hadoop/logs
然后修改其權限:
1 chown -R mr:hadoop //opt/hadoop/logs 2 chmod 777 //opt/hadoop/logs
hadoop集群開啟
1在每個節點上格式化HDFS文件系統。只在主節點格式化也行,但有時候只在主節點格式化不行,我一般都是在每個節點都格式化。
1 hdfs namenode –format
2開啟hdfs集群。
2.1切換到hdfs用戶。
1 su - hdfs
2.2在namenode上執行如下命令開啟hdfs集群。
1 start-dfs.sh
2.3在namenode上執行jps查看。
1 [hdfs@namenode ~]$ start-dfs.sh
2 Starting namenodes on [namenode]
3 namenode: starting namenode, logging to /opt/hadoop-2.6.0/logs/hadoop-hdfs-namenode-namenode.out 4 SecondNamenode: starting datanode, logging to /opt/hadoop-2.6.0/logs/hadoop-hdfs-datanode-SecondNamenode.out 5 datanode2: starting datanode, logging to /opt/hadoop-2.6.0/logs/hadoop-hdfs-datanode-datanode2.out 6 datanode1: starting datanode, logging to /opt/hadoop-2.6.0/logs/hadoop-hdfs-datanode-datanode1.out 7 [hdfs@namenode ~]$ jps 8 10902 Jps 9 10712 NameNode
2.4在其他三個節點上執行jps查看如下。
1 [hdfs@datanode1 ~]$ jps
2 4547 DataNode 3 5190 Jps
1 [hdfs@datanode2 ~]$ jps
2 4416 DataNode 3 5070 Jps
1 [hdfs@SecondNamenode ~]$ jps
2 5110 Jps 3 4394 DataNode
2.5網頁端查看:http://192.168.190.11:50070。
2.6在namenode上關閉集群。
1 stop-dfs.sh
3開啟yarn集群。
3.1切換到yarn用戶。
1 su - yarn
3.2在namenode上執行如下命令開啟yarn集群。
1 start-yarn.sh
3.3在namenode上執行jps查看。
1 [yarn@namenode ~]$ start-yarn.sh
2 starting yarn daemons
3 resourcemanager running as process 9741. Stop it first. 4 datanode2: nodemanager running as process 4713. Stop it first. 5 SecondNamenode: nodemanager running as process 4695. Stop it first. 6 datanode1: nodemanager running as process 4828. Stop it first. 7 [yarn@namenode ~]$ jps 8 11004 Jps 9 9741 ResourceManager
3.4在其他三個節點上執行jps查看如下。
1 [yarn@datanode1 ~]$ jps
2 5398 Jps 3 4828 NodeManager
1 [yarn@datanode2 ~]$ jps
2 5281 Jps 3 4713 NodeManager
1 [yarn@SecondNamenode ~]$ jps
2 4695 NodeManager 3 5308 Jps
3.5網頁端查看:http://192.168.190.11:8088。
3.6 在namenode上關閉yarn集群
1 stop-yarn.sh
4開啟作業日志服務器。
4.1切換到mr用戶。
1 su - mr
4.2執行如下命令開啟作業日志服務器。
1 mr-jobhistory-daemon.sh start historyserver
4.3執行jps查看如下。
1 [mr@namenode ~]$ mr-jobhistory-daemon.sh start historyserver
2 starting historyserver, logging to /opt/hadoop-2.6.0/logs/mapred-mr-historyserver-namenode.out 3 [mr@namenode ~]$ jps 4 11157 Jps 5 11126 JobHistoryServer
4.4網頁端查看:http://192.168.190.11:19888。
4.5在namenode上關閉作業日志服務器。
1 mr-jobhistory-daemon.sh stop historyserver