Hadoop2.6集群環境搭建(HDFS HA+YARN)原來4G內存也能任性一次.


 准備工作:

1、筆記本4G內存 ,操作系統WIN7 (屌絲的配置)

2、工具VMware Workstation

3、虛擬機:CentOS6.4共四台

虛擬機設置:

每台機器:內存512M,硬盤40G,網絡適配器:NAT模式

 

選擇高級,新生成虛機Mac地址(克隆虛擬機,Mac地址不會改變,每次最后手動重新生成)

 

編輯虛擬機網絡:

 

點擊NAT設置,查看虛機網關IP,並記住它,該IP在虛機的網絡設置中非常重要。

NAT設置默認IP會自動生成,但是我們的集群中IP需要手動設置。

 

本機Win7 :VMnet8 網絡設置

 

實驗環境:

Ip

hostname

role

192.168.249.130

SY-0130

ActiveNameNode

192.168.249.131 

SY-0131

StandByNameNode

192.168.249.132

SY-0132

DataNode1

192.168.249.133

SY-0133

DataNode2

 

 

Linux網絡設置:

1、新建用戶如:hadoop。不建議使用root用戶搭建集群(root權限過大)

2、使得hadoop用戶獲得sudo權限。

[root@SY-0130 ~]# vi /etc/sudoers ## Allow root to run any commands anywhere root ALL=(ALL) ALL hadoop ALL=(ALL)       ALL

3、查看當前虛機當前網絡使用網卡設備

[root@SY-0130 hadoop]# ifconfig eth2 Link encap:Ethernet HWaddr 00:50:56:35:8E:E8 inet addr:192.168.249.130  Bcast:192.168.249.255  Mask:255.255.255.0 inet6 addr: fe80::250:56ff:fe35:8ee8/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500  Metric:1 RX packets:877059 errors:0 dropped:0 overruns:0 frame:0 TX packets:597769 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:865720294 (825.6 MiB)  TX bytes:324530557 (309.4 MiB) Interrupt:19 Base address:0x2024 lo Link encap:Local Loopback inet addr:127.0.0.1  Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436  Metric:1 RX packets:1354 errors:0 dropped:0 overruns:0 frame:0 TX packets:1354 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:196675 (192.0 KiB)  TX bytes:196675 (192.0 KiB)

 

[root@SY-0130 ~]# cat /proc/net/dev,當前網卡,我的為eth2

[root@SY-0130 ~]# cat /proc/net/dev Inter-|   Receive                                                | Transmit face |bytes    packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed lo: 196675    1354    0    0    0     0          0         0   196675    1354    0    0    0     0       0          0 eth2:865576893  875205    0    0    0     0          0         0 324425517  596433    0    0    0     0       0          0

 

4、查看當前網卡對應的Mac地址

[root@SY-0130 ~]# cat /etc/udev/rules.d/70-persistent-net.rules # This file was automatically generated by the /lib/udev/write_net_rules # program, run by the persistent-net-generator.rules rules file. # # You can modify it, as long as you keep each rule on a single # line, and change only the value of the NAME= key. # PCI device 0x1022:0x2000 (vmxnet) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0c:29:b5:fd:bb", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1" # PCI device 0x1022:0x2000 (vmxnet) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:50:56:35:8e:e8", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"

 

5、Network Configuration

   [root@SY-0130 ~]# setup 

 

 

選擇eth2,進行設置,更改為當前網卡設備eth2,並且進行IP、網管、DNS設置。

 

DNS Server 與Win7的網絡中的DNS Server一致,這樣虛擬機也可以連接Internet網了,方便下載安裝軟件。

另外還有將防火牆關閉。

6、修改hostname

[root@SY-0130 ~]# vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=SY-0130

7、修改hosts

[hadoop@SY-0130 ~]$ sudo vi /etc/hosts

#添加如下內容

192.168.249.130 SY-0130
192.168.249.131 SY-0131
192.168.249.132 SY-0132
192.168.249.133 SY-0133

8、重啟虛機.  reboot 

軟件安裝:

(注:用戶hadoop登錄SY-130)

1、SY-130用戶目錄創建toolkit 文件夾,用來保存所有軟件安裝包,建立labc文件作為本次實驗環境目錄。

[hadoop@SY-0130 ~]$ mkdir labc

[hadoop@SY-0130 ~]$ mkdir toolkit

[hadoop@SY-0130 ~]$ ls

labc  toolkit

#我將下載的軟件包存放在toolkit中如下

[hadoop@SY-0130 toolkit]$ ls hadoop-2.5.2.tar.gz  hadoop-2.6.0.tar.gz  jdk-7u71-linux-i586.gz  scala-2.10.3.tgz  spark-1.2.0-bin-hadoop2.3.tgz  zookeeper-3.4.6.tar.gz

1、JDK安裝及環境變量設置

[hadoop@SY-0130 ~]$ mkdir lab

#我將jdk7安裝在lab目錄

[hadoop@SY-0130 jdk1.7.0_71]$ pwd

/home/hadoop/lab/jdk1.7.0_71

#環境變量設置:

[hadoop@SY-0130 ~]$ vi .bash_profile

# User specific environment and startup programs export JAVA_HOME=/home/hadoop/lab/jdk1.7.0_71 PATH=$JAVA_HOME/bin:$PATH:$HOME/bin export PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

#設置生效

[hadoop@SY-0130 ~]$ source .bash_profile

2、Hadoop2.6安裝及設置

#解壓toolkit 文件夾中的hadoop-2.6.0.tar.gz到. /home/hadoop/labc目錄

[hadoop@SY-0130 hadoop-2.6.0]$ pwd

/home/hadoop/labc/hadoop-2.6.0

JDK,Hadoop基本安裝完成,除了一些配置,現在可以將此虛擬機進行克隆。

該虛擬機命名:

130.ActiveNameNode

我將它克隆了3次,分別命名為:

131.StandbyNameNode

132.DataNode1

133..DataNode2

並且將克隆之后的3個虛擬機,重新生成了Mac地址,查看了他們分別使用的網卡,更改了對應的IP,dns,hostname , hosts,關閉防火牆。 具體操作按照上述說明操作即可。在網絡配置這塊我花了不少的時間。

 

軟件配置:

至此,我擁有了四台安裝好了JDK、Hadoop、及配置好對應的IP,能夠訪問Internet的Linux虛機。

在具體Hadoop HA 配置前,為了讓節點之間通信方便,將4節點之間設置SSH免密碼登錄。

 

1、SSH免密碼登錄

[hadoop@SY-0130 ~]$ ssh-keygen -t rsa     #一直回車即可.

#查看生成公鑰

[hadoop@SY-0130 .ssh]$ ls

 id_rsa  id_rsa.pub  known_hosts

#遠程復制id_rsa.pub到SY-0131, SY-0132, SY-0133 節點。

[hadoop@SY-0130 .ssh]$ scp id_rsa.pub hadoop@SY-0131:.ssh/authorized_keys

[hadoop@SY-0130 .ssh]$ scp id_rsa.pub hadoop@SY-0132:.ssh/authorized_keys

[hadoop@SY-0130 .ssh]$ scp id_rsa.pub hadoop@SY-0133:.ssh/authorized_keys

#注意:SY-130為ActiveName , 在此我只配置了SY-0130到其他節點的免密碼登錄,即只是單向,沒有設置雙向。

#完成上述配置后,測試SY-130免密碼登錄

#連接sy-0131 [hadoop@SY-0130 ~]$ ssh sy-0131 Last login: Tue Jan 6 07:32:46 2015 from 192.168.249.1 [hadoop@SY-0131 ~]$ #ctrl+d 可退出連接 #連接sy-0132 [hadoop@SY-0130 ~]$ ssh sy-0132 Last login: Tue Jan 6 21:25:16 2015 from 192.168.249.1 [hadoop@SY-0132 ~]$ #連接sy-0132 [hadoop@SY-0130 ~]$ ssh sy-0133 Last login: Tue Jan 6 21:25:18 2015 from 192.168.249.1 [hadoop@SY-0133 ~]$ #測試成功

 

2、Hadoop設置

#進入hadoop安裝目錄

[hadoop@SY-0130 hadoop-2.6.0]$ pwd

/home/hadoop/labc/hadoop-2.6.0

#修改 hadoop-env.sh ,添加Java環境變量

[hadoop@SY-0130 hadoop-2.6.0]$ vi etc/hadoop/hadoop-env.sh

# The java implementation to use. export JAVA_HOME=/home/hadoop/lab/jdk1.7.0_71

#修改core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

<!-- Do not modify this file directly. Instead, copy entries that you -->
<!-- wish to modify from this file into core-site.xml and change them -->
<!-- there. If core-site.xml does not already exist, create it. -->

<configuration>

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://SY-0130:8020</value>
  <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
</property>

</configuration>

 

#修改hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

<!-- Do not modify this file directly. Instead, copy entries that you -->
<!-- wish to modify from this file into hdfs-site.xml and change them -->
<!-- there. If hdfs-site.xml does not already exist, create it. -->

<configuration>

<property>
  <name>dfs.nameservices</name>
  <value>hadoop-test</value>
  <description> Comma-separated list of nameservices. </description>
</property>

<property>
  <name>dfs.ha.namenodes.hadoop-test</name>
  <value>nn1,nn2</value>
  <description> The prefix for a given nameservice, contains a comma-separated list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE). </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.hadoop-test.nn1</name>
  <value>SY-0130:8020</value>
  <description> RPC address for nomenode1 of hadoop-test </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.hadoop-test.nn2</name>
  <value>SY-0131:8020</value>
  <description> RPC address for nomenode2 of hadoop-test </description>
</property>

<property>
  <name>dfs.namenode.http-address.hadoop-test.nn1</name>
  <value>SY-0130:50070</value>
  <description> The address and the base port where the dfs namenode1 web ui will listen on. </description>
</property>

<property>
  <name>dfs.namenode.http-address.hadoop-test.nn2</name>
  <value>SY-0131:50070</value>
  <description> The address and the base port where the dfs namenode2 web ui will listen on. </description>
</property>

<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:///home/hadoop/labc/hdfs/name</value>
  <description>Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. </description>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://SY-0131:8485;SY-0132:8485;SY-0133:8485/hadoop-test</value>
  <description>A directory on shared storage between the multiple namenodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.namenode.edits.dir above. It should be left empty in a non-HA cluster. </description>
</property>

<property>
  <name>dfs.datanode.data.dir</name>
  <value>file:///home/hadoop/labc/hdfs/data</value>
  <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description>
</property>

<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>false</value>
  <description> Whether automatic failover is enabled. See the HDFS High Availability documentation for details on automatic HA configuration. </description>
</property>

<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/home/hadoop/labc/hdfs/journal/</value>
</property>

</configuration>

 

#修改mapred-site.xml

<?xml version="1.0"?>
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Do not modify this file directly. Instead, copy entries that you -->
<!-- wish to modify from this file into mapred-site.xml and change them -->
<!-- there. If mapred-site.xml does not already exist, create it. -->

<configuration>

<!-- MR YARN Application properties -->

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
  <description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn. </description>
</property>

<!-- jobhistory properties -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>SY-0131:10020</value>
  <description>MapReduce JobHistory Server IPC host:port</description>
</property>

<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>SY-0131:19888</value>
  <description>MapReduce JobHistory Server Web UI host:port</description>
</property>

</configuration>

#修改:yarn-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

<!-- Do not modify this file directly. Instead, copy entries that you -->
<!-- wish to modify from this file into yarn-site.xml and change them -->
<!-- there. If yarn-site.xml does not already exist, create it. -->

<configuration>
  
  <!-- Resource Manager Configs -->
  <property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>SY-0130</value>
  </property>    
  
  <property>
    <description>The address of the applications manager interface in the RM.</description>
    <name>yarn.resourcemanager.address</name>
    <value>${yarn.resourcemanager.hostname}:8032</value>
  </property>

  <property>
    <description>The address of the scheduler interface.</description>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>${yarn.resourcemanager.hostname}:8030</value>
  </property>

  <property>
    <description>The http address of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>${yarn.resourcemanager.hostname}:8088</value>
  </property>

  <property>
    <description>The https adddress of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.https.address</name>
    <value>${yarn.resourcemanager.hostname}:8090</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>${yarn.resourcemanager.hostname}:8031</value>
  </property>

  <property>
    <description>The address of the RM admin interface.</description>
    <name>yarn.resourcemanager.admin.address</name>
    <value>${yarn.resourcemanager.hostname}:8033</value>
  </property>

  <property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>

  <property>
    <description>fair-scheduler conf location</description>
    <name>yarn.scheduler.fair.allocation.file</name>
    <value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>
  </property>
  <property>
    <description>List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this. </description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/home/hadoop/labc/yarn/local</value>
  </property>

  <property>
    <description>Whether to enable log aggregation</description>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
  </property>

  <property>
    <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>30720</value>
  </property>

  <property>
    <description>Number of CPU cores that can be allocated for containers.</description>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>12</value>
  </property>

  <property>
    <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  
</configuration>

 

#修改slaves

SY-0131
SY-0132
SY-0133

#在/home/hadoop/labc/hadoop-2.6.0/etc/hadoop下,增加fairscheduler.xml

<?xml version="1.0"?>
<allocations>

  <queue name="infrastructure">
    <minResources>102400 mb, 50 vcores </minResources>
    <maxResources>153600 mb, 100 vcores </maxResources>
    <maxRunningApps>200</maxRunningApps>
    <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
    <weight>1.0</weight>
    <aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
  </queue>

   <queue name="tool">
      <minResources>102400 mb, 30 vcores</minResources>
      <maxResources>153600 mb, 50 vcores</maxResources>
   </queue>

   <queue name="sentiment">
      <minResources>102400 mb, 30 vcores</minResources>
      <maxResources>153600 mb, 50 vcores</maxResources>
   </queue>

</allocations>

將etc/hadoop/目錄中的這幾個配置文件通過scp 命令遠程拷貝到SY-0131,SY-0132,SY-0133節點對應目錄。

3、Hadoop 啟動(HDFS , YARN啟動)

注意:所有操作均在Hadoop部署目錄下進行。

啟動Hadoop集群:
Step1 :
在各個JournalNode節點上,輸入以下命令啟動journalnode服務:
sbin/hadoop-daemon.sh start journalnode

Step2:
在[nn1]上,對其進行格式化,並啟動:
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode

Step3:
在[nn2]上,同步nn1的元數據信息:
bin/hdfs namenode -bootstrapStandby

Step4:
啟動[nn2]:
sbin/hadoop-daemon.sh start namenode

經過以上四步操作,nn1和nn2均處理standby狀態
Step5:
將[nn1]切換為Active
bin/hdfs haadmin -transitionToActive nn1

Step6:
在[nn1]上,啟動所有datanode
sbin/hadoop-daemons.sh start datanode

關閉Hadoop集群:
在[nn1]上,輸入以下命令
sbin/stop-dfs.sh

 

web地址訪問:

activenamenode

 

standbynamenode

 

datanodes info

4G內存,任性一次,開始Hadoop之旅!

PS:

一、Hdfs命令的區別:

1、如果Apache hadoop版本是0.x 或者1.x,

  bin/hadoop hdfs fs -mkdir -p /in

  bin/hadoop hdfs fs  -put /home/du/input  /in

 

2、如果Apache hadoop版本是2.x.

  bin/hdfs  dfs  -mkdir -p /in

  bin/hdfs  dfs   -put /home/du/input   /in

二、 有時候DataNode啟動不了有如下原因:

1、因重復格式化namenode時候,集群ID會更改,原先已有數據的datanode中記錄的集群ID與NameNode不一致,該問題會導致datanode啟動不了。

在第一次格式化dfs后,啟動並使用了hadoop,后來又重新執行了格式化命令(hdfs namenode -format),這時namenode的clusterID會重新生成,而datanode的clusterID 保持不變。

#對比clusterID :

namenode

[hadoop@SY-0131 current]$ pwd

/home/hadoop/labc/hdfs/name/current

[hadoop@SY-0131 current]$ cat VERSION

#Tue Jan 06 23:39:38 PST 2015

namespaceID=313333531

clusterID=CID-c402aa07-4128-4cad-9d65-75afc5241fe1

cTime=0

storageType=NAME_NODE

blockpoolID=BP-1463638609-192.168.249.130-1420523102441

layoutVersion=-60

 

datanode

[hadoop@SY-0132 current]$ pwd

/home/hadoop/labc/hdfs/data/current

 

[hadoop@SY-0132 current]$ cat VERSION

#Tue Jan 06 23:41:36 PST 2015

storageID=DS-9475efc9-f890-4890-99e2-fdedaf1540c5

clusterID=CID-c402aa07-4128-4cad-9d65-75afc5241fe1

cTime=0

datanodeUuid=d3f6a297-9b79-4e17-9e67-631732f94698

storageType=DATA_NODE

layoutVersion=-56

 2、data目錄的權限不夠

本博客文章除特別聲明,全部都是原創!

可以轉載, 但必須以超鏈接形式標明文章原始出處和作者信息及版權聲明。

尊重原創,轉載請注明: 轉載自JackyKen (http://www.cnblogs.com/xiejin)

本文鏈接地址:《Hadoop2.6集群環境搭建(HDFS HA+YARN)原來4G內存也能任性一次》 (http://www.cnblogs.com/xiejin/p/4208741.html)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM