Hadoop HA on Yarn——集群配置


集群搭建

因為服務器數量有限,這里服務器開啟的進程有點多:

機器名   安裝軟件   運行進程  
hadoop001    Hadoop,Zookeeper  

NameNode, DFSZKFailoverController, ResourceManager

DataNode, NodeManager

QuorumPeerMain

JournalNode

hadoop002 Hadoop,Zookeeper

NameNode, DFSZKFailoverController, ResourceManager

DataNode, NodeManager

QuorumPeerMain 

JournalNode

hadoop003 Hadoop,Zookeeper

DataNode, NodeManager

QuorumPeerMain

 

 

 

 

 

 

 

 

 

 

 

 
 
 
說明[2]:
在hadoop2.X中通常由兩個NameNode組成,一個處於active狀態,另一個處於standby狀態。Active NameNode對外提供服務,而Standby NameNode則不對外提供服務,僅同步active namenode的狀態,以便能夠在它失敗時快速進行切換。
hadoop2.0官方提供了兩種HDFS HA的解決方案,一種是NFS,另一種是QJM(由cloudra提出,原理類似zookeeper)。這里我使用QJM完成。主備NameNode之間通過一組JournalNode同步元數據信息,一條數據只要成功寫入多數JournalNode即認為寫入成功。通常配置奇數個JournalNode
 
這里略去jdk,Hadoop,Zookeeper的安裝過程和環境變量配置。
 

無密碼登陸

這里要非常注意無密碼登陸的配置:
ssh-keygen -t rsa

在~/.ssh/目錄中生成兩個文件id_rsa和id_rsa.pub

如果想從hadoop001免密碼登錄到hadoop002中要在hadoop001中執行

ssh-copy-id -i ~/.ssh/id_rsa.pub [用戶名]@hadoop002

這里為了實現任何機器之間都可以免密碼登陸,所以在hadoop001中再執行兩遍上面的操作(把@后面的機器名分別改成hadoop001和hadoop003),最后把生成的authorized_keys復制所有的節點上

 

Hadoop配置 

core-site.xml

<configuration>
<!--   -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://appcluster</value>
</property>

<!-- 指定hadoop臨時目錄 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/storage/tmp</value>
</property>

<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>

<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>2000</value>
</property>
</configuration>

 

hdfs-site.xml

<configuration>
<!--指定namenode名稱空間的存儲地址-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/storage/hdfs/name</value>
</property>

<!--指定datanode數據存儲地址-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/storage/hdfs/data</value>
</property>

<!--指定數據冗余份數-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>

<!--指定hdfs的nameservice為appcluster,需要和core-site.xml中的保持一致 -->
<property> 
<name>dfs.nameservices</name> 
<value>appcluster</value> 
</property>

<!-- appcluster下面有兩個NameNode,分別是nn1,nn2 -->
<property> 
<name>dfs.ha.namenodes.appcluster</name> 
<value>nn1,nn2</value> 
</property> 

<!-- nn1的RPC通信地址 -->
<property> 
<name>dfs.namenode.rpc-address.appcluster.nn1</name> 
<value>hadoop001:8020</value> 
</property> 

<!-- nn2的RPC通信地址 -->
<property> 
<name>dfs.namenode.rpc-address.appcluster.nn2</name> 
<value>hadoop002:8020</value> 
</property> 

<!-- nn1的http通信地址 -->
<property> 
<name>dfs.namenode.http-address.appcluster.nn1</name> 
<value>hadoop001:50070</value> 
</property> 

<!-- nn2的http通信地址 -->
<property> 
<name>dfs.namenode.http-address.appcluster.nn2</name> 
<value>hadoop002:50070</value> 
</property> 

<!-- 指定NameNode的元數據在JournalNode上的存放位置 -->
<property> 
<name>dfs.namenode.shared.edits.dir</name> 
<value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/appcluster</value> 
</property> 

<property> 
<name>dfs.ha.automatic-failover.enabled.appcluster</name> 
<value>true</value> 
</property> 

<!-- 配置失敗自動切換實現方式 -->
<property> 
<name>dfs.client.failover.proxy.provider.appcluster</name> 
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
</property> 

<!-- 配置隔離機制 -->
<property> 
<name>dfs.ha.fencing.methods</name> 
<value>sshfence</value> 
</property>

<!-- 使用隔離機制時需要ssh免密碼登陸 -->
<property> 
<name>dfs.ha.fencing.ssh.private-key-files</name> 
<value>/home/[用戶名]/.ssh/id_rsa</value> 
</property> 

<!-- -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop/tmp/journal</value>
</property>
</configuration>

 

mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<!-- 配置 MapReduce JobHistory Server 地址 ,默認端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>

<!-- 配置 MapReduce JobHistory Server web ui 地址, 默認端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>  

 

yarn-site.xml

<?xml version="1.0"?>
<configuration>
<!--rm失聯后重新鏈接的時間-->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>

<!--開啟resourcemanagerHA,默認為false-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>

<!--配置resourcemanager-->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>

<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>

<!--開啟故障自動切換-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop001</value>
</property>

<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop002</value>
</property>

<!--
在hadoop001上配置rm1,在hadoop002上配置rm2,
注意:一般都喜歡把配置好的文件遠程復制到其它機器上,但這個在YARN的另一個機器上一定要修改
-->
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
<description>If we want to launch more than one RM in single node,we need this configuration</description>
</property>

<!--開啟自動恢復功能-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>

<!--配置與zookeeper的連接地址-->
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>

<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>

<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>appcluster-yarn</value>
</property>

<!--schelduler失聯等待連接時間-->
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>

<!--配置rm1-->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop001:8032</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop001:8030</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop001:8088</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop001:8031</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>hadoop001:8033</value>
</property>

<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>hadoop001:23142</value>
</property>

<!--配置rm2-->
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop002:8032</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop002:8030</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop002:8088</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop002:8031</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>hadoop002:8033</value>
</property>

<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>hadoop002:23142</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/hadoop/yarn/local</value>
</property>

<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/data/hadoop/yarn/log</value>
</property>

<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>

<!--故障處理類-->
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>

<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
<description>Optionalsetting.Thedefaultvalueis/yarn-leader-election</description>
</property>
</configuration>

 

 

hadoop-env.sh & mapred-env.sh & yarn-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_60 
export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib 
  
export HADOOP_HOME=/data/hadoop-2.6.0
export HADOOP_PID_DIR=/data/hadoop/pids 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_OPTS="$HADOOP_OPTS-Djava.library.path=$HADOOP_HOME/lib/native" 
  
export HADOOP_PREFIX=$HADOOP_HOME 
  
export HADOOP_MAPRED_HOME=$HADOOP_HOME 
export HADOOP_COMMON_HOME=$HADOOP_HOME 
export HADOOP_HDFS_HOME=$HADOOP_HOME 
export YARN_HOME=$HADOOP_HOME 
  
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop 
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop 
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop 
  
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native 
  
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

  

參考文獻

[1] hdfs-site.xml:http://www.21ops.com/front-tech/10744.html  

[2] yarn-site.xml: http://www.aboutyun.com/thread-10572-1-1.html 評論也值得參考

僅參考這兩篇配置后報錯:

15/07/17 13:58:55 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at hadoop001/**.**.**.**:8020. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller.

再參考

[3]http://www.cnblogs.com/meiyuanbao/p/3545929.html (沒有做到Yarn的HA)

發現需要在hdfs-site.xml添加配置:

<property> 
<name>dfs.ha.automatic-failover.enabled.appcluster</name> 
<value>true</value> 
</property> 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM