我們知道hadoop1.x之前的namenode存在兩個主要的問題:1、namenode內存瓶頸的問題,2、namenode的單點故障的問題。針對這兩個問題,hadoop2.x都對它進行改進和解決。其中,問題1中對namenode內存瓶頸的問題采用擴展namenode的方式來解決。對於問題2中的namenode的單點故障問題hadoop2.x采用的是HA的解決方案。apache hadoop 官方網站上提供了兩種解決HDFS High Availability Using the Quorum Journal Manager 和High Availability with NFS。 本文是采用HDFS High Availability Using the Quorum Journal Manager 方案來實現HA。並且實現namenode單點故障自動切換的功能,這就需要借助與zookeeper集群來實現。下面詳細的講解一下通過zookeeper來實現HDFS High Availability Using the Quorum Journal Manager 單點故障自動切換的方案。 在介紹之前,首先說明一下我的集群規模:2個namenode(hadoop1,hadoop5),3個datanode(hadoop2,hadoop3,hadoop4)。 ------------------------------------------------------------------------------------------------ | IP地址 | 主機名 | NameNode | journalNode | DataNode | zookeeper | | 192.168.1.21 | hadoop1 | 是 | 是 | 否 | 是 | | 192.168.1.22 | hadoop2 | 否 | 是 | 是 | 是 | | 192.168.1.23 | hadoop3 | 否 | 是 | 是 | 是 | | 192.168.1.24 | hadoop4 | 否 | 是 | 是 | 是 | | 192.168.1.25 | hadoop5 | 是 | 是 | 否 | 是 | -------------------------------------------------------------------------------------------------- 1、首先當然是安裝zookeeper的集群了 對於該集群的安裝可以參考另一篇文章: http://www.cnblogs.com/ljy2013/p/4510143.html 。這篇文章詳細介紹了zookeeper的安裝 2、安裝好了zookeeper集群之后,下一步就需要部署你自己的hadoop2.x的集群了。 對於hadoop2.x的集群,我部署的是hadoop2.6.0的集群,部署方法可以參考文章:http://www.cnblogs.com/ljy2013/articles/4345172.html 。這篇文章當中詳細介紹了如何安裝和部署簡單的hadoop的集群。 3、這里對journalnode進行說明一下,這個節點是一個輕量級的進行,可以與hadoop的集群部署在同一台機器上,並且,它的部署只需要添加hadoop相應的配置參數即可。 4、修改hadoop集群的配置文件,這里需要修改的配置文件較多,修改的參數更多,並且比較重要。 (1)修改core-site.xml文件 <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181,hadoop4:2181,hadoop5:2181</value> </property> <!-- <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/grid/.ssh/id_rsa_nn</value> </property> --> <property> <name>ha.zookeeper.session-timeout.ms</name> <value>60000</value> </property> <property> <name>ha.failover-controller.cli-check.rpc-timeout.ms</name> <value>60000</value> </property> <property> <name>ipc.client.connect.timeout</name> <value>20000</value> </property> </configuration> (2)修改hdfs-site.xml文件 <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>hadoop1:9000</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>hadoop5:9000</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>hadoop1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>hadoop5:50070</value> </property> <property> <name>dfs.namenode.servicerpc-address.mycluster.nn1</name> <value>hadoop1:53310</value> </property> <property> <name>dfs.namenode.servicerpc-address.mycluster.nn2</name> <value>hadoop5:53310</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485;hadoop4:8485;hadoop5:8485/mycluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/grid/hadoop-2.6.0/journal/data</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/grid/hadoop-2.6.0/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/grid/hadoop-2.6.0/dfs/data</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions.enable</name> <value>false</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.image.transfer.bandwidthPerSec</name> <value>1048576</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property> <!-- <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/grid/.ssh/id_rsa_nn</value> </property> --> </configuration> 在這個文件中需要說明的有兩點: 第一、在官網上,我查看了hdfs-default.xml文件中,並沒有參數dfs.ha.fencing.methods 這個參數,范圍這個參數是在core-default.xml文件中有,那么按照官網上的意思是dfs.ha.fencing.methods 這個參數的配置是通過core-site.xml文件來設置的。但是實際上,這個參數是需要在hdfs-site.xml文件中設置的。否則就會出錯,錯誤就是hadoop-daemon.sh start zkfc 啟動DFSZKFailoverController進程時,無法啟動。 第二、官網上都是通過設置下面兩個參數來實現,出現故障時,通過哪種方式登錄到另一個namenode上進行接管工作。如果采用下面的參數的話,我設置集群就會報錯。顯示錯誤信息的是無法連接,也就是梁一個namenode連接被拒絕了。 <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/grid/.ssh/id_rsa_nn</value> </property> 所以,我換用了另一個值,如下: <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property> 此時針對HDFS的HA的配置工作已經完成,對應的yarn-site.xml和mapred-site.xml可以采用 http://www.cnblogs.com/ljy2013/articles/4345172.html 一文中的方式來設置即可。在這里我還設置了ResourceManager進行了熱備。於是我的文件如下: (3)Yarn-site.xml文件的修改,該文件的配置對於不同的機器需要做出相應的修改工作。 <?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>60000</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>rm-cluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.ha.id</name> //不同的節點只需要對這個參數做出相應的修改即可,也就是在熱備的另一個節點上,該參數設置為rm2.即兩個備份機器上的yarn-site.xml文件就是該參數不同。 <value>rm1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop5</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181,hadoop4:2181,hadoop5:2181</value> </property> <property> <name>yarn.resourcemanager.address.rm1</name> <value>${yarn.resourcemanager.hostname.rm1}:23140</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>${yarn.resourcemanager.hostname.rm1}:23130</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.rm1</name> <value>${yarn.resourcemanager.hostname.rm1}:23189</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>${yarn.resourcemanager.hostname.rm1}:23188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>${yarn.resourcemanager.hostname.rm1}:23125</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm1</name> <value>${yarn.resourcemanager.hostname.rm1}:23141</value> </property> <property> <name>yarn.resourcemanager.address.rm2</name> <value>${yarn.resourcemanager.hostname.rm2}:23140</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>${yarn.resourcemanager.hostname.rm2}:23130</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.rm2</name> <value>${yarn.resourcemanager.hostname.rm2}:23189</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>${yarn.resourcemanager.hostname.rm2}:23188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>${yarn.resourcemanager.hostname.rm2}:23125</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>${yarn.resourcemanager.hostname.rm2}:23141</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/home/hadoop/logs/yarn_local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/home/hadoop/logs/yarn_log</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/home/hadoop/logs/yarn_remotelog</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>4.2</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>2</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration> (4)mapred-site.xml文件的修改 <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop1:10020,hadoop5:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop1:19888,hadoop5:19888</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/tmp/hadoop-yarn/staging</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>100</value> </property> <property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>10</value> </property> </configuration> Ok了!至此,所有的配置文件修改工作都完成了。下面就是介紹一下如何啟動了 5、啟動 (1)首先啟動zookeeper集群 由於我的節點是5個,所以我是將所有的節點都用來作為zookeeper來作為zookeeper的集群。因此在各節點上執行如下命令即可。 zkServer.sh start 所有的節點都啟動zookeeper服務之后,zookeeper集群就已經啟動了。 (2)對zookeeper集群進行格式化 hdfs zkfc -formatZK (3)啟動JournalNode進程,注意這個在第一次的時候必須要按照這個順序執行。否則后面hdfs格式化不了。 同樣,我也是將所有的節點都作為了journalnode的節點,於是在所有的節點上執行下面的命令來啟動journalnode。 hadoop-daemon.sh start journalnode (4)格式化hadoop的集群,注意,第一次格式化必須首先啟動上面的journalnode進程。並且,hadoop格式化的執行在某一個namenode節點上進行,在這里我選擇的是hadoop1上執行。 hdfs namenode -format mycluster (5)啟動第(4)步格式化之后的namenode。 也就是說在第(4)步上面格式化后的namenode節點上啟動namenode進程。 hadoop-daemon.sh start namenode (6)在另外一個namenode節點上按順序執行如下兩個命令來啟動namenode進程。(本文中是hadoop5上執行) hdfs namenode -bootstrapStandby hadoop-daemon.sh start namenode (7)在一個namenode節點上執行一下兩個命令啟動所有的進程: start-dfs.sh start-yarn.sh (8)此時啟動完之后,我們可以通過下面的命令來查看兩個namenode的狀態是否是standby或者是active hdfs haadmin -getServiceState nn1 standby hdfs haadmin -getServiceState nn2 active 這里的nn1和nn2就是上面的配置文件中所設置的。nn1對應的就是hadoop1,nn2對應的就是hadoop5。 6、檢驗自動切換,通過kill active的namenode來驗證namenode是否能自動切換。 (1)通過上面步驟(8)查看到對應的那個namenode的狀態是active,在該namenode節點上查看所有的進程。如下所示: (2)在active的namenode節點上,執行 kill -9 7048 。實際上這一步可以直接將該節點重啟也可以。 (3)在standby的namenode節點上查看其狀態的改變。 hdfs haadmin -getServiceState nn1 我們可以看到其對應的狀態從standby的狀態轉變為active的狀態了。 7、通過上傳文件來檢測HDFS的健康狀態 執行 :hadoop fs -put /hadoop-2.6.0/etc/hadoop/hdfs-site.xml / 然后可以通過web查看hdfs-site.xml 8、測試在作業運行時,namendoe掛掉了是否能自動切換,並且還能正常執行作業? 准備一個2G的文件,我准備了一不電影zr.MP4,2.13G准備上傳上HDFS中,在上傳的過程中kill 掉active的namenode 查看最終的運行結果。 通過在在standby的namenode節點上執行:hadoop fs -put zr.mp4 / 。 在它執行的過程中,在active的namenode節點上執行:kill -9 7048 (這里的7048就是namenode進程ID)。在執行過程中,我們可以看到如下圖所示: 通過上圖可以看出,最終zr.mp4上傳成功了。至此HDFS的HA方案已經完成。完全可以投入使用。