解決辦法
因為,如下,我的Hadoop HA集群。
1、首先在hdfs-site.xml中添加下面的參數,該參數的值默認為false:
<property> <name>dfs.ha.automatic-failover.enabled.ns</name> <value>true</value> </property>
2、在core-site.xml文件中添加下面的參數,該參數的值為ZooKeeper服務器的地址,ZKFC將使用該地址。
在HA或者HDFS聯盟中,上面的兩個參數還需要以NameServiceID為后綴,比如dfs.ha.automatic-failover.enabled.mycluster。除了上面的兩個參數外,還有其它幾個參數用於自動故障轉移,比如ha.zookeeper.session-timeout.ms,但對於大多數安裝來說都不是必須的。
在添加了上述的配置參數后,下一步就是在ZooKeeper中初始化要求的狀態,可以在任一NameNode中運行下面的命令實現該目的,該命在ZooKeeper中創建znode:
執行該命令需要進入Hadoop的安裝目錄下面的bin目錄中找到hdfs這個命令,輸入上面的命令執行,然后就可以修復這個問題了。
注意:之前,先得啟動好,每台機器的zookeeper進程。
[kfk@bigdata-pro01 bin]$ pwd /opt/modules/hadoop-2.6.0/bin [kfk@bigdata-pro01 bin]$ ./hdfs zkfc -formatZK
18/06/16 10:44:28 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=bigdata-pro01.kfk.com:2181,bigdata-pro02.kfk.com:2181,bigdata-pro03.kfk.com:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@20deea7f 18/06/16 10:44:28 INFO zookeeper.ClientCnxn: Opening socket connection to server bigdata-pro01.kfk.com/192.168.80.151:2181. Will not attempt to authenticate using SASL (unknown error) 18/06/16 10:44:28 INFO zookeeper.ClientCnxn: Socket connection established to bigdata-pro01.kfk.com/192.168.80.151:2181, initiating session 18/06/16 10:44:28 INFO zookeeper.ClientCnxn: Session establishment complete on server bigdata-pro01.kfk.com/192.168.80.151:2181, sessionid = 0x164065bc2a90001, negotiated timeout = 5000 =============================================== The configured parent znode /hadoop-ha/ns already exists. Are you sure you want to clear all failover information from ZooKeeper? WARNING: Before proceeding, ensure that all HDFS services and failover controllers are stopped! =============================================== Proceed formatting /hadoop-ha/ns? (Y or N) 18/06/16 10:44:28 INFO ha.ActiveStandbyElector: Session connected. y 18/06/16 10:44:57 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/ns from ZK... 18/06/16 10:44:57 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/ns from ZK. 18/06/16 10:44:57 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ns in ZK. 18/06/16 10:44:57 INFO zookeeper.ClientCnxn: EventThread shut down 18/06/16 10:44:57 INFO zookeeper.ZooKeeper: Session: 0x164065bc2a90001 closed [kfk@bigdata-pro01 bin]$
啟動並測試
1、先停止掉Hadoop和zookeeper的進程。
2、啟動zookeeper進程。
3、開啟zkfc進程
[kfk@bigdata-pro01 hadoop-2.6.0]$ pwd /opt/modules/hadoop-2.6.0 [kfk@bigdata-pro01 hadoop-2.6.0]$ sbin/hadoop-daemon.sh start zkfc starting zkfc, logging to /opt/modules/hadoop-2.6.0/logs/hadoop-kfk-zkfc-bigdata-pro01.kfk.com.out
4、進入Hadoop的安裝目錄下面的sbin目錄中,找到start-dfs.sh命令可以啟動NameNode,當然這里需要你在配置了NameNode主節點的Hadoop節點上面來執行他。
或者,直接sbin/start-all.sh
[kfk@bigdata-pro02 hadoop-2.6.0]$ bin/hdfs -help Usage: hdfs [--config confdir] COMMAND where COMMAND is one of: dfs run a filesystem command on the file systems supported in Hadoop. namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode journalnode run the DFS journalnode zkfc run the ZK Failover Controller daemon datanode run a DFS datanode dfsadmin run a DFS admin client haadmin run a DFS HA admin client fsck run a DFS filesystem checking utility balancer run a cluster balancing utility jmxget get JMX exported values from NameNode or DataNode. mover run a utility to move block replicas across storage types oiv apply the offline fsimage viewer to an fsimage oiv_legacy apply the offline fsimage viewer to an legacy fsimage oev apply the offline edits viewer to an edits file fetchdt fetch a delegation token from the NameNode getconf get config values from configuration groups get the groups which users belong to snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot lsSnapshottableDir list all snapshottable dirs owned by the current user Use -help to see options portmap run a portmap service nfs3 run an NFS version 3 gateway cacheadmin configure the HDFS cache crypto configure HDFS encryption zones storagepolicies get all the existing block storage policies version print the version Most commands print help when invoked w/o parameters.
[kfk@bigdata-pro02 hadoop-2.6.0]$ [kfk@bigdata-pro02 hadoop-2.6.0]$ bin/hdfs haadmin -help Usage: DFSHAAdmin [-ns <nameserviceId>] [-transitionToActive <serviceId> [--forceactive]] [-transitionToStandby <serviceId>] [-failover [--forcefence] [--forceactive] <serviceId> <serviceId>] [-getServiceState <serviceId>] [-checkHealth <serviceId>] [-help <command>] Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] [kfk@bigdata-pro02 hadoop-2.6.0]$
注意,其實自帶的命令里,都提供了,若兩者都是standby狀態怎么執行。若兩者都是active狀態怎么執行。這里,不多贅述。
如果,還是沒解決的話,則
bin/hdfs haadmin -transitionToActive nn1