全網最詳細的Hadoop HA集群啟動后，兩個namenode都是standby的解決辦法（圖文詳解）

本文轉載自查看原文 2019-08-14 15:09 412 hadoop

解決辦法

　　因為，如下，我的Hadoop HA集群。

1、首先在hdfs-site.xml中添加下面的參數，該參數的值默認為false：

  <property>
                <name>dfs.ha.automatic-failover.enabled.ns</name>
                <value>true</value>
        </property>

2、在core-site.xml文件中添加下面的參數，該參數的值為ZooKeeper服務器的地址，ZKFC將使用該地址。

　　在HA或者HDFS聯盟中，上面的兩個參數還需要以NameServiceID為后綴，比如dfs.ha.automatic-failover.enabled.mycluster。除了上面的兩個參數外，還有其它幾個參數用於自動故障轉移，比如ha.zookeeper.session-timeout.ms，但對於大多數安裝來說都不是必須的。

　　在添加了上述的配置參數后，下一步就是在ZooKeeper中初始化要求的狀態，可以在任一NameNode中運行下面的命令實現該目的，該命在ZooKeeper中創建znode：

　　執行該命令需要進入Hadoop的安裝目錄下面的bin目錄中找到hdfs這個命令，輸入上面的命令執行，然后就可以修復這個問題了。

　　注意：之前，先得啟動好，每台機器的zookeeper進程。

[kfk@bigdata-pro01 bin]$ pwd
/opt/modules/hadoop-2.6.0/bin
[kfk@bigdata-pro01 bin]$ ./hdfs zkfc -formatZK

18/06/16 10:44:28 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=bigdata-pro01.kfk.com:2181,bigdata-pro02.kfk.com:2181,bigdata-pro03.kfk.com:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@20deea7f
18/06/16 10:44:28 INFO zookeeper.ClientCnxn: Opening socket connection to server bigdata-pro01.kfk.com/192.168.80.151:2181. Will not attempt to authenticate using SASL (unknown error)
18/06/16 10:44:28 INFO zookeeper.ClientCnxn: Socket connection established to bigdata-pro01.kfk.com/192.168.80.151:2181, initiating session
18/06/16 10:44:28 INFO zookeeper.ClientCnxn: Session establishment complete on server bigdata-pro01.kfk.com/192.168.80.151:2181, sessionid = 0x164065bc2a90001, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/ns already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/ns? (Y or N) 18/06/16 10:44:28 INFO ha.ActiveStandbyElector: Session connected.
y
18/06/16 10:44:57 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/ns from ZK...
18/06/16 10:44:57 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/ns from ZK.
18/06/16 10:44:57 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ns in ZK.
18/06/16 10:44:57 INFO zookeeper.ClientCnxn: EventThread shut down
18/06/16 10:44:57 INFO zookeeper.ZooKeeper: Session: 0x164065bc2a90001 closed
[kfk@bigdata-pro01 bin]$

啟動並測試

　　1、先停止掉Hadoop和zookeeper的進程。

　　2、啟動zookeeper進程。

　　3、開啟zkfc進程

[kfk@bigdata-pro01 hadoop-2.6.0]$ pwd
/opt/modules/hadoop-2.6.0
[kfk@bigdata-pro01 hadoop-2.6.0]$ sbin/hadoop-daemon.sh start zkfc 
starting zkfc, logging to /opt/modules/hadoop-2.6.0/logs/hadoop-kfk-zkfc-bigdata-pro01.kfk.com.out

　　4、進入Hadoop的安裝目錄下面的sbin目錄中，找到start-dfs.sh命令可以啟動NameNode，當然這里需要你在配置了NameNode主節點的Hadoop節點上面來執行他。

　　　　或者，直接sbin/start-all.sh

[kfk@bigdata-pro02 hadoop-2.6.0]$ bin/hdfs -help
Usage: hdfs [--config confdir] COMMAND
       where COMMAND is one of:
  dfs                  run a filesystem command on the file systems supported in Hadoop.
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  journalnode          run the DFS journalnode
  zkfc                 run the ZK Failover Controller daemon
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  haadmin              run a DFS HA admin client
  fsck                 run a DFS filesystem checking utility
  balancer             run a cluster balancing utility
  jmxget               get JMX exported values from NameNode or DataNode.
  mover                run a utility to move block replicas across
                       storage types
  oiv                  apply the offline fsimage viewer to an fsimage
  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
  oev                  apply the offline edits viewer to an edits file
  fetchdt              fetch a delegation token from the NameNode
  getconf              get config values from configuration
  groups               get the groups which users belong to
  snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
  lsSnapshottableDir   list all snapshottable dirs owned by the current user
                        Use -help to see options
  portmap              run a portmap service
  nfs3                 run an NFS version 3 gateway
  cacheadmin           configure the HDFS cache
  crypto               configure HDFS encryption zones
  storagepolicies      get all the existing block storage policies
  version              print the version

Most commands print help when invoked w/o parameters.

[kfk@bigdata-pro02 hadoop-2.6.0]$ 
[kfk@bigdata-pro02 hadoop-2.6.0]$ bin/hdfs haadmin -help
Usage: DFSHAAdmin [-ns <nameserviceId>]
    [-transitionToActive <serviceId> [--forceactive]]
    [-transitionToStandby <serviceId>]
    [-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
    [-getServiceState <serviceId>]
    [-checkHealth <serviceId>]
    [-help <command>]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

[kfk@bigdata-pro02 hadoop-2.6.0]$

　　　　注意，其實自帶的命令里，都提供了，若兩者都是standby狀態怎么執行。若兩者都是active狀態怎么執行。這里，不多贅述。

　　如果，還是沒解決的話，則

bin/hdfs haadmin -transitionToActive nn1

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 全網最詳細的Hadoop HA集群啟動后，兩個namenode都是standby的解決辦法（圖文詳解） Hadoop ha CDH5.15.1-hadoop集群啟動后，兩個namenode都是standby模式全網最詳細的HA集群的主節點之間的雙active，雙standby，active和standby之間切換的解決辦法（圖文詳解）全網最詳細的zkfc啟動以后，幾秒鍾以后自動關閉問題的解決辦法（圖文詳解）全網最詳細的再次或多次格式化導致namenode的ClusterID和datanode的ClusterID之間不一致的問題解決辦法（圖文詳解） HADOOP HA 踩坑 - 所有 namenode 都是standby 全網最詳細的HBase啟動以后，HMaster進程啟動了，幾秒鍾以后自動關閉問題的解決辦法（圖文詳解） hadoop啟動后jps查不到namenode的解決辦法 hadoop啟動后jps查不到namenode的解決辦法全網最詳細的啟動Kafka服務時出現kafka.common.InconsistentBrokerIdException: Configured brokerId 3 doesn't match stored brokerId 1 in meta.properties錯誤的解決辦法（圖文詳解）