處理CDH環境Hadoop:NameNode is not formatted


背景

因升級JN節點,需要將JN遷移到其他機器,該節點有三台在遷移過程中我遷移其中一台。
在HDFS頁面進行角色遷移,選擇當前角色機器和目標機器,提示需要重啟整個集群(前提是需要確保是否有人員在使用)。重啟后出現錯誤導致HA中Master無法啟動

錯誤信息

引導備用 NameNode
Failed to bootstrap Standby NameNode NameNode (cluster-master): STARTUP_MSG:   build = http://github.com/cloudera/hadoop -r 91e45acfc3e208d656c3ec1c1a0abe4a8de6ad4c; compiled by 'jenkins' on 2016-01-26T00:19Z
STARTUP_MSG:   java = 1.7.0_67
************************************************************/
19/01/15 11:11:47 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
19/01/15 11:11:47 INFO namenode.NameNode: createNameNode [-bootstrapStandby, -nonInteractive]
Running in non-interactive mode, and data appears to exist in Storage Directory /data1/dfs/nn. Not formatting.
19/01/15 11:11:49 INFO util.ExitUtil: Exiting with status 5
19/01/15 11:11:49 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at cluster-master.gyyx.cn/10.12.50.49
************************************************************/

查看日志

2019-01-15 13:24:55,058 INFO org.apache.hadoop.util.GSet: Computing capacity for map NameNodeRetryCache
2019-01-15 13:24:55,058 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit
2019-01-15 13:24:55,058 INFO org.apache.hadoop.util.GSet: 0.029999999329447746% max memory 4.9 GB = 1.5 MB
2019-01-15 13:24:55,058 INFO org.apache.hadoop.util.GSet: capacity      = 2^18 = 262144 entries
2019-01-15 13:24:55,063 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: ACLs enabled? false
2019-01-15 13:24:55,063 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: XAttrs enabled? true
2019-01-15 13:24:55,063 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: Maximum size of an xattr: 16384
2019-01-15 13:24:55,080 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data1/dfs/nn/in_use.lock acquired by nodename 15050@cluster-master.gyyx.cn
2019-01-15 13:24:55,083 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.IOException: NameNode is not formatted.
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:212)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:609)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:666)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:838)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:817)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1606)
2019-01-15 13:24:55,096 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@cluster-master.gyyx.cn:50070
2019-01-15 13:24:55,196 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2019-01-15 13:24:55,197 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2019-01-15 13:24:55,198 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2019-01-15 13:24:55,198 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: NameNode is not formatted.
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:212)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:609)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:666)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:838)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:817)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1606)
2019-01-15 13:24:55,202 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2019-01-15 13:24:55,205 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 

關注點在:

2019-01-15 13:24:55,083 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.IOException: NameNode is not formatted.

各種百度、google搜索均是要求格式化,

hadoop namenode -format

我這生產環境能動不動就格式化嗎?

解決思路

根據提示說是無法load fsimage
於是尋找fsimage所在的位置也就是edits 所在的位置
看到/data1/dfs/nn 目錄下只有一個root權限的current.bak 說明系統將current目錄給重命名了。
因為我的NN是HA。所以可以把current目錄拷貝過來。(不能把currtne.bak名稱改過去是因為數據已經發生變更)

操作流程

1、聯系各組負責人需要對hadoop集群進行修復,暫停使用查詢或其他操作
2、關閉整個集群,確認服務均已關閉
3、拷貝current數據至故障NN
scp -r  -P63008 root@YOUR_NAMENODE:/data1/dfs/nn/current/* .
4、授權
chown -R hdfs.hdfs current
5、刪除/tmp 目錄下的臨時文件
6、重啟集群
7、查看hadoop日志、cloudera manager狀態正常

解決問題


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM