1. 基本信息
hadoop 版本 hadoop-0.20.205.0.tar.gz
操作系統 ubuntu
2. 問題
在使用Hadoop開發初期的時候遇到一個問題。 每次重啟系統后發現不能正常運行hadoop。必須執行 bin/hadoop namenode -format 進行格式化才能成功運行hadoop,但是也就意味着以前記錄的name等數據丟失。
查詢日志發現錯誤:
- 21:08:48,103 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
- 21:08:48,125 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
- 21:08:48,129 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop-sylar/dfs/name does not exist.
- 21:08:48,130 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
- org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-sylar/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
- at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:288)
- at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97)
- at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:384)
- at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:358)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:497)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1268)
- at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1277)
3.原因
后查詢文檔時發現, 在linux下hadoop等的各種數據保存在/tmp目錄下。 當重啟系統后/tmp目錄中的數據信息被清除,導致hadoop啟動失敗。 當bin/hadoop namenode -format 格式化后,恢復了默認設置,即可正常啟動。
4. 解決
需要在配置文件core-site.xml中指定臨時目錄的存儲位置, 現貼出修改后的配置
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://127.0.0.1:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/hadoopdata/tmp</value>
- <description>A base for other temporary directories.</description>
- </property>
- <property>
- <name>dfs.name.dir</name>
- <value>/home/hadoopdata/filesystem/name</value>
- <description>Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. </description>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/home/hadoopdata/filesystem/data</value>
- <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.</description>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- <description>Default block replication. The actual number of replications can be specified when the file is created. The default isused if replication is not specified in create time.</description>
- </property>
- </configuration>
dfs.name.dir是NameNode持久存儲名字空間及事務日志的本地文件系統路徑。當這個值是一個逗號分割的目錄列表時,nametable數據將會被復制到所有目錄中做冗余備份。