https://segmentfault.com/a/1190000006838239
hadoop啟動遇到的各種問題

1. HDFS initialized but not 'healthy' yet, waiting...
這個日志會在啟動hadoop的時候在JobTracker的log日志文件中出現,在這里就是hdfs出現問題,導致DataNode無法啟動,這里唯一的解決方式就是把所有的NameNode管理的路徑下的文件刪除然后重新執行namenode -format,而刪除的地方主要有存放臨時數據的tmp路徑,存放數據的data路徑還有name路徑,全部刪除之后重新format次問題就解決了
2. 在執行hadoop程序的時候出現Name node is in safe mode
這個異常一般就直接會在IDE的控制台輸出,這個錯誤的主要導致原因是,datanode不停在丟失數據,所以此時namenode就強制本身進入safe mode模式,在該模式下對數據只可以進行讀操作而不能進行寫操作。解決此異常很簡單,直接執行命令讓namenode離開次模式就可以了。
./hadoop dfsadmin-safemode leave
3. java.io.FileNotFoundException: /data/dfs/namesecondary/in_use.lock (Permission denied):
2016-09-07 10:18:42,902 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: SecondaryNameNode metrics system started 2016-09-07 10:18:43,053 FATAL org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Failed to start secondary namenode java.io.FileNotFoundException: /data/dfs/namesecondary/in_use.lock (Permission denied) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:706) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:678) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:499) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:962) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:243) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:192) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671) 2016-09-07 10:18:43,056 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2016-09-07 10:18:43,057 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down SecondaryNameNode at joyven/192.168.2.35 ************************************************************
這有兩種場景出現,
1):在原來正常的時候,有一次突然使用了原來不同的用戶啟動了一次hadoop。這種場景會產生一個in_use.lock 文件夾在你設置的目錄中,這時候可以刪除這個文件夾直接,然后重新啟動
2):在格式化hadoop的時候和當期啟動的用戶不是同一個,也會導致該問題。這個時候可以使用格式化hadoop的那個用戶重新啟動hadoop。也可以解決此錯誤。
4. hadoop /tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
啟動了集群之后發現namenode起來了,但是各個slave節點的datanode卻都沒起起來。去看namenode日志發現錯誤日志:
INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call addBlock(/opt/hadoop/tmp/mapred/system/jobtracker.info, DFSClient_502181644) from 127.0.0.1:2278: error: java.io.IOException: File /opt/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 java.io.IOException: File /opt/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
具體原因還不是很清楚,當防火牆不關閉的時候可能出現,但是當異常宕掉整個系統再重啟的時候也會出現。解決辦法是master和slave同時重新格式化
5. ERROR mapred.JvmManager: Caught Throwable in JVMRunner. Aborting TaskTracker.
java.lang.OutOfMemoryError: unable to create new native thread
在運行任務的過程中,計算突然停止,去計算節點查看TaskTracker日志,發現在計算的過程中拋出以上錯誤,經查證是因為你的作業打開的文件個數超過系統設置一個進程可以打開的文件的個數的上限。更改/etc/security/limits.conf的配置加入如下配置
hadoop soft nproc 10000 hadoop hard nproc 64000
6. namenode 異常
2013-08-20 14:10:08,946 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot access storage directory /var/lib/hadoop/cache/hadoop/dfs/name 2013-08-20 14:10:08,947 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /var/lib/hadoop/cache/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:388) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:277) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:497) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1298) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1307) 2013-08-20 14:10:08,948 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /var/lib/hadoop/cache/hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:388) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:277) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:497) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1298) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1307)
7. namenode無法啟動(或者SecondaryNameNode無法啟動)
查看namenode日志,發現端口被占用:
2016-09-07 10:18:08,547 INFO org.apache.hadoop.http.HttpServer2: HttpServer.start() threw a non Bind IOException java.net.BindException: Port in use: 0.0.0.0:50070 at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919) at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856) at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142) at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:752) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:638) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:811) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:795) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914) ... 8 more 2016-09-07 10:18:08,550 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system... 2016-09-07 10:18:08,550 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped. 2016-09-07 10:18:08,550 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete. 2016