現象:
在集群中某節點, 啟動DataNode服務后馬上又Shutdown, 在操作系統沒看到有DataNode的日志(可能是服務啟動失敗, 自動刪除了日志文件),幸好在界面上可以查看報錯的日志:



點開報錯信息, 可以看到如下信息:

HDFS的端口為50010, 但是使用netstat -ntulp | grep 50010查看不到此端口。
分析:
原因:當應用程序崩潰后, 它會留下一個滯留的socket,以便能夠提前重用socket, 當嘗試綁定socket並重用它,你需要將socket的flag設置為SO_REUSEADDR,但是HDFS不是這么做的。
解決辦法是使用設置SO_REUSEADDR的應用程序綁定到這個端口, 然后停止這個應用程序。可以使用netcat工具實現。
解決辦法: 安裝nc工具, 使用nc工具占用50010端口, 然后關閉nc服務, 再次啟動DataNode后正常。


參考鏈接:
http://www.nosql.se/2013/10/hadoop-hdfs-datanode-java-net-bindexception-address-already-in-use/
參考文字:
After an application crashes it might leave a lingering socket, so to reuse that
socket early you need to set the socket flag SO_REUSEADDR when attempting to bind to
it to be allowed to reuse it. The HDFS datanode doesn’t do that, and I didn’t want to
restart the HBase regionserver (which was locking the socket with a connection it hadn’t realized was dead).
The solution was to bind to the port with an application that sets SO_REUSEADDR and
then stop that application, I used netcat for that:
# nc -l 50010
2017-02-17 20:54:52,250 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Shutdown complete.
2017-02-17 20:54:52,251 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at com.cloudera.io.netty.channel.socket
.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at com.cloudera.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475)
at com.cloudera.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021)
at com.cloudera.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:455)
at com.cloudera.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:440)
at com.cloudera.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:844)
at com.cloudera.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:194)
at com.cloudera.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:340)
at com.cloudera.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at com.cloudera.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at com.cloudera.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at com.cloudera.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
2017-02-17 20:54:52,262 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-02-17 20:54:52,264 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at cdh1/192.168.5.78