1、hbase regionserver 錯誤日志
2020-04-07 15:53:36,604 WARN [hadoop01:16020-0.append-pool2-t1] wal.FSHLog: Append sequenceId=3897, requesting roll of WAL java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[110.221.140.165:50010, 110.221.140.163:50010], original=[110.221.140.165:50010, 10.221.140.163:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:969) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1035) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:933) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487) 2020-04-07 15:53:36,620 ERROR [MemStoreFlusher.3] regionserver.MemStoreFlusher: Cache flush failed for region hbase:meta,,1 org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=3897, requesting roll of WAL at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1971) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1815) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1725) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
2、分析
由於datanode集群只有四台,標准的小集群,所以hbase在寫入數據到datanode時,,在pipeline中,大量的datanode失敗時,,會把bad datanode踢出,這樣一來由於副本數不能滿足,導致regionserver掛掉
3、解決
該錯誤相關的兩個參數:
dfs.client.block.write.replace-datanode-on-failure.enable=true
dfs.client.block.write.replace-datanode-on-failure.policy=DEFAULT
這個屬性只有在dfs.client.block.write.replace-datanode-on-failure.enable設置true時有效:
ALWAYS:當一個存在的DataNode被刪除時,總是添加一個新的DataNode
NEVER:永遠不添加新的DataNode
DEFAULT:副本數是r,DataNode的數時n,只要r >= 3時,或者floor(r/2)大於等於n時,r>n時再添加一個新的DataNode,並且這個塊是hflushed/appended
借鑒:
https://blog.csdn.net/wangweislk/article/details/78890163