問題現象:
使用hbase shell 連接報如下問題:
2019-10-09 10:37:18,855 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
2019-10-09 10:37:18,856 WARN [main] zookeeper.ZKUtil: hconnection-0x6ef784bf0x0, quorum=xxx:2181,xxx:2181,xxx:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
hbase 日志里面報錯日志如下:
2019-10-09 09:26:58,701 WARN [regionserver/xxx/192.168.1.8:16020-longCompactions-1569222224980-SendThread(xxx:2181)] zookeeper.ClientCnxn: Session 0x0 for server xxx/192.168.1.24:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
解決過程:
由上訴問題現象,可以發現是由於zookeeper的問題,先嘗試看一下zookeeper是否掛掉。
1、使用telnet host or ip 2181 連接測試
# telnet xxx 2181
Trying 192.168.1.23...
Connected to xxx.
Escape character is '^]'.
Connection closed by foreign host.
發現連接不過去,遠程服務器或者遠程程序關閉了該連接
2、連接到zookeeper的節點服務器查看socket連接數
:~$ netstat -anl|grep 2181|grep -i '192.168.1.7'|grep ESTABLISHED|wc -l
1
:~$ netstat -anl|grep 2181|grep -i '192.168.1.8'|grep ESTABLISHED|wc -l
60
上訴的192.168.1.8這台機器就算hbase報錯的服務器,可以發現這台機器在當前的zookeeper節點保持的會話是60個,這遠遠沒有達到系統的限制
3、修改hbase的zookeeper連接限制
<property> <name>hbase.zookeeper.property.maxClientCnxns</name> <value>300</value> #默認是30,修改完以后,重啟regioserver,但是沒什么用
property>
4、修改zookeeper下的zoo.cfg文件
#maxClientCnxns=60 這個值跟剛才查看的ESTABLISHED連接數量剛好一致 取消掉注釋,修改為150,重啟zookeeper
問題解決