HiveServer2連接ZooKeeper出現Too many connections問題的解決


作者: 大圓那些事 | 文章可以轉載,請以超鏈接形式標明文章原始出處和作者信息

網址: http://www.cnblogs.com/panfeng412/archive/2013/03/23/hiveserver2-too-many-zookeeper-connections-issues.html

HiveServer2支持多客戶端的並發訪問,使用ZooKeeper來管理Hive表的讀寫鎖。實際環境中,遇到了HiveServer2連接ZooKeeper出現Too many connections的問題,這里是對這一問題的排查和解決過程。

問題描述

HiveServer2服務無法執行hive命令,日志中提示如下錯誤:

2013-03-22 12:54:43,946 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1089)) - Session 0x0 for server hostname/***.***.***.***:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:200)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

問題排查

1. 首先,根據HiveServer2的錯誤日志,提示是由於Connection reset by peer,即連接被ZooKeeper拒絕。

2. 進一步查看HiveServer2上所配置的ZooKeeper集群日志(用戶Hive表的讀寫鎖管理),發現如下錯誤信息:

2013-03-22 12:52:48,938 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /***.***.***.*** - max is 50

3. 結合HiveServer2的日志,可見是由於HiveServer2所在機器對ZooKeeper的連接數超過了ZooKeeper設置允許的單個client最大連接數(這里是50)。

4. 我們進一步確認了是不是完全都是HiveServer2占用了這50個連接,顯示確實是HiveServer2進程內部占用了這50個連接(進程號26871即為HiveServer2進程):

[user@hostname ~]$ sudo netstat -nap  | grep 2181
tcp    0      0 ***.***.***.***:58089   ***.***.***.***:2181    ESTABLISHED 26871/java          
tcp    0      0 ***.***.***.***:57837   ***.***.***.***:2181    ESTABLISHED 26871/java          
tcp    0      0 ***.***.***.***:57853   ***.***.***.***:2181    ESTABLISHED 26871/java         
……
(共計50個)

5. 為什么HiveServer2會占用這么多連接?而實際並發請求量並沒有這么多。只能從HiveServer2的實現原理找找線索,由於HiveServer2是通過Thrift實現的,懷疑是不是其內部維護連接池導致的?經過查看hive-default.xml中發現,其中默認配置了工作線程數(這里猜測每個工作線程會維護一個與ZooKeeper的連接,有待從代碼級別進行驗證)

<property>
  <name>hive.server2.thrift.min.worker.threads</name>
  <value>5</value>
  <description>Minimum number of Thrift worker threads</description>
</property>
<property>
  <name>hive.server2.thrift.max.worker.threads</name>
  <value>100</value>
  <description>Maximum number of Thrift worker threads</description>
</property>

問題解決

方法一:

通過在hive-site.xml中修改HiveServer2Thrift工作線程數,減少與ZooKeeper的連接請求數。這樣可能降低HiveServer2的並發處理能力。

方法二:

通過修改ZooKeeperzoo.cfg文件中的maxClientCnxns選項,調大對於單個Client的連接數限制。

以上兩個方法,需要根據自己的實際生產情況進行合理設置。

相關的配置選項:

1hive-site.xml中:

<property>
  <name>hive.server2.thrift.min.worker.threads</name>
  <value>10</value>
  <description>Minimum number of Thrift worker threads</description>
</property>
<property>
  <name>hive.server2.thrift.max.worker.threads</name>
  <value>200</value>
  <description>Maximum number of Thrift worker threads</description>
</property>
<property>
  <name>hive.zookeeper.session.timeout</name>
  <value>60000</value>
  <description>Zookeeper client's session timeout. The client is disconnected, and as a result, all locks released, if a heartbeat is not sent in the timeout.</description>
</property>

2zoo.cfg中:

# Limits the number of concurrent connections (at the socket level) that a single client, identified by IP address
maxClientCnxns=200
# The minimum session timeout in milliseconds that the server will allow the client to negotiate
minSessionTimeout=1000
# The maximum session timeout in milliseconds that the server will allow the client to negotiate
maxSessionTimeout=60000


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM