應用程序連接hbase報錯:java.net.SocketTimeoutException: callTimeout=60000


背景說明

  今天對生產環境hbase增加了節點,下午的時候一個同事反饋,應用程序后台報錯,如下:

Tue Feb 26 17:35:35 CST 2019, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68451: row 'SYSTEM.CATALOG,TARGETCUST_DATA,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=host-10-191-36-24,16020,1551146724629, seqNum=0

        at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:210)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
        at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:212)
        at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:186)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1275)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1181)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1165)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1122)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:957)
        at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
        at org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:506)
        at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:722)
        at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:692)
        at org.apache.hadoop.hbase.client.HTable.getStartKeysInRange(HTable.java:1769)
        at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1724)
        at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1704)
        at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1301)
        ... 47 more
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68451: row 'SYSTEM.CATALOG,TARGETCUST_DATA,99999999999999' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=host-10-191-36-24,16020,1551146724629, seqNum=0
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:169)
        at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
        ... 3 more
Caused by: java.net.UnknownHostException: host-10-191-36-24
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.<init>(AbstractRpcClient.java:315)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1639)
        at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:162)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:376)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
        ... 4 more
2019-02-26 17:35:35 [com.asiainfo.cb2.consumer.ReceiveSMSID]-[ERROR]:33 - sms consumer begin 
2019-02-26 17:35:35 [com.asiainfo.cb2.consumer.ReceiveSMSID]-[ERROR]:38 - sms consumer pushType3

 

分析

  開始的時候,思路還是糾結在,超時的錯誤java.net.SocketTimeoutException: callTimeout=60000,想着有沒有辦法能夠增加客戶端的超時時間,然后還看了datanode的日志,認為是datanode在寫數據導磁盤的時候性能問題,新增加hadoop節點,節點間在進行平衡,導致影響了性能,但是,對於以上的分析都沒有更好的方法來解決問題。

結果

  等到后續,在仔細看錯誤,發現,Caused by: java.net.UnknownHostException: host-10-191-36-24,找不到主機的異常,才突然明白,應用程序首先連接到zk,然后zk告知region在哪個regionserver上,然后,應用程序再連接到hbase的regionserver上讀寫數據。

解決

  所以,在應用程序的/etc/hosts上配置新增的hbase regionserver節點的hosts解析,再次觀察應用程序,該問題解決

 

文檔創建時間:2019年2月27日10:34:38


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM