hbase時間不同步問題引起的bug


查看步驟:

一:讀取hbase數據庫時出現異常

2018-12-10 10:00:13,620 ERROR [hconnection-0x2609b277-metaLookup-shared--pool1-t2] zookeeper.ZooKeeperWatcher - hconnection-0x2609b277-0x267942b66f701d1, quorum=10.100.2.92:2181,10.100.2.93:2181,10.100.2.94:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/meta-region-server
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:623)
    at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:487)
    at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)
    at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:608)
    at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:588)
    at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:561)
    at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1211)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1178)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1152)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:303)
    at org.apache.hadoop.hbase.client.ReversedScannerCallable.prepare(ReversedScannerCallable.java:105)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:376)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
    at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

二:首先看了下hbase的監控,http://masterHostIp:60010/master-status

發現少了個serverName。下圖是正常狀態。

三:重新啟動hbase,命令如下。期間也試過重啟zookeeper,再啟動hbase。

啟動HBase集群:
bin/start-hbase.sh
單獨啟動一個HMaster進程:
bin/hbase-daemon.sh start master
單獨停止一個HMaster進程:
bin/hbase-daemon.sh stop master
單獨啟動一個HRegionServer進程:
bin/hbase-daemon.sh start regionserver
單獨停止一個HRegionServer進程:
bin/hbase-daemon.sh stop regionserver

四:發現仍然是有一個服務器的hbase沒有啟動起來。看hbase的日志:

2018-12-10 10:40:38,785 FATAL [regionserver/dev-hadoop2/10.100.2.93:16020] regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server dev-hadoop2,16020,1544409635868 has been rejected; Reported time is too far out of sync with master.  Time difference of 79232ms > max allowed of 30000ms
    at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:409)
    at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)
    at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
    at java.lang.Thread.run(Thread.java:748)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
    at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:330)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2318)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:907)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server dev-hadoop2,16020,1544409635868 has been rejected; Reported time is too far out of sync with master.  Time difference of 79232ms > max allowed of 30000ms
    at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:409)
    at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)
    at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
    at java.lang.Thread.run(Thread.java:748)

    at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1267)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2316)
    ... 2 more

2018-12-10 10:40:38,866 INFO  [regionserver/dev-hadoop2/10.100.2.93:16020] regionserver.HRegionServer: STOPPED: Unhandled: org.apache.hadoop.hbase.ClockOutOfSyncException: Server dev-hadoop2,16020,1544409635868 has been rejected; Reported time is too far out of sync with master.  Time difference of 79232ms > max allowed of 30000ms
    at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:409)
    at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:275)
    at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:361)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
    at java.lang.Thread.run(Thread.java:748)

2018-12-10 10:40:38,995 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
    at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)
    at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2697)

原因是:Time difference of 79232ms > max allowed of 30000ms,總結就是系統時間不同步。

五:解決方法:

1- vi /etc/ntp.conf   加上黃色的一行,意思是所有的時間都和10.100.2.93時間同步。

server 127.127.1.0
fudge 127.127.1.0 stratum 8
Broadcastdelay 0.008
server 0.centos.pool.ntp.org
server 1.centos.pool.ntp.org
server 2.centos.pool.ntp.org
server 10.100.2.93

2- service ntpd restart    重啟ntpd,使配置生效。

3- ntpq -pn 查看狀態(在非2.93上查看):

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 185.134.197.4   .STEP.          16 u    -  512    0    0.000    0.000   0.000
 193.228.143.14  .STEP.          16 u    -  512    0    0.000    0.000   0.000
 119.28.206.193  .STEP.          16 u    -  512    0    0.000    0.000   0.000
 85.199.214.100  .STEP.          16 u    -  512    0    0.000    0.000   0.000
*10.100.2.93     LOCAL(0)         9 u   21   64  377    2.026    0.781   0.804

沒有任何兩樣東西一樣,晶振(計算機硬件)也是有差異的。帶來的問題是,時間差異會越來越大。

refid:參考的上一層NTP主機的地址

st:即stratum階層

when:幾秒前曾做過時間同步更新的操作

poll:下次更新在幾秒之后

reach:已經向上層NTP服務器要求更新的次數

delay:網絡傳輸過程鍾延遲的時間

offset:時間補償的結果

jitter:Linux系統時間與BIOS硬件時間的差異時間

最后提及一點,ntp服務,默認只會同步系統時間。如果想要讓ntp同時同步硬件時間,可以設置/etc/sysconfig/ntpd 文件。

在/etc/sysconfig/ntpd文件中,添加 SYNC_HWCLOCK=yes 這樣,就可以讓硬件時間與系統時間一起同步。

 

4- 使用定時器定時同步時間。

詳情可參考:https://my.oschina.net/myaniu/blog/182959

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM