問題現象

部署有一個etcd集群，分別是10.8.65.106，10.8.65.107和10.8.65.108。

然后我使用etcdctl為一個值設置ttl，然后通過watch觀察，發現失效時間不准確，而且時間隨機。

比如我設置/mytest/test的ttl時間為10秒

[root@node-106 ~]# date && etcdctl set --ttl 10 /mytest/test hello && date
Fri Sep  2 05:31:10 EDT 2016
hello
Fri Sep  2 05:31:10 EDT 2016

這里采用的是東八區時間，所以UTC時間應該為2016-09-02T09:31:20

但是通過watch查看時候，發現etcd將其失效時間設置為了2016-09-02T09:31:18，而不是2016-09-02T09:31:20。

[root@node-106 ~]# curl -X GET "http://10.8.65.108:2379/v2/keys/mytest/test1?recursive=false&wait=true&stream=true"
{"action":"set","node":{"key":"/mytest/test","value":"hello","expiration":"2016-09-02T09:31:18.221701998Z","ttl":17,"modifiedIndex":306840,"createdIndex":306840}}
{"action":"expire","node":{"key":"/mytest/test","modifiedIndex":306844,"createdIndex":306840},"prevNode":{"key":"/mytest/test","value":"hello","expiration":"2016-09-02T09:31:18.221701998Z","ttl":9,"modifiedIndex":306840,"createdIndex":306840}}
{"action":"expire","node":{"key":"/mytest/test","modifiedIndex":306844,"createdIndex":306840},"prevNode":{"key":"/mytest/test","value":"hello","expiration":"2016-09-02T09:31:18.221701998Z","ttl":9,"modifiedIndex":306840,"createdIndex":306840}}

這個反復實驗多次，發現理論失效時間10秒與實際失效時間的誤差，最多可能到9秒，也有0秒。誤差似乎是隨機的。

問題分析

打開debug模式，進行詳細分析。

[root@node-106 ~]# date && etcdctl --debug set --ttl 10 /mytest/test1 hello && date
Fri Sep  2 05:57:20 EDT 2016
start to sync cluster using endpoints(http://127.0.0.1:4001,http://127.0.0.1:2379)
cURL Command: curl -X GET http://127.0.0.1:4001/v2/members
cURL Command: curl -X GET http://127.0.0.1:2379/v2/members
got endpoints(http://10.8.65.107:2379,http://10.8.65.106:2379,http://10.8.65.108:2379) after sync
Cluster-Endpoints: http://10.8.65.107:2379, http://10.8.65.106:2379, http://10.8.65.108:2379
cURL Command: curl -X PUT http://10.8.65.107:2379/v2/keys/mytest/test1 -d "ttl=10&value=hello"
hello
Fri Sep  2 05:57:20 EDT 2016

可以看到etcdctl發起設置請求時，會首先獲得集群的members，然后向其中發送一個set mytest/test1的請求。而這個請求會是隨機的。如上是請求定位到了10.8.65.107之上。

之后我分別查看了三台機器的時間，發現三台時間不同步。初步判斷是時間不同步導致的，因此這里使用ntp進行同步。

[root@node-106 ~]# ntpdate pool.ntp.org
 2 Sep 05:45:23 ntpdate[24846]: adjust time server 120.25.108.11 offset -0.000273 sec

之后再進行ttl設置，失效時間恢復准確。

回顧與解決

回顧整個問題，主要原因還是時間不同步。之后再出現該問題時，可以根據返回值進行判斷。

[root@node-106 ~]# curl -X GET "http://10.8.65.108:2379/v2/keys/mytest/test1?recursive=false&wait=true&stream=true"
{"action":"set","node":{"key":"/mytest/test","value":"hello","expiration":"2016-09-02T09:31:18.221701998Z","ttl":17,"modifiedIndex":306840,"createdIndex":306840}}

返回的action為set的值，其中的ttl值應與自己設置的ttl值一致。如果該值與設置的ttl值不一致，就極有可能是時間不同步原因造成的。

所以解決方法是將三台機器進行時間同步，就不再出現ttl失效時間不准確的問題。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 使用HAL的RTC庫出現年、小時不准確的問題 mongo count不准確問題 EXCEL數據計算不准確的問題 MySQL double 類型查詢不准確的問題關於mysql使用雪花id作為主鍵出現查詢數據不准確問題 vue在munted中獲取dom高度不准確的問題 neo4j allshortestpaths查詢路徑不准確問題 Timer計時不准確的問題及解決方法 Mysql 的不准確 redis使用scan count 返回數量不准確