日志集中式監控平台上線已經有一段時間,但是大部分情況下只是作為發布或者出問題時查看日志的便利工具使用。平時大家都不怎么主動上去看看。於是前幾天把應用的錯誤日志也加上郵件、Hi和短信報警,馬上就收到很多錯誤報警,引起了大家的重視。其中有一個Redis報錯:
Push notify error: => redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool at redis.clients.util.Pool.getResource(Pool.java:53) at
redis.clients.jedis.JedisPool.getResource(JedisPool.java:99) at
com.baidu.global.mobile.server.dataaccess.cache.CacheClient.execute(CacheClient.java:285) at
com.baidu.global.mobile.server.dataaccess.cache.CacheClient.getAsRaw(CacheClient.java:793) at
com.baidu.global.mobile.server.dataaccess.cache.CacheClient.get(CacheClient.java:432) at
com.baidu.global.mobile.server.guanxing.facade.MessageTaskFacade.checkPushValidationID(MessageTaskFacade.java:952) at
com.baidu.global.mobile.server.guanxing.facade.MessageTaskFacade.pushNotify(MessageTaskFacade.java:983) at
com.baidu.global.mobile.server.guanxing.facade.MessageTaskFacade.buildMessage(MessageTaskFacade.java:586) at
com.baidu.global.mobile.server.guanxing.facade.MessageTaskFacade.pushRightNow(MessageTaskFacade.java:372) at
com.baidu.global.mobile.server.guanxing.facade.MessageTaskFacade$AsynMessagePush.run(MessageTaskFacade.java:350) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at
java.lang.Thread.run(Thread.java:745) Caused by: java.util.NoSuchElementException: Unable to validate object at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:502) at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361) at
redis.clients.util.Pool.getResource(Pool.java:51) ... 12 more
看起來挺嚴重的,拿不到Redis連接,而且是在 validate的時候報的錯:
java.util.NoSuchElementException: Unable to validate object at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:502) at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361) at
我們使用的是Jedis,在validate的時候默認是發送一個PING命令到Redis,然后期待返回的是PONG字符串。難道連接池真的不夠了?
於是登上redis所在服務器,手動ping了一下,果然發現錯誤:
[work@id01-xxx-mob28.id01 redis-2.8.19]$ src/redis-cli -p 6379
127.0.0.1:6379> ping
(error) MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
127.0.0.1:6379> set hello world
(error) MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
讀命令都沒有問題,但是所有的寫操作都不行。
看一下redis的log日志,發現一直在報錯:
[26378] 03 May 17:58:51.215 * 50000 changes in 60 seconds. Saving...
[26378] 03 May 17:58:51.254 * Background saving started by pid 2854
[2854] 03 May 17:58:58.949 * DB saved on disk
[2854] 03 May 17:58:58.967 * RDB: 4 MB of memory used by copy-on-write
[26378] 03 May 17:58:59.063 * Background saving terminated with success
...
[55231] 11 Aug 10:43:12.034 # Can't save in background: fork: Cannot allocate memory
[55231] 11 Aug 10:43:18.043 * 100 changes in 3600 seconds. Saving...
[55231] 11 Aug 10:43:18.044 # Can't save in background: fork: Cannot allocate memory
[55231] 11 Aug 10:43:24.052 * 100 changes in 3600 seconds. Saving...
[55231] 11 Aug 10:43:24.052 # Can't save in background: fork: Cannot allocate memory
原來是內存不足,導致無法folk進程做RDB持久化。看了一下內存使用率:
[work@id01-xxx-mob28.id01 ~]$ free -g
total used free shared buffers cached
Mem: 62 61 1 0 0 25
-/+ buffers/cache: 35 27
Swap: 0 0 0
其實free的內存還是蠻多的,有27G。然后info一下,發現其實連接的Client數並不多,但是使用的內存確實還是蠻多的,有27.55G,超過了free的內存。
# Clients
connected_clients:41
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
# Memory
used_memory:29281598848
used_memory_human:27.27G
used_memory_rss:30031474688
used_memory_peak:29581256416
used_memory_peak_human:27.55G
used_memory_lua:35840
mem_fragmentation_ratio:1.03
mem_allocator:jemalloc-3.6.0
Redis的RDB持久化實現是folk一個子進程,然后讓子進程將內存鏡像dump到RDB文件中。理論上來說是需要跟父進程一樣的內存空間,也就是27.55G,但是由於Linux很早就支持的copy-on-write技術,所以實際上並不需要這么多的物理內存的,這個可以從log中分析出來。我們這個Redis最多只有150M左右的COW內存。 官方文檔上有說明:Background saving is failing with a fork() error under Linux even if I've a lot of free RAM!
Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.
Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.
可以簡單的通過修改overcommit_memory系統參數為1來改變這種簡單粗暴的檢查行為:
sysctl vm.overcommit_memory=1
馬上就生效了:
[55231] 11 Aug 10:59:31.036 * Background saving started by pid 11820
[11820] 11 Aug 11:04:27.567 * DB saved on disk
為了避免重啟失效,修改一下:/etc/sysctl.conf:
vm.overcommit_memory=1
TIPS 看到拿不到連接,validate出錯,第一反應總是以為是連接池不夠,或者連接太多之類的。但是實際上根據validate的實現機制,只要不是返回PONG字符串就是錯誤,所以上去手動PING一下,就知道什么問題了:)。
補記:記一次Redis錯誤排查經歷-下集
前天(2015年08月18日)早上突然又收到類似的錯誤:
get jedis connection has error => redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:53)
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:99)
at com.baidu.global.mobile.hao123.dataaccess.utilscache.jedis.RedisUtils.doTask(RedisUtils.java:82)
at com.baidu.global.mobile.hao123.nantianmen.utils.CacheHelper.getLatestEntriesByCategory(CacheHelper.java:123)
at com.baidu.global.mobile.hao123.nantianmen.service.NewsServiceImpl.getLatestEntriesByCategory(NewsServiceImpl.java:591)
at com.baidu.global.mobile.hao123.nantianmen.facade.NanTianMenFacadeImpl.getLatestEntriesByCategory(NanTianMenFacadeImpl.java:192)
at com.baidu.global.mobile.hao123.nantianmen.web.MbrowserController.getLatestEntriesByCategory(MbrowserController.java:248)
at com.baidu.global.mobile.hao123.nantianmen.web.MbrowserController.mbrowser(MbrowserController.java:138)
at sun.reflect.GeneratedMethodAccessor143.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:215)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:132)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:745)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:685)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:919)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:851)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:953)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:844)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:618)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:829)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:301)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:721)
at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:468)
at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:391)
at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:318)
at org.apache.catalina.core.StandardHostValve.custom(StandardHostValve.java:423)
at org.apache.catalina.core.StandardHostValve.status(StandardHostValve.java:291)
at org.apache.catalina.core.StandardHostValve.throwable(StandardHostValve.java:383)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:166)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:78)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1033)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:652)
at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:222)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1523)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException: Unable to validate object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:497)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:360)
at redis.clients.util.Pool.getResource(Pool.java:51)
從日志看起來跟上次的錯誤基本是一樣的,就是在borrowObject的時候進行validate,然后發現validate failed。也就是說PING沒有返回正常的PONG! 首先第一反應仍然是上去手動PING了一下發現並沒有問題。INFO也一切正常。直接請求報錯的接口也沒有發現問題。
在集中式日志平台搜索了一下,發現錯誤時間是從前天早上11點25分開始,到昨天下午4點為止大概有2w左右的錯誤量。
結合我們做的事情,剛好我們在前天早上11點半的時候對這個Redis進行了遷移。因為要把這台機器(c32-07)下線,所有李貝在新機器m32-06 Copy了一個Redis,然后把c32-6由主庫變成從庫了。然后就發現不斷有這樣的錯誤報出來了。
那其實在上次出現拿不到連接validate出錯之后,我就在線上增加了一個監控,會每隔5分鍾PING一下我們線上的所有Redis主庫,判斷返回結果是不是PONG,如果不是就會短信和郵件報警。不過之前漏掉了這個Redis,因為這個是新聞推薦的Redis,之前一直是偉哥在維護,被我遺漏了。於是立即做了兩件事情:
- 將接口層的請求大部分切換到m32-06主庫,只留下一台仍然訪問c32-06從庫。
- 將c32-06加入到定時PING監控腳本中
時間大概是19號下午5點的樣子。
然后今天果然有效果了:
- 切換到主庫之后,錯誤日志由2w變成2k。基本可以說明從庫是有問題的。
- 監控腳本檢測到了PING c32-06從庫返回錯誤信息:LOADING Redis is loading the dataset in memory。
也就是說從庫確實在某些時間點是不可用的。
從錯誤信息LOADING Redis is loading the dataset in memory看來,是redis在加載RDB數據到內存中的過程是阻塞的,沒法提供服務的,阻塞的時間取決於RDB的大小。理論上來說這個過程只會發生在
- 啟動的時候
- slave全同步的時候。2.8之后理論上來說只是第一次同步,后面應該都是部分同步。
難道slave一直在全量同步?看了一下Redis的日志,很快就發現問題了。
這個是從庫c32-7的redis日志:
17859:S 18 Aug 11:26:16.363 # Server started, Redis version 3.0.0
17859:S 18 Aug 11:26:16.363 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
17859:S 18 Aug 11:26:52.802 * DB loaded from disk: 36.438 seconds
17859:S 18 Aug 11:26:52.802 * The server is now ready to accept connections on port 6666
17859:S 18 Aug 11:26:53.364 * Connecting to MASTER 10.242.73.46:6666
17859:S 18 Aug 11:26:53.364 * MASTER <-> SLAVE sync started
17859:S 18 Aug 11:26:53.364 * Non blocking connect for SYNC fired the event.
17859:S 18 Aug 11:26:53.365 * Master replied to PING, replication can continue...
17859:S 18 Aug 11:26:53.365 * Partial resynchronization not possible (no cached master)
17859:S 18 Aug 11:26:53.365 * Full resync from master: 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:1
17859:S 18 Aug 11:27:38.686 * MASTER <-> SLAVE sync: receiving 2550593979 bytes from master
17859:S 18 Aug 11:27:59.262 * MASTER <-> SLAVE sync: Flushing old data
17859:S 18 Aug 11:28:05.517 * MASTER <-> SLAVE sync: Loading DB in memory
17859:S 18 Aug 11:28:27.393 * MASTER <-> SLAVE sync: Finished with success
17859:S 18 Aug 12:51:54.051 # Connection with master lost.
17859:S 18 Aug 12:51:54.052 * Caching the disconnected master state.
17859:S 18 Aug 12:51:54.657 * Connecting to MASTER 10.242.73.46:6666
17859:S 18 Aug 12:51:54.657 * MASTER <-> SLAVE sync started
17859:S 18 Aug 12:51:54.657 * Non blocking connect for SYNC fired the event.
17859:S 18 Aug 12:51:54.657 * Master replied to PING, replication can continue...
17859:S 18 Aug 12:51:54.658 * Trying a partial resynchronization (request 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:1688184130).
17859:S 18 Aug 12:51:54.658 * Full resync from master: 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:1934263988
17859:S 18 Aug 12:51:54.658 * Discarding previously cached master state.
17859:S 18 Aug 12:52:39.353 * MASTER <-> SLAVE sync: receiving 2550573652 bytes from master
17859:S 18 Aug 12:52:56.391 # I/O error trying to sync with MASTER: connection lost
17859:S 18 Aug 12:52:57.584 * Connecting to MASTER 10.242.73.46:6666
17859:S 18 Aug 12:52:57.584 * MASTER <-> SLAVE sync started
17859:S 18 Aug 12:52:57.584 * Non blocking connect for SYNC fired the event.
17859:S 18 Aug 12:52:57.584 * Master replied to PING, replication can continue...
17859:S 18 Aug 12:52:57.584 * Partial resynchronization not possible (no cached master)
17859:S 18 Aug 12:52:57.585 * Full resync from master: 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:2022862913
17859:S 18 Aug 12:53:42.309 * MASTER <-> SLAVE sync: receiving 2550573652 bytes from master
17859:S 18 Aug 12:54:02.551 * MASTER <-> SLAVE sync: Flushing old data
17859:S 18 Aug 12:54:09.241 * MASTER <-> SLAVE sync: Loading DB in memory
17859:S 18 Aug 12:54:31.375 * MASTER <-> SLAVE sync: Finished with success
17859:S 18 Aug 13:22:01.266 # Connection with master lost.
17859:S 18 Aug 13:22:01.267 * Caching the disconnected master state.
17859:S 18 Aug 13:22:02.340 * Connecting to MASTER 10.242.73.46:6666
17859:S 18 Aug 13:22:02.341 * MASTER <-> SLAVE sync started
17859:S 18 Aug 13:22:02.380 * Non blocking connect for SYNC fired the event.
17859:S 18 Aug 13:22:02.404 * Master replied to PING, replication can continue...
17859:S 18 Aug 13:22:02.404 * Trying a partial resynchronization (request 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:2548455709).
17859:S 18 Aug 13:22:02.404 * Full resync from master: 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:2964290063
17859:S 18 Aug 13:22:02.405 * Discarding previously cached master state.
17859:S 18 Aug 13:22:47.186 * MASTER <-> SLAVE sync: receiving 2550563247 bytes from master
17859:S 18 Aug 13:23:10.751 # I/O error trying to sync with MASTER: connection lost
17859:S 18 Aug 13:23:11.097 * Connecting to MASTER 10.242.73.46:6666
17859:S 18 Aug 13:23:11.097 * MASTER <-> SLAVE sync started
17859:S 18 Aug 13:23:11.097 * Non blocking connect for SYNC fired the event.
17859:S 18 Aug 13:23:11.097 * Master replied to PING, replication can continue...
17859:S 18 Aug 13:23:11.097 * Partial resynchronization not possible (no cached master)
17859:S 18 Aug 13:23:11.098 * Full resync from master: 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:3034293813
17859:S 18 Aug 13:23:55.925 * MASTER <-> SLAVE sync: receiving 2550563310 bytes from master
17859:S 18 Aug 13:24:16.412 * MASTER <-> SLAVE sync: Flushing old data
17859:S 18 Aug 13:24:24.694 * MASTER <-> SLAVE sync: Loading DB in memory
17859:S 18 Aug 13:24:47.453 * MASTER <-> SLAVE sync: Finished with success
17859:S 18 Aug 13:52:27.452 # Connection with master lost.
17859:S 18 Aug 13:52:27.452 * Caching the disconnected master state.
17859:S 18 Aug 13:52:28.512 * Connecting to MASTER 10.242.73.46:6666
17859:S 18 Aug 13:52:28.512 * MASTER <-> SLAVE sync started
17859:S 18 Aug 13:52:28.574 * Non blocking connect for SYNC fired the event.
17859:S 18 Aug 13:52:28.574 * Master replied to PING, replication can continue...
17859:S 18 Aug 13:52:28.574 * Trying a partial resynchronization (request 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:3560083735).
17859:S 18 Aug 13:52:28.575 * Full resync from master: 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:3994311298
17859:S 18 Aug 13:52:28.575 * Discarding previously cached master state.
17859:S 18 Aug 13:53:13.042 * MASTER <-> SLAVE sync: receiving 2550551123 bytes from master
17859:S 18 Aug 13:53:33.314 * MASTER <-> SLAVE sync: Flushing old data
17859:S 18 Aug 13:53:41.168 * MASTER <-> SLAVE sync: Loading DB in memory
17859:S 18 Aug 13:54:03.683 * MASTER <-> SLAVE sync: Finished with success
17859:S 18 Aug 13:54:03.688 # Connection with master lost.
17859:S 18 Aug 13:54:03.688 * Caching the disconnected master state.
17859:S 18 Aug 13:54:03.783 * Connecting to MASTER 10.242.73.46:6666
17859:S 18 Aug 13:54:03.783 * MASTER <-> SLAVE sync started
17859:S 18 Aug 13:54:03.783 * Non blocking connect for SYNC fired the event.
17859:S 18 Aug 13:54:03.783 * Master replied to PING, replication can continue...
17859:S 18 Aug 13:54:03.783 * Trying a partial resynchronization (request 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:3996662976).
17859:S 18 Aug 13:54:03.783 * Full resync from master: 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:4045725312
17859:S 18 Aug 13:54:03.783 * Discarding previously cached master state.
17859:S 18 Aug 13:54:48.517 * MASTER <-> SLAVE sync: receiving 2550550079 bytes from master
17859:S 18 Aug 13:55:08.913 * MASTER <-> SLAVE sync: Flushing old data
17859:S 18 Aug 13:55:15.929 * MASTER <-> SLAVE sync: Loading DB in memory
17859:S 18 Aug 13:55:38.210 * MASTER <-> SLAVE sync: Finished with success
。。。
看到有很多的Connection with master lost的錯誤,然后slave嘗試要求master進行部分同步:Trying a partial resynchronization (request 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:3996662976)。但是master還是要求全量同步:Full resync from master: 91fa2b65639c2ce2caf88d279aba53e8c4ad3a22:4045725312。然后就是我們熟悉的全量同步的過程了。上面的過程大概每個一分多鍾就發生一次,一直重復着。
然后再看看主庫m32-06的日志:
12345:M 18 Aug 11:24:38.122 # Server started, Redis version 3.0.0
12345:M 18 Aug 11:24:38.122 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
12345:M 18 Aug 11:25:14.322 * DB loaded from disk: 36.200 seconds
12345:M 18 Aug 11:25:14.322 * The server is now ready to accept connections on port 6666
12345:M 18 Aug 11:26:53.366 * Slave 10.242.117.36:6666 asks for synchronization
12345:M 18 Aug 11:26:53.366 * Full resync requested by slave 10.242.117.36:6666
12345:M 18 Aug 11:26:53.366 * Starting BGSAVE for SYNC with target: disk
12345:M 18 Aug 11:26:53.523 * Background saving started by pid 17147
17147:C 18 Aug 11:27:38.393 * DB saved on disk
17147:C 18 Aug 11:27:38.482 * RDB: 0 MB of memory used by copy-on-write
12345:M 18 Aug 11:27:38.686 * Background saving terminated with success
12345:M 18 Aug 11:27:58.746 * Synchronization with slave 10.242.117.36:6666 succeeded
12345:M 18 Aug 12:51:54.071 # Client id=2 addr=10.242.117.36:21949 fd=5 name= age=5100 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=12899 omem=268443524 events=rw cmd=replconf scheduled to be closed ASAP for overcoming of output buffer limits.
12345:M 18 Aug 12:51:54.081 # Connection with slave 10.242.117.36:6666 lost.
12345:M 18 Aug 12:51:54.700 * Slave 10.242.117.36:6666 asks for synchronization
12345:M 18 Aug 12:51:54.700 * Unable to partial resync with slave 10.242.117.36:6666 for lack of backlog (Slave request was: 1688184130).
12345:M 18 Aug 12:51:54.700 * Starting BGSAVE for SYNC with target: disk
12345:M 18 Aug 12:51:54.865 * Background saving started by pid 15986
15986:C 18 Aug 12:52:39.111 * DB saved on disk
15986:C 18 Aug 12:52:39.201 * RDB: 474 MB of memory used by copy-on-write
12345:M 18 Aug 12:52:39.396 * Background saving terminated with success
12345:M 18 Aug 12:52:56.304 # Client id=17 addr=10.242.117.36:46501 fd=5 name= age=62 idle=62 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=15882 oll=7597 omem=131532334 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
12345:M 18 Aug 12:52:56.405 # Connection with slave 10.242.117.36:6666 lost.
12345:M 18 Aug 12:52:57.627 * Slave 10.242.117.36:6666 asks for synchronization
12345:M 18 Aug 12:52:57.627 * Full resync requested by slave 10.242.117.36:6666
12345:M 18 Aug 12:52:57.627 * Starting BGSAVE for SYNC with target: disk
12345:M 18 Aug 12:52:57.789 * Background saving started by pid 18402
18402:C 18 Aug 12:53:42.084 * DB saved on disk
18402:C 18 Aug 12:53:42.175 * RDB: 2 MB of memory used by copy-on-write
12345:M 18 Aug 12:53:42.353 * Background saving terminated with success
12345:M 18 Aug 12:54:02.103 * Synchronization with slave 10.242.117.36:6666 succeeded
12345:M 18 Aug 13:21:59.734 # Client id=19 addr=10.242.117.36:46801 fd=5 name= age=1742 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=12899 omem=268443524 events=rw cmd=replconf scheduled to be closed ASAP for overcoming of output buffer limits.
12345:M 18 Aug 13:21:59.783 # Connection with slave 10.242.117.36:6666 lost.
12345:M 18 Aug 13:22:02.430 * Slave 10.242.117.36:6666 asks for synchronization
12345:M 18 Aug 13:22:02.430 * Unable to partial resync with slave 10.242.117.36:6666 for lack of backlog (Slave request was: 2548455709).
12345:M 18 Aug 13:22:02.430 * Starting BGSAVE for SYNC with target: disk
12345:M 18 Aug 13:22:02.602 * Background saving started by pid 31064
31064:C 18 Aug 13:22:46.863 * DB saved on disk
31064:C 18 Aug 13:22:46.954 * RDB: 444 MB of memory used by copy-on-write
12345:M 18 Aug 13:22:47.138 * Background saving terminated with success
12345:M 18 Aug 13:23:10.557 # Client id=26 addr=10.242.117.36:55208 fd=5 name= age=68 idle=68 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=15411 oll=6422 omem=107089538 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
12345:M 18 Aug 13:23:10.657 # Connection with slave 10.242.117.36:6666 lost.
12345:M 18 Aug 13:23:11.121 * Slave 10.242.117.36:6666 asks for synchronization
12345:M 18 Aug 13:23:11.121 * Full resync requested by slave 10.242.117.36:6666
12345:M 18 Aug 13:23:11.121 * Starting BGSAVE for SYNC with target: disk
12345:M 18 Aug 13:23:11.288 * Background saving started by pid 1363
1363:C 18 Aug 13:23:55.725 * DB saved on disk
1363:C 18 Aug 13:23:55.818 * RDB: 1 MB of memory used by copy-on-write
12345:M 18 Aug 13:23:55.948 * Background saving terminated with success
12345:M 18 Aug 13:24:15.940 * Synchronization with slave 10.242.117.36:6666 succeeded
12345:M 18 Aug 13:52:25.584 # Client id=28 addr=10.242.117.36:55550 fd=5 name= age=1754 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=12899 omem=268443524 events=rw cmd=replconf scheduled to be closed ASAP for overcoming of output buffer limits.
12345:M 18 Aug 13:52:25.593 # Connection with slave 10.242.117.36:6666 lost.
12345:M 18 Aug 13:52:28.577 * Slave 10.242.117.36:6666 asks for synchronization
12345:M 18 Aug 13:52:28.577 * Unable to partial resync with slave 10.242.117.36:6666 for lack of backlog (Slave request was: 3560083735).
12345:M 18 Aug 13:52:28.577 * Starting BGSAVE for SYNC with target: disk
12345:M 18 Aug 13:52:28.750 * Background saving started by pid 14284
14284:C 18 Aug 13:53:12.715 * DB saved on disk
14284:C 18 Aug 13:53:12.807 * RDB: 344 MB of memory used by copy-on-write
12345:M 18 Aug 13:53:13.044 * Background saving terminated with success
12345:M 18 Aug 13:53:32.805 * Synchronization with slave 10.242.117.36:6666 succeeded
12345:M 18 Aug 13:53:34.763 # Client id=35 addr=10.242.117.36:12979 fd=5 name= age=66 idle=1 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=4991 omem=77418232 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
12345:M 18 Aug 13:53:34.864 # Connection with slave 10.242.117.36:6666 lost.
12345:M 18 Aug 13:54:03.785 * Slave 10.242.117.36:6666 asks for synchronization
12345:M 18 Aug 13:54:03.785 * Unable to partial resync with slave 10.242.117.36:6666 for lack of backlog (Slave request was: 3996662976).
12345:M 18 Aug 13:54:03.785 * Starting BGSAVE for SYNC with target: disk
12345:M 18 Aug 13:54:03.952 * Background saving started by pid 18056
18056:C 18 Aug 13:54:48.271 * DB saved on disk
18056:C 18 Aug 13:54:48.363 * RDB: 2 MB of memory used by copy-on-write
12345:M 18 Aug 13:54:48.518 * Background saving terminated with success
12345:M 18 Aug 13:55:08.460 * Synchronization with slave 10.242.117.36:6666 succeeded
可以看到跟slave的日志非常對應:
12345:M 18 Aug 12:51:54.081 # Connection with slave 10.242.117.36:6666 lost.
12345:M 18 Aug 12:51:54.700 * Slave 10.242.117.36:6666 asks for synchronization
12345:M 18 Aug 12:51:54.700 * Unable to partial resync with slave 10.242.117.36:6666 for lack of backlog (Slave request was: 1688184130).
12345:M 18 Aug 12:51:54.700 * Starting BGSAVE for SYNC with target: disk
12345:M 18 Aug 12:51:54.865 * Background saving started by pid 15986
15986:C 18 Aug 12:52:39.111 * DB saved on disk
15986:C 18 Aug 12:52:39.201 * RDB: 474 MB of memory used by copy-on-write
12345:M 18 Aug 12:52:39.396 * Background saving terminated with success
12345:M 18 Aug 12:52:56.304 # Client id=17 addr=10.242.117.36:46501 fd=5 name= age=62 idle=62 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=15882 oll=7597 omem=131532334 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
而且從日志我們可以知道為什么主庫不能對從庫進行部分同步:Unable to partial resync with slave 10.242.117.36:6666 for lack of backlog (Slave request was: 1688184130)。這個涉及到Redis的部分同步的實現。在2.8版本,redis使用了新的復制方式,引入了replication backlog以支持部分同步。
The master then starts background saving, and starts to buffer all new commands received that will modify the dataset. When the background saving is complete, the master transfers the database file to the slave, which saves it on disk, and then loads it into memory. The master will then send to the slave all buffered commands. This is done as a stream of commands and is in the same format of the Redis protocol itself.
當主服務器進行命令傳播的時候,maser不僅將所有的數據更新命令發送到所有slave的replication buffer,還會寫入replication backlog。當斷開的slave重新連接上master的時候,slave將會發送psync命令(包含復制的偏移量offset),請求partial resync。如果請求的offset不存在,那么執行全量的sync操作,相當於重新建立主從復制。
replication backlog是一個環形緩沖區,整個master進程中只會存在一個,所有的slave公用。backlog的大小通過repl-backlog-size參數設置,默認大小是1M,其大小可以根據每秒產生的命令、(master執行rdb bgsave) +(master發送rdb到slave) + (slave load rdb文件)時間之和來估算積壓緩沖區的大小,repl-backlog-size值不小於這兩者的乘積。
所以在主從同步的時候,slave會落后master的時間 =(master執行rdb bgsave)+ (master發送rdb到slave) + (slave load rdb文件) 的時間之和。 然后如果在這個時間master的數據變更非常巨大,超過了replication backlog,那么老的數據變更命令就會被丟棄,導致需要全量同步。
那么我們這個Redis的數據變更量有多大呢?這個可以大概從redis的日志猜測出來:RDB: 474 MB of memory used by copy-on-write。發現這個COW的內存大小變化挺大的,有時候是2M ,有時候能夠達到1414 MB。而整個Redis數據內存大小也才10G左右。這么大的變更量,默認的1M repl-backlog-size根本不夠用。
那現在明白了為什么每次都全量同步的原因了,剩下最根本的問題,就是主從鏈接為什么會每隔1分鍾左右就斷開一次,導致需要重新同步?
從日志還是能夠看出一些端倪:
Client id=17 addr=10.242.117.36:46501 fd=5 name= age=62 idle=62 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=15882 oll=7597 omem=131532334 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
注意到這么一句話:psync scheduled to be closed ASAP for overcoming of output buffer limits。看起來是psync因為超過output buffer limits將被close。
谷歌一下,發現有挺多人遇到一樣的問題:redis 2.8 psync #1400。然后設置
set "client-output-buffer-limit slave 0 0 0
就沒有問題了。
於是查看了一下client-output-buffer-limit。發現這是Redis的一個保護機制。配置格式是:
client-output-buffer-limit <class> <hard limit> <soft limit> <soft seconds>
具體參數含義如下:
- class: 客戶端種類,包括Normal,Slaves和Pub/Sub
- Normal: 普通的客戶端。默認limit 是0,也就是不限制。
- Pub/Sub: 發布與訂閱的客戶端的。默認hard limit 32M,soft limit 8M/60s。
- Slaves: 從庫的復制客戶端。默認hard limit 256M,soft limit 64M/60s。
- hard limit: 緩沖區大小的硬性限制。
- soft limit: 緩沖去大小的軟性限制。
- soft seconds: 緩沖區大小達到了(超過)soft limit值的持續時間。
client-output-buffer-limit參數限制分配的緩沖區的大小,防止內存無節制的分配,Redis將會做如下自我保護:
- client buffer的大小達到了soft limit並持續了soft seconds時間,將立即斷開和客戶端的連接
- client buffer的大小達到了hard limit,server也會立即斷開和客戶端的連接
再看看我們從庫的這個配置,其實就是默認配置:
# 客戶端的輸出緩沖區的限制,因為某種原因客戶端從服務器讀取數據的速度不夠快,
# 可用於強制斷開連接(一個常見的原因是一個發布 / 訂閱客戶端消費消息的速度無法趕上生產它們的速度)。
# 可以三種不同客戶端的方式進行設置:
# normal -> 正常客戶端
# slave -> slave 和 MONITOR 客戶端
# pubsub -> 至少訂閱了一個 pubsub channel 或 pattern 的客戶端
# 每個 client-output-buffer-limit 語法 :
# client-output-buffer-limit <class><hard limit> <soft limit> <soft seconds>
# 一旦達到硬限制客戶端會立即斷開,或者達到軟限制並保持達成的指定秒數(連續)。
# 例如,如果硬限制為 32 兆字節和軟限制為 16 兆字節 /10 秒,客戶端將會立即斷開
# 如果輸出緩沖區的大小達到 32 兆字節,客戶端達到 16 兆字節和連續超過了限制 10 秒,也將斷開連接。
# 默認 normal 客戶端不做限制,因為他們在一個請求后未要求時(以推的方式)不接收數據,
# 只有異步客戶端可能會出現請求數據的速度比它可以讀取的速度快的場景。
# 把硬限制和軟限制都設置為 0 來禁用該特性
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
redis的replication buffer其實就是client buffer的一種。里面存放的數據是下面三個時間內所有的master數據更新操作:
- master執行rdb bgsave產生snapshot的時間
- master發送rdb到slave網絡傳輸時間
- slave load rdb文件把數據恢復到內存的時間
可以看到跟replication backlog是一模一樣的!
replication buffer由client-output-buffer-limit slave設置,當這個值太小會導致主從復制鏈接斷開:
- 當master-slave復制連接斷開,server端會釋放連接相關的數據結構。replication buffer中的數據也就丟失了,此時主從之間重新開始復制過程。
- 還有個更嚴重的問題,主從復制連接斷開,導致主從上出現rdb bgsave和rdb重傳操作無限循環。
看起來確實server(這里就是master)會因為緩沖區的大小問題主動關閉客戶端(slave)鏈接。因為我們的數據變更量太大,超過了client-output-buffer-limit。導致主從同步連接被斷開,然后slave要求psync,但是由於repl-backlog-size太小,導致psync失敗,需要full sync,而full sync需要Discarding previously cached master state,重新load RDB文件到內存,而這個加載數據過程是阻塞式的。所以導致slave出現間歇式的不可用。而切換到master之后,master的整個同步操作都是fork一個子進程進行的,所以不影響父進程繼續服務。所有的現象都能清清楚楚的解釋上。
從這個問題我們可以發現其實Redis的主從同步非常依賴於兩個參數的合理配置:
- client-output-buffer-limit
- repl-backlog-size
真的要遇到問題,才能夠深入的了解底層的實現機制啊。
--EOF--
參考和推薦文章
- Redis 3.0 中文版 - 復制
- Redis Clients Handling-Output buffers limits
- redis主從復制(2)— replication buffer與replication backlog 非常簡潔明了的文章,強烈推薦!
- redis主從復制(4)— client buffer
- Top Redis Headaches for Devops – Replication Buffer - See more at: https://redislabs.com/blog/top-redis-headaches-for-devops-replication-buffer#sthash.VO8bl2Ul.dpuf
- Top Redis Headaches for Devops – Client Buffers
