復制
A few things to understand ASAP about Redis replication.
1) Redis replication is asynchronous, but you can configure a master to
stop accepting writes if it appears to be not connected with at least a given number of slaves. 2) Redis slaves are able to perform a partial resynchronization with the master if the replication link is lost for a relatively small amount of time. You may want to configure the replication backlog size (see the next sections of this file) with a sensible value depending on your needs. 3) Replication is automatic and does not need user intervention. After a network partition slaves automatically try to reconnect to masters and resynchronize with them.
復制的實現
1. 設置主節點的地址和端口
簡而言之,是執行SLAVEOF命令,該命令是個異步命令,在設置完masterhost和masterport屬性之后,從節點將向發送SLAVEOF的客戶端返回OK。表示復制指令已經被接受,而實際的復制工作將在OK返回之后才真正開始執行。
2. 創建套接字連接。
在執行完SLAVEOF命令后,從節點根據命令所設置的IP和端口,創建連向主節點的套接字連接。如果創建成功,則從節點將為這個套接字關聯一個專門用於處理復制工作的文件事件處理器,這個處理器將負責執行后續的復制工作,比如接受RDB文件,以及接受主節點傳播來的寫命令等。
3. 發送PING命令。
從節點成為主節點的客戶端之后,首先會向主節點發送一個PING命令,其作用如下:
1. 檢查套接字的讀寫狀態是否正常。
2. 檢查主節點是否能正常處理命令請求。
如果從節點讀取到“PONG”的回復,則表示主從節點之間的網路連接狀態正常,並且主節點可以正常處理從節點發送的命令請求。
4. 身份驗證
從節點在收到主節點返回的“PONG”回復之后,接下來會做的就是身份驗證。如果從節點設置了masterauth選項,則進行身份驗證。反之則不進行。
在需要進行身份驗證的情況下,從節點將向主節點發送一條AUTH命令,命令的參數即可從節點masterauth選項的值。
5. 發送端口信息。
在身份驗證之后,從節點將執行REPLCONF listening-port <port-number>,向主節點發送從節點的監聽端口號。
主節點會將其記錄在對應的客戶端狀態的slave_listening_port屬性中,這點可通過info Replication查看。
127.0.0.1:6379> info Replication
# Replication
role:master
connected_slaves:1 slave0:ip=127.0.0.1,port=6380,state=online,offset=3696,lag=0
6. 同步。
從節點向主節點發送PSYNC命令,執行同步操作,並將自己的數據庫更新至主節點數據庫當前所處的狀態。
7. 命令傳播
當完成了同步之后,主從節點就會進入命令傳播階段。這時主節點只要一直將自己執行的寫命令發送到從節點,而從節點只要一直接收並執行主節點發來的寫命令,就可以保證主從節點保持一致了。
8. 心跳檢測
在命令傳播階段,從節點默認會以每秒一次的頻率,向主節點發送命令。
REPLCONF ACK <replication_offset>
其中,replication_offset是從節點當前的復制偏移量。
發送REPLCONF ACK主從節點有三個作用:
1> 檢測主從節點的網絡連接狀態。
2> 輔助實現min-slave選項。
3> 檢查是否存在命令丟失。
REPLCONF ACK命令和復制積壓緩沖區是Redis 2.8版本新增的,在此之前,即使命令在傳播過程中丟失,主從節點都不會注意到。
復制的相關參數
slaveof <masterip> <masterport>
masterauth <master-password> slave-serve-stale-data yes slave-read-only yes repl-diskless-sync no repl-diskless-sync-delay 5 repl-ping-slave-period 10 repl-timeout 60 repl-disable-tcp-nodelay no repl-backlog-size 1mb repl-backlog-ttl 3600 slave-priority 100 min-slaves-to-write 3 min-slaves-max-lag 10 slave-announce-ip 5.5.5.5 slave-announce-port 1234
其中,
slaveof <masterip> <masterport>:開啟復制,只需這條命令即可。
masterauth <master-password>:如果master中通過requirepass參數設置了密碼,則slave中需設置該參數。
slave-serve-stale-data:當主從連接中斷,或主從復制建立期間,是否允許slave對外提供服務。默認為yes,即允許對外提供服務,但有可能會讀到臟的數據。
slave-read-only:將slave設置為只讀模式。需要注意的是,只讀模式針對的只是客戶端的寫操作,對於管理命令無效。
repl-diskless-sync,repl-diskless-sync-delay:是否使用無盤復制。為了降低主節點磁盤開銷,Redis支持無盤復制,生成的RDB文件不保存到磁盤而是直接通過網絡發送給從節點。無盤復制適用於主節點所在機器磁盤性能較差但網絡寬帶較充裕的場景。需要注意的是,無盤復制目前依然處於實驗階段。
repl-ping-slave-period:master每隔一段固定的時間向SLAVE發送一個PING命令。
repl-timeout:復制超時時間。
# The following option sets the replication timeout for:
#
# 1) Bulk transfer I/O during SYNC, from the point of view of slave. # 2) Master timeout from the point of view of slaves (data, pings). # 3) Slave timeout from the point of view of masters (REPLCONF ACK pings). # # It is important to make sure that this value is greater than the value # specified for repl-ping-slave-period otherwise a timeout will be detected # every time there is low traffic between the master and the slave.
repl-disable-tcp-nodelay:設置為yes,主節點會等待一段時間才發送TCP數據包,具體等待時間取決於Linux內核,一般是40毫秒。適用於主從網絡環境復雜或帶寬緊張的場景。默認為no。
repl-backlog-size:復制積壓緩沖區,復制積壓緩沖區是保存在主節點上的一個固定長度的隊列。用於從Redis 2.8開始引入的部分復制。
# Set the replication backlog size. The backlog is a buffer that accumulates
# slave data when slaves are disconnected for some time, so that when a slave # wants to reconnect again, often a full resync is not needed, but a partial # resync is enough, just passing the portion of data the slave missed while # disconnected. # # The bigger the replication backlog, the longer the time the slave can be # disconnected and later be able to perform a partial resynchronization. # # The backlog is only allocated once there is at least a slave connected.
只有slave連接上來,才會開辟backlog。
repl-backlog-ttl:如果master上的slave全都斷開了,且在指定的時間內沒有連接上,則backlog會被master清除掉。repl-backlog-ttl即用來設置該時長,默認為3600s,如果設置為0,則永不清除。
slave-priority:設置slave的優先級,用於Redis Sentinel主從切換時使用,值越小,則提升為主的優先級越高。需要注意的是,如果設置為0,則代表該slave不參加選主。
slave-announce-ip,slave-announce-port :常用於端口轉發或NAT場景下,對Master暴露真實IP和端口信息。
同步的過程
1. 從節點向主節點發送PSYNC命令。
2. 收到PSYNC命令的主節點執行BGSAVE命令,在后台生成一個RDB文件,並使用一個緩沖區記錄從現在開始執行的所有寫命令。
3. 當主節點的BGSAVE命令執行完畢時,主節點會將BGSAVE命令生成的RDB文件發送給從節點,從節點接受並載入這個RDB文件,將自己的數據庫狀態更新至主節點執行BGSAVE命令時的數據庫狀態。
4. 主節點將記錄在緩沖區里面的所有寫命令發送給從節點,從節點執行這些寫命令,將自己的數據庫狀態更新至主節點數據庫當前所處的狀態。
需要注意的是,在步驟2中提到的緩沖區,其實是有大小限制的,其由client-output-buffer-limit slave 256mb 64mb 60決定,該參數的語法及解釋如下:
# client-output-buffer-limit <class> <hard limit> <soft limit> <soft seconds>
#
# A client is immediately disconnected once the hard limit is reached, or if # the soft limit is reached and remains reached for the specified number of # seconds (continuously).
意思是如果該緩沖區的大小超過256M,或該緩沖區的大小超過64M,且持續了60s,主節點會馬上斷開從節點的連接。斷開連接后,在60s之后(repl-timeout),從節點發現沒有從主節點中獲得數據,會重新啟動復制。
在Redis 2.8之前,如果因網絡原因,主從節點復制中斷,當再次建立連接時,還是會執行SYNC命令進行全量復制。效率較為低下。從Redis 2.8開始,引入了PSYNC命令代替SYNC命令來執行復制時的同步操作。
PSYNC命令具有全量同步(full resynchronization)和增量同步(partial resynchronization)。
全量同步的日志:
master:
19544:M 05 Oct 20:44:04.713 * Slave 127.0.0.1:6380 asks for synchronization
19544:M 05 Oct 20:44:04.713 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for 'dc419fe03ddc9ba30cf2a2cf1894872513f1ef96', my replication IDs are 'f8a035fdbb7cfe435652b3445c2141f98a65e437' and '0000000000000000000000000000000000000000')19544:M 05 Oct 20:44:04.713 * Starting BGSAVE for SYNC with target: disk 19544:M 05 Oct 20:44:04.713 * Background saving started by pid 20585 20585:C 05 Oct 20:44:04.723 * DB saved on disk 20585:C 05 Oct 20:44:04.723 * RDB: 0 MB of memory used by copy-on-write 19544:M 05 Oct 20:44:04.813 * Background saving terminated with success 19544:M 05 Oct 20:44:04.814 * Synchronization with slave 127.0.0.1:6380 succeeded
slave:
19746:S 05 Oct 20:44:04.288 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new
master with just a partial transfer.19746:S 05 Oct 20:44:04.288 * SLAVE OF 127.0.0.1:6379 enabled (user request from 'id=3 addr=127.0.0.1:37128 fd=8 name= age=929 idle=0 flags=N db=0 sub=0 psub=
0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')19746:S 05 Oct 20:44:04.712 * Connecting to MASTER 127.0.0.1:6379
19746:S 05 Oct 20:44:04.712 * MASTER <-> SLAVE sync started 19746:S 05 Oct 20:44:04.712 * Non blocking connect for SYNC fired the event. 19746:S 05 Oct 20:44:04.713 * Master replied to PING, replication can continue... 19746:S 05 Oct 20:44:04.713 * Trying a partial resynchronization (request dc419fe03ddc9ba30cf2a2cf1894872513f1ef96:1191). 19746:S 05 Oct 20:44:04.713 * Full resync from master: f8a035fdbb7cfe435652b3445c2141f98a65e437:1190 19746:S 05 Oct 20:44:04.713 * Discarding previously cached master state. 19746:S 05 Oct 20:44:04.814 * MASTER <-> SLAVE sync: receiving 224566 bytes from master 19746:S 05 Oct 20:44:04.814 * MASTER <-> SLAVE sync: Flushing old data 19746:S 05 Oct 20:44:04.815 * MASTER <-> SLAVE sync: Loading DB in memory 19746:S 05 Oct 20:44:04.817 * MASTER <-> SLAVE sync: Finished with success
增量同步的日志:
master:
19544:M 05 Oct 20:42:06.423 # Connection with slave 127.0.0.1:6380 lost.
19544:M 05 Oct 20:42:06.753 * Slave 127.0.0.1:6380 asks for synchronization 19544:M 05 Oct 20:42:06.753 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 0 bytes of backlog starting from offset 1037.
slave:
19746:S 05 Oct 20:42:06.423 # Connection with master lost.
19746:S 05 Oct 20:42:06.423 * Caching the disconnected master state. 19746:S 05 Oct 20:42:06.752 * Connecting to MASTER 127.0.0.1:6379 19746:S 05 Oct 20:42:06.752 * MASTER <-> SLAVE sync started 19746:S 05 Oct 20:42:06.752 * Non blocking connect for SYNC fired the event. 19746:S 05 Oct 20:42:06.753 * Master replied to PING, replication can continue... 19746:S 05 Oct 20:42:06.753 * Trying a partial resynchronization (request f8a035fdbb7cfe435652b3445c2141f98a65e437:1037). 19746:S 05 Oct 20:42:06.753 * Successful partial resynchronization with master. 19746:S 05 Oct 20:42:06.753 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
在Redis 4.0中,master_replid和offset存儲在RDB文件中。當從節點被優雅的關閉並重新啟動時,Redis能夠從RDB文件中重新加載master_replid和offset,從而使增量同步成為可能。
增量同步的實現依賴於以下三部分:
1. 主從節點的復制偏移量。
2. 主節點的復制積壓緩沖區。
3. 節點的運行ID(run ID)。
當一個從節點被提升為主節點時,其它的從節點必須與新主節點重新同步。在Redis 4.0 之前,因為master_replid發生了變化,所以這個過程是一個全量同步。在Redis 4.0之后,新主節點會記錄舊主節點的naster_replid和offset,因為能夠接受來自其它從節點的增量同步請求,即使請求中的master_replid不同。在底層實現上,當執行slaveof no one時,會將master_replid,master_repl_offset+1復制為master_replid,second_repl_offset。
復制相關變量
# Replication
role:master
connected_slaves:2 slave0:ip=127.0.0.1,port=6380,state=online,offset=5698,lag=0 slave1:ip=127.0.0.1,port=6381,state=online,offset=5698,lag=0 master_replid:e071f49c8d9d6719d88c56fa632435fba83e145d master_replid2:0000000000000000000000000000000000000000 master_repl_offset:5698 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:5698 # Replication role:slave master_host:127.0.0.1 master_port:6379 master_link_status:up master_last_io_seconds_ago:1 master_sync_in_progress:0 slave_repl_offset:126 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:15715bc0bd37a71cae3d08b9566f001ccbc739de master_replid2:0000000000000000000000000000000000000000 master_repl_offset:126 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:126
其中,
role: Value is "master" if the instance is replica of no one, or "slave" if the instance is a replica of some master instance. Note that a replica can be master of another replica (chained replication).
master_replid: The replication ID of the Redis server. 每個Redis節點啟動后都會動態分配一個40位的十六進制字符串作為運行ID。主的運行ID。
master_replid2: The secondary replication ID, used for PSYNC after a failover. 在執行slaveof no one時,會將master_replid,master_repl_offset+1復制為master_replid,second_repl_offset。
master_repl_offset: The server's current replication offset. Master的復制偏移量。
second_repl_offset: The offset up to which replication IDs are accepted.
repl_backlog_active: Flag indicating replication backlog is active 是否開啟了backlog。
repl_backlog_size: Total size in bytes of the replication backlog buffer. repl-backlog-size的大小。
repl_backlog_first_byte_offset: The master offset of the replication backlog buffer. backlog中保存的Master最早的偏移量,
repl_backlog_histlen: Size in bytes of the data in the replication backlog buffer. backlog中數據的大小。
If the instance is a replica, these additional fields are provided:
master_host: Host or IP address of the master. Master的IP。
master_port: Master listening TCP port. Master的端口。
master_link_status: Status of the link (up/down). 主從之間的連接狀態。
master_last_io_seconds_ago: Number of seconds since the last interaction with master. 主節點每隔10s對從從節點發送PING命令,以判斷從節點的存活性和連接狀態。該變量代表多久之前,主從進行了心跳交互。
master_sync_in_progress: Indicate the master is syncing to the replica. 主節點是否在向從節點同步數據。個人覺得,應該指的是全量同步或增量同步。
slave_repl_offset: The replication offset of the replica instance. Slave的復制偏移量。
slave_priority: The priority of the instance as a candidate for failover. Slave的權重。
slave_read_only: Flag indicating if the replica is read-only. Slave是否處於可讀模式。
If a SYNC operation is on-going, these additional fields are provided:
master_sync_left_bytes: Number of bytes left before syncing is complete.
master_sync_last_io_seconds_ago: Number of seconds since last transfer I/O during a SYNC operation.
If the link between master and replica is down, an additional field is provided:
master_link_down_since_seconds: Number of seconds since the link is down. 主從連接中斷持續的時間。
The following field is always provided:
connected_slaves: Number of connected replicas. 連接的Slave的數量。
If the server is configured with the min-slaves-to-write (or starting with Redis 5 with the min-replicas-to-write) directive, an additional field is provided:
min_slaves_good_slaves: Number of replicas currently considered good。狀態正常的從節點的數量。
For each replica, the following line is added:
slaveXXX: id, IP address, port, state, offset, lag. Slave的狀態。
slave0:ip=127.0.0.1,port=6381,state=online,offset=1288,lag=1
如何監控主從延遲
# Replication
role:master
connected_slaves:2 slave0:ip=127.0.0.1,port=6381,state=online,offset=560,lag=0 slave1:ip=127.0.0.1,port=6380,state=online,offset=560,lag=0 master_replid:15715bc0bd37a71cae3d08b9566f001ccbc739de master_replid2:0000000000000000000000000000000000000000 master_repl_offset:560
其中,master_repl_offset是主節點的復制偏移量,slaveX中的offset即對應從節點的復制偏移量,兩者的差值即主從的延遲量。
如何評估backlog緩沖區的大小
t * (master_repl_offset2 - master_repl_offset1 ) / (t2 - t1)
t is how long the disconnections may last in seconds.
參考:
1. 《Redis開發與運維》
2. 《Redis設計與實現》
3. 《Redis 4.X Cookbook》