06 redis的哨兵系統的工作流程

本文轉載自查看原文 2021-05-12 23:00 190 redis基礎知識/ 數據庫

1 哨兵的概述
2 哨兵實例的配置文件
- 2-1 sentinel.conf文件的內容
- 2-2 哨兵實例的啟動
3 哨兵的工作原理
參考資料

1 哨兵的概述

sentinel [ˈsɛntənəl]

Sentinel（哨兵）定義：哨兵(sentinel) 是一個分布式系統，用於對主從結構中的每台服務器進行監控，當出現故障時選擇新的master並將所有slave連接到新的master。

哨兵提供的功能：

監控功能

不斷的檢查master和slave是否正常運行。
master存活檢測、 master與slave運行情況檢測

通知功能

當被監控的服務器出現問題時， 向其他（哨兵間，客戶端） 發送通知

自動故障轉移

斷開master與slave連接，選取一個slave作為master，將其他slave連接到新的master，並告知客戶端新的服務器地址

哨兵的注意點：哨兵是由奇數個redis服務器組成，只是不提供數據服務

2 哨兵實例的配置文件

2-1 sentinel.conf文件的內容

cat sentinel.conf | grep -v '#' | grep -v "^$"

port 26379    
dir /tmp
sentinel monitor mymaster 127.0.0.1 6379 2   // 2台哨兵實例判斷監視服務器為主觀下線，則該監視服務器變為客觀下線，判定為客觀下線才能進行故障轉移
sentinel down-after-milliseconds mymaster 30000   // 監視主服務器的無效回復時間達到30000ms,則判斷該服務器主觀下線 
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000

配置一：sentinel monitor

這個配置表達的是哨兵節點定期監控名字叫做並且 IP 為端口號為的主節點。表示的是哨兵判斷主節點是否發生故障的票數。也就是說如果我們將設置為2就代表至少要有兩個哨兵認為主節點故障了，才算這個主節點是客觀下線的了，一般是設置為sentinel節點數的一半加一。

配置二：sentinel down-after-milliseconds

每個哨兵節點會定期發送ping命令來判斷Redis節點和其余的哨兵節點是否是可達的，如果超過了配置的時間沒有收到pong回復（有效恢復），就主觀判斷節點是不可達的, 的單位為毫秒。

配置三：sentinel parallel-syncs

當哨兵節點都認為主節點故障時，哨兵投票選出的leader會進行故障轉移，選出新的主節點，原來的從節點們會向新的主節點發起復制，這個配置就是控制在故障轉移之后，每次可以向新的主節點發起復制的節點的個數，最多為個，因為 如果不加控制會對主節點的網絡和磁盤IO資源很大的開銷。

配置四：sentinel failover-timeout

這個代表哨兵進行故障轉移時如果超過了配置的時間就 表示故障轉移超時失敗。

配置五： sentinel auth-pass

如果主節點設置了密碼，則需要這個配置，否則哨兵無法對主節點進行監控。

2-2 哨兵實例的啟動

redis-sentinel sentinel-端口號.conf

3 哨兵的工作原理

3-1 工作流程概述

問題：哨兵系統的組成和作用？

Sentinel系統由一個或多個Sentinel實例(本質是運行在特殊模式下的redis服務器)組成：

1）可以監視任意多個主服務器，以及主服務器屬下的所有從服務器。

2）當被監視的主服務器下線時（宕機或者主動維護），自動將下線主服務器所屬的從服務器升級為主服務器。

注意點：作為Sentinel的服務器無法使用事務命令，腳本命令以及數據庫操作命令等命令。

問題：哨兵需要支持哪些功能？

1 監控
2 通知
3 自動故障轉移（Automatic failover）
4 配置提供（客戶端可以通過連接哨兵實例獲取到當前的master最新地址）

Monitoring. Sentinel constantly checks if your master and replica instances are working as expected.
Notification. Sentinel can notify the system administrator, or other computer programs, via an API, that something is wrong with one of the monitored Redis instances.
Automatic failover. If a master is not working as expected, Sentinel can start a failover process where a replica is promoted to master, the other additional replicas are reconfigured to use the new master, and the applications using the Redis server are informed about the new address to use when connecting.
Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.

3-1 階段1：哨兵實例連接建立階段

3-1-2 哨兵在監控階段的總體流程（關注三種實例連接建立過程）

step1: 啟動並初始化哨兵服務器（sentinel）

主要工作：初始化狀態（sentinelState）信息並建立與主服務器的網絡連接。

從圖16-7可以看到服務器會創建一個sentinelState結構體，結構中設置一個master字典（key-value即master名稱-sentinelRedisInstanc）去保留所監視的主服務器信息。

細分步驟：

1）初始化服務器
2）將普通Redis服務器程序代碼替換為Sentinel專用程序代碼（這個步驟使得sentinel服務器無法執行普通redis服務器的相關指令）
3）執行Sentinel的專用程序代碼去初始化sentinelState狀態（具體的狀態見sentinel.c中的sentinelState結構體定義）
4）根據給定配置文件，初始化Sentinel主服務器列表(本質上就是初始化sentinelState結構體中的maser字典)
5）創建主服務器的網絡連接,對於每個主服務器會創建2個異步網絡連接分被是命令連接與訂閱連接

命令連接：專門用於發送命令並接受命令回復（后續sentinel會使用該連接獲取主服務器信息）
訂閱連接：訂閱主服務器的 sentinel hello 頻道（后續sentinel服務器會使用該連接發現/同步其他sentinel信息）

sentinelState定義源代碼

/* Main state. */
struct sentinelState {
    char myid[CONFIG_RUN_ID_SIZE+1];  /* This sentinel ID. */
    uint64_t current_epoch;           /* Current epoch. */
    // ===============================每個哨兵服務器會維護一個主服務器字典========================================
    dict *masters;                    /* Dictionary of master sentinelRedisInstances.Key is the instance name, value is the sentinelRedisInstance structure pointer. */
    int tilt;                         /* Are we in TILT mode? */
    int running_scripts;            /* Number of scripts in execution right now. */
    mstime_t tilt_start_time;       /* When TITL started. */
    mstime_t previous_time;         /* Last time we ran the time handler. */
    list *scripts_queue;            /* Queue of user scripts to execute. */
    char *announce_ip;              /* IP addr that is gossiped to other sentinels ifnot NULL. */
    int announce_port;              /* Port that is gossiped to other sentinels if non zero. */
    unsigned long simfailure_flags; /* Failures simulation. */
    int deny_scripts_reconfig;     /* Allow SENTINEL SET ... to change script paths at runtime? */
} sentinel;

從代碼中可以看到：狀態定義中包含有一個master字典，這個字典記錄了所有被sentinel監視的主服務器的相關信息。

字典的key是配置文件中主服務器的名稱，字典的值是一個sentinelRedisInstance結構體。

sentinelRedisInstance結構體的源碼

typedef struct sentinelRedisInstance {
    int flags;      /* See SRI_... defines */
    char *name;     /* Master name from the point of view of this sentinel. */
    char *runid;    /* Run ID of this instance, or unique ID if is a Sentinel.*/
    uint64_t config_epoch;  /* Configuration epoch. */
    sentinelAddr *addr; /* Master host. */
    instanceLink *link; /* Link to the instance, may be shared for Sentinels. */
    mstime_t last_pub_time;   /* Last time we sent hello via Pub/Sub. */
    mstime_t last_hello_time; /* Only used if SRI_SENTINEL is set. Last time
                                 we received a hello from this Sentinel
                                 via Pub/Sub. */
    mstime_t last_master_down_reply_time; /* Time of last reply to
                                             SENTINEL is-master-down command. */
    mstime_t s_down_since_time; /* Subjectively down since time. */
    mstime_t o_down_since_time; /* Objectively down since time. */
    mstime_t down_after_period; /* Consider it down after that period. */
    mstime_t info_refresh;  /* Time at which we received INFO output from it. */

    /* Role and the first time we observed it.
     * This is useful in order to delay replacing what the instance reports
     * with our own configuration. We need to always wait some time in order
     * to give a chance to the leader to report the new configuration before
     * we do silly things. */
    int role_reported;
    mstime_t role_reported_time;
    mstime_t slave_conf_change_time; /* Last time slave master addr changed. */

    /* Master specific. */
    //============================維護監視該主服務器的哨兵字典=======================================
    dict *sentinels;    /* Other sentinels monitoring the same master. */
    //============================ 維護一個從服務器字典===============================================
    dict *slaves;       /* Slaves for this master instance.*/   
    unsigned int quorum;/* Number of sentinels that need to agree on failure. */
    int parallel_syncs; /* How many slaves to reconfigure at same time. */
    char *auth_pass;    /* Password to use for AUTH against master & slaves. */

    /* Slave specific. */
    mstime_t master_link_down_time; /* Slave replication link down time. */
    int slave_priority; /* Slave priority according to its INFO output. */
    mstime_t slave_reconf_sent_time; /* Time at which we sent SLAVE OF <new> */
    struct sentinelRedisInstance *master; /* Master instance if it's slave. */
    char *slave_master_host;    /* Master host as reported by INFO */
    int slave_master_port;      /* Master port as reported by INFO */
    int slave_master_link_status; /* Master link status as reported by INFO */
    unsigned long long slave_repl_offset; /* Slave replication offset. */
    /* Failover */
    char *leader;       /* If this is a master instance, this is the runid of
                           the Sentinel that should perform the failover. If
                           this is a Sentinel, this is the runid of the Sentinel
                           that this Sentinel voted as leader. */
    uint64_t leader_epoch; /* Epoch of the 'leader' field. */
    uint64_t failover_epoch; /* Epoch of the currently started failover. */
    int failover_state; /* See SENTINEL_FAILOVER_STATE_* defines. */
    mstime_t failover_state_change_time;
    mstime_t failover_start_time;   /* Last failover attempt start time. */
    mstime_t failover_timeout;      /* Max time to refresh failover state. */
    mstime_t failover_delay_logged; /* For what failover_start_time value we
                                       logged the failover delay. */
    struct sentinelRedisInstance *promoted_slave; /* Promoted slave instance. */
    /* Scripts executed to notify admin or reconfigure clients: when they
     * are set to NULL no script is executed. */
    char *notification_script;
    char *client_reconfig_script;
    sds info; /* cached INFO output */
} sentinelRedisInstance;

注意：系統啟動時根據配置文件中的信息初始化字典中每個master的信息，注意配置文件中需要設置監視master的ip與端口號

step2: 哨兵服務器初始化后,首次通過網絡連接獲取主服務器信息並更新

具體工作流程：哨兵服務器通過命令連接每10s一次發送info指令獲取master信息以及屬於該master服務器的slave服務器信息。

從服務器的信息在sentiRedisInstance的slave字典中，字典的key是從服務器的ip+端口，value還是sentiRedisInstance結構體。

注意點：

master服務器信息與slave信息都是sentinelRedisInstance結構體的實例化
二者通過flags屬性進行區分，都是通過字典進行維護，主服務器字典的key/name是用戶通過配置文件設置的，而從服務器的name/key是該服務器的IP地址+端口號

step3:根據主服務器提供的從服務器信息，首次建立與從服務器的連接並獲取信息

具體工作流程：利用主服務器提供的從服務器ip以及端口號，同樣建立2個網絡連接即命令連接以及訂閱連接。建立連接后通過命令連接發送

info指令獲取從服務器的詳細信息並存儲到對應的sentiRedisInstance實例結構。

主要更新的信息如下所示：

step4: 哨兵服務器向所有主從服務器頻道發送信息並接受所有訂閱的頻道信息

具體流程：哨兵服務器會每隔2s一次向所有主從服務器的定於頻道發送信息

Redis Publish 命令用於將信息發送到指定的頻道（下面的命令發送到名稱為__sentinel__:hello的頻道）

注意點：

命令連接：哨兵服務器初次獲取主從服務器發送指令是通過命令（cmd）連接
訂閱連接：主從服務器會通過訂閱連接向所有哨兵服務器發送自己的信息

監視同一個服務器的哨兵服務器通過接受解析訂閱頻道的信息識別到其他哨兵服務器的存在，然后哨兵服務器之間也會建立命令連接（哨兵服務器之間沒有訂閱連接）。！！！！！！！！！！！！！！！！！！！！！！！！

每個哨兵同時也會維護其他哨兵服務器的信息，這個信息存儲在對應master狀態中。

    /* Master specific. */
    //============================維護監視該主服務器的哨兵字典=======================================
    dict *sentinels;    /* Other sentinels monitoring the same master. */
    //============================ 維護一個從服務器字典===============================================
    dict *slaves;       /* Slaves for this master instance.*/

3-1-3 連接建立階段總結（重要）

單個哨兵啟動  => 
實例初始化，利用配置文件中的master信息，建立與所有master的命令連接和訂閱連接，通過命令連接發送info命令獲取所有master信息
=> 利用之前獲取的master信息與所有slave建立命令和訂閱連接，同樣使用info獲取所有slave信息 
=> 所有主從服務器訂閱連接已經建立，此時哨兵服務器通過命令連接發送publish指令給主/從服務器的訂閱頻道，注意該哨兵服務器發送的信息也會被其他訂閱同一頻道的哨兵服務器（包括他自己）獲取
=> 監控相同服務器的哨兵服務器通過訂閱連接發送的頻道信息（如果沒有該信息，建立該哨兵服務器的命令連接），並維護該哨兵服務器的信息，至此整個redis服務器的連接建立完畢
=> 哨兵服務器周期性的：
    a)通過命令連接向訂閱頻道發送public命令公開自己維護的信息
    b)通過命令連接向其他三類實例發送ping命令確認他們的狀態。

監控階段更加詳細的內容可以參考數據《redis設計與實現》第16章

3-2 階段二：監控/通知

問題：哨兵實例會監控哪些信息？

答：監控並維護所監控的主服務器信息，從服務器信息以及其他哨兵服務器的信息，通過定期的ping命令確認狀態，通過定期的info命令獲取詳細信息

通知：信息的長期維護階段，sentinel服務器內部信息進行維護

對於監視同一個服務器的多個sentinel,一個sentinel發送的信息會被其他sentinel接受到，這些信息會被用於更新其他sentinel對發送信息sentinel信息的認識，也別用於更新其他sentinel對監視服務器的認識。

哨兵服務器之間信息的維護（結合上圖）：圖中sentinel 1-3監視同一個服務器，當sentinel1通過命令（cmd）連接向監視服務器的__sentinel__:hello channel發送信息，其他所有訂閱該頻道的sentinel(圖中的2和3)也會收到該消息，這個消息中包含了當前監控服務器的配置，其他sentinel會對比較自己維護的配置是否比這個配置老，如果是的話則進行更新。

由於哨兵服務器之間通過訂閱監視服務器的頻道進行信息的維護，因此哨兵服務器之間不需要建立訂閱連接。

Sentinels and replicas auto discovery

Sentinels stay connected with other Sentinels in order to reciprocally check the availability of each other, and to exchange messages. However you don't need to configure a list of other Sentinel addresses in every Sentinel instance you run, as Sentinel uses the Redis instances Pub/Sub capabilities in order to discover the other Sentinels that are monitoring the same masters and replicas.

This feature is implemented by sending hello messages into the channel named __sentinel__:hello.

Similarly you don't need to configure what is the list of the replicas attached to a master, as Sentinel will auto discover this list querying Redis.

Every Sentinel publishes a message to every monitored master and replica Pub/Sub channel __sentinel__:hello, every two seconds, announcing its presence with ip, port, runid.
Every Sentinel is subscribed to the Pub/Sub channel __sentinel__:hello of every master and replica, looking for unknown sentinels. When new sentinels are detected, they are added as sentinels of this master.
Hello messages also include the full current configuration of the master. If the receiving Sentinel has a configuration for a given master which is older than the one received, it updates to the new configuration immediately.
Before adding a new sentinel to a master a Sentinel always checks if there is already a sentinel with the same runid or the same address (ip and port pair). In that case all the matching sentinels are removed, and the new added.

問題：為什么哨兵實例要與主從實例除了建立命令連接外，還建立訂閱連接？

哨兵實例之間通過訂閱連接獲取信息實現內部的信息同步
哨兵實例的自動發現通過訂閱連接實現，避免了哨兵實例手動去配置其他哨兵服務器的IP以及端口。

3-3 階段三：故障轉移階段

背景知識：sential所監控的服務器的所有可能狀態（sentinelRedisInstance的flag屬性值）

/* A Sentinel Redis Instance object is monitoring. */
#define SRI_MASTER  (1<<0)      // 表示該實例是master服務器
#define SRI_SLAVE   (1<<1)      // 表示該實例是slave服務器
#define SRI_SENTINEL (1<<2)     // 表示該實例是sentinel
#define SRI_S_DOWN (1<<3)   /* Subjectively down (no quorum). 該實例是主觀下線狀態*/
#define SRI_O_DOWN (1<<4)   /* Objectively down (confirmed by others).該實例是客觀下線狀態 */
#define SRI_MASTER_DOWN (1<<5) /* A Sentinel with this flag set thinks that
                                   its master is down. */
#define SRI_FAILOVER_IN_PROGRESS (1<<6) /* Failover is in progress for
                                           this master. */
#define SRI_PROMOTED (1<<7)            /* Slave selected for promotion. */
#define SRI_RECONF_SENT (1<<8)     /* SLAVEOF <newmaster> sent. */
#define SRI_RECONF_INPROG (1<<9)   /* Slave synchronization in progress. */
#define SRI_RECONF_DONE (1<<10)     /* Slave synchronized with new master. */
#define SRI_FORCE_FAILOVER (1<<11)  /* Force failover with master up. */
#define SRI_SCRIPT_KILL_SENT (1<<12) /* SCRIPT KILL already sent on -BUSY */

step1主觀下線判定階段

默認情況下，sentinel會以1s一次的頻率向所有與其建立命令連接的實例（包括主服務器，從服務器，其他sentinel）發送ping命令，並通過實例返回的ping命令回復判斷實例是否在線。

當其中一台sentinel發現監控的服務器下線時，會將監控服務器的狀態改為SRI_S_DOWN（主觀下線）

問題：什么時候哨兵實例判斷一台監控實例為主觀下線？

答：當這台實例的長時間是無效回復（沒有回復），到達配置文件設定的閾值，則判定為主觀下線。

通過改變配置文件 sentinel.conf中的down-after-milliseconds選項設置主觀下線判定等待的時間

注意：配置文件中master的down-after-milliseconds選項也會被用於其他類型的服務器實例。

step2:客觀下線判定階段

當超過半數的sentinel實例發現監控的服務器下線時，會將監控服務器的狀態改為SRI_O_DOWN（主觀下線)

判定客觀下線的半數實例數目也是通過配置文件設置（通常是sentinel實例數目的一半+1）

step3:選舉領頭sentinel

詳細的選舉機制見《Redis的設計與實現》16.8節

Redis的七個核心機制底層原理

step4:領頭sentinel重新挑選master，並讓其他slave連接這個新的master

3-4 哨兵的工作流程總結(重要)

哨兵實例的工作日志信息

1）連接建立階段：
  sentinel實例需要正常工作，首先基於哨兵配置文件建立與master實例的命令連接與訂閱連接並獲取slave的信息，之后建立與slave實例建立連接，然后通過
訂閱頻道獲取其他哨兵的發布的信息，從而發現其他哨兵並建立連接。
2）監控/通知階段：
--監控（同步監控服務器信息）：sentinel通過命令連接發送ping指令確認其余三種實例的狀態。
--通知（整合sentinel內部最新信息）：通知可以理解為sentinel內部的最新信息同步，每個sentinel定期的publish自身的信息以及監控的服務器實例信息，這些信息會被所有訂閱相同頻道的1其他sentinel收到，當一個sentinel實例收到其他sentinel信息時，根據信息的版本來判斷自己舊的信息是否過期並進行維護。
3）故障轉移階段
主客觀下線判定maste下線=>競選領頭sentinel=>采用一定方式從slave選出新的master
=>發出命令使得新master上任，其他slave切換master，原master作為slave

哨兵實例總體工作流程（面試的時候連接建立可以不用太詳細講）

連接建立 =>  監控/通知(信息維護) => 故障轉移 => 監控/通知(信息維護) .....

哨兵實例發送與接受總結

常用的發送的指令

info: 獲取主從服務器的詳細信息
ping: 確認服務器的存活狀態
publish:發布自身以及自己維護的最新信息

接受的常見信息

1）info/ping指令的反饋
2）訂閱頻道的信息（包含自己publish信息的反饋，以及其他哨兵實例publish信息的反饋）

參考資料

redis4.0的源碼

Redis的設計與實現（第16章節）（推薦）

Nyima的博客之redis的設計與實現讀書筆記整理之哨兵（不錯）

redis的基礎課程

redis重點知識匯總

JavaGuide的Redis

Redis Sentinel Documentation

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 redis源碼學習之工作流程初探 nginx的工作流程 Servlet工作流程 Ajax工作流程 CA 工作流程 SpringMVC工作流程 scrapy工作流程 SpringMVC的工作流程 springmvc工作流程 Redux的工作流程