[ mongoDB ] - ReplSet選舉策略 [轉]

本文轉載自查看原文 2012-04-12 15:35 3880 21-mongoDB

首先介紹一下在replica set里分為三種節點類型：

1primary 負責client的讀寫。

2secondary作為熱備節點，應用Primary的oplog讀取的操作日志,和primary保持一致，不提供讀寫操作！

secondary有兩種類型:

1)normal secondary 隨時和Primay保持同步,

2)delayed secondary 延時指定時間和primary保持同步,防止誤操作.

3arbiter.它不負責任何讀寫,只作為一個仲裁者，負責primary down的時候剩余節點的選舉操作.

在Replica Set 如果主庫down了，要進行故障切換，集群的選舉策略：

當primary當了之后，剩下的節點會選擇一個primary節點，仲裁節點也會參與投票，避免僵局出現(如果沒有仲裁節點，對於兩節點的replica set 從節點down，主節點會變為secondary，導致整個replica set 不可用)選擇依據為：優先級最高的且數據新鮮度最新的！

primary 節點使用心跳來跟蹤集群中有多少節點對其可見。如果達不到1/2，活躍節點會自動降級為secondary。這樣就能夠防止上面說的僵局狀態或者當網絡切割后primary已經與集群隔離的時候！

來自官方文檔的例子：

初始狀況：

server-a: secondary oplog: ()

server-b: secondary oplog: ()

server-c: secondary oplog: ()

主庫寫入數據

server-a: primary oplog: (a1,a2,a3,a4,a5)

server-b: secondary oplog: ()

server-c: secondary oplog: ()

secondary庫應用數據

server-a: primary oplog: (a1,a2,a3,a4,a5)

server-b: secondary oplog: (a1)

server-c: secondary oplog: (a1,a2,a3)

…

主庫 server-a down

…

server-b: secondary oplog: (a1)

server-c: secondary oplog: (a1,a2,a3)

...

server-b: secondary oplog: (a1)

server-c: primary oplog: (a1,a2,a3) // c 具有最新的數據被選擇為primary庫

...

server-b: secondary oplog: (a1,a2,a3)

server-c: primary oplog: (a1,a2,a3,c4)

...

server-a 恢復或者起來

...

server-a: recovering oplog: (a1,a2,a3,a4,a5) --做數據恢復

server-b: secondary oplog: (a1,a2,a3)

server-c: primary oplog: (a1,a2,a3,c4)

…應用從server-c中的數據，此時數據a4，a5丟失

server-a: recovering oplog: (a1,a2,a3,c4)

server-b: secondary oplog: (a1,a2,a3,c4)

server-c: primary oplog: (a1,a2,a3,c4)

新的主庫server-c進行數據寫入。

server-a: secondary oplog: (a1,a2,a3,c4)

server-b: secondary oplog: (a1,a2,a3,c4)

server-c: primary oplog: (a1,a2,a3,c4,c5,c6,c7,c8)

…

server-a: secondary oplog: (a1,a2,a3,c4,c5,c6,c7,c8)

server-b: secondary oplog: (a1,a2,a3,c4,c5,c6,c7,c8)

server-c: primary oplog: (a1,a2,a3,c4,c5,c6,c7,c8)

從上面的過程中可以看出server-c 變為主庫，其他節點則應用從server-c的日志。數據a4，a5 丟失。

當選出新的primary之后，此數據庫的數據就會被假定為整個集群中的最新數據，對其他節點(原來的活躍節點)的操作都會回滾，即便之前的主庫已經恢復工作了。為了完成回滾，所有節點連接新的主庫后都要重新進行同步。此過程如下：

這些節點會查看自己的oplog日志，找到還沒應用的主庫操作，然后向主庫請求這些操作影響的文檔的最新副本，進行數據同步。

對於Replica Set中的選擇策略：

We use a consensus protocol to pick a primary. Exact details will be spared here but that basic process is:

1 get maxLocalOpOrdinal from each server.

2 if a majority of servers are not up (from this server's POV), remain in Secondary mode and stop.

3 if the last op time seems very old, stop and await human intervention.

4 else, using a consensus protocol, pick the server with the highest maxLocalOpOrdinal as the Primary.

對於策略2：當集群里的大多數服務器發生down 機了,剩余的節點就會保持在secondary模式並停止服務。

做了實驗結果是對於4節點的 Replica Set，當兩個secondary節點down了的時候，主節點變為secondary。整個集群相當於掛了，因為secondary 不提供讀寫操作。。

在一個集群中關閉兩個secondary 節點：rac4:27019和rac3:27017

[mongodb@rac4bin]$ ./mongo 127.0.0.1:27019

MongoDB shell version: 2.0.1

connecting to: 127.0.0.1:27019/test

SECONDARY>

SECONDARY> use admin

switched to db admin

SECONDARY> db.shutdownServer();

Wed Nov 2 11:02:29 DBClientCursor::init call() failed

Wed Nov 2 11:02:29 query failed : admin.$cmd { shutdown: 1.0 } to: 127.0.0.1:27019

server should be down...

[mongodb@rac3bin]$ ./mongo 10.250.7.241:27017

MongoDB shell version: 2.0.1

connecting to: 127.0.0.1:27017/test

SECONDARY>

SECONDARY> use admin

switched to db admin

SECONDARY> db.shutdownServer();

Tue Nov 1 22:02:46 DBClientCursor::init call() failed

Tue Nov 1 22:02:46 query failed : admin.$cmd { shutdown: 1.0 } to: 127.0.0.1:27017

server should be down...

Tue Nov 1 22:02:46 trying reconnect to 127.0.0.1:27017

Tue Nov 1 22:02:46 reconnect 127.0.0.1:27017 failed couldn't connect to server 127.0.0.1:27017

Tue Nov 1 22:02:46 Error: error doing query: unknown shell/collection.js:150

從主庫的客戶端退出以后，再次進入提示符發生變化：由PRIMARY--->SECONDARY ，查看Replica Set的狀態信息：

[mongodb@rac4 bin]$ ./mongo 127.0.0.1:27020

MongoDB shell version: 2.0.1

connecting to: 127.0.0.1:27020/test

SECONDARY>

SECONDARY> rs.status();

{

"set" : "myset",

"date" : ISODate("2011-11-01T13:56:05Z"),

"myState" : 2,

"members" : [

{

"_id" : 0,

"name" : "10.250.7.220:27018",

"health" : 1,

"state" : 2,

"stateStr" : "SECONDARY",

"uptime" : 101,

"optime" : {

"t" : 1320154033000,

"i" : 1

"optimeDate" : ISODate("2011-11-01T13:27:13Z"),

"lastHeartbeat" : ISODate("2011-11-01T13:56:04Z"),

"pingMs" : 0

{

"_id" : 1,

"name" : "10.250.7.220:27019",

"health" : 0, --已經關閉

"state" : 8,

"stateStr" : "(not reachable/healthy)",

"uptime" : 0,

"optime" : {

"t" : 1320154033000,

"i" : 1

"optimeDate" : ISODate("2011-11-01T13:27:13Z"),

"lastHeartbeat" : ISODate("2011-11-01T13:53:50Z"),

"pingMs" : 0,

"errmsg" : "socket exception"

{

"_id" : 2,

"name" : "10.250.7.220:27020",

"health" : 1,

"state" : 2,

"stateStr" : "SECONDARY", ---由主庫變為從庫

"optime" : {

"t" : 1320154033000,

"i" : 1

"optimeDate" : ISODate("2011-11-01T13:27:13Z"),

"self" : true

{

"_id" : 3,

"name" : "10.250.7.241:27017",

"health" : 0,

"state" : 8,

"stateStr" : "(not reachable/healthy)",

"uptime" : 0,

"optime" : {

"t" : 1320154033000,

"i" : 1

"optimeDate" : ISODate("2011-11-01T13:27:13Z"),

"lastHeartbeat" : ISODate("2011-11-01T13:53:54Z"),

"pingMs" : 0,

"errmsg" : "socket exception"

}

"ok" : 1

}

SECONDARY> exut

Wed Nov 2 15:23:02 ReferenceError: exut is not defined (shell):1

Wed Nov 2 15:23:02 DBClientCursor::init call() failed

> exit

bye

承接之前的文章繼續介紹replica set 選舉機制。

創建兩節點的Replica Sets，一主一備secondary，如果Secondary宕機，Primary會變成Secondary！這時候集群里沒有Primary了！為什么會出現這樣的情況呢。

[mongodb@rac4 bin]$ mongo 127.0.0.1:27018 init1node.js

MongoDB shell version: 2.0.1

connecting to: 127.0.0.1:27018/test

[mongodb@rac4 bin]$ ./mongo 127.0.0.1:27019

MongoDB shell version: 2.0.1

connecting to: 127.0.0.1:27019/test

RECOVERING>

SECONDARY>

SECONDARY> use admin

switched to db admin

SECONDARY> db.shutdownServer()

Sun Nov 6 20:16:11 DBClientCursor::init call() failed

Sun Nov 6 20:16:11 query failed : admin.$cmd { shutdown: 1.0 } to: 127.0.0.1:27019

server should be down...

Sun Nov 6 20:16:11 trying reconnect to 127.0.0.1:27019

Sun Nov 6 20:16:11 reconnect 127.0.0.1:27019 failed couldn't connect to server 127.0.0.1:27019

Sun Nov 6 20:16:11 Error: error doing query: unknown shell/collection.js:150

secondary 當機之后，主庫有PRIMARY變為SECONDARY

[mongodb@rac4 bin]$ mongo 127.0.0.1:27018

MongoDB shell version: 2.0.1

connecting to: 127.0.0.1:27018/test

PRIMARY>

SECONDARY>

從日志中可以看出：從庫down了之后，主庫的變化

Sun Nov 6 20:16:13 [rsHealthPoll] replSet info 10.250.7.220:27019 is down (or slow to respond): DBClientBase::findN: transport error: 10.250.7.220:27019 query: { replSetHeartbeat: "myset", v: 1, pv: 1, checkEmpty: false, from: "10.250.7.220:27018" }

Sun Nov 6 20:16:13 [rsHealthPoll] replSet member 10.250.7.220:27019 is now in state DOWN

Sun Nov 6 20:16:13 [conn7] end connection 10.250.7.220:13217

Sun Nov 6 20:16:37 [rsMgr] can't see a majority of the set, relinquishing primary

Sun Nov 6 20:16:37 [rsMgr] replSet relinquishing primary state

Sun Nov 6 20:16:37 [rsMgr] replSet SECONDARY

這是和MongoDB的Primary選舉策略有關的，如果情況不是Secondary宕機，而是網絡斷開，那么兩個節點都會選取自己為Primary，因為他們能連接上的只有自己這一個節點。而這樣的情況在網絡恢復后就需要處理復雜的一致性問題。而且斷開的時間越長，時間越復雜。所以MongoDB選擇的策略是如果集群中只有自己一個節點，那么不選取自己為Primary。

所以正確的做法應該是添加兩個以上的節點，或者添加arbiter，當然最好也最方便的做法是添加arbiter，aribiter節點只參與選舉，幾乎不會有壓力，所以你可以在各種閑置機器上啟動arbiter節點，這不僅會避免上面說到的無法選舉Primary的情況，更會讓選取更快速的進行。因為如果是三台數據節點，一個節點宕機，另外兩個節點很可能會各自選舉自己為Primary，從而導致很長時間才能得出選舉結果。實際上集群選舉主庫上由優先級和數據的新鮮度這兩個條件決定的。

官方文檔：

Example: if B and C are candidates in an election, B having a higher priority but C being the most up to date:

1 C will be elected primary

2 Once B catches up a re-election should be triggered and B (the higher priority node) should win the election between B and C

3 Alternatively, suppose that, once B is within 12 seconds of synced to C, C goes down.

B will be elected primary.

When C comes back up, those 12 seconds of unsynced writes will be written to a file in the rollback directory of your data directory (rollback is created when needed).

You can manually apply the rolled-back data, see Replica Sets - Rollbacks.

重新搭建replica set 集群不過這次加上仲裁者:

[mongodb@rac4 bin]$ cat init2node.js

rs.initiate({

_id : "myset",

members : [

{_id : 0, host : "10.250.7.220:28018"},

{_id : 1, host : "10.250.7.220:28019"},

{_id : 2, host : "10.250.7.220:28020", arbiterOnly: true}

]

})

[mongodb@rac4 bin]$ ./mongo 127.0.0.1:28018 init2node.js

[mongodb@rac4 bin]$ ./mongo 127.0.0.1:28018

MongoDB shell version: 2.0.1

connecting to: 127.0.0.1:28018/test

PRIMARY> rs.status()

{

"set" : "myset",

"date" : ISODate("2011-11-06T14:16:13Z"),

"myState" : 1,

"members" : [

{

"_id" : 0,

"name" : "10.250.7.220:28018",

"health" : 1,

"state" : 1,

...

{

"_id" : 1,

"name" : "10.250.7.220:28019",

"health" : 1,

"state" : 2,

"stateStr" : "SECONDARY",

....

{

"_id" : 2,

"name" : "10.250.7.220:28020",

"health" : 1,

"state" : 7,

"stateStr" : "ARBITER",

....

}

"ok" : 1

}

PRIMARY>

再次測試，測試主庫變成secondary節點。

對於前一篇文章多節點的，比如4個primary，secondary節點，一個仲裁者，當兩個節點down了之后，不會出現的文章說的down 1/2的機器整個集群不可用，但是如果down 3/4的機器時，整個集群將不可用！

日志記錄中描述的 “majority of” 並沒有給出一個具體的數值，目前所做的實驗是多於1/2的時候,整個集群就不可用了

Sun Nov 6 19:34:16 [rsMgr] can't see a majority of the set, relinquishing primary

參考文章：

http://www.mongodb.org/display/DOCS/Replica+Sets+-+Priority

http://blog.nosqlfan.com/html/2523.html

-----------------------------------

http://space.itpub.net/22664653/spacelist-blog-itemtypeid-86668

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 mongodb配置復制集replset MongoDB關於replSet的配置概述（一主二從） MongoDB源碼閱讀之ReplSet源碼分析 Keepalived詳解（五）：Keepalived集群中MASTER和BACKUP角色選舉策略【轉】 MongoCola使用教程 2 - MongoDB的Replset 初始化和配置使用Docker方式部署Mongodb多副本集（replSet）從零開始 MongoDB 集群(ReplSet)搭建之二群集搭建【Zookeeper】實現哨兵機制（選舉策略） Kafka Rebalance機制和選舉策略總結連接MongoDB時出現錯誤：SHARDING [initandlisten] Marking collection local.system.replset as collection version: