轉自:http://www.lanceyan.com/tech/mongodb/mongodb_repset1.html
在上一篇文章《搭建高可用MongoDB集群(一)——配置MongoDB》 提到了幾個問題還沒有解決。
- 主節點掛了能否自動切換連接?目前需要手工切換。
- 主節點的讀寫壓力過大如何解決?
- 從節點每個上面的數據都是對數據庫全量拷貝,從節點壓力會不會過大?
- 數據壓力大到機器支撐不了的時候能否做到自動擴展?
這篇文章看完這些問題就可以搞定了。NoSQL的產生就是為了解決大數據量、高擴展性、高性能、靈活數據模型、高可用性。但是光通過主從模式的架構遠遠達不到上面幾點,由此MongoDB設計了副本集和分片的功能。這篇文章主要介紹副本集:
mongoDB官方已經不建議使用主從模式了,替代方案是采用副本集的模式,點擊查看 ,如圖:
那什么是副本集呢?打魔獸世界總說打副本,其實這兩個概念差不多一個意思。游戲里的副本是指玩家集中在高峰時間去一個場景打怪,會出現玩家暴多怪物少的情況,游戲開發商為了保證玩家的體驗度,就為每一批玩家單獨開放一個同樣的空間同樣的數量的怪物,這一個復制的場景就是一個副本,不管有多少個玩家各自在各自的副本里玩不會互相影響。 mongoDB的副本也是這個,主從模式其實就是一個單副本的應用,沒有很好的擴展性和容錯性。而副本集具有多個副本保證了容錯性,就算一個副本掛掉了還有很多副本存在,並且解決了上面第一個問題“主節點掛掉了,整個集群內會自動切換”。難怪mongoDB官方推薦使用這種模式。我們來看看mongoDB副本集的架構圖:
由圖可以看到客戶端連接到整個副本集,不關心具體哪一台機器是否掛掉。主服務器負責整個副本集的讀寫,副本集定期同步數據備份,一但主節點掛掉,副本節點就會選舉一個新的主服務器,這一切對於應用服務器不需要關心。我們看一下主服務器掛掉后的架構:
副本集中的副本節點在主節點掛掉后通過心跳機制檢測到后,就會在集群內發起主節點的選舉機制,自動選舉一位新的主服務器。看起來很牛X的樣子,我們趕緊操作部署一下!
官方推薦的副本集機器數量為至少3個,那我們也按照這個數量配置測試。
1、准備兩台機器 192.168.1.136、192.168.1.137、192.168.1.138。 192.168.1.136 當作副本集主節點,192.168.1.137、192.168.1.138作為副本集副本節點。
2、分別在每台機器上建立mongodb副本集測試文件夾
|
1
2
3
4
5
6
7
8
|
#存放整個mongodb文件
mkdir
-p
/data/mongodbtest/replset
#存放mongodb數據文件
mkdir
-p
/data/mongodbtest/replset/data
#進入mongodb文件夾
cd
/data/mongodbtest
|
3、下載mongodb的安裝程序包
|
1
|
wget http:
//fastdl
.mongodb.org
/linux/mongodb-linux-x86_64-2
.4.8.tgz
|
注意linux生產環境不能安裝32位的mongodb,因為32位受限於操作系統最大2G的文件限制。
|
1
2
|
#解壓下載的壓縮包
tar
xvzf mongodb-linux-x86_64-2.4.8.tgz
|
4、分別在每台機器上啟動mongodb
|
1
|
/data/mongodbtest/mongodb-linux-x86_64-2
.4.8
/bin/mongod
--dbpath
/data/mongodbtest/replset/data
--replSet repset
|
可以看到控制台上顯示副本集還沒有配置初始化信息。
|
1
2
|
Sun Dec 29 20:12:02.953 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Sun Dec 29 20:12:02.953 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done
|
5、初始化副本集
在三台機器上任意一台機器登陸mongodb
|
1
2
3
4
|
/data/mongodbtest/mongodb-linux-x86_64-2
.4.8
/bin/mongo
#使用admin數據庫
use admin
|
#定義副本集配置變量,這里的 _id:”repset” 和上面命令參數“ –replSet repset” 要保持一樣。
|
1
2
3
4
5
|
config = { _id:
"repset"
, members:[
... {_id:0,host:
"192.168.1.136:27017"
},
... {_id:1,host:
"192.168.1.137:27017"
},
... {_id:2,host:
"192.168.1.138:27017"
}]
... }
|
#輸出
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
{
"_id" : "repset",
"members" : [
{
"_id" : 0,
"host" : "192.168.1.136:27017"
},
{
"_id" : 1,
"host" : "192.168.1.137:27017"
},
{
"_id" : 2,
"host" : "192.168.1.138:27017"
}
]
}
|
|
1
2
|
#初始化副本集配置
rs.initiate(config);
|
#輸出成功
|
1
2
3
4
|
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
|
#查看日志,副本集啟動成功后,138為主節點PRIMARY,136、137為副本節點SECONDARY。
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
Sun Dec 29 20:26:13.842 [conn3] replSet replSetInitiate admin command received from client
Sun Dec 29 20:26:13.842 [conn3] replSet replSetInitiate config object parses ok, 3 members specified
Sun Dec 29 20:26:13.847 [conn3] replSet replSetInitiate all members seem up
Sun Dec 29 20:26:13.848 [conn3] ******
Sun Dec 29 20:26:13.848 [conn3] creating replication oplog of size: 990MB...
Sun Dec 29 20:26:13.849 [FileAllocator] allocating new datafile /data/mongodbtest/replset/data/local.1, filling with zeroes...
Sun Dec 29 20:26:13.862 [FileAllocator] done allocating datafile /data/mongodbtest/replset/data/local.1, size: 1024MB, took 0.012 secs
Sun Dec 29 20:26:13.863 [conn3] ******
Sun Dec 29 20:26:13.863 [conn3] replSet info saving a newer config version to local.system.replset
Sun Dec 29 20:26:13.864 [conn3] replSet saveConfigLocally done
Sun Dec 29 20:26:13.864 [conn3] replSet replSetInitiate config now saved locally. Should come online in about a minute.
Sun Dec 29 20:26:23.047 [rsStart] replSet I am 192.168.1.138:27017
Sun Dec 29 20:26:23.048 [rsStart] replSet STARTUP2
Sun Dec 29 20:26:23.049 [rsHealthPoll] replSet member 192.168.1.137:27017 is up
Sun Dec 29 20:26:23.049 [rsHealthPoll] replSet member 192.168.1.136:27017 is up
Sun Dec 29 20:26:24.051 [rsSync] replSet SECONDARY
Sun Dec 29 20:26:25.053 [rsHealthPoll] replset info 192.168.1.136:27017 thinks that we are down
Sun Dec 29 20:26:25.053 [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state STARTUP2
Sun Dec 29 20:26:25.056 [rsMgr] not electing self, 192.168.1.136:27017 would veto with 'I don't think 192.168.1.138:27017 is electable'
Sun Dec 29 20:26:31.059 [rsHealthPoll] replset info 192.168.1.137:27017 thinks that we are down
Sun Dec 29 20:26:31.059 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state STARTUP2
Sun Dec 29 20:26:31.062 [rsMgr] not electing self, 192.168.1.137:27017 would veto with 'I don't think 192.168.1.138:27017 is electable'
Sun Dec 29 20:26:37.074 [rsMgr] replSet info electSelf 2
Sun Dec 29 20:26:38.062 [rsMgr] replSet PRIMARY
Sun Dec 29 20:26:39.071 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state RECOVERING
Sun Dec 29 20:26:39.075 [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state RECOVERING
Sun Dec 29 20:26:42.201 [slaveTracking] build index local.slaves { _id: 1 }
Sun Dec 29 20:26:42.207 [slaveTracking] build index done. scanned 0 total records. 0.005 secs
Sun Dec 29 20:26:43.079 [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state SECONDARY
Sun Dec 29 20:26:49.080 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state SECONDARY
|
|
1
2
|
#查看集群節點的狀態
rs.status();
|
#輸出
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
{
"set" : "repset",
"date" : ISODate("2013-12-29T12:54:25Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.1.136:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1682,
"optime" : Timestamp(1388319973, 1),
"optimeDate" : ISODate("2013-12-29T12:26:13Z"),
"lastHeartbeat" : ISODate("2013-12-29T12:54:25Z"),
"lastHeartbeatRecv" : ISODate("2013-12-29T12:54:24Z"),
"pingMs" : 1,
"syncingTo" : "192.168.1.138:27017"
},
{
"_id" : 1,
"name" : "192.168.1.137:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1682,
"optime" : Timestamp(1388319973, 1),
"optimeDate" : ISODate("2013-12-29T12:26:13Z"),
"lastHeartbeat" : ISODate("2013-12-29T12:54:25Z"),
"lastHeartbeatRecv" : ISODate("2013-12-29T12:54:24Z"),
"pingMs" : 1,
"syncingTo" : "192.168.1.138:27017"
},
{
"_id" : 2,
"name" : "192.168.1.138:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 2543,
"optime" : Timestamp(1388319973, 1),
"optimeDate" : ISODate("2013-12-29T12:26:13Z"),
"self" : true
}
],
"ok" : 1
}
|
整個副本集已經搭建成功了。
6、測試副本集數據復制功能
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
#在主節點192.168.1.138 上連接到終端:
mongo 127.0.0.1
#建立test 數據庫。
use
test
;
往testdb表插入數據。
> db.testdb.insert({
"test1"
:
"testval1"
})
#在副本節點 192.168.1.136、192.168.1.137 上連接到mongodb查看數據是否復制過來。
/data/mongodbtest/mongodb-linux-x86_64-2
.4.8
/bin/mongo
192.168.1.136:27017
#使用test 數據庫。
repset:SECONDARY> use
test
;
repset:SECONDARY> show tables;
|
#輸出
|
1
|
Sun Dec 29 21:50:48.590 error: { "$err" : "not master and slaveOk=false", "code" : 13435 } at src/mongo/shell/query.js:128
|
|
1
2
3
4
5
|
#mongodb默認是從主節點讀寫數據的,副本節點上不允許讀,需要設置副本節點可以讀。
repset:SECONDARY> db.getMongo().setSlaveOk();
#可以看到數據已經復制到了副本集。
repset:SECONDARY> db.testdb.
find
();
|
|
1
2
|
#輸出
{ "_id" : ObjectId("52c028460c7505626a93944f"), "test1" : "testval1" }
|
7、測試副本集故障轉移功能
先停掉主節點mongodb 138,查看136、137的日志可以看到經過一系列的投票選擇操作,137 當選主節點,136從137同步數據過來。
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
Sun Dec 29 22:03:05.351 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: 192.168.1.138:27017
Sun Dec 29 22:03:05.354 [rsBackgroundSync] replSet syncing to: 192.168.1.138:27017
Sun Dec 29 22:03:05.356 [rsBackgroundSync] repl: couldn't connect to server 192.168.1.138:27017
Sun Dec 29 22:03:05.356 [rsBackgroundSync] replSet not trying to sync from 192.168.1.138:27017, it is vetoed for 10 more seconds
Sun Dec 29 22:03:05.499 [rsHealthPoll] DBClientCursor::init call() failed
Sun Dec 29 22:03:05.499 [rsHealthPoll] replset info 192.168.1.138:27017 heartbeat failed, retrying
Sun Dec 29 22:03:05.501 [rsHealthPoll] replSet info 192.168.1.138:27017 is down (or slow to respond):
Sun Dec 29 22:03:05.501 [rsHealthPoll] replSet member 192.168.1.138:27017 is now in state DOWN
Sun Dec 29 22:03:05.511 [rsMgr] not electing self, 192.168.1.137:27017 would veto with '192.168.1.136:27017 is trying to elect itself but 192.168.1.138:27017 is already primary and more up-to-date'
Sun Dec 29 22:03:07.330 [conn393] replSet info voting yea for 192.168.1.137:27017 (1)
Sun Dec 29 22:03:07.503 [rsHealthPoll] replset info 192.168.1.138:27017 heartbeat failed, retrying
Sun Dec 29 22:03:08.462 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state PRIMARY
Sun Dec 29 22:03:09.359 [rsBackgroundSync] replSet syncing to: 192.168.1.137:27017
Sun Dec 29 22:03:09.507 [rsHealthPoll] replset info 192.168.1.138:27017 heartbeat failed, retrying
|
查看整個集群的狀態,可以看到138為狀態不可達
|
1
2
3
|
/data/mongodbtest/mongodb-linux-x86_64-2
.4.8
/bin/mongo
192.168.1.136:27017
repset:SECONDARY> rs.status();
|
#輸出
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
{
"set" : "repset",
"date" : ISODate("2013-12-29T14:28:35Z"),
"myState" : 2,
"syncingTo" : "192.168.1.137:27017",
"members" : [
{
"_id" : 0,
"name" : "192.168.1.136:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 9072,
"optime" : Timestamp(1388324934, 1),
"optimeDate" : ISODate("2013-12-29T13:48:54Z"),
"self" : true
},
{
"_id" : 1,
"name" : "192.168.1.137:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 7329,
"optime" : Timestamp(1388324934, 1),
"optimeDate" : ISODate("2013-12-29T13:48:54Z"),
"lastHeartbeat" : ISODate("2013-12-29T14:28:34Z"),
"lastHeartbeatRecv" : ISODate("2013-12-29T14:28:34Z"),
"pingMs" : 1,
"syncingTo" : "192.168.1.138:27017"
},
{
"_id" : 2,
"name" : "192.168.1.138:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(1388324934, 1),
"optimeDate" : ISODate("2013-12-29T13:48:54Z"),
"lastHeartbeat" : ISODate("2013-12-29T14:28:35Z"),
"lastHeartbeatRecv" : ISODate("2013-12-29T14:28:23Z"),
"pingMs" : 0,
"syncingTo" : "192.168.1.137:27017"
}
],
"ok" : 1
}
|
再啟動原來的主節點 138,發現138 變為 SECONDARY,還是137 為主節點 PRIMARY。
|
1
2
3
4
5
6
7
8
|
Sun Dec 29 22:21:06.619 [rsStart] replSet I am 192.168.1.138:27017
Sun Dec 29 22:21:06.619 [rsStart] replSet STARTUP2
Sun Dec 29 22:21:06.627 [rsHealthPoll] replset info 192.168.1.136:27017 thinks that we are down
Sun Dec 29 22:21:06.627 [rsHealthPoll] replSet member 192.168.1.136:27017 is up
Sun Dec 29 22:21:06.627 [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state SECONDARY
Sun Dec 29 22:21:07.628 [rsSync] replSet SECONDARY
Sun Dec 29 22:21:08.623 [rsHealthPoll] replSet member 192.168.1.137:27017 is up
Sun Dec 29 22:21:08.624 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state PRIMARY
|
8、java程序連接副本集測試。三個節點有一個節點掛掉也不會影響應用程序客戶端對整個副本集的讀寫!
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
public
class
TestMongoDBReplSet {
public
static
void
main(String[] args) {
try
{
List<ServerAddress> addresses =
new
ArrayList<ServerAddress>();
ServerAddress address1 =
new
ServerAddress(
"192.168.1.136"
,
27017
);
ServerAddress address2 =
new
ServerAddress(
"192.168.1.137"
,
27017
);
ServerAddress address3 =
new
ServerAddress(
"192.168.1.138"
,
27017
);
addresses.add(address1);
addresses.add(address2);
addresses.add(address3);
MongoClient client =
new
MongoClient(addresses);
DB db = client.getDB(
"test"
);
DBCollection coll = db.getCollection(
"testdb"
);
// 插入
BasicDBObject object =
new
BasicDBObject();
object.append(
"test2"
,
"testval2"
);
coll.insert(object);
DBCursor dbCursor = coll.find();
while
(dbCursor.hasNext()) {
DBObject dbObject = dbCursor.next();
System. out.println(dbObject.toString());
}
}
catch
(Exception e) {
e.printStackTrace();
}
}
}
|
目前看起來支持完美的故障轉移了,這個架構是不是比較完美了?其實還有很多地方可以優化,比如開頭的第二個問題:主節點的讀寫壓力過大如何解決?常見的解決方案是讀寫分離,mongodb副本集的讀寫分離如何做呢?
看圖說話:
常規寫操作來說並沒有讀操作多,所以一台主節點負責寫,兩台副本節點負責讀。
1、設置讀寫分離需要先在副本節點SECONDARY 設置 setSlaveOk。
2、在程序中設置副本節點負責讀操作,如下代碼:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
public
class
TestMongoDBReplSetReadSplit {
public
static
void
main(String[] args) {
try
{
List<ServerAddress> addresses =
new
ArrayList<ServerAddress>();
ServerAddress address1 =
new
ServerAddress(
"192.168.1.136"
,
27017
);
ServerAddress address2 =
new
ServerAddress(
"192.168.1.137"
,
27017
);
ServerAddress address3 =
new
ServerAddress(
"192.168.1.138"
,
27017
);
addresses.add(address1);
addresses.add(address2);
addresses.add(address3);
MongoClient client =
new
MongoClient(addresses);
DB db = client.getDB(
"test"
);
DBCollection coll = db.getCollection(
"testdb"
);
BasicDBObject object =
new
BasicDBObject();
object.append(
"test2"
,
"testval2"
);
//讀操作從副本節點讀取
ReadPreference preference = ReadPreference. secondary();
DBObject dbObject = coll.findOne(object,
null
, preference);
System. out .println(dbObject);
}
catch
(Exception e) {
e.printStackTrace();
}
}
}
|
讀參數除了secondary一共還有五個參數:primary、primaryPreferred、secondary、secondaryPreferred、nearest。
primary:默認參數,只從主節點上進行讀取操作;
primaryPreferred:大部分從主節點上讀取數據,只有主節點不可用時從secondary節點讀取數據。
secondary:只從secondary節點上進行讀取操作,存在的問題是secondary節點的數據會比primary節點數據“舊”。
secondaryPreferred:優先從secondary節點進行讀取操作,secondary節點不可用時從主節點讀取數據;
nearest:不管是主節點、secondary節點,從網絡延遲最低的節點上讀取數據。
好,讀寫分離做好我們可以數據分流,減輕壓力解決了“主節點的讀寫壓力過大如何解決?”這個問題。不過當我們的副本節點增多時,主節點的復制壓力會加大有什么辦法解決嗎?mongodb早就有了相應的解決方案。
其中的仲裁節點不存儲數據,只是負責故障轉移的群體投票,這樣就少了數據復制的壓力。是不是想得很周到啊,一看mongodb的開發兄弟熟知大數據架構體系,其實不只是主節點、副本節點、仲裁節點,還有Secondary-Only、Hidden、Delayed、Non-Voting。
Secondary-Only:不能成為primary節點,只能作為secondary副本節點,防止一些性能不高的節點成為主節點。
Hidden:這類節點是不能夠被客戶端制定IP引用,也不能被設置為主節點,但是可以投票,一般用於備份數據。
Delayed:可以指定一個時間延遲從primary節點同步數據。主要用於備份數據,如果實時同步,誤刪除數據馬上同步到從節點,恢復又恢復不了。
Non-Voting:沒有選舉權的secondary節點,純粹的備份數據節點。
到此整個mongodb副本集搞定了兩個問題:
- 主節點掛了能否自動切換連接?目前需要手工切換。
- 主節點的讀寫壓力過大如何解決?
還有這兩個問題后續解決:
- 從節點每個上面的數據都是對數據庫全量拷貝,從節點壓力會不會過大?
- 數據壓力大到機器支撐不了的時候能否做到自動擴展?
做了副本集發現又一些問題:
- 副本集故障轉移,主節點是如何選舉的?能否手動干涉下架某一台主節點。
- 官方說副本集數量最好是奇數,為什么?
- mongodb副本集是如何同步的?如果同步不及時會出現什么情況?會不會出現不一致性?
- mongodb的故障轉移會不會無故自動發生?什么條件會觸發?頻繁觸發可能會帶來系統負載加重






