Mongodb 故障分享 初始化時"errmsg" : "exception: new file allocation failure" 並且長時間處於STARTUP2


Hello,大家下午好。

近幾天的項目有點趕,所以耽誤了更新。現在給大家分享下,在安裝mongodb的過程中,遇到的故障一則。其實很小白的問題,當時遇到這個問題的時候比較心慌,浪費了很多時間,跟大家分享下解決的思路吧。

先描述下我的環境,請參照前一篇博客,利用腳本初始化出4個節點(由於實驗原因,我使用了單台的服務器)。

首先我們啟動這四個初始化節點。(具體的文件配置,請參照前文,初始化的內容,這里僅用/etc/mongodb.conf為例)

[root@1]# cat /etc/mongodb.conf
dbpath=/usr/local/mongodb-3.0.2/data
logpath=/usr/local/mongodb-3.0.2/logs/mongo.log
pidfilepath=/var/run/mongodb/mongodb.pid
unixSocketPrefix=/usr/local/mongodb-3.0.2/socket
directoryperdb=true
replSet=picture
shardsvr=true
logappend=true
port = 27017
maxConns=20000
oplogSize=30720
fork=true
nohttpinterface=true
nojournal=true

[root@1]# mongod --config=/etc/mongodb.conf
about to fork child process, waiting until server is ready for connections.
forked process: 32560
child process started successfully, parent exiting
[root@1]# mongod --config=/etc/mongodb1.conf
about to fork child process, waiting until server is ready for connections.
forked process: 32574
child process started successfully, parent exiting
[root@1]# mongod --config=/etc/mongodb2.conf
about to fork child process, waiting until server is ready for connections.
forked process: 32588
child process started successfully, parent exiting
[root@1]# mongod --config=/etc/mongodb3.conf
about to fork child process, waiting until server is ready for connections.
forked process: 32602
child process started successfully, parent exiting

[root@1]# ps -ef | grep mongo                                  //我們可以看到四個實例已經啟動。下一步我們將進行角色分配(即副本集搭建)
root     32560     1  0 15:43 ?        00:00:00 mongod --config=/etc/mongodb.conf
root     32574     1  0 15:43 ?        00:00:00 mongod --config=/etc/mongodb1.conf
root     32588     1  0 15:43 ?        00:00:00 mongod --config=/etc/mongodb2.conf
root     32602     1  0 15:43 ?        00:00:00 mongod --config=/etc/mongodb3.conf
root     32616 28203  0 15:46 pts/0    00:00:00 grep mongo
[root@1]#

[root@1]# mongo
MongoDB shell version: 3.0.2
connecting to: test
Server has startup warnings:
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten]
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten]
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten]
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten]
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten] ** WARNING: soft rlimits too low. rlimits set to 65535 processes, 655350 files. Number of processes should be at least 327675 : 0.5 times number of files.
2016-01-25T15:43:46.645+0800 I CONTROL  [initandlisten]
> use admin                                            //首先切換到admindb,進行副本集配置
switched to db admin
> config = {_id: 'picture', members: [
...       {_id: 0, host: '$IP:27017'},
...       {_id: 1, host: '$IP:37017'},
...       {_id: 2, host: '$IP:47017'},
...       {_id: 3, host: '$IP:30000',arbiterOnly:true}]                //此時配置將產生一個主節點,兩個從節點,一個仲裁節點
... }
{
    "_id" : "picture",
    "members" : [
        {
            "_id" : 0,
            "host" : "$IP:27017"
        },
        {
            "_id" : 1,
            "host" : "$IP:37017"
        },
        {
            "_id" : 2,
            "host" : "$IP:47017"
        },
        {
            "_id" : 3,
            "host" : "$IP:30000",
            "arbiterOnly" : true
        }
    ]
}
> rs.initiate(config);
{
    "errmsg" : "exception: new file allocation failure",                   //大家可以看到初始化失敗
    "code" : 12520,
    "ok" : 0
}

查詢下各個節點狀態,悲劇了,長時間停留在STARTUP2狀態

picture:OTHER> rs.status()
{
    "set" : "picture",
    "date" : ISODate("2016-01-25T07:51:15.665Z"),
    "myState" : 5,
    "members" : [
        {
            "_id" : 0,
            "name" : "$IP:27017",
            "health" : 1,
            "state" : 5,
            "stateStr" : "STARTUP2",
            "uptime" : 449,
            "optime" : Timestamp(0, 0),
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
            "configVersion" : 1,
            "self" : true
        },
        {
            "_id" : 1,
            "name" : "$IP:37017",
            "health" : 1,
            "state" : 5,
            "stateStr" : "STARTUP2",
            "uptime" : 154,
            "optime" : Timestamp(0, 0),
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
            "lastHeartbeat" : ISODate("2016-01-25T07:51:15.326Z"),
            "lastHeartbeatRecv" : ISODate("2016-01-25T07:51:15.326Z"),
            "pingMs" : 0,
            "configVersion" : 1
        },
        {
            "_id" : 2,
            "name" : "$IP:47017",
            "health" : 1,
            "state" : 5,
            "stateStr" : "STARTUP2",
            "uptime" : 154,
            "optime" : Timestamp(0, 0),
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
            "lastHeartbeat" : ISODate("2016-01-25T07:51:15.326Z"),
            "lastHeartbeatRecv" : ISODate("2016-01-25T07:51:15.324Z"),
            "pingMs" : 0,
            "configVersion" : 1
        },
        {
            "_id" : 3,
            "name" : "$IP:30000",
            "health" : 1,
            "state" : 7,
            "stateStr" : "ARBITER",
            "uptime" : 154,
            "lastHeartbeat" : ISODate("2016-01-25T07:51:15.324Z"),
            "lastHeartbeatRecv" : ISODate("2016-01-25T07:51:15.324Z"),
            "pingMs" : 0,
            "configVersion" : 1
        }
    ],
    "ok" : 1
}
picture:OTHER> //大家發現本節點狀態為other,………… ,很久很久以后  依然如此。

 

我們的解決思路是什么?

首先,想跟大家分享下mongodb的整個的啟動和副本集初始化過程(可以關注下一章節的博客)。我們知道  STARTUP是其中的一個階段。

然后,出現問題。其實第一個想法就是看log。那么我們看下log告訴我們什么了。

【$IP:27017-log】

2016-01-25T15:48:41.275+0800 I REPL     [conn1] replSetInitiate admin command received from client
2016-01-25T15:48:41.277+0800 I REPL     [conn1] replSet replSetInitiate config object with 4 members parses ok
2016-01-25T15:48:41.278+0800 I NETWORK  [initandlisten] connection accepted from $IP:38296 #2 (2 connections now open)
2016-01-25T15:48:41.279+0800 I NETWORK  [initandlisten] connection accepted from $IP:38299 #3 (3 connections now open)
2016-01-25T15:48:41.279+0800 I NETWORK  [initandlisten] connection accepted from $IP:38300 #4 (4 connections now open)
2016-01-25T15:48:41.280+0800 I REPL     [ReplicationExecutor] New replica set config in use: { _id: "picture", version: 1, members: [ { _id: 0, host: "$IP:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "$IP:37017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "$IP:47017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 3, host: "$IP:30000", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatTimeoutSecs: 10, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
2016-01-25T15:48:41.280+0800 I REPL     [ReplicationExecutor] This node is $IP:27017 in the config
2016-01-25T15:48:41.280+0800 I REPL     [ReplicationExecutor] transition to STARTUP2
2016-01-25T15:48:41.280+0800 I REPL     [conn1] ******
2016-01-25T15:48:41.280+0800 I REPL     [conn1] creating replication oplog of size: 30720MB...
2016-01-25T15:48:41.281+0800 I STORAGE  [FileAllocator] allocating new datafile /usr/local/mongodb-3.0.2/data/local/local.1, filling with zeroes...
2016-01-25T15:48:41.281+0800 I REPL     [ReplicationExecutor] Member $IP:47017 is now in state STARTUP
2016-01-25T15:48:41.281+0800 I REPL     [ReplicationExecutor] Member $IP:37017 is now in state STARTUP
2016-01-25T15:48:41.281+0800 I REPL     [ReplicationExecutor] Member $IP:30000 is now in state STARTUP
2016-01-25T15:48:41.283+0800 I STORAGE  [FileAllocator] done allocating datafile /usr/local/mongodb-3.0.2/data/local/local.1, size: 2047MB,  took 0.002 secs
2016-01-25T15:48:41.283+0800 I STORAGE  [FileAllocator] allocating new datafile /usr/local/mongodb-3.0.2/data/local/local.2, filling with zeroes...
2016-01-25T15:48:41.286+0800 I STORAGE  [FileAllocator] done allocating datafile /usr/local/mongodb-3.0.2/data/local/local.2, size: 2047MB,  took 0.002 secs
2016-01-25T15:48:41.286+0800 I STORAGE  [FileAllocator] allocating new datafile /usr/local/mongodb-3.0.2/data/local/local.3, filling with zeroes...
2016-01-25T15:48:41.288+0800 I STORAGE  [FileAllocator] done allocating datafile /usr/local/mongodb-3.0.2/data/local/local.3, size: 2047MB,  took 0.002 secs
2016-01-25T15:48:41.288+0800 I STORAGE  [FileAllocator] allocating new datafile /usr/local/mongodb-3.0.2/data/local/local.4, filling with zeroes...
2016-01-25T15:48:41.291+0800 I STORAGE  [FileAllocator] done allocating datafile /usr/local/mongodb-3.0.2/data/local/local.4, size: 2047MB,  took 0.002 secs
2016-01-25T15:48:41.291+0800 I STORAGE  [FileAllocator] allocating new datafile /usr/local/mongodb-3.0.2/data/local/local.5, filling with zeroes...
2016-01-25T15:48:41.293+0800 I STORAGE  [FileAllocator] done allocating datafile /usr/local/mongodb-3.0.2/data/local/local.5, size: 2047MB,  took 0.002 secs
2016-01-25T15:48:41.293+0800 I STORAGE  [FileAllocator] allocating new datafile /usr/local/mongodb-3.0.2/data/local/local.6, filling with zeroes...
2016-01-25T15:48:41.298+0800 I STORAGE  [FileAllocator] done allocating datafile /usr/local/mongodb-3.0.2/data/local/local.6, size: 2047MB,  took 0.004 secs
2016-01-25T15:48:41.298+0800 I STORAGE  [FileAllocator] allocating new datafile /usr/local/mongodb-3.0.2/data/local/local.7, filling with zeroes...
2016-01-25T15:48:41.301+0800 I STORAGE  [FileAllocator] done allocating datafile /usr/local/mongodb-3.0.2/data/local/local.7, size: 2047MB,  took 0.002 secs
2016-01-25T15:48:41.301+0800 I STORAGE  [FileAllocator] allocating new datafile /usr/local/mongodb-3.0.2/data/local/local.8, filling with zeroes...
2016-01-25T15:48:41.303+0800 I STORAGE  [FileAllocator] done allocating datafile /usr/local/mongodb-3.0.2/data/local/local.8, size: 2047MB,  took 0.002 secs
2016-01-25T15:48:41.304+0800 I STORAGE  [FileAllocator] allocating new datafile /usr/local/mongodb-3.0.2/data/local/local.9, filling with zeroes...
2016-01-25T15:48:41.318+0800 I STORAGE  [FileAllocator] FileAllocator: posix_fallocate  //發現日志至此不動了

【$IP:37017-log】

2016-01-25T15:51:59.127+0800 I STORAGE  [FileAllocator] FileAllocator: posix_fallocate failed: errno:28 No space left on device falling back

//Congratulations!!!!!~~~

沒有空間!!我們查看下系統的空間使用情況。

[root@1 logs]# df -TH
Filesystem     Type   Size  Used Avail Use% Mounted on
/dev/vda1      ext4    22G   21G     0 100% /
tmpfs          tmpfs  8.4G     0  8.4G   0% /dev/shm
/dev/vdb       ext4    22G   21G     0 100% /
/dev/vdb       ext4   529G  208M  502G   1% /data
[root@1 logs]#

就是這么簡單的問題。這個是一位好朋友在運行了我的腳本之后,反饋給我的。分享給大家。

其實在實踐中,能夠遇到各種預期外的狀況,很開心。

希望大家能多多交流。

謝謝,反饋給我這個問題的張同學~~撒花~~

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM