MongoDB 集群 config server 查詢超時導致 mongos 集群寫入失敗


環境

OS:CentOS 7.x
DB:MongoDB 3.6.12
集群模式:mongod-shard1 *3 + mongod-shard2 *3 + mongod-conf-shard *3 + mongos *3

業務錯誤日志

caused by :: NetworkInterfaceExceededTimeLimit: Operation time out on server ****:27018
....
at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:107)

故障復現


在一個集合執行 insert 操作的時候,提示 NetworkInterfaceExceededTimeLimit: Operation time out
在另一個不存在的集合執行就可以正常操作。

懷疑 config server 查詢分片信息的時候有問題。

排查問題

2020-07-07T09:55:36.605+0800 D REPL     [conn52850] Required snapshot optime: { ts: Timestamp(1594086936, 7), t: 19 } is not yet part of the current 'committed' snapshot: { ts: Timestamp(1594086936, 3), t: 19 }
2020-07-07T09:55:36.605+0800 D REPL     [conn35081] Required snapshot optime: { ts: Timestamp(1594086936, 7), t: 19 } is not yet part of the current 'committed' snapshot: { ts: Timestamp(1594086936, 3), t: 19 }
2020-07-07T09:55:37.084+0800 D REPL     [conn72545] waitUntilOpTime: waiting for optime:{ ts: Timestamp(1594086683, 2), t: 20 } to be in a snapshot -- current snapshot: { ts: Timestamp(1594086936, 7), t: 19 }
2020-07-07T09:55:37.187+0800 I COMMAND  [conn72537] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp(1594086804, 1), t: 20 } }, maxTimeMS: 30000, $readPreference: { mode: "nearest" }, $replData: 1, $clusterTime: { clusterTime: Timestamp(1594086903, 1), signature: { hash: BinData(0, CD6262BF59D2AAC318183C6109F3B31DEE2E1837), keyId: 6807014219125358676 } }, $configServerState: { opTime: { ts: Timestamp(1594086804, 1), t: 20 } }, $db: "config" }
2020-07-07T09:55:37.187+0800 I COMMAND  [conn72537] command config.$cmd command: find { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp(1594086804, 1), t: 20 } }, maxTimeMS: 30000, $readPreference: { mode: "nearest" }, $replData: 1, $clusterTime: { clusterTime: Timestamp(1594086903, 1), signature: { hash: BinData(0, CD6262BF59D2AAC318183C6109F3B31DEE2E1837), keyId: 6807014219125358676 } }, $configServerState: { opTime: { ts: Timestamp(1594086804, 1), t: 20 } }, $db: "config" } numYields:0 reslen:517 locks:{} protocol:op_msg 30009ms
2020-07-07T09:55:37.187+0800 I NETWORK  [conn72537] end connection *.*.*.*:45296 (34 connections now open)
2020-07-07T09:55:40.425+0800 D REPL     [conn72539] Required snapshot optime: { ts: Timestamp(1594086940, 1), t: 19 } is not yet part of the current 'committed' snapshot: { ts: Timestamp(1594086936, 7), t: 19 }

在 config server 的日志里找到一行 Command on database config timed out waiting for read concern to be satisfied.
具體原因未知,但是顯示在 config server 上執行 find 操作的時候,執行超時。 和業務日志報錯限制一致。

重啟 config server PRIMARY 節點,觸發 config server 副本集SECONDARY節點的重新選舉機制。
故障恢復。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM