概要:MongoDB元數據損壞,通過自身--repair無法修復,使用更高版本--repair修復。
背景
MongoDB 3.2.6單點,因存儲空間耗盡停止,修復存儲空間后啟動失敗。
以下內容使用v3.2.22替代v3.2.6進行復盤,使用數據來自故障現場的備份,即v3.2.6的數據,特此說明。
啟動日志:
2021-10-16T16:55:34.784+0800 I CONTROL [initandlisten] MongoDB starting : pid=13724 port=27017 dbpath=/home/voyager/repair/data2 64-bit host=node
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] db version v3.2.22
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] git version: 105acca0d443f9a47c1a5bd608fd7133840a58dd
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] allocator: tcmalloc
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] modules: none
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] build environment:
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] distmod: rhel70
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] distarch: x86_64
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] target_arch: x86_64
2021-10-16T16:55:34.785+0800 I CONTROL [initandlisten] options: { config: "./mongo.conf", storage: { dbPath: "/home/voyager/repair/data2" } }
2021-10-16T16:55:34.803+0800 I - [initandlisten] Detected data files in /home/voyager/repair/data2 created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2021-10-16T16:55:34.803+0800 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=4G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),verbose=(recovery_progress),
2021-10-16T16:55:34.839+0800 E STORAGE [initandlisten] WiredTiger (0) [1634374534:839158][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 102400: block header checksum of 2702779842 doesn't match expected checksum of 3569473964
2021-10-16T16:55:34.839+0800 E STORAGE [initandlisten] WiredTiger (0) [1634374534:839200][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2021-10-16T16:55:34.839+0800 E STORAGE [initandlisten] WiredTiger (-31804) [1634374534:839211][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2021-10-16T16:55:34.839+0800 I - [initandlisten] Fatal Assertion 28558
2021-10-16T16:55:34.839+0800 I - [initandlisten]
***aborting after fassert() failure
2021-10-16T16:55:34.855+0800 F - [initandlisten] Got signal: 6 (Aborted).
0x1354f62 0x13540b9 0x13548c2 0x7f2dbee715d0 0x7f2dbeacb207 0x7f2dbeacc8f8 0x12d5122 0x10c8943 0x9790db 0x9792d8 0x9794a6 0x19ea16c 0x19ea6d3 0x19e6f69 0x19eb969 0x1a04bd5 0x1a3e08b 0x1ac40ed 0x1ac4609 0x1ac482b 0x1a4b876 0x1ac0fed 0x1a8576f 0x1a8586e 0x1a3a81a 0x10afe36 0x10ac083 0xfd4a6e 0x9c4510 0x979dbd 0x7f2dbeab73d5 0x9bfed7
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F54F62","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F540B9"},{"b":"400000","o":"F548C2"},{"b":"7F2DBEE62000","o":"F5D0"},{"b":"7F2DBEA95000","o":"36207","s":"gsignal"},{"b":"7F2DBEA95000","o":"378F8","s":"abort"},{"b":"400000","o":"ED5122","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"CC8943"},{"b":"400000","o":"5790DB","s":"__wt_eventv"},{"b":"400000","o":"5792D8","s":"__wt_err"},{"b":"400000","o":"5794A6","s":"__wt_panic"},{"b":"400000","o":"15EA16C","s":"__wt_block_extlist_read"},{"b":"400000","o":"15EA6D3","s":"__wt_block_extlist_read_avail"},{"b":"400000","o":"15E6F69","s":"__wt_block_checkpoint_load"},{"b":"400000","o":"15EB969"},{"b":"400000","o":"1604BD5","s":"__wt_btree_open"},{"b":"400000","o":"163E08B","s":"__wt_conn_btree_open"},{"b":"400000","o":"16C40ED","s":"__wt_session_get_btree"},{"b":"400000","o":"16C4609","s":"__wt_session_get_btree"},{"b":"400000","o":"16C482B","s":"__wt_session_get_btree_ckpt"},{"b":"400000","o":"164B876","s":"__wt_curfile_open"},{"b":"400000","o":"16C0FED"},{"b":"400000","o":"168576F","s":"__wt_metadata_cursor_open"},{"b":"400000","o":"168586E","s":"__wt_metadata_cursor"},{"b":"400000","o":"163A81A","s":"wiredtiger_open"},{"b":"400000","o":"CAFE36","s":"_ZN5mongo18WiredTigerKVEngineC2ERKSsS2_S2_mbbb"},{"b":"400000","o":"CAC083"},{"b":"400000","o":"BD4A6E","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"400000","o":"5C4510"},{"b":"400000","o":"579DBD","s":"main"},{"b":"7F2DBEA95000","o":"223D5","s":"__libc_start_main"},{"b":"400000","o":"5BFED7"}],"processInfo":{ "mongodbVersion" : "3.2.22", "gitVersion" : "105acca0d443f9a47c1a5bd608fd7133840a58dd", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-957.el7.x86_64", "version" : "#1 SMP Thu Nov 8 23:39:32 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "8AA2D2ED91A86A9E3F04BAF741D95CC01B610ED4" }, { "b" : "7FFDB8D3D000", "elfType" : 3, "buildId" : "163C2DC43405427478788BAD0AFD537A7ACF7A13" }, { "b" : "7F2DBFE05000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "3B305C3BA17FE394862E749763F2956C9C890C2E" }, { "b" : "7F2DBF9A2000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "4CF1939F660008CFA869D8364651F31AACD2C1C4" }, { "b" : "7F2DBF79A000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "EFDE2029C9A4A20BE5B8D8AE7E6551FF9B5755D2" }, { "b" : "7F2DBF596000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "67AD3498AC7DE3EAB952A243094DF5C12A21CD7D" }, { "b" : "7F2DBF294000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "918D3696BF321AA8D32950AB2AB8D0F1B21AC907" }, { "b" : "7F2DBF07E000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "6B4F3D896CD0F06FCB3DEF0245F204ECE3220D7E" }, { "b" : "7F2DBEE62000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "3D9441083D079DC2977F1BD50C8068D11767232D" }, { "b" : "7F2DBEA95000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "3C61131D1DAC9DA79B73188E7702BEF786C2AD54" }, { "b" : "7F2DC0077000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5DA2D47925497B2F5875A7D8D1799A1227E2FDE4" }, { "b" : "7F2DBE848000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "E2AA8CA3D3164E7DBEC293BFA0B55D2B10DAC05D" }, { "b" : "7F2DBE55F000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "3EE7267AF7BFD3B132E6A222D997DA09C96C90DD" }, { "b" : "7F2DBE35B000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "67E935BFABA2C914C01156B88947DD515EA51170" }, { "b" : "7F2DBE128000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "82E28CACB60C27CD6F14A6D2268F0CFF621664D0" }, { "b" : "7F2DBDF12000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "B9D5F73428BD6AD68C96986B57BEA3B7CEDB9745" }, { "b" : "7F2DBDD02000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "4F5FBB2087BE132892467C4E7A46A3D07E5DA40B" }, { "b" : "7F2DBDAFE000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F2DBD8E5000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "4C488F6E7044BB966162C1F7081ABBA6EBB2B485" }, { "b" : "7F2DBD6BE000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "D2DD4DA3FDE1477D25BFFF80F3A25FDB541A8179" }, { "b" : "7F2DBD45C000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "9CA3D11F018BEEB719CDB34BE800BF1641350D0A" } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1354f62]
mongod(+0xF540B9) [0x13540b9]
mongod(+0xF548C2) [0x13548c2]
libpthread.so.0(+0xF5D0) [0x7f2dbee715d0]
libc.so.6(gsignal+0x37) [0x7f2dbeacb207]
libc.so.6(abort+0x148) [0x7f2dbeacc8f8]
mongod(_ZN5mongo13fassertFailedEi+0x82) [0x12d5122]
mongod(+0xCC8943) [0x10c8943]
mongod(__wt_eventv+0x440) [0x9790db]
mongod(__wt_err+0x8D) [0x9792d8]
mongod(__wt_panic+0x2E) [0x9794a6]
mongod(__wt_block_extlist_read+0x6C) [0x19ea16c]
mongod(__wt_block_extlist_read_avail+0x33) [0x19ea6d3]
mongod(__wt_block_checkpoint_load+0x359) [0x19e6f69]
mongod(+0x15EB969) [0x19eb969]
mongod(__wt_btree_open+0xBD5) [0x1a04bd5]
mongod(__wt_conn_btree_open+0x16B) [0x1a3e08b]
mongod(__wt_session_get_btree+0xDD) [0x1ac40ed]
mongod(__wt_session_get_btree+0x5F9) [0x1ac4609]
mongod(__wt_session_get_btree_ckpt+0xAB) [0x1ac482b]
mongod(__wt_curfile_open+0x86) [0x1a4b876]
mongod(+0x16C0FED) [0x1ac0fed]
mongod(__wt_metadata_cursor_open+0x5F) [0x1a8576f]
mongod(__wt_metadata_cursor+0x7E) [0x1a8586e]
mongod(wiredtiger_open+0x191A) [0x1a3a81a]
mongod(_ZN5mongo18WiredTigerKVEngineC2ERKSsS2_S2_mbbb+0x5B6) [0x10afe36]
mongod(+0xCAC083) [0x10ac083]
mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x59E) [0xfd4a6e]
mongod(+0x5C4510) [0x9c4510]
mongod(main+0x15D) [0x979dbd]
libc.so.6(__libc_start_main+0xF5) [0x7f2dbeab73d5]
mongod(+0x5BFED7) [0x9bfed7]
----- END BACKTRACE -----
Aborted (core dumped)
關鍵信息:
2021-10-16T16:55:34.839+0800 E STORAGE [initandlisten] WiredTiger (0) [1634374534:839158][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 102400: block header checksum of 2702779842 doesn't match expected checksum of 3569473964
2021-10-16T16:55:34.839+0800 E STORAGE [initandlisten] WiredTiger (0) [1634374534:839200][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2021-10-16T16:55:34.839+0800 E STORAGE [initandlisten] WiredTiger (-31804) [1634374534:839211][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2021-10-16T16:55:34.839+0800 I - [initandlisten] Fatal Assertion 28558
錯誤日志中已經表明是“WiredTiger.wt”損壞,它屬於元數據,因此這起事件為元數據損壞。
使用mongod --repair修復時,報錯與啟動時一致。
數據修復方法
下載和安裝v4.0的最新版本(測試的時候是4.0.27),用這個版本的mongod --repair修復損壞的數據,具體為:
mongod --dbpath <損壞的數據目錄> --repair
修復日志關鍵信息:
[voyager@node repair]$ ./mongodb-4.0.27/bin/mongod -f ./mongo.conf --repair
2021-10-16T17:09:43.658+0800 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2021-10-16T17:09:43.662+0800 I CONTROL [initandlisten] MongoDB starting : pid=15243 port=27017 dbpath=/home/voyager/repair/data2 64-bit host=node
2021-10-16T17:09:43.662+0800 I CONTROL [initandlisten] db version v4.0.27
2021-10-16T17:09:43.662+0800 I CONTROL [initandlisten] git version: d47b151b55f286546e7c7c98888ae0577856ca20
2021-10-16T17:09:43.663+0800 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2021-10-16T17:09:43.663+0800 I CONTROL [initandlisten] allocator: tcmalloc
2021-10-16T17:09:43.663+0800 I CONTROL [initandlisten] modules: none
2021-10-16T17:09:43.663+0800 I CONTROL [initandlisten] build environment:
2021-10-16T17:09:43.663+0800 I CONTROL [initandlisten] distmod: rhel70
2021-10-16T17:09:43.663+0800 I CONTROL [initandlisten] distarch: x86_64
2021-10-16T17:09:43.663+0800 I CONTROL [initandlisten] target_arch: x86_64
2021-10-16T17:09:43.663+0800 I CONTROL [initandlisten] options: { config: "./mongo.conf", repair: true, storage: { dbPath: "/home/voyager/repair/data2" } }
2021-10-16T17:09:43.675+0800 I STORAGE [initandlisten] Detected data files in /home/voyager/repair/data2 created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
......
2021-10-16T17:09:58.111+0800 W STORAGE [initandlisten] Modifications made by repair:
2021-10-16T17:09:58.111+0800 W STORAGE [initandlisten] WiredTiger metadata salvaged
2021-10-16T17:09:58.115+0800 I STORAGE [initandlisten] Finished adjusting the table logging settings for existing WiredTiger tables
......
2021-10-16T17:10:00.174+0800 I STORAGE [initandlisten] Starting to check the table logging settings for existing WiredTiger tables
2021-10-16T17:10:01.592+0800 I STORAGE [initandlisten] shutdown: removing fs lock...
2021-10-16T17:10:01.592+0800 I - [initandlisten] Dropping the scope cache for shutdown
2021-10-16T17:10:01.592+0800 I CONTROL [initandlisten] now exiting
2021-10-16T17:10:01.592+0800 I CONTROL [initandlisten] shutting down with code:0
[END] 2021/10/16 17:10:09
使用修復后的數據首次啟動v3.2.22會有非常多的初始化和檢查導致的日志,這個過程結束后將成功啟動:
......
2021-10-16T17:18:43.789+0800 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2021-10-16T17:18:43.789+0800 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/home/voyager/repair/data2/diagnostic.data'
2021-10-16T17:18:43.790+0800 I NETWORK [initandlisten] waiting for connections on port 27017
過程和總結
數據修復的過程並不是很順利,期間查找到一些資料,但都是基於修復原理的論述,按照這些原理我不能快速的達成修復數據的目的,能夠看到的修復成功的案例都來自官方的幫助,官方基於損壞的數據和錯誤日志盡力修復元數據,幫助客戶的進程成功啟動,於是我也向官方尋求幫助,最終得到回復並成功修復了數據,梳理下時間線:
- 發現數據損壞,並確認是元數據損壞,--repair修復失敗,保留現場數據;
- 在網絡上尋求幫助,無果,期間嘗試使用wt工具,因一些運行報錯放棄(主要因為數據不是非常重要,並且還有其他途徑彌補數據丟失帶來的損失,並且這個時候已經有了處理這個事件的方案,可以放棄數據恢復);
- 出於好奇嘗試恢復數據,在官方服務SERVER-19815中找到線索,嘗試使用v4.2.7 --repair方式恢復,報錯失敗,提示降低版本;
- 尋求官方服務的幫助,申請服務單SERVER-60703,大概24小時后,官方回復並給出建議,首先建議嘗試使用最新版本--repair修復,然后遵從官方建議開始嘗試:
a. v5.0.3提升版本過高,建議v4.2和更早的版本;
b. 因為一開始就嘗試過v4.2,所以直接測試v4.0.27,然后成功。
如果你閱讀了SERVER-19815可以發現,官方說“FIX VERSIONS: This issue is fixed in MongoDB 4.0.3 as well as in 4.1.4, and will be available in the 4.2 production release.”,這也是我在嘗試v4.2.7之后沒有進行更多嘗試的原因。
有幾個點需要注意:
- 首先備份損壞的數據,防止二次破壞;
- 評估哪些數據遭到損壞,然后嘗試恢復;
- 官方的支持是及時的,但是有時差存在。
一些參考,幫你了解數據損壞的更多細節:
WiredTiger存儲引擎之四:WT工具編譯與元數據文件剖析
MongoDB無法啟動,如何恢復數據?
MongoDB 無法啟動,WiredTiger 如何恢復數據(二)?