MongoDB数据损坏恢复



概要:MongoDB元数据损坏,通过自身--repair无法修复,使用更高版本--repair修复。

背景

MongoDB 3.2.6单点,因存储空间耗尽停止,修复存储空间后启动失败。

以下内容使用v3.2.22替代v3.2.6进行复盘,使用数据来自故障现场的备份,即v3.2.6的数据,特此说明。

启动日志:

2021-10-16T16:55:34.784+0800 I CONTROL  [initandlisten] MongoDB starting : pid=13724 port=27017 dbpath=/home/voyager/repair/data2 64-bit host=node
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten] db version v3.2.22
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten] git version: 105acca0d443f9a47c1a5bd608fd7133840a58dd
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten] allocator: tcmalloc
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten] modules: none
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten] build environment:
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten]     distmod: rhel70
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten]     distarch: x86_64
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten]     target_arch: x86_64
2021-10-16T16:55:34.785+0800 I CONTROL  [initandlisten] options: { config: "./mongo.conf", storage: { dbPath: "/home/voyager/repair/data2" } }
2021-10-16T16:55:34.803+0800 I -        [initandlisten] Detected data files in /home/voyager/repair/data2 created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2021-10-16T16:55:34.803+0800 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=4G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),verbose=(recovery_progress),
2021-10-16T16:55:34.839+0800 E STORAGE  [initandlisten] WiredTiger (0) [1634374534:839158][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 102400: block header checksum of 2702779842 doesn't match expected checksum of 3569473964
2021-10-16T16:55:34.839+0800 E STORAGE  [initandlisten] WiredTiger (0) [1634374534:839200][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2021-10-16T16:55:34.839+0800 E STORAGE  [initandlisten] WiredTiger (-31804) [1634374534:839211][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2021-10-16T16:55:34.839+0800 I -        [initandlisten] Fatal Assertion 28558
2021-10-16T16:55:34.839+0800 I -        [initandlisten] 

***aborting after fassert() failure


2021-10-16T16:55:34.855+0800 F -        [initandlisten] Got signal: 6 (Aborted).

 0x1354f62 0x13540b9 0x13548c2 0x7f2dbee715d0 0x7f2dbeacb207 0x7f2dbeacc8f8 0x12d5122 0x10c8943 0x9790db 0x9792d8 0x9794a6 0x19ea16c 0x19ea6d3 0x19e6f69 0x19eb969 0x1a04bd5 0x1a3e08b 0x1ac40ed 0x1ac4609 0x1ac482b 0x1a4b876 0x1ac0fed 0x1a8576f 0x1a8586e 0x1a3a81a 0x10afe36 0x10ac083 0xfd4a6e 0x9c4510 0x979dbd 0x7f2dbeab73d5 0x9bfed7
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F54F62","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F540B9"},{"b":"400000","o":"F548C2"},{"b":"7F2DBEE62000","o":"F5D0"},{"b":"7F2DBEA95000","o":"36207","s":"gsignal"},{"b":"7F2DBEA95000","o":"378F8","s":"abort"},{"b":"400000","o":"ED5122","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"CC8943"},{"b":"400000","o":"5790DB","s":"__wt_eventv"},{"b":"400000","o":"5792D8","s":"__wt_err"},{"b":"400000","o":"5794A6","s":"__wt_panic"},{"b":"400000","o":"15EA16C","s":"__wt_block_extlist_read"},{"b":"400000","o":"15EA6D3","s":"__wt_block_extlist_read_avail"},{"b":"400000","o":"15E6F69","s":"__wt_block_checkpoint_load"},{"b":"400000","o":"15EB969"},{"b":"400000","o":"1604BD5","s":"__wt_btree_open"},{"b":"400000","o":"163E08B","s":"__wt_conn_btree_open"},{"b":"400000","o":"16C40ED","s":"__wt_session_get_btree"},{"b":"400000","o":"16C4609","s":"__wt_session_get_btree"},{"b":"400000","o":"16C482B","s":"__wt_session_get_btree_ckpt"},{"b":"400000","o":"164B876","s":"__wt_curfile_open"},{"b":"400000","o":"16C0FED"},{"b":"400000","o":"168576F","s":"__wt_metadata_cursor_open"},{"b":"400000","o":"168586E","s":"__wt_metadata_cursor"},{"b":"400000","o":"163A81A","s":"wiredtiger_open"},{"b":"400000","o":"CAFE36","s":"_ZN5mongo18WiredTigerKVEngineC2ERKSsS2_S2_mbbb"},{"b":"400000","o":"CAC083"},{"b":"400000","o":"BD4A6E","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"400000","o":"5C4510"},{"b":"400000","o":"579DBD","s":"main"},{"b":"7F2DBEA95000","o":"223D5","s":"__libc_start_main"},{"b":"400000","o":"5BFED7"}],"processInfo":{ "mongodbVersion" : "3.2.22", "gitVersion" : "105acca0d443f9a47c1a5bd608fd7133840a58dd", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-957.el7.x86_64", "version" : "#1 SMP Thu Nov 8 23:39:32 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "8AA2D2ED91A86A9E3F04BAF741D95CC01B610ED4" }, { "b" : "7FFDB8D3D000", "elfType" : 3, "buildId" : "163C2DC43405427478788BAD0AFD537A7ACF7A13" }, { "b" : "7F2DBFE05000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "3B305C3BA17FE394862E749763F2956C9C890C2E" }, { "b" : "7F2DBF9A2000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "4CF1939F660008CFA869D8364651F31AACD2C1C4" }, { "b" : "7F2DBF79A000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "EFDE2029C9A4A20BE5B8D8AE7E6551FF9B5755D2" }, { "b" : "7F2DBF596000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "67AD3498AC7DE3EAB952A243094DF5C12A21CD7D" }, { "b" : "7F2DBF294000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "918D3696BF321AA8D32950AB2AB8D0F1B21AC907" }, { "b" : "7F2DBF07E000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "6B4F3D896CD0F06FCB3DEF0245F204ECE3220D7E" }, { "b" : "7F2DBEE62000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "3D9441083D079DC2977F1BD50C8068D11767232D" }, { "b" : "7F2DBEA95000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "3C61131D1DAC9DA79B73188E7702BEF786C2AD54" }, { "b" : "7F2DC0077000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5DA2D47925497B2F5875A7D8D1799A1227E2FDE4" }, { "b" : "7F2DBE848000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "E2AA8CA3D3164E7DBEC293BFA0B55D2B10DAC05D" }, { "b" : "7F2DBE55F000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "3EE7267AF7BFD3B132E6A222D997DA09C96C90DD" }, { "b" : "7F2DBE35B000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "67E935BFABA2C914C01156B88947DD515EA51170" }, { "b" : "7F2DBE128000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "82E28CACB60C27CD6F14A6D2268F0CFF621664D0" }, { "b" : "7F2DBDF12000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "B9D5F73428BD6AD68C96986B57BEA3B7CEDB9745" }, { "b" : "7F2DBDD02000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "4F5FBB2087BE132892467C4E7A46A3D07E5DA40B" }, { "b" : "7F2DBDAFE000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F2DBD8E5000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "4C488F6E7044BB966162C1F7081ABBA6EBB2B485" }, { "b" : "7F2DBD6BE000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "D2DD4DA3FDE1477D25BFFF80F3A25FDB541A8179" }, { "b" : "7F2DBD45C000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "9CA3D11F018BEEB719CDB34BE800BF1641350D0A" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1354f62]
 mongod(+0xF540B9) [0x13540b9]
 mongod(+0xF548C2) [0x13548c2]
 libpthread.so.0(+0xF5D0) [0x7f2dbee715d0]
 libc.so.6(gsignal+0x37) [0x7f2dbeacb207]
 libc.so.6(abort+0x148) [0x7f2dbeacc8f8]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x12d5122]
 mongod(+0xCC8943) [0x10c8943]
 mongod(__wt_eventv+0x440) [0x9790db]
 mongod(__wt_err+0x8D) [0x9792d8]
 mongod(__wt_panic+0x2E) [0x9794a6]
 mongod(__wt_block_extlist_read+0x6C) [0x19ea16c]
 mongod(__wt_block_extlist_read_avail+0x33) [0x19ea6d3]
 mongod(__wt_block_checkpoint_load+0x359) [0x19e6f69]
 mongod(+0x15EB969) [0x19eb969]
 mongod(__wt_btree_open+0xBD5) [0x1a04bd5]
 mongod(__wt_conn_btree_open+0x16B) [0x1a3e08b]
 mongod(__wt_session_get_btree+0xDD) [0x1ac40ed]
 mongod(__wt_session_get_btree+0x5F9) [0x1ac4609]
 mongod(__wt_session_get_btree_ckpt+0xAB) [0x1ac482b]
 mongod(__wt_curfile_open+0x86) [0x1a4b876]
 mongod(+0x16C0FED) [0x1ac0fed]
 mongod(__wt_metadata_cursor_open+0x5F) [0x1a8576f]
 mongod(__wt_metadata_cursor+0x7E) [0x1a8586e]
 mongod(wiredtiger_open+0x191A) [0x1a3a81a]
 mongod(_ZN5mongo18WiredTigerKVEngineC2ERKSsS2_S2_mbbb+0x5B6) [0x10afe36]
 mongod(+0xCAC083) [0x10ac083]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x59E) [0xfd4a6e]
 mongod(+0x5C4510) [0x9c4510]
 mongod(main+0x15D) [0x979dbd]
 libc.so.6(__libc_start_main+0xF5) [0x7f2dbeab73d5]
 mongod(+0x5BFED7) [0x9bfed7]
-----  END BACKTRACE  -----
Aborted (core dumped)


关键信息:

2021-10-16T16:55:34.839+0800 E STORAGE  [initandlisten] WiredTiger (0) [1634374534:839158][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 102400: block header checksum of 2702779842 doesn't match expected checksum of 3569473964
2021-10-16T16:55:34.839+0800 E STORAGE  [initandlisten] WiredTiger (0) [1634374534:839200][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2021-10-16T16:55:34.839+0800 E STORAGE  [initandlisten] WiredTiger (-31804) [1634374534:839211][13724:0x7f2dc0282dc0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2021-10-16T16:55:34.839+0800 I -        [initandlisten] Fatal Assertion 28558

错误日志中已经表明是“WiredTiger.wt”损坏,它属于元数据,因此这起事件为元数据损坏。

使用mongod --repair修复时,报错与启动时一致。

数据修复方法

下载和安装v4.0的最新版本(测试的时候是4.0.27),用这个版本的mongod --repair修复损坏的数据,具体为:

mongod --dbpath <损坏的数据目录> --repair

修复日志关键信息:

[voyager@node repair]$ ./mongodb-4.0.27/bin/mongod -f ./mongo.conf --repair
2021-10-16T17:09:43.658+0800 I CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2021-10-16T17:09:43.662+0800 I CONTROL  [initandlisten] MongoDB starting : pid=15243 port=27017 dbpath=/home/voyager/repair/data2 64-bit host=node
2021-10-16T17:09:43.662+0800 I CONTROL  [initandlisten] db version v4.0.27
2021-10-16T17:09:43.662+0800 I CONTROL  [initandlisten] git version: d47b151b55f286546e7c7c98888ae0577856ca20
2021-10-16T17:09:43.663+0800 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2021-10-16T17:09:43.663+0800 I CONTROL  [initandlisten] allocator: tcmalloc
2021-10-16T17:09:43.663+0800 I CONTROL  [initandlisten] modules: none
2021-10-16T17:09:43.663+0800 I CONTROL  [initandlisten] build environment:
2021-10-16T17:09:43.663+0800 I CONTROL  [initandlisten]     distmod: rhel70
2021-10-16T17:09:43.663+0800 I CONTROL  [initandlisten]     distarch: x86_64
2021-10-16T17:09:43.663+0800 I CONTROL  [initandlisten]     target_arch: x86_64
2021-10-16T17:09:43.663+0800 I CONTROL  [initandlisten] options: { config: "./mongo.conf", repair: true, storage: { dbPath: "/home/voyager/repair/data2" } }
2021-10-16T17:09:43.675+0800 I STORAGE  [initandlisten] Detected data files in /home/voyager/repair/data2 created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.

......

2021-10-16T17:09:58.111+0800 W STORAGE  [initandlisten] Modifications made by repair:
2021-10-16T17:09:58.111+0800 W STORAGE  [initandlisten]   WiredTiger metadata salvaged
2021-10-16T17:09:58.115+0800 I STORAGE  [initandlisten] Finished adjusting the table logging settings for existing WiredTiger tables

......

2021-10-16T17:10:00.174+0800 I STORAGE  [initandlisten] Starting to check the table logging settings for existing WiredTiger tables
2021-10-16T17:10:01.592+0800 I STORAGE  [initandlisten] shutdown: removing fs lock...
2021-10-16T17:10:01.592+0800 I -        [initandlisten] Dropping the scope cache for shutdown
2021-10-16T17:10:01.592+0800 I CONTROL  [initandlisten] now exiting
2021-10-16T17:10:01.592+0800 I CONTROL  [initandlisten] shutting down with code:0

[END] 2021/10/16 17:10:09

使用修复后的数据首次启动v3.2.22会有非常多的初始化和检查导致的日志,这个过程结束后将成功启动:

......

2021-10-16T17:18:43.789+0800 I NETWORK  [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2021-10-16T17:18:43.789+0800 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/home/voyager/repair/data2/diagnostic.data'
2021-10-16T17:18:43.790+0800 I NETWORK  [initandlisten] waiting for connections on port 27017

过程和总结

数据修复的过程并不是很顺利,期间查找到一些资料,但都是基于修复原理的论述,按照这些原理我不能快速的达成修复数据的目的,能够看到的修复成功的案例都来自官方的帮助,官方基于损坏的数据和错误日志尽力修复元数据,帮助客户的进程成功启动,于是我也向官方寻求帮助,最终得到回复并成功修复了数据,梳理下时间线:

  1. 发现数据损坏,并确认是元数据损坏,--repair修复失败,保留现场数据;
  2. 在网络上寻求帮助,无果,期间尝试使用wt工具,因一些运行报错放弃(主要因为数据不是非常重要,并且还有其他途径弥补数据丢失带来的损失,并且这个时候已经有了处理这个事件的方案,可以放弃数据恢复);
  3. 出于好奇尝试恢复数据,在官方服务SERVER-19815中找到线索,尝试使用v4.2.7 --repair方式恢复,报错失败,提示降低版本;
  4. 寻求官方服务的帮助,申请服务单SERVER-60703,大概24小时后,官方回复并给出建议,首先建议尝试使用最新版本--repair修复,然后遵从官方建议开始尝试:
    a. v5.0.3提升版本过高,建议v4.2和更早的版本;
    b. 因为一开始就尝试过v4.2,所以直接测试v4.0.27,然后成功。

如果你阅读了SERVER-19815可以发现,官方说“FIX VERSIONS: This issue is fixed in MongoDB 4.0.3 as well as in 4.1.4, and will be available in the 4.2 production release.”,这也是我在尝试v4.2.7之后没有进行更多尝试的原因。

有几个点需要注意:

  1. 首先备份损坏的数据,防止二次破坏;
  2. 评估哪些数据遭到损坏,然后尝试恢复;
  3. 官方的支持是及时的,但是有时差存在。

一些参考,帮你了解数据损坏的更多细节:
WiredTiger存储引擎之四:WT工具编译与元数据文件剖析
MongoDB无法启动,如何恢复数据?
MongoDB 无法启动,WiredTiger 如何恢复数据(二)?


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM