MySQL 短時間內磁盤暴漲導致 MySQL 空間被占滿

本文轉載自查看原文 2020-09-23 15:47 1022 MySQL

我們遇到了一次在下午 17:30 開始到晚上 23:00 期間 MySQL 存儲空間穩定下降最終導致 MySQL 可用空間使用完

可用空間監控圖展示為

可以看到從 17:00 開始存儲空間開始線性下降直至最后 1TB 空間被完全使用完。

當時查找問題時候的第一反應是不是有什么程序起起來周期性寫入某個數據表，最后把表寫爆導致了這個問題。所以想起來先看看 MySQL 的 binlog。但是比較遺憾的是我們機器的 MySQL 配置幾乎沒有什么優化和修改，

所以 binlog 的默認存儲大小是在 500 M 左右滾動，單個切分文件大小是 128M 我找了一下 aws 似乎沒有能設置 max-size 的地方，官方文檔也只寫了 aws 會盡快刪除和將 binlog 文件進行滾動。實際上它也是這么做的，一邊增加

滾動，然后刪除。設置單個文件 128M 並使得總的 binlog 文件維持在 500M 左右。

因為我們的 binlog 被大量數據沖掉了，發現的時候已經無法通過 binlog 回溯當時是否有大批量不符合預期的 insert 語句插入數據表。

於是我們換了一個方法，直接掃描數據表的 information_schema.table 和直接對一些表進行 count 查詢。以確定是否有大量不符合預期的數據被寫入數據庫，結果都完全沒有發現有任何異常。

平白無故多出 1TB 數據卻在數據庫表里無法體現？

非常奇怪的情況，於是重新去找了一下數據庫的 error.log 信息仔細看了一下得到以下信息

1 2020-09-22T14:39:43.466320Z 26855093 [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file /rdsdbdata/db/innodb/ibtmp1, desired size 67108864 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html
2 2020-09-22T14:39:43.517029Z 26869328 [ERROR] Disk is full writing '/rdsdbdata/log/binlog/mysql-bin-changelog.672430' (Errcode: 15885504 - No space left on device). Waiting for someone to free space...
3 2020-09-22T14:39:43.517726Z 26869328 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
4 2020-09-22T14:39:43.617317Z 26855093 [Warning] InnoDB: 1048576 bytes should have been written. Only 176128 bytes written. Retrying for the remaining bytes.
5 2020-09-22T14:39:43.617364Z 26855093 [Warning] InnoDB: Retry attempts for writing partial data failed.
6 2020-09-22T14:39:43.617370Z 26855093 [ERROR] InnoDB: Write to file /rdsdbdata/db/innodb/ibtmp1failed at offset 1283732799488, 1048576 bytes should have been written, only 176128 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
7 2020-09-22T14:39:43.617387Z 26855093 [ERROR] InnoDB: Error number 28 means 'No space left on device'
8 2020-09-22T14:39:43.617391Z 26855093 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
9 2020-09-22T14:39:43.617403Z 26855093 [Warning] InnoDB: Error while writing 67108864 zeroes to /rdsdbdata/db/innodb/ibtmp1 starting at offset 1283670933504
10 2020-09-22T14:39:43.617444Z 26855093 [ERROR] /rdsdbbin/mysql/bin/mysqld: The table '/rdsdbdata/tmp/#sql_2111_1' is full
11 2020-09-22T14:39:43.617482Z 26854253 [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file /rdsdbdata/db/innodb/ibtmp1, desired size 5066752 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html
12 2020-09-22T14:39:43.617658Z 26854253 [Warning] InnoDB: Retry attempts for writing partial data failed.
13 2020-09-22T14:39:43.617670Z 26854253 [Warning] InnoDB: Error while writing 5066752 zeroes to /rdsdbdata/db/innodb/ibtmp1 starting at offset 1283732975616
14 2020-09-22T14:39:43.617689Z 26854253 [ERROR] /rdsdbbin/mysql/bin/mysqld: The table '/rdsdbdata/tmp/#sql_2111_0' is full15 2020-09-22T14:43:24.521888Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4087ms. The settings might not be optimal. (flushed=20002020-09-22T14:57:43.521368Z 26870123 [Note] Aborted connection 26870123 (Got an error reading communication packets)

可以注意到第6行

6  2020-09-22T14:39:43.617370Z 26855093 [ERROR] InnoDB: Write to file /rdsdbdata/db/innodb/ibtmp1failed at offset 1283732799488, 1048576 bytes should have been written, only 176128 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.

臨時表存儲文件 ibtmp1 的偏移量高達 1283732799488kb => 1195gb 這不正好一個 t 嗎

看這個文件 ibtmp1 是和臨時表相關的，馬上反應過來是不是其實根本不是數據插入了，而是 join 創建臨時表過大導致的上述情況。

很明顯完全有可能在最后存儲耗盡存儲之后掛掉，然后被寫入 slow query 里面。

於是查找了掛掉時候的 slow_query.log 找到了這條執行了上萬 s 的自鏈接語句。

# Query_time: 19010.513418  Lock_time: 0.000048 Rows_sent: 0  Rows_examined: 876574
select f1.CreatorChannelId, f1.tweets, f1.createdAt, f2.updatedAt from xx f1, xx f2 where f1.createdAt != f2.createdAt ORDER BY updatedAt DESC limit 200;

問題很明顯了 xx 這個表是一個上千萬的大表。由於 createdAt 的時間有太多的不同，所以滿足這個條件的情況就會非常多，是一個很大很大的笛卡爾積。

由於我們 MySQL 沒有設置查詢語句超時時間，導致該語句需要的空間無限膨脹最終擊穿了我們的可用存儲。

High Light 那么最后總結一下，供參考：

〇在線上環境執行的 SQL，還是需要非常非常非常謹慎，包括通過 client 執行的錯誤查詢語句 cancel 掉之后需要仔細檢查是否真的已經被殺掉。
〇考慮是否可以限制 ibtmp1 的大小，也就是設定 innodb_temp_data_file_path 的最大值來防止過度膨脹。
〇也可以設置 MySQL 的超時時間來避免這種超長時間無法結束的語句一直執行消耗資源。
〇做好監控和及時響應，這次最讓人意外的就是主數據庫存儲資源一直降低卻最終無人發現，最后掛了才發現引發了線上的 P0 故障。稍微有一個環節做得再好一些都不至於如此。
〇所以一個系統有太多的問題一直拖延不解決，最終總是會以非常極端的事件進行反饋，這大概就是技術債吧。

Reference:

http://blog.itpub.net/29773961/viewspace-2142197/ 【MySQL】一條SQL使磁盤暴漲並導致MySQL Crash

https://zhuanlan.zhihu.com/p/82554609 從MySQL的ibtmp1文件太大說起

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 短時間內多個請求狀態更新，導致react 不能及時響應問題總結如何短時間內快速通過Java面試 vue防止按鈕在短時間內被多次點擊的方法 vue前端防止按鈕在短時間內多次點擊【Javascript Demo】防止按鈕在短時間內被多次點擊如何避免在短時間內按鈕被多次重復點擊 Linux SSH遠程鏈接短時間內斷開 UWP忽略短時間內重復觸發的事件 Java面試，如何在短時間內做突擊連點處理，禁止按鈕在短時間內被多次點擊