現象
主機無緣無故死機,主機上服務無響應
日志出現大量:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
日志輸出信息:
#tail -f /var/log/messages
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: INFO: task keepalived:21553 blocked for more than 120 seconds.
kernel: INFO: task vnetd:18082 blocked for more than 120 seconds.
kernel: INFO: task zabbix_agentd:15274 blocked for more than 120 seconds.
kernel: INFO: task jbd2/dm-3-8:848 blocked for more than 120 seconds.
kernel: INFO: task pickup:21858 blocked for more than 120 seconds.
kernel: INFO: task xfsaild/dm-0:476 blocked for more than 120 seconds.
Runtime journal is using 832.0M (max allowed 794.3M, trying to leave 1.1G free of 6.9G available → current limit 832.0M)
# dmesg |grep '/proc/sys/kernel/hung_task_timeout_secs' -B 1
[51140129.902940] INFO: task systemd:1 blocked for more than 120 seconds.
[51140129.902992] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140129.903265] INFO: task xfsaild/dm-0:476 blocked for more than 120 seconds.
[51140129.903298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140129.903636] INFO: task jbd2/dm-3-8:848 blocked for more than 120 seconds.
[51140129.903668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140129.903796] INFO: task keepalived:21553 blocked for more than 120 seconds.
[51140129.903829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140129.904034] INFO: task vnetd:18082 blocked for more than 120 seconds.
[51140269.655352] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.655521] INFO: task zabbix_agentd:15274 blocked for more than 120 seconds.
[51140269.655546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.655638] INFO: task zabbix_agentd:15275 blocked for more than 120 seconds.
[51140269.655661] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.655745] INFO: task zabbix_agentd:15276 blocked for more than 120 seconds.
[51140269.655767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.655852] INFO: task kworker/4:0:29226 blocked for more than 120 seconds.
[51140269.655874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
--
[51140269.656181] INFO: task pickup:21858 blocked for more than 120 seconds.
[51140269.656204] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
分析:
echo 0 > /proc/sys/kernel/hung_task_timeout_secs,提示內容為禁用超時限制,就不會再有上面信息提示,不建議禁用
內核參數設置為: kernel.hung_task_timeout_secs = 120 ,表示內存寫到磁盤的時間限制為120s
結合任務IO堵塞信息,可判斷是內存寫入磁盤造成IO堵塞堆積,導致系統失去響應。先達到vm.dirty_background_ratio的條件然后觸發flush進程進行異步的回寫操作,但是這一過程中應用進程仍然可以進行寫操作,如果多個應用進程寫入的量大於flush進程刷出的量那自然會達到vm.dirty_ratio這個參數所設定的坎,此時操作系統會轉入同步地處理臟頁的過程,阻塞應用進程。
問題原因:
By default Linux uses up to 40% of the available memory for file system caching.
After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous.
For flushing out this data to disk this there is a time limit of 120 seconds by default.
In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds.
This especially happens on systems with a lot of memory.
The problem is solved in later kernels
默認情況下, Linux會最多使用40%的可用內存作為文件系統緩存。當超過這個閾值后,文件系統會把將緩存中的內存全部寫入磁盤, 導致后續的IO請求都是同步的。
將緩存寫入磁盤時,有一個默認120秒的超時時間。 出現上面的問題的原因是IO子系統的處理速度不夠快,不能在120秒將緩存中的數據全部寫入磁盤。
IO系統響應緩慢,導致越來越多的請求堆積,最終系統內存全部被占用,導致系統失去響應。
這個Linux延遲寫機制帶來的問題,並且在主機內存越大時,出現該問題的可能性更大。
解決方法:
根據情況,對vm.dirty_ratio,vm.dirty_background_ratio兩個參數進行調優設置。
優化思路:
- 減少臟數據的比例,避免刷寫超時
- 減小臟數據在內存中的存放時間,避免積少成多
臨時生效
sysctl -w vm.dirty_ratio = 40 sysctl -w vm.dirty_background_ratio = 10
持久寫入內核參數 #vi /etc/sysctl.conf vm.dirty_ratio = 40 vm.dirty_background_ratio = 10 #sysctl -p
vm.dirty_background_ratio是內存可以填充“臟數據”的百分比。這些“臟數據”在稍后是會寫入磁盤的,pdflush/flush/kdmflush這些后台進程會稍后清理臟數據。舉一個例子,我有32G內存,那么有3.2G的內存可以待着內存里,超過3.2G的話就會有后來進程來清理它。
vm.dirty_ratio 是絕對的臟數據限制,內存里的臟數據百分比不能超過這個值,如果超過,將強制刷寫到磁盤。如果臟數據超過這個數量,新的IO請求將會被阻擋,直到臟數據被寫進磁盤。這是造成IO卡頓的重要原因,但這也是保證內存中不會存在過量臟數據的保護機制。
vm.dirty_expire_centisecs 指定臟數據能存活的時間。在這里它的值是30秒。當 pdflush/flush/kdmflush 進行起來時,它會檢查是否有數據超過這個時限,如果有則會把它異步地寫到磁盤中。畢竟數據在內存里待太久也會有丟失風險。
vm.dirty_writeback_centisecs 指定多長時間
pdflush/flush/kdmflush 這些進程會起來一次。
臟數據
臟數據 :由於Linux內核實現的一種主要磁盤緩存的存在,也就是頁高速緩存(cache)。頁高速緩存的緩存作用,寫操作實際上會被延遲。當頁高速緩存中的數據比后台存儲的數據更新時,那么該數據就被稱做臟數據。
參考鏈接:
https://blog.csdn.net/weixin_43279032/article/details/87718804
http://ilinuxkernel.com/?p=1578
頁高速緩存和臟數據等其他IO術語參考:https://blog.51cto.com/qixue/1906775