問題背景:
筆者所在的項目組最近把生產環境Tomcat遷移到Linux,算是順利運行了一段時間,最近一個低概率密度的(too many open files)問題導致服務假死並停止響應客戶端客戶端請求。
進入服務器查看日志,發現tomcat凌晨6-7點的日志丟失,查看進程端口仍舊開放。
root@# lsof -i:8080 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 29511 root 67u IPv4 147648 0t0 TCP *:8080 (LISTEN)
從存檔的日志找到一些端倪,發現凌晨1點的日志就開始出現異常
23-Dec-2018 01:12:12.514 嚴重 [http-nio-83-Acceptor-0] org.apache.tomcat.util.net.Acceptor.run Socket accept failed java.io.IOException: 打開的文件過多 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:448) at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:70) at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:95) at java.lang.Thread.run(Thread.java:748)
錯誤日志一直追到凌晨6-7點,這個錯誤仍舊比較高密度的出現
23-Dec-2018 07:35:06.932 嚴重 [http-nio-83-Acceptor-0] org.apache.tomcat.util.net.Acceptor.run Socket accept failed java.io.IOException: 打開的文件過多 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:448) at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:70) at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:95) at java.lang.Thread.run(Thread.java:748) 23-Dec-2018 07:35:07.692 嚴重 [http-nio-81-Acceptor-0] org.apache.tomcat.util.net.Acceptor.run Socket accept failed java.io.IOException: 打開的文件過多 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:448) at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:70) at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:95) at java.lang.Thread.run(Thread.java:748) 23-Dec-2018 07:35:08.532 嚴重 [http-nio-83-Acceptor-0] org.apache.tomcat.util.net.Acceptor.run Socket accept failed java.io.IOException: 打開的文件過多 at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:448) at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept(NioEndpoint.java:70) at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:95) at java.lang.Thread.run(Thread.java:748)
故障排除步驟:
筆者最初懷疑自己在預安裝生產環境的過程中忘記調優內核參數,於是按步驟查詢了最大文件打開數:
root@# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1031211 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 1031211 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
open files那一行就代表系統目前允許單個進程打開的最大句柄數,這里是4096。
按照這個參數想了一下,即使class文件沒有打包,項目中也不會消耗這么多文件句柄的。
有打開內存參數確認了一下
vim /etc/security/limits.conf
#<domain> <type> <item> <value> # #* soft core 0 #root hard core 100000 #* hard rss 10000 #@student hard nproc 20 #@faculty soft nproc 20 #@faculty hard nproc 50 #ftp hard nproc 0 #ftp - chroot /ftp #@student - maxlogins 4 # End of file *soft nofile 63335 *hard nofile 65535
這一切看上去都沒有文件,文件句柄看上去足夠tomcat使用。
查看tomcat進程
jps
或者
ps -ef | grep -i 'bootstrap.jar' | grep -v grep | awk '{print $2}'
查得tomcat進程
root@# ps -ef | grep -i 'bootstrap.jar' | grep -v grep | awk '{print $2}' 29511
筆者調大文件句柄最多打開數
然后用命令查看進程打開句柄數
lsof -p 29511| wc -l
root@# lsof -p 29511| wc -l
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/108/gvfs
Output information may be incomplete.
265
root@# lsof -p 29511| wc -l
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/108/gvfs
Output information may be incomplete.
268
通過以下命令查看進程打開的文件句柄詳情:
root@# lsof -p 29511 lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/108/gvfs Output information may be incomplete. COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 29511 root cwd DIR 8,16 53248 160366607 /opt/tomcat-9.0.13/logs java 29511 root rtd DIR 8,1 4096 2 / java 29511 root txt REG 8,1 7734 15729394 /opt/jdk1.8.0_181/bin/java java 29511 root mem REG 8,1 41080 23855198 /var/cache/fontconfig/945677eb7aeaf62f1d50efc3fb3ec7d8-le64.cache-6 java 29511 root mem REG 8,1 20584 23855180 /var/cache/fontconfig/2cd17615ca594fa2959ae173292e504c-le64.cache-6 java 29511 root mem REG 8,1 84432 23855174 /var/cache/fontconfig/04aabc0a78ac019cf9454389977116d2-le64.cache-6 java 29511 root mem REG 8,1 12392 23855184 /var/cache/fontconfig/385c0604a188198f04d133e54aba7fe7-le64.cache-6 java 29511 root mem REG 8,1 265387 15731055 /opt/jdk1.8.0_181/jre/lib/amd64/libjpeg.so java 29511 root mem REG 8,1 493889 15731072 /opt/jdk1.8.0_181/jre/lib/amd64/libt2k.so java 29511 root mem REG 8,1 525421 15731090 /opt/jdk1.8.0_181/jre/lib/amd64/libfontmanager.so java 29511 root mem REG 8,1 3416 23855177 /var/cache/fontconfig/0d8c3b2ac0904cb8a57a757ad11a4a08-le64.cache-6 java 29511 root mem REG 8,1 3144 23855179 /var/cache/fontconfig/1ac9eb803944fde146138c791f5cc56a-le64.cache-6 java 29511 root mem REG 8,1 1632 23855211 /var/cache/fontconfig/dc05db6664285cc2f12bf69c139ae4c3-le64.cache-6 java 29511 root mem REG 8,1 8640 23855194 /var/cache/fontconfig/767a8244fc0220cfb567a839d0392e0b-le64.cache-6 java 29511 root mem REG 8,1 2320 23860602 /var/cache/fontconfig/4794a0821666d79190d59a36cb4f44b5-le64.cache-6 java 29511 root mem REG 8,1 17256 23855197 /var/cache/fontconfig/8801497958630a81b71ace7c5f9b32a8-le64.cache-6 java 29511 root mem REG 8,1 1800 23855204 /var/cache/fontconfig/bab58bb527bb656aaa9f116d68a48d89-le64.cache-6 java 29511 root mem REG 8,1 25168 23855181 /var/cache/fontconfig/3047814df9a2f067bd2d96a2b9c36e5a-le64.cache-6 java 29511 root mem REG 8,1 1816 23855188 /var/cache/fontconfig/56cf4f4769d0f4abc89a4895d7bd3ae1-le64.cache-6 java 29511 root mem REG 8,1 3184 23855203 /var/cache/fontconfig/b9d506c9ac06c20b433354fa67a72993-le64.cache-6 java 29511 root mem REG 8,1 21520 23855202 /var/cache/fontconfig/b47c4e1ecd0709278f4910c18777a504-le64.cache-6 java 29511 root mem REG 8,1 76688 23855208 /var/cache/fontconfig/d52a8644073d54c13679302ca1180695-le64.cache-6 java 29511 root mem REG 8,1 1816 23855187 /var/cache/fontconfig/551ecf3b0e8b0bca0f25c0944f561853-le64.cache-6 java 29511 root mem REG 8,1 15216 23855209 /var/cache/fontconfig/d589a48862398ed80a3d6066f4f56f4c-le64.cache-6 ..... java 29511 root 133r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 134r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 135r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 136r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 137r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 138r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 139r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 140r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 141r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 142r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 143r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 144r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 145r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 146r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 147r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 148w REG 8,16 1416 160368886 /opt/tomcat-9.0.13/logs/logs_tomcat.log
發現一直還在長,於是查看進程打開文件,多次對比查看打開的文件句柄,發現tomcat-users.xml只增不減
java 29511 root 171r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 172r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 173r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 174r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 175r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 176r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 177r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 178r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 179r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 180r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 181r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 182r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 183r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 184r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 185r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 186r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 187r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 188r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 189r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 190r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 191r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 192r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 193r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 194r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 195r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 196r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 197r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 198r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 199r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 200r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 201r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 202r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 203r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml java 29511 root 204r REG 8,16 2164 160366602 /opt/tomcat-9.0.13/conf/tomcat-users.xml
至此,問題單症結基本確定了,就是tomcat這里
通過Google查詢:too many tomcat-users.xml open by tomcat
最終確定這是tomcat9.0.13的一個bug:Tomcat too many files open (tomcat-users.xml)
This was caused by a bug in Tomcat 9.0.13 which has been fixed in Tomcat 9.0.14.
升級tomcat到9.0.14解決問題
筆者按:以筆者目前粗淺的win32知識觸類旁通來分析這個問題,進程打開的文件句柄數達到設定的最大值,在向操作系統申請資源的時候一直處於等待狀態,操作系統不再分撥資源
給他,於是進程還在,沒死掉,只是資源被他耗盡,待客戶端來訪問的時候,他還會去申請文件句柄,一直就處於假死狀態了。
參考:
Tomcat報java.io.IOException: 打開的文件過多
too many open files(打開的文件過多)解決方法
Error in tomcat “too many open files”
How to really fix the too many open files problem for Tomcat in Ubuntu