問題症狀:服務器內存占用持續增長,性能低下,並發上不去,隔幾天宕機
排查思路:分析出內存泄露模塊,分析出性能瓶頸,調優JVM
使用工具:jconsole、jprofiler
使用jconsole連接,經過測試得到如下圖表,可能為JVM配置不當或內存泄露

通過thread dump發現線程大都阻在HTTPCLIENT獲取連接方法上,經分析可能為HTTPCLIENT連接未及時釋放
問題一:HTTPCLIENT連接沒有真正釋放
THREAD DUMP日志
"[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'" id=50 idx=0xbc tid=1596 prio=5 alive, in native, waiting, daemon
-- Waiting for notification on: org/apache/commons/httpclient/MultiThreadedHttpConnectionManager$ConnectionPool@0x03801D20[fat lock]
at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
at java/lang/Object.wait(J)V(Native Method)[optimized]
at org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)[optimized]
^-- Lock released while waiting: org/apache/commons/httpclient/MultiThreadedHttpConnectionManager$ConnectionPool@0x03801D20[fat lock]
at org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)[optimized]
at org/apache/commons/httpclient/HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)[optimized]
at org/apache/commons/httpclient/HttpClient.executeMethod(HttpClient.java:397)[optimized]
at org/apache/commons/httpclient/HttpClient.executeMethod(HttpClient.java:346)[inlined]
-- Waiting for notification on: org/apache/commons/httpclient/MultiThreadedHttpConnectionManager$ConnectionPool@0x03801D20[fat lock]
at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
at java/lang/Object.wait(J)V(Native Method)[optimized]
at org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)[optimized]
^-- Lock released while waiting: org/apache/commons/httpclient/MultiThreadedHttpConnectionManager$ConnectionPool@0x03801D20[fat lock]
at org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)[optimized]
at org/apache/commons/httpclient/HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)[optimized]
at org/apache/commons/httpclient/HttpClient.executeMethod(HttpClient.java:397)[optimized]
at org/apache/commons/httpclient/HttpClient.executeMethod(HttpClient.java:346)[inlined]
解決辦法:釋放連接,為了避免產生大量CLOSE_WAIT,定期關閉空閑連接:
httpMethod.releaseConnection();
connectionManager
.closeIdleConnections(3000);
問題二:JVM調優,開啟JMX服務,使用JCONSOLE連接
SUN JDK
-Xms1024m -Xmx1024m -Xss192k -XX:PermSize=192M -XX:+UseParallelGC -XX:ParallelGCThread=8 -Djava.rmi.server.hostname=172.30.0.232 -Dcom.sun.management.jmxremote.port=7001 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
【調優后】
服務器經過72小時的壓力測試,線程數增加了100個,服務器內存增加了50M,總訪問次數291萬次,我們判斷服務器目前運行正常,今天主要分析了線程數和內存增加的原因。
從內存占用情況來看,內存最低能釋放到170M左右,之前是120M左右,增長了50M左右

從線程圖表上看,有過兩次線程增長,且堆棧內存占用都有增加,但是這次的線程數增長和上次不一樣,我們的依據是Thread Dump抓包分析、日志分析、console后台分析,上次是由於tmp下文件過多,導致讀取本地文件出現IO超時,而導致線程數急劇增長(400個);而這次應該是由於網絡問題(ERR日志中有一些Read Time Out異常)而導致某時段內網絡出現阻塞,weblogic的線程管理器進行了自動優化,增加一些線程,另外單個HOST最大只能創建100個連接,也與這次線程增長數吻合,線程增長,棧內存增長(20M左右),線程中用到對象實例,堆內存自然增長(50M左右),似乎也合理,依據此我們判斷目前服務器運行正常。
從LoadRunner圖表來看,周末網絡情況似乎不錯,TPS能到16,響應時間也在2S內,比上次圖表好看,依然是凌晨的網絡比白天的好

從內存占用情況來看,內存最低能釋放到170M左右,之前是120M左右,增長了50M左右

從線程圖表上看,有過兩次線程增長,且堆棧內存占用都有增加,但是這次的線程數增長和上次不一樣,我們的依據是Thread Dump抓包分析、日志分析、console后台分析,上次是由於tmp下文件過多,導致讀取本地文件出現IO超時,而導致線程數急劇增長(400個);而這次應該是由於網絡問題(ERR日志中有一些Read Time Out異常)而導致某時段內網絡出現阻塞,weblogic的線程管理器進行了自動優化,增加一些線程,另外單個HOST最大只能創建100個連接,也與這次線程增長數吻合,線程增長,棧內存增長(20M左右),線程中用到對象實例,堆內存自然增長(50M左右),似乎也合理,依據此我們判斷目前服務器運行正常。

從LoadRunner圖表來看,周末網絡情況似乎不錯,TPS能到16,響應時間也在2S內,比上次圖表好看,依然是凌晨的網絡比白天的好

問題三:內存泄露
通過JMAP命令分析發現jtidy存在內存泄露,經過分析發現時在在異常時沒有釋放資源
1371376 164565120 org.w3c.tidy.Node3
1413005 169560600 org.w3c.tidy.Node
1437838 172540560 org.w3c.tidy.Node
問題四:連接數暴漲,分析發現磁盤已滿

問題五:性能瓶頸,通過JPROFILER分析,找出消耗時間最多的方法,進行優化
分析發現,字符串操作耗時大,XSLT解析XML耗時大,優化,合理使用緩存
問題六:數據庫連接達到上限
####<2011-8-25 上午08時21分07秒 CST> <Info> <Console> <RD-DCM-03> <AdminServer> <[ACTIVE] ExecuteThread: '32' for queue: 'weblogic.kernel.Default (self-tuning)'> <weblogic> <> <> <1314231667171> <BEA-240002> <Struts module /core is configured to use com.bea.console.internal.ConsolePageFlowRequestProcessor as the request processor, but the <controller> element does not contain a <set-property> for "controllerClass". Page Flow actions in this module may not be handled correctly.>
####<2011-8-25 上午08時21分07秒 CST> <Info> <Console> <RD-DCM-03> <AdminServer> <[ACTIVE] ExecuteThread: '32' for queue: 'weblogic.kernel.Default (self-tuning)'> <weblogic> <> <> <1314231667171> <BEA-240001> <Attempting to instantiate SharedFlowControllers for request /console/core/CoreServerThreadStackDump.do>
####<2011-8-25 上午08時21分07秒 CST> <Info> <Console> <RD-DCM-03> <AdminServer> <[ACTIVE] ExecuteThread: '32' for queue: 'weblogic.kernel.Default (self-tuning)'> <weblogic> <> <> <1314231667171> <BEA-240001> <<ConsoleInteraction: User is viewing <com.bea.console.actions.core.server.ThreadStackDumpAction> for <Server> <com.bea.console.handles.JMXHandle%28%22com.bea%3AName%3DAdminServer%2CType%3DServer%22%29>>>
####<2011-8-25 上午08時21分17秒 CST> <Info> <Common> <RD-DCM-03> <AdminServer> <[ACTIVE] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1314231677484> <BEA-000627> <Reached maximum capacity of pool "dcm", making "0" new resource instances instead of "1".>
####<2011-8-25 上午08時21分17秒 CST> <Info> <Common> <RD-DCM-03> <AdminServer> <[ACTIVE] ExecuteThread: '27' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1314231677500> <BEA-000627> <Reached maximum capacity of pool "dcm", making "0" new resource instances instead of "1".>
JDBC連接池到達極限,設置JDBC Connection Max Capacity

問題七:其他問題
使用jps查看java進程id時,顯示process informatin unavailable,且jstat無法連接,即無法圖形化查看JVM運行情況,對排查問題造成不便,排查發現是管理員帳號中存在"_",最后更換管理員帳號啟動腳本,正常