記Tomcat進程stop卡住問題定位處理


部分內容參考自 CSDN

測試環境通過agent注入了部分代碼,其中包括幾個Timer.

在通過啟動腳本重啟tomcat時,會一直有一個stop進程卡住,導致tomcat無法正常重啟,進程卡住不動。

通過jstack tomcat進程,發現沒有死鎖進程,只有兩個進程是TIMED_WAITING,這兩個進程是通過agent注入的兩個原生timer,原生timer很不建議使用.

        /**
         * 1.獲取路由節點隊列數據,超過Config.Message.NODES大小發送至GRCC
         */
        new Timer("route-nodes-to-grcc-timer-1").scheduleAtFixedRate(new TimerTask() {
            @Override
            public void run() {
                try {
                    while (true) {
                        try {
                            if(isCanPoll){
                                Node node = nodesLinkedQueue.poll();
                                if(null != node){
                                    nodes.add(node);
                                    if(nodes.size() >= Config.Message.NODES){
                                        try{
                                            semaphore.acquire();
                                            if(!nodes.isEmpty()){
                                                sendToGRCC(nodes);
                                            }
                                        }catch (Exception e){
                                            logger.error("Consumer Task發送鏈路信息至GRCC異常,異常信息如下:"+e.getMessage());
                                        }finally {
                                            semaphore.release();
                                        }
                                    }
                                }
                            }
                            Thread.sleep(10L);
                        } catch (Exception e) {
                            logger.error("nodesLinkedQueue poll異常,錯誤信息如下:"+e.getMessage());
                        }
                    }
                } catch (Exception e) {
                    logger.error("schedule執行異常,錯誤信息如下:"+e.getMessage());
                }
            }
        }, 1000L, 1000L);

        /**
         * 2.間隔Config.Message.INTERVAL時間發送一次路由節點信息至GRCC
         */
        new Timer("route-nodes-to-grcc-timer-2").scheduleAtFixedRate(new TimerTask() {
            @Override
            public void run() {
                /**
                 * nodes不為null再獲取鎖進行發送
                 */
                if (!nodes.isEmpty()) {
                    try {
                        semaphore.acquire();
                        isCanPoll = false;
                        if (!nodes.isEmpty()) {
                            sendToGRCC(nodes);
                        }
                    } catch (Exception e) {
                        logger.error("Schedule Task發送鏈路信息至GRCC異常,異常信息如下:" + e.getMessage());
                    } finally {
                        semaphore.release();
                        isCanPoll = true;
                    }
                }
            }
        },1000L,Config.Message.INTERVAL);

每次tomcat重啟時,進程卡住的同時,后端tomcat日志會打印如下類似的提示內存泄漏的warning:

02-Jan-2019 19:37:46.918 警告 [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [fx-route] appears to have started a thread named [dubbo-remoting-client-heartbeat-thread-2] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 java.lang.Thread.run(Thread.java:745)
02-Jan-2019 19:37:46.920 警告 [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [fx-route] appears to have started a thread named [pool-13-thread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 java.lang.Thread.run(Thread.java:745)
02-Jan-2019 19:37:46.922 警告 [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [fx-route] appears to have started a thread named [pool-14-thread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 java.lang.Thread.run(Thread.java:745)
02-Jan-2019 19:37:46.923 警告 [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [fx-route] appears to have started a thread named [Abandoned connection cleanup thread] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
 java.lang.Object.wait(Native Method)
 java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
 java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
 com.mysql.jdbc.NonRegisteringDriver$1.run(NonRegisteringDriver.java:93)

通過查詢,發現上述CSDN中描述的:存在線程未關閉,比如Timer

后來將Timer修改為ScheduledExecutorService后,發現tomcat啟動正常。

ScheduledExecutorService executor =  Executors.newScheduledThreadPool(2,new DefaultNamedThreadFactory("route-nodes-to-grcc"));
        /**
         * 1.獲取路由節點隊列數據,超過Config.Message.NODES大小發送至GRCC
         */
        executor.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                try {
                    while (true) {
                        try {
                            if(isCanPoll){
                                Node node = nodesLinkedQueue.poll();
                                if(null != node){
                                    nodes.add(node);
                                    if(nodes.size() >= Config.Message.NODES){
                                        try{
                                            semaphore.acquire();
                                            if(!nodes.isEmpty()){
                                                sendToGRCC(nodes);
                                            }
                                        }catch (Exception e){
                                            logger.error("Consumer Task發送鏈路信息至GRCC異常,異常信息如下:"+e.getMessage());
                                        }finally {
                                            semaphore.release();
                                        }
                                    }
                                }
                            }
                            Thread.sleep(10L);
                        } catch (Exception e) {
                            logger.error("nodesLinkedQueue poll異常,錯誤信息如下:"+e.getMessage());
                        }
                    }
                } catch (Exception e) {
                    logger.error("schedule執行異常,錯誤信息如下:"+e.getMessage());
                }
            }
        },1000L, 1000L, TimeUnit.MILLISECONDS);

        /**
         * 2.間隔Config.Message.INTERVAL時間發送一次路由節點信息至GRCC
         */
        executor.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                /**
                 * nodes不為null再獲取鎖進行發送
                 */
                if (!nodes.isEmpty()) {
                    try {
                        semaphore.acquire();
                        isCanPoll = false;
                        if (!nodes.isEmpty()) {
                            sendToGRCC(nodes);
                        }
                    } catch (Exception e) {
                        logger.error("Schedule Task發送鏈路信息至GRCC異常,異常信息如下:" + e.getMessage());
                    } finally {
                        semaphore.release();
                        isCanPoll = true;
                    }
                }
            }
        },1000L,Config.Message.INTERVAL,TimeUnit.MILLISECONDS);

阿里編程規范有如下兩條:

    • 【規范】線程資源必須通過線程池提供,不允許在應用中自行顯式創建線程。
      說明:使用線程池的好處是減少在創建和銷毀線程上所花的時間以及系統資源的開銷,解決資源不足的問題。如果不使用線程池,有可能造成系統創建大量同類線程而導致消耗完內存或者 “過度切換”的問題。
    • 【規范】線程池不允許使用 Executors 去創建,而是通過 ThreadPoolExecutor 的方式,這樣的處理方式讓寫的同學更加明確線程池的運行規則,規避資源耗盡的風險。 
      說明:Executors 返回的線程池對象的弊端如下:
      1)FixedThreadPool 和 SingleThreadPool:
        允許的請求隊列長度為 Integer.MAX_VALUE,可能會堆積大量的請求,從而導致 OOM。
      2)CachedThreadPool 和 ScheduledThreadPool:
             允許的創建線程數量為 Integer.MAX_VALUE,可能會創建大量的線程,從而導致 OOM。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM