記一次CountDownLatch引發的問題


背景:使用countDownLatch,10個線程計算結果,最后通過主線程來匯總結果;

有個spring scheduler的程序,運行一段時間后,不運行了,查看日志發現CountDownLatch無法歸零,導致整個主線程Hang在那里,此時如果調用其外部接口還能響應:

懷疑是不是死鎖了? 查一下jstack發現沒有出現死鎖:

"Attach Listener" daemon prio=10 tid=0x00007fc1b4001000 nid=0x162e4 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Abandoned connection cleanup thread" daemon prio=10 tid=0x00007fc1801e5800 nid=0x1e632 in Object.wait() [0x00007fc1ec79a000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
    - locked <0x0000000703f97058> (a java.lang.ref.ReferenceQueue$Lock)
    at com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:43)

"Tomcat JDBC Pool Cleaner[1485406391:1509516002697]" daemon prio=10 tid=0x00007fc1801b8000 nid=0x1e631 in Object.wait() [0x00007fc1ec89b000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.TimerThread.mainLoop(Timer.java:552)
    - locked <0x000000070406b8d8> (a java.util.TaskQueue)
    at java.util.TimerThread.run(Timer.java:505)

"http-nio-8119-Acceptor-0" daemon prio=10 tid=0x00007fc19ec27800 nid=0x2d11 runnable [0x00007fc1ba16b000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
    - locked <0x000000070408f7d8> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:686)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8119-ClientPoller-1" daemon prio=10 tid=0x00007fc19f9a2800 nid=0x2d10 runnable [0x00007fc1cc1af000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
    - locked <0x000000070408fbe0> (a sun.nio.ch.Util$2)
    - locked <0x000000070408fbf0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000070408fb98> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1049)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8119-ClientPoller-0" daemon prio=10 tid=0x00007fc19d143000 nid=0x2d0f runnable [0x00007fc1cc2b0000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
    - locked <0x000000070416a268> (a sun.nio.ch.Util$2)
    - locked <0x000000070416a278> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000070416a220> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1049)
    at java.lang.Thread.run(Thread.java:745)

"NioBlockingSelector.BlockPoller-1" daemon prio=10 tid=0x00007fc19e89b000 nid=0x2d0e runnable [0x00007fc1cc3b1000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
    - locked <0x0000000703d3c3a0> (a sun.nio.ch.Util$2)
    - locked <0x0000000703d3c390> (a java.util.Collections$UnmodifiableSet)
    - locked <0x0000000703d3c258> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
    at org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:342)

"pool-1-thread-1" prio=10 tid=0x00007fc1a0749800 nid=0x2d0b waiting on condition [0x00007fc1cc4b1000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007047385d8> (a java.util.concurrent.CountDownLatch$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
    at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
    at com.didi.bigdataqa.tools.HiveBaseTool.doHealthyScore(HiveBaseTool.java:139)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
    at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:81)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"pool-2-thread-1" daemon prio=10 tid=0x00007fc1a0381000 nid=0x2ceb waiting on condition [0x00007fc1cc5b3000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007037d9338> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"cluster-ClusterId{value='59f823ec4d62e12bc9d4a556', description='null'}-localhost:27017" daemon prio=10 tid=0x00007fc1a1734800 nid=0x2cea waiting on condition [0x00007fc1ec497000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007037c1d30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
    at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:224)
    at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:205)
    at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:153)
    - locked <0x00000007037c19e8> (a com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable)
    at java.lang.Thread.run(Thread.java:745)

"commons-pool-EvictionTimer" daemon prio=10 tid=0x00007fc19ecc5800 nid=0x2cbb in Object.wait() [0x00007fc1ec396000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.TimerThread.mainLoop(Timer.java:552)
    - locked <0x0000000703b48458> (a java.util.TaskQueue)
    at java.util.TimerThread.run(Timer.java:505)

"container-0" prio=10 tid=0x00007fc19dfd7800 nid=0x2c95 waiting on condition [0x00007fc1ec598000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.catalina.core.StandardServer.await(StandardServer.java:407)
    at org.springframework.boot.context.embedded.tomcat.TomcatEmbeddedServletContainer$1.run(TomcatEmbeddedServletContainer.java:139)

"ContainerBackgroundProcessor[StandardEngine[Tomcat].StandardHost[localhost].StandardContext[/healthy_score]]" daemon prio=10 tid=0x00007fc199247800 nid=0x2c94 waiting on condition [0x00007fc1ec699000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1345)
    at java.lang.Thread.run(Thread.java:745)

"DestroyJavaVM" prio=10 tid=0x00007fc208009000 nid=0x2bca waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Service Thread" daemon prio=10 tid=0x00007fc208244800 nid=0x2bdc runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007fc208242000 nid=0x2bdb waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007fc20823f000 nid=0x2bda waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007fc20823d000 nid=0x2bd9 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007fc208210000 nid=0x2bd7 in Object.wait() [0x00007fc1edb29000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
    - locked <0x000000070001e8f0> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" daemon prio=10 tid=0x00007fc20820e000 nid=0x2bd6 in Object.wait() [0x00007fc1edc2a000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:503)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
    - locked <0x00000006fff96908> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x00007fc208209800 nid=0x2bd5 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fc20801f000 nid=0x2bcb runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fc208021000 nid=0x2bcc runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fc208023000 nid=0x2bcd runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fc208025000 nid=0x2bce runnable 

"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fc208026800 nid=0x2bcf runnable 

"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fc208028800 nid=0x2bd0 runnable 

"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fc20802a800 nid=0x2bd1 runnable 

"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fc20802c800 nid=0x2bd2 runnable 

"GC task thread#8 (ParallelGC)" prio=10 tid=0x00007fc20802e800 nid=0x2bd3 runnable 

"GC task thread#9 (ParallelGC)" prio=10 tid=0x00007fc208030800 nid=0x2bd4 runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007fc20824f800 nid=0x2bdd waiting on condition 

JNI global references: 151
View Code

多次輸入jstack觀察:

 

jstack 輸出 nid=0x2d0f  查看這個線程ID,轉換為十進制為11535

查看進程下的線程,下面這幾個都是沒有釋放的線程

 ps -T -p 11209

 

 11209  11534 ?        00:00:15 java

 11209  11535 ?        00:00:17 java

 11209  11536 ?        00:00:17 java

 11209  11537 ?        00:00:00 java

 

在使用countDownLatch的await時,加入一個超時等待時間,這樣就可以避免這個問題,但是問題的根源是什么呢?

cnt.await(1, TimeUnit.HOURS);

繼續定位,懷疑是哪個地方沒有設置超時機制,導致一直等待,或許哪里出現了死循環? jstack打印任務的線程都已經看不到了,只能看到進程
代碼里面沒看到死循環的地方,懷疑是GC的問題,導致停機了
最后再定位GC,發現有個List的超級大,最大的時候有30多w,堆設置的有點小。。。
果斷加大堆內存空間得到解決


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM