今日線上突發一個問題,應用假死,不接受http請求,排查日志,發現有大量的 The last packet successfully received from 異常。
首先排查數據庫和鏈接池配置
<bean id="dataSource" class="org.apache.tomcat.jdbc.pool.DataSource"
destroy-method="close">
<!-- Connection Info -->
<property name="driverClassName" value="${jdbc.driver}"/>
<property name="url" value="${jdbc.url}"/>
<property name="username" value="${jdbc.username}"/>
<property name="password" value="${jdbc.password}"/>
<!-- Connection Pooling Info -->
<property name="maxActive" value="${jdbc.pool.maxActive}"/>
<property name="maxIdle" value="${jdbc.pool.maxIdle}"/>
<property name="defaultAutoCommit" value="true"/>
<property name="validationQuery" value="${jdbc.validationQuery}"/>
<property name="testOnBorrow" value="true"/>
<!-- 連接Idle一個小時后超時 -->
<property name="timeBetweenEvictionRunsMillis" value="3600000"/>
<property name="minEvictableIdleTimeMillis" value="3600000"/>
</bean>
對比mysql 配置 的timeout配置
鏈接池的超時時間遠遠小於 數據庫的wait_timeout ,配置沒有問題
wait_timeout wait_timeout 服務器關閉非交互連接之前等待活動的秒數
數據庫wait_timeout配置的3600s(1小時)、業務線程閑置狀態下和數據庫保持的連接存活1個小時后數據庫主動斷開連接,這個時候有新的數據庫操作請求、拿到該連接去執行validationQuery檢測連接是否有效,由於數據庫已經主動斷開連接、執行檢測sql就會拋出上面的錯誤。問題的本質還是druid線程池里沒有及時清除無效的數據庫連接導致。
但數據庫的判斷認為的連接超時時間是 timeBetweenEvictionRunsMillis = 3600000 < wait_timeout = 31536000 故問題不在此
排查應用程序
使用arthas監控線程,果不其然,出現了10running的http-nio線程,打開一個查看一下,果然有線程死鎖
[arthas@23223]$ thread 167
"http-nio-30030-exec-113" Id=167 RUNNABLE (in native)
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:112)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:159)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:187)
- locked com.mysql.jdbc.util.ReadAheadInputStream@5c14cebc
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3140)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3597)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3586)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4131)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2597)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2758)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820)
- locked com.mysql.jdbc.JDBC4Connection@eba881e
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2769)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:907)
- locked com.mysql.jdbc.JDBC4Connection@eba881e
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:791)
at net.sf.log4jdbc.StatementSpy.execute(StatementSpy.java:842)
at org.apache.tomcat.jdbc.pool.PooledConnection.validate(PooledConnection.java:455)
at org.apache.tomcat.jdbc.pool.PooledConnection.validate(PooledConnection.java:395)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:777)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:618)
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:188)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128)
at org.springside.modules.persistence.Hibernates.getJdbcUrlFromDataSource(Hibernates.java:47)
at org.springside.modules.persistence.Hibernates.getDialect(Hibernates.java:30)
at cn.infisa.obstetrics.respository.DaoUtils.getDatabaseDialect(DaoUtils.java:130)
at cn.infisa.obstetrics.respository.DaoUtils.getUrl(DaoUtils.java:151)
at cn.infisa.obstetrics.respository.DaoUtils.getDataList(DaoUtils.java:174)
at cn.infisa.obstetrics.respository.hospital.PatientInfoJdbcDao.getTimeline(PatientInfoJdbcDao.java:136)
at cn.infisa.obstetrics.service.hospital.PatientInfoService.getTimeline(PatientInfoService.java:74)
at cn.infisa.obstetrics.service.hospital.PatientInfoMenuService.assemPatientTimeline(PatientInfoMenuService.java:375)
at cn.infisa.obstetrics.web.record.PatientInfoController.getTimeline(PatientInfoController.java:134)
at sun.reflect.GeneratedMethodAccessor149.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
線程調用過程中,有三把鎖,分別位於:
- com.mysql.jdbc.StatementImpl.execute synchronized (locallyScopedConn.getConnectionMutex()) Line:797
- com.mysql.jdbc.ConnectionImpl.execSQL synchronized (getConnectionMutex()) Line:2779
- com.mysql.jdbc.util.ReadAheadInputStream public synchronized int read(byte b[], int off, int len) throws IOException Line 176
最后鎖住在 socketRead0 方法,而socketRead0 是一個native方法,有一個參數表示方法的超時設置-timeout
private int socketRead(FileDescriptor fd,
byte b[], int off, int len,
int timeout)
throws IOException {
return socketRead0(fd, b, off, len, timeout);
}
private native int socketRead0(FileDescriptor fd,
byte b[], int off, int len,
int timeout)
throws IOException;
什么是 Socket 超時
JDBC 驅動是用 Socket 方式與數據庫連接的,應用程序和數據庫之間的連接超時並不是由數據庫處理的。
當數據庫突然宕掉或發生網絡錯誤(設備故障等)時,JDBC 驅動的 Socket 超時的值是必須的。由於 TCP/IP 的結構,Socket 沒有辦法檢測到網絡錯誤,因此應用不能檢測到與數據庫到連接斷開了。如果沒有設置 Socket 超時,應用程序會一直等待數據庫返回結果。(這個連接也被叫做“死連接”) 為了避免死連接,Socket 必須要設置超時時間。Socket 超時可以通過 JDBC 驅動程序配置。通過設置 Socket 超時,可以防止出現網絡錯誤時一直等待的情況並縮短故障時間。
原因分析
通過這兒明白了,socket鏈接超時,導致鏈接池中數據庫占滿,新線程申請不到鏈接,故http請求不響應;在socket超時后,釋放鏈接,就會拋出大量的超時(The last packet successfully received from)錯誤。
解決辦法
配置jdbc參數,在一個合理的范圍內影響,超時即停止。
jdbc.url=jdbc:log4jdbc:mysql://ip:port/dbname?useUnicode=true&characterEncoding=utf-8&connectTimeout=60000&socketTimeout=60000
參考:
https://blog.csdn.net/jiadajing267/article/details/79516989
https://blog.csdn.net/java1993666/article/details/87889941
深入理解 JDBC 的超時