背景
項目使用了阿里druid jdbc連接池。某天環境出現網絡波動,持續20分鍾后,網絡恢復,單服務一直無法連接數據庫。重啟之后正常運行
druid 版本:1.1.21
項目配置
spring.datasource.druid.game.driver-class-name = com.mysql.cj.jdbc.Driver
spring.datasource.druid.game.initial-size = 5
spring.datasource.druid.game.min-idle = 5
spring.datasource.druid.game.max-active = 100
spring.datasource.druid.game.max-wait = 30000
spring.datasource.druid.game.time-between-eviction-runs-millis = 60000
spring.datasource.druid.game.min-evictable-idle-time-millis = 300000
spring.datasource.druid.game.validationQuery = SELECT 'x'
spring.datasource.druid.game.test-while-idle = true
spring.datasource.druid.game.test-on-borrow = false
spring.datasource.druid.game.test-on-return = false
spring.datasource.druid.game.poolPreparedStatements = true
spring.datasource.druid.game.maxPoolPreparedStatementPerConnectionSize = 20
spring.datasource.druid.game.maxOpenPreparedStatements = 20
Druid的Bug現場
翻看當時日志,發現大量CannotGetJdbcConnectionException異常日志:
org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.exceptions.PersistenceException:
### Error querying database. Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 0, maxActive 100, creating 1, createElapseMillis 686180, createErrorCount 3
|
Caused by: org.apache.ibatis.exceptions.PersistenceException: ### Error querying database. Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 0, maxActive 100, creating 1, createElapseMillis 686180, createErrorCount 3 ### The error may exist in class path resource [mapper/SysDictMapper.xml] ### The error may involve com.stnts.suileyoo.game.api.dao.SysDictDao.select ### The error occurred while executing a query ### Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 0, maxActive 100, creating 1, createElapseMillis 686180, createErrorCount 3 at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:149) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:140) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:76) at sun.reflect.GeneratedMethodAccessor135.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:426) ... 95 common frames omitted Caused by: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 0, maxActive 100, creating 1, createElapseMillis 686180, createErrorCount 3 at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:82) at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:80) at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:67) at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:336) at org.apache.ibatis.executor.SimpleExecutor.prepareStatement(SimpleExecutor.java:86) at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62) at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:324) at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:83) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:147) ... 101 common frames omitted Caused by: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 0, maxActive 100, creating 1, createElapseMillis 686180, createErrorCount 3 at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1722) at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1402) at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1382) at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1372) at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:109) at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:158) at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:116) at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:79) ... 111 common frames omitted
Bug定位
系統是通過Druid連接
獲取連接超時(GetConnectionTimeoutException)此錯誤的出現,只有兩種可能:
1.業務系統本身Druid獲取連接失敗。 2.網絡有問題或者數據庫不可用。
在這個Bug里面很明顯是Druid創建連接失敗,原因如下:
1.之前網絡的確有問題,單已恢復,另外還有另外一個服務也是訪問這個數據庫的。但是那個訪問正常。 2.重啟服務之后一切正常。
如果說還是網絡連接異常,那么並不能解釋重啟后正常這一現象。
Druid獲取連接的過程
在分析這個問題之前,先得看下Druid是如何創建連接的,下面是本人在上網查找到的,示意圖:
(備注:來源 https://my.oschina.net/alchemystar/blog/899987)
可見druid創建連接都是通過一個專門的線程來進行的,此圖省略了大量的源碼細節。
然后對比DruidDataSource源碼,驗證了的確是上圖所示的流程。
定位源碼
根據日志“
Failed to obtain JDBC Connection; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 60000, active 0, maxActive 100, creating 1, createElapseMillis 686180, createErrorCount 3
”,發現日志打印位置,如下圖:
holder == null 是關鍵。然后查找holder是如果被賦值的,經查找源碼,發現其如下圖:
因為已經配置過maxWait屬性,故定位到pollLast方法中,再結合druid連接流程,找到CreateConnectionTask的runInternal方法。仔細閱讀發現如下邏輯
經過比對日志輸出,最終發現創建連接時,沒有創建就直接返回了。真相了,原來是未設置keepAlive屬性導致的。。。。。淚奔
起伏
已經定位到了問題,但是在查找資料的過程中,發現有人說在1.1.21版本中配置了keepAlive也是出現類似的情況
詳見https://github.com/alibaba/druid/issues/3889
后查找druid發版日志,找到如下日志:
隨將druid版本到1.1.24版本,但是發現日志中會頻繁出現“discard long time none received connection”的錯誤日志。然后進一步查找相關問題,找到下面的一篇文章https://gitee.com/y_project/RuoYi-Vue/issues/I1XWZE?_from=gitee_search
此錯誤不影響鏈接數據庫,去除需要設置System.setProperty("druid.mysql.usePingMethod", "false"); 因為采用的spring-boot,隨將這行代碼加入到Application的main方法中,經測試之后,已無此錯誤日志頻繁出現的問題。
解決
最終,升級druid版本到1.1.24版本,另外在配置中增加了keepAlive屬性,並在Application的main方法中加入System.setProperty("druid.mysql.usePingMethod", "false");
題外
不配置maxWait可能帶了的BUG https://zhuanlan.zhihu.com/p/144446516