項目場景:
公司某現場經常在停服務之前報SQLException:ORA-01013,正常業務也偶發SQLException:ORA-01013但是頻率較低。問題描述:
軟件版本如下:- 數據源:alibaba druid-1.1.9版本
- 數據庫:Oracle11g 11.2.0.3.0
- JDBC驅動:ojdbc7-11
報錯異常堆棧如下:
2020-11-04 09:41:22.228 ERROR [Thread-170] druid.sql.Statement[Slf4jLogFilter.java:149]
{conn-10738, pstmt-20197} execute error. select *
from xxxxx
where xxxxxxxxxxxxxxx
java.sql.SQLException: ORA-01013: 用戶請求取消當前的操作
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:206)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1035)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:194)
at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:953)
at oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:897)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1188)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3386)
at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3487)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1374)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3051)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:440)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:440)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.wall.WallFilter.preparedStatement_execute(WallFilter.java:619)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:440)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.filter.FilterAdapter.preparedStatement_execute(FilterAdapter.java:1080)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3049)
at com.alibaba.druid.proxy.jdbc.PreparedStatementProxyImpl.execute(PreparedStatementProxyImpl.java:167)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:498)
at sun.reflect.GeneratedMethodAccessor514.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:59)
at com.sun.proxy.$Proxy36.execute(Unknown Source)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:63)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:79)
at sun.reflect.GeneratedMethodAccessor503.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:63)
at com.sun.proxy.$Proxy34.query(Unknown Source)
at sun.reflect.GeneratedMethodAccessor503.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:49)
...
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at com.sun.proxy.$Proxy34.query(Unknown Source)
at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:63)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:324)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:83)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:148)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:141)
...
at sun.reflect.GeneratedMethodAccessor666.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653)
...
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:649)
...
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
原因分析:
-
觀察Oracle數據庫服務端是否存在異常
結論: 查看Oracle Alert日志,發現每次報錯時,數據庫都在進行redo日志切換,懷疑可能是redo日志文件大小設置過小導致,查閱資料發現設置為20分鍾切換一次比較合理,調大redo日志大小之后問題仍然存在。但是異常從SQLException:ORA-01013變成了SQLTimeOutException:ORA-01013。證明問題不在Oracle服務端。 -
測試環境做壓力測試,復現問題,觀察問題出現是否存在規律
結論: 復現之后發現報錯位置並無規律。 -
發現每次停服務之前問題發生頻率較高,看stop.sh腳本中發現每次停服務之前會進行jmap dump操作。jmap dump時會發生STW,可能導致SQL執行超時導致異常。
結論: stop.sh腳本去掉dump操作之后,問題發生頻率明顯降低。 -
懷疑druid在調用oralce jdbc時傳遞的queryTimeout參數存在問題。
驗證方案:
首先在本地復現此報錯。
給某條記錄加行級鎖,編寫測試代碼設置queryTimeout為1之后執行更新操作,復現了ORA-01013問題
通過druid filter,打印出PreparedStatement的queryTimeout變量值,觀察是否此值配置存在問題。
結論:
(1) 默認未配置時queryTimeout變量值全為0(為0代表永不超時),問題存在。
(2) 將DruidDataSource的queryTimeout設置為1800,代表SQL執行超時時間為30分鍾,問題存在。
不是此參數存在問題導致,可能是數據庫驅動對此變量處理存在問題,無論此值設置多少,都會存在ORA-01013. -
升級oracle驅動包為ojdbc8-12.2.0.1.0.jar
結論: 更換之后,壓測之后未復現此問題。
此問題在github上有多個issue都未解決,有的公司將數據源更換為HikariCP之后問題解決。此方案我嘗試之后確實可以解決,但是具體為什么更換數據源、升級jdbc驅動包都可解決此問題,仍是未解之謎,革命仍未成功,同志仍需努力呀!!!
github issue地址:
https://github.com/alibaba/druid/issues/4178
解決方案:
- 升級JDBC驅動程序
- 更換數據源(許多公司使用druid存在此問題,druid官方未給出解決方案)