疑難雜症--SQL SERVER 18056的錯誤


朋友遇到一個很棘手的問題,查看服務器日志,報以下錯誤:

01/21/2014 11:47:43,spid296,未知,錯誤: 18056,嚴重性: 20,狀態: 2901/21/2014 11:47:43,spid495,未知,
The client was unable to reuse a session with SPID 495<c/> 
which had been reset for connection pooling. The failure ID is 29. 
This error may have been caused by an earlier operation failing. 
Check the error logs for failed operations immediately before this error message.

百度一下, 找到相關文章:http://blog.csdn.net/yangzhawen/article/details/8209167

一方面讓開發從IIS角度去解決,另一方面從SQL SERVER入手,繼續查看錯誤日志,發現以下錯誤:

01/21/2014 11:46:10,spid8s,未知,
SQL Server has encountered 3 occurrence(s) of I/O requests taking longer 
than 15 seconds to complete on file [H:templog.ldf] in database [tempdb]
  
The OS file handle is 0x0000000000001254.  
The offset of the latest long I/O is: 0x000000184a6a00

01/21/2014 11:46:10,spid8s,未知,
SQL Server has encountered 3 occurrence(s) of I/O 
requests taking longer than 15 seconds to complete 
on file [H:\HisData\TXX.mdf] in database [xxx] 

The OS file handle is 0x0000000000001268.  
The offset of the latest long I/O is: 0x00000349a6e000

 

使用以下代碼查看當前耗CPU和IO比較多的執行計划

--最耗費CPU的前個查詢以及它們的執行計划
SETTRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITHTEMP AS(
SELECTCAST((qs.total_worker_time) / 1000000.0 AS DECIMAL(28,2)) AS TotalSecondsForCPUTime
,CAST(qs.total_worker_time* 100.0 / qs.total_elapsed_time  AS DECIMAL(28,2)) AS CPUPersent
,CAST((qs.total_elapsed_time- qs.total_worker_time)* 100.0 / qs.total_elapsed_time AS DECIMAL(28, 2)) AS WaitingPersent
,qs.execution_countExecutionCount
,CAST((qs.total_worker_time)/ 1000000.0 / qs.execution_count AS DECIMAL(28, 2)) AS AvgSecondsForCPUTime
,SUBSTRING (qt.text,(qs.statement_start_offset/2) + 1,     
    ((CASEWHEN qs.statement_end_offset = -1
      THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
      ELSE qs.statement_end_offset
      END - qs.statement_start_offset)/2) + 1) AS IndividualQuery
,qt.text AS ParentQuery
,DB_NAME(qt.dbid) AS DatabaseName
,qp.query_planQueryPlan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
WHEREqs.total_elapsed_time > 0 )
 
SELECTTOP(20)* FROM TEMP
ORDERBY TEMP.TotalSecondsForCPUTime DESC
 
--最占IO的前個查詢以及它們的執行計划
SETTRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITHTEMP AS(
SELECT (qs.total_logical_reads + qs.total_logical_writes) AS TotalIO
,(qs.total_logical_reads+ qs.total_logical_writes) / qs.execution_count AS AvgIO
,qs.execution_count AS ExecutionCount
,SUBSTRING (qt.text,(qs.statement_start_offset/2) + 1,     
((CASEWHEN qs.statement_end_offset = -1
THENLEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE qs.statement_end_offset
END- qs.statement_start_offset)/2) + 1) AS IndividualQuery
,qt.text AS ParentQuery
, DB_NAME(qt.dbid) AS DatabaseName
,qp.query_plan AS QueryPlan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
CROSSAPPLY sys.dm_exec_query_plan(qs.plan_handle) qp)
 
SELECTTOP(20)* FROM TEMP
ORDERBY TEMP.TotalIO DESC
View Code

發現一部分耗時長和IO高的執行語句,提起這些語句待分析優化

 

在使用以下代碼來查看當前請求和阻塞:

SELECT
    SPID                = er.session_id 
    ,STATUS             = ses.STATUS 
    ,[LOGIN]            = ses.login_name 
    ,HOST               = ses.host_name 
    ,BlkBy              = er.blocking_session_id 
    ,DBName             = DB_NAME(er.database_id) 
    ,CommandType        = er.command 
    ,SQLStatement       = st.text 
    ,ObjectName         = OBJECT_NAME(st.objectid) 
    ,ElapsedMS          = er.total_elapsed_time 
    ,CPUTime            = er.cpu_time 
    ,IOReads            = er.logical_reads + er.reads 
    ,IOWrites           = er.writes 
    ,LastWaitType       = er.last_wait_type 
    ,StartTime          = er.start_time 
    ,Protocol           = con.net_transport 
    ,ConnectionWrites   = con.num_writes 
    ,ConnectionReads    = con.num_reads 
    ,ClientAddress      = con.client_net_address 
    ,Authentication     = con.auth_scheme 
FROM sys.dm_exec_requests er 
OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) st 
LEFT JOIN sys.dm_exec_sessions ses 
ON ses.session_id = er.session_id 
LEFT JOIN sys.dm_exec_connections con 
ON con.session_id = ses.session_id 
WHERE er.session_id > 50 
ORDER BY er.blocking_session_id DESC,er.session_id
View Code


發現以下問題:

MSDN上有如下介紹:

ConnectionWrites:此連接中已發生的讀包次數。可為空值。
ConnectionReads:此連接中已發生的寫數據包次數。可為空值。


在普通的OLTP數據上,ConnectionWrites和ConnectionReads 基本在幾十到上百,而在這台服務器上達到了352W之多,經確認,192.168.8.16 上是新產品服務器,服務器出現問題與該產品上線時間基本吻合,基本可斷定問題根源就是該新產品。

 

剩下問題表示分析為什么需要如此龐大的網絡包,尋找解決之道。

 

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM