1,問題現象
某次測試發現,程序失去響應。由於程序集成了EurekaLog組件,彈出了錯誤框。查看其給出的Call Stack信息,發現沒有發生線程死鎖(DeadLock=0;),問題在於 Wait Chain=找不到指定的程序。

Call Stack Information: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |Methods |Details|Stack |Address |Module |Offset |Unit |Class |Procedure/Method |Line | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |*Exception Thread: ID=14008; Parent=0; Priority=0 | |Class=; Name=MAIN | |DeadLock=0; Wait Chain=找不到指定的程序。 | |Comment= | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |7FFFFFFE|03 |00000000|7C9583AC|ntdll.dll |000283AC|ntdll | |KiFastSystemCallRet | | |00000020|03 |0013EF6C|7C957AC7|ntdll.dll |00027AC7|ntdll | |ZwWaitForSingleObject | | |00000020|03 |0013EFAC|71A8C502|mswsock.dll |0000C502|mswsock | | (possible ServiceMain+7745) | | |00000020|03 |0013F024|71B62FEB|ws2_32.dll |00002FEB|WS2_32 | |recv | | |00000020|03 |0013F11C|0391D42F|myodbc5.dll |0000D42F|myodbc5 | | (possible SQLCopyDesc+4927) | | |00000020|03 |0013F138|0391EBC1|myodbc5.dll |0000EBC1|myodbc5 | | (possible SQLCancel+1361) | | |00000020|03 |0013F150|0392794B|myodbc5.dll |0001794B|myodbc5 | | (possible SQLGetCursorNameW+4955) | | |00000020|03 |0013F15C|4B7567F0|odbc32.dll |000067F0|ODBC32 | | (possible SQLParamOptions+379) | | |00000020|03 |0013F17C|4B7568C2|odbc32.dll |000068C2|ODBC32 | |SQLExecDirectW | | |00000020|03 |0013F208|004B6C8F|Plate_collect_and_forward.exe|000B6C8F|Graphics | |FreeMemoryContexts | | |00000020|03 |0013F29C|4B43A64A|msado15.dll |0000A64A|msado15 | | (possible DllGetClassObject+7071) | | |00000020|03 |0013F32C|749A4EEE|msdart.dll |00014EEE|CEXAutoBackupFile | | (possible CEXAutoBackupFile+303) | | |00000020|03 |0013F338|74998351|msdart.dll |00008351|MSDART | | (possible mpMalloc+350) | | |00000020|03 |0013F34C|77600CAC|oleaut32.dll |00010CAC|oleaut32 | |VariantChangeType | | |00000020|03 |0013F378|74998351|msdart.dll |00008351|MSDART | | (possible mpMalloc+350) | | |00000020|03 |0013F3B4|749984A6|msdart.dll |000084A6|MSDART | | (possible MpGetHeapHandle+299) | | |00000020|03 |0013F3CC|749984A6|msdart.dll |000084A6|MSDART | | (possible MpGetHeapHandle+299) | | |00000020|03 |0013F3D8|749984DD|msdart.dll |000084DD|MSDART | |MpHeapFree | | |00000020|03 |0013F3F0|749984DD|msdart.dll |000084DD|MSDART | |MpHeapFree | | |00000020|03 |0013F454|749980B9|msdart.dll |000080B9|`CLKRHashTableStats |(possible BucketSizes'|`2'.s_aBucketSizes+177) | | |00000020|03 |0013F490|749980B9|msdart.dll |000080B9|`CLKRHashTableStats |(possible BucketSizes'|`2'.s_aBucketSizes+177) | | |00000020|03 |0013F4AC|4B43A4A2|msado15.dll |0000A4A2|msado15 | | (possible DllGetClassObject+6647) | | |00000020|03 |0013F4C0|4B43A380|msado15.dll |0000A380|msado15 | | (possible DllGetClassObject+6357) | | |00000020|03 |0013F54C|749A4EEE|msdart.dll |00014EEE|CEXAutoBackupFile | | (possible CEXAutoBackupFile+303) | | |00000020|03 |0013F558|74998351|msdart.dll |00008351|MSDART | | (possible mpMalloc+350) | | |00000020|03 |0013F55C|74998192|msdart.dll |00008192|MSDART | | (possible UMSEnterCSWraper+178) | | |00000020|03 |0013F5A8|4B43A253|msado15.dll |0000A253|msado15 | | (possible DllGetClassObject+6056) | | |00000020|03 |0013F5F0|749984A6|msdart.dll |000084A6|MSDART | | (possible MpGetHeapHandle+299) | | |00000020|03 |0013F79C|749980B9|msdart.dll |000080B9|`CLKRHashTableStats |(possible BucketSizes'|`2'.s_aBucketSizes+177) | | |00000020|03 |0013F820|005A9A64|Plate_collect_and_forward.exe|001A9A64|ADODB |TCustomADODataSet |OpenCursor | | |00000020|03 |0013F8B4|00599C35|Plate_collect_and_forward.exe|00199C35|DB |TDataSet |SetActive | | |00000020|03 |0013F8D0|00599A80|Plate_collect_and_forward.exe|00199A80|DB |TDataSet |Open | | |00000020|04 |0013F8D4|005C8A0A|Plate_collect_and_forward.exe|001C8A0A|Ufunc | |UpdateOneRec |403[18] | |00000020|04 |0013F900|005C8CE3|Plate_collect_and_forward.exe|001C8CE3|Ufunc | |Handle_DB_OP_1000 |440[18]
查看任務管理器,發現程序占用300多個線程,耗用100多M內存,但cpu耗用為0. 似乎停止了在等待什么。
2,問題分析
那么究竟是什么程序找不到呢?EurekaLog提供的信息中並未提供。由於程序已經失去反應,也得不到其它具體信息。
但可以發現出錯時執行的是數據庫操作。因此,可以從數據庫入手。
程序使用Mysql數據庫,因此首先連上數據庫查看有無反應。發現可以正常連接使用,因此數據庫方面不存在問題。
那么是不是程序對數據庫的連接被拒絕了呢?采用Navicat查看服務器監控中的進程列表,發現連接是存在的,但連接時間明顯不對,才61秒,顯然是剛剛建立起來的連接。
而且伴隨着建立新連接,出現了SQL語句。
顯然,程序試圖連接上DB后,運行此查詢語句,但卻因為此查詢語句被DB給強行斷開了。於是程序不斷嘗試連接DB,DB不斷強行斷開,形成死結,程序被阻塞在數據庫操作上。
到網上查詢為什么MySQL會強行斷開程序的連接,看到有以下說明:
Your SQL statement was too large,即查詢的結果集超過 max_allowed_packet 。
max_allowed_packet 缺省是1048576,1M字節;我的查詢中含有Blob字段,顯然超出了這個限制!
3,原因驗證
既然猜測程序無反應的原因在於查詢結果集太大,那么就修改數據庫,刪除一部分數據。
當刪除完成后,發現程序自動恢復運行了,主界面也正常出現了。可見原因就出現在查詢結果集大小超出max_allowed_packet值。
順便說一句,當程序恢復后,主界面上的日志顯示了運行錯誤信息:
更新數據庫表錯誤:[MySQL][ODBC 5.1 Driver][mysqld-5.5.19]MySQL server has gone away
MySQL server has gone away,這才是程序失去反應的真正原因。因此本文標題叫做《一次“MySQL server has gone away”故障及其解決》。
4,解決措施
對於出現的問題,顯然關系到程序和數據庫兩個方面,因此也應該從這兩方面來解決。
1),程序應避免將大量數據取到前端執行操作。SQL查詢要進行優化。
2),數據庫要根據使用要求修改配置參數值,一般可以將max_allowed_packet加大到20M。