起因是發現一個同事編寫的程序運行兩個月左右,占用了服務器20G左右的內存。用WinDbg查看發現存在大量的Async Pinned Handles,而它們的gcroot都來自於SocketAsyncEventArgs。下面是場景的簡易模擬代碼(為了說明問題添加了手動GC):
for (var i = 0; i < 1000; ++i) { var endPoint = new IPEndPoint(IPAddress.Parse(host), port); var socket = new Socket(endPoint.AddressFamily, SocketType.Stream, ProtocolType.Tcp); socket.ReceiveBufferSize = bufferSize; socket.SendBufferSize = bufferSize; var iocp = new SocketAsyncEventArgs(); iocp.Completed += new EventHandler<SocketAsyncEventArgs>(OnIoSocketCompleted); iocp.SetBuffer(new Byte[bufferSize], 0, bufferSize); iocp.AcceptSocket = socket; try { socket.Connect(endPoint); Console.WriteLine(i); } catch (SocketException ex) { Console.WriteLine(ex.Message); } socket.Close(); //iocp.Dispose(); } GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect();
SocketAsyncEventArgs的SetBuffer函數內部會pin住buffer數據(查閱SetBufferInternal實現),它的析構函數會調用FreeOverlapped函數釋放資源。而問題就是在於FreeOverlapped函數的實現和傳遞參數。來看一下Dispose、~SocketAsyncEventArgs的實現:
public void Dispose() { this.m_DisposeCalled = true; if (Interlocked.CompareExchange(ref this.m_Operating, 2, 0) != 0) { return; } this.FreeOverlapped(false); GC.SuppressFinalize(this); }
~SocketAsyncEventArgs() { this.FreeOverlapped(true); }
兩者都會調用FreeOverlapped函數釋放,但是一個傳遞了false、一個傳遞了true參數。再來看FreeOverlapped實現:
private void FreeOverlapped(bool checkForShutdown) { if (!checkForShutdown || !NclUtilities.HasShutdownStarted) { if (this.m_PtrNativeOverlapped != null && !this.m_PtrNativeOverlapped.IsInvalid) { this.m_PtrNativeOverlapped.Dispose(); this.m_PtrNativeOverlapped = null; this.m_Overlapped = null; this.m_PinState = SocketAsyncEventArgs.PinState.None; this.m_PinnedAcceptBuffer = null; this.m_PinnedSingleBuffer = null; this.m_PinnedSingleBufferOffset = 0; this.m_PinnedSingleBufferCount = 0; } if (this.m_SocketAddressGCHandle.IsAllocated) { this.m_SocketAddressGCHandle.Free(); } if (this.m_WSAMessageBufferGCHandle.IsAllocated) { this.m_WSAMessageBufferGCHandle.Free(); } if (this.m_WSARecvMsgWSABufferArrayGCHandle.IsAllocated) { this.m_WSARecvMsgWSABufferArrayGCHandle.Free(); } if (this.m_ControlBufferGCHandle.IsAllocated) { this.m_ControlBufferGCHandle.Free(); } } }
用WinDbg查看gchandles統計:
Statistics: MT Count TotalSize Class Name 000007fb1bdb6ae8 1 24 System.Object 000007fb1bdb6b80 1 48 System.SharedStatics 000007fb1bdb7f58 1 64 System.Security.PermissionSet 000007fb1bdb6a10 1 160 System.ExecutionEngineException 000007fb1bdb6998 1 160 System.StackOverflowException 000007fb1bdb6920 1 160 System.OutOfMemoryException 000007fb1bdb6738 1 160 System.Exception 000007fb1bdb7b90 2 192 System.Threading.Thread 000007fb1bdb6c40 1 216 System.AppDomain 000007fb1bdb6a88 2 320 System.Threading.ThreadAbortException 000007fb1ae19770 6 336 System.Net.Logging+NclTraceSource 000007fb1bdbf958 3 480 System.RuntimeType+RuntimeTypeCache 000007fb1ae19900 6 480 System.Diagnostics.SourceSwitch 000007fb1bd64458 6 34520 System.Object[] 000007fb1b6f5e40 1000 112000 System.Threading.OverlappedData Total 1033 objects Handles: Strong Handles: 12 Pinned Handles: 5 Async Pinned Handles: 1000 Weak Long Handles: 3 Weak Short Handles: 13
確實存在1000個Async Pinned Handles,也就是說無法通過SocketAsyncEventArgs的析構函數釋放SafeNativeOverlapped相關的資源。將示例代碼的"iocp.Dispose();"注釋去除,並重新執行再次查看gchandles:
Statistics: MT Count TotalSize Class Name 000007fb1bdb6ae8 1 24 System.Object 000007fb1bdb6b80 1 48 System.SharedStatics 000007fb1bdb7f58 1 64 System.Security.PermissionSet 000007fb1bdb6a10 1 160 System.ExecutionEngineException 000007fb1bdb6998 1 160 System.StackOverflowException 000007fb1bdb6920 1 160 System.OutOfMemoryException 000007fb1bdb6738 1 160 System.Exception 000007fb1bdb7b90 2 192 System.Threading.Thread 000007fb1bdb6c40 1 216 System.AppDomain 000007fb1bdb6a88 2 320 System.Threading.ThreadAbortException 000007fb1ae19770 6 336 System.Net.Logging+NclTraceSource 000007fb1bdbf958 3 480 System.RuntimeType+RuntimeTypeCache 000007fb1ae19900 6 480 System.Diagnostics.SourceSwitch 000007fb1bd64458 6 34520 System.Object[] Total 33 objects Handles: Strong Handles: 12 Pinned Handles: 5 Weak Long Handles: 3 Weak Short Handles: 13
1000個Async Pinned Handles已不存在,但SocketAsyncEventArgs的析構函數從實現來看應該也可以完成釋放,為什么失敗了?
用!bpmd命令添加NclUtilities.HasShutdownStarted、 m_PtrNativeOverlapped.Dispose()的斷點。
!bpmd System.dll System.Net.NclUtilities.get_HasShutdownStarted
!bpmd mscorlib.dll System.Runtime.InteropServices.SafeHandle.Dispose
以上函數無法命中,由於.NET Framework ngen的關系。實際可以看到SocketAsyncEventArgs的Finalize函數內聯了FreeOverlapped。
0:002> !clrstack -a OS Thread Id: 0x46c (2) Child SP IP Call Site 000000002656f940 000007ff138485e9 System.Net.Sockets.SocketAsyncEventArgs.FreeOverlapped(Boolean) PARAMETERS: this = <no data> checkForShutdown = <no data> 000000002656f980 000007ff1385222d System.Net.Sockets.SocketAsyncEventArgs.Finalize() PARAMETERS: this (0x000000002656f9c0) = 0x000000000e140438 000000002656fd38 000007ff72c1c446 [DebuggerU2MCatchHandlerFrame: 000000002656fd38]
逐一單步執行可以發現NclUtilities.HasShutdownStarted也被內聯,直接調用外部函數clr!SystemNative::HasShutdownStarted。
0:002> t 000007ff`138485ec 85c0 test eax,eax 0:002> t 000007ff`138485ee 7438 je 000007ff`13848628 [br=0] 0:002> t 000007ff`138485f0 e8c798485f call clr!SystemNative::HasShutdownStarted (000007ff`72cd1ebc)
那問題就在於HasShutdownStarted的返回值了,如果為false則肯定會釋放資源。
0:002> t clr!SystemNative::HasShutdownStarted: 000007ff`72cd1ebc 4883ec28 sub rsp,28h 0:002> t clr!SystemNative::HasShutdownStarted+0x4: 000007ff`72cd1ec0 f60501767a0004 test byte ptr [clr!g_fEEShutDown (000007ff`734794c8)],4 ds:000007ff`734794c8=07 0:002> t clr!SystemNative::HasShutdownStarted+0xb: 000007ff`72cd1ec7 751a jne clr!SystemNative::HasShutdownStarted+0x27 (000007ff`72cd1ee3) [br=1] 0:002> t clr!SystemNative::HasShutdownStarted+0x27: 000007ff`72cd1ee3 b901000000 mov ecx,1 0:002> t clr!SystemNative::HasShutdownStarted+0x2c: 000007ff`72cd1ee8 ebf2 jmp clr!SystemNative::HasShutdownStarted+0x20 (000007ff`72cd1edc) 0:002> t clr!SystemNative::HasShutdownStarted+0x20: 000007ff`72cd1edc 8bc1 mov eax,ecx 0:002> t clr!SystemNative::HasShutdownStarted+0x22: 000007ff`72cd1ede 4883c428 add rsp,28h 0:002> t clr!SystemNative::HasShutdownStarted+0x26: 000007ff`72cd1ee2 c3 ret 0:002> t 000007ff`138485f5 0fb6c8 movzx ecx,al 0:002> t 000007ff`138485f8 85c9 test ecx,ecx
查看ecx的值發現竟然是1。
0:002> r ecx ecx=1 0:002> r al al=1
