使用Encoding進行字符編碼時注意的細節

本文轉載自查看原文 2012-04-26 13:58 2221

一般在對字符進行編碼的時候都會使用Encoding.GetBytes方法來進行，但當你在使用該方法的時候有沒了解這個方法呢？其實Encoding.GetBytes提供了很多方法不過一般都會直接使用Encoding.GetBytes(string).那使用這個方法會有什么問題呢？通過反編譯工具看一下這個方法的實現代碼．

public virtual byte[] GetBytes(string s)
{
	if (s == null)
	{
		throw new ArgumentNullException("s", Environment.GetResourceString("ArgumentNull_String"));
	}
	char[] array = s.ToCharArray();
	return this.GetBytes(array, 0, array.Length);
}

從反編譯的代碼來看是從s.ToCharArray()返回一個char[]數據，打開這個方法再細看一下

public unsafe char[] ToCharArray()
{
	int length = this.Length;
	char[] array = new char[length];
	if (length > 0)
	{
		fixed (char* ptr = &this.m_firstChar)
		{
			fixed (char* ptr2 = array)
			{
				string.wstrcpyPtrAligned(ptr2, ptr, length);
			}
		}
	}
	return array;
}

從代碼上來看是構建一個新的char[]把內容復制，到這里可以看到通過這個方法編碼必然會創建一個新的char[],返回到GetBytes方法看下return this.GetBytes(array, 0, array.Length);實現又是怎樣的

public virtual byte[] GetBytes(char[] chars, int index, int count)
{
	byte[] array = new byte[this.GetByteCount(chars, index, count)];
	this.GetBytes(chars, index, count, array, 0);
	return array;
}

原理和String.ToArray一樣返回一個新的byte[],從以上分析來看就是說一個string的默認方法編碼會構建新的char[]和byte[].這意味着會有內存產生和回收，.net下內存分配應該說是很高效的，但內存自動回收ＧＣ的是一件比較麻煩的事情，其實通過工具分析ＧＣ在程序的整個生命周期占用分額其實也不低的.特別是在大量內存需要回收的時候更加要命．

其實Encoding和String都提供相關方法可以避免這些對象的開銷

string:

public unsafe void CopyTo(int sourceIndex, char[] destination, int destinationIndex, int count)

encoding:

public abstract int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex);

都提供一些帶char[]和byte[]傳入的方法，通過這些方法復用char[]和byte[];這樣做除了可以減少內存的開銷回收還能提供程序的效率．為了使用這些方法簡單地構建一個可復用對象

public class EncodingData
        {
            public byte[] Bytes = new byte[1024*8];
            public char[] Chars = new char[1024 * 8];
        }

構建一個對應的應用池

static Stack<EncodingData> Pool = new Stack<EncodingData>();
        static EncodingData Pop()
        {
            lock (Pool)
            {
                return Pool.Pop();
            }
        }
        static void Push(EncodingData data)
        {
            lock (Pool)
            {
                Pool.Push(data);
            }
        }

實現起來比較簡單，下面看一下實際應用的效果,寫一個簡單的測用例

private static void test(string text)
        {
            Console.WriteLine("string length:" + text.Length);
            System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
            sw.Reset();
            sw.Start();
            for (int i = 0; i < 10000; i++)
            {
                Encoding.UTF8.GetBytes(text);
            }
            sw.Stop();
            Console.WriteLine(sw.Elapsed.TotalMilliseconds);
            sw.Reset();
            sw.Start();
            for (int i = 0; i < 10000; i++)
            {
                EncodingData ed = Pop();
                text.CopyTo(0, ed.Chars, 0, text.Length);
                Encoding.UTF8.GetBytes(ed.Chars, 0, text.Length, ed.Bytes, 0);
                Push(ed);
            }
            sw.Stop();
            Console.WriteLine(sw.Elapsed.TotalMilliseconds);
        }

測試結果

雖然測試過程偏向於默認的GetBytes方法，因為ＧＣ所占用的時間不能在這里統計進去，但是出來的效果明顯是可復用緩沖的方式比默認方法要高效很多．

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 使用Spring和SpringMVC管理bean時要注意的一個小細節使用GDI+進行圖片處理時要注意的問題使用vue組件需要注意的4個細節文件上傳時得到的List 為空的原因以及注意細節 URL encoding(URL編碼) 使用split進行分割時遇到特殊字符的問題使用Bootstrap v3.3.4注意細節box-sizing Google Protocol Buffers 編碼(Encoding) postman對字符串進行base64編碼方法和變量的使用 Qt 使用 QUrl 對字符串進行 URL 格式化編碼